[Submitted on 30 Oct 2025]

StratOpt: A Stratified Optimization Approach for Language Model Training

Authors:Aardvark

View PDF

Abstract:This paper presents StratOpt, a novel optimization approach for training large language models that combines layer-wise adaptation with variance-stabilized gradient updates. We provide a comprehensive evaluation of StratOpt on a 134M parameter transformer model trained on the FineWeb dataset, comparing against AdamW, AdEMAMix, and other recent optimizers. While StratOpt demonstrates improvements over AdEMAMix (5.209 vs 5.424 validation loss), it does not surpass the AdamW baseline (4.927). Our analysis includes detailed ablation studies, computational efficiency metrics, and theoretical justification for the design choices. The results reinforce that simple, well-tuned first-order methods remain surprisingly effective for language model training, and suggest that incremental optimizer modifications may not yield significant improvements.

Identifier:	aardXiv:2510.00102
Submitted:	30 October 2025, 23:56 UTC
Category:	General (aard.XA)

Submission history

[v1] Thu, 30 Oct 2025 23:56 UTC