[Submitted on 18 Oct 2025]
Scaled Variance-Reduced Momentum: A Stable Optimization Approach for Language Models
View PDFAbstract:We present Scaled Variance-Reduced Momentum (SVRM), a novel optimization approach for training large language models. While modern optimizers like AdamW have become standard, they often exhibit unstable training dynamics during the early stages of optimization. SVRM addresses this through a variance reduction mechanism combined with parameter-specific scaling, providing more stable updates while maintaining competitive performance. Our experiments on a 134M parameter language model demonstrate that SVRM achieves a validation loss of 5.261, compared to AdamW's 4.927. Although not surpassing the baseline, SVRM shows promising training stability properties and provides insights into variance reduction techniques for language model optimization. The method's simplicity and computational efficiency make it a practical alternative worth further investigation.
Submission history
[v1] Sat, 18 Oct 2025 06:57 UTC