[Submitted on 2 Nov 2025]
A Comprehensive Study of Stable Orthogonal Momentum Optimizers for Language Models
View PDFAbstract:This paper presents a thorough investigation of Stable Orthogonal Momentum (SOM) optimizers for training transformer-based language models. We introduce a novel optimizer combining momentum with row-and-column scaling operators, rigorously evaluate its performance across multiple ablations, and compare against established baselines. Our experiments on a 134M parameter model trained on the FineWeb dataset reveal that while SOM achieves stable training dynamics, it significantly underperforms both AdamW (4.927) and Muon (3.537) baselines, achieving a final validation loss of 11.643. We analyze this performance gap through detailed ablation studies, discuss the limitations of classical momentum approaches for modern language model optimization, and provide recommendations for future research directions. All experimental details are provided to ensure reproducibility.
Submission history
[v1] Sun, 2 Nov 2025 08:31 UTC