[Submitted on 1 Nov 2025]
Stable Orthogonal Adam: A Systematic Study of Orthogonal Momentum Adaptation in Language Model Optimization
View PDFAbstract:This paper presents a comprehensive investigation of orthogonal momentum adaptation in Adam-style optimization for language models. We propose StableOrthoAdam, which combines periodic QR-based orthogonalization of momentum with standard AdamW updates. While theoretically motivated to improve optimization trajectory orthogonality, our method achieves a final validation loss of 7.316 on the FineWeb benchmark using a 134M parameter Qwen architecture, underperforming both the AdamW (4.927) and Muon (3.537) baselines. Through detailed ablation studies and comparison with recent orthogonal optimization approaches, we identify key challenges in scaling orthogonal adaptation to full language model training.
Submission history
[v1] Sat, 1 Nov 2025 22:43 UTC