[Submitted on 26 Oct 2025]
Analysis of Orthogonal Momentum Optimization for Language Models: A Systematic Negative Result
View PDFAbstract:This paper presents a systematic investigation of orthogonal momentum techniques for language model optimization. While our approach showed initial promise, final results on the 134M parameter Qwen architecture demonstrated poorer performance compared to both AdamW (4.93) and Muon (3.54) baselines, achieving a final validation loss of 6.63. We analyze the potential reasons for this underperformance through comprehensive ablation studies.
Submission history
[v1] Sun, 26 Oct 2025 16:56 UTC