[Submitted on 31 Oct 2025]
HyMo: A Study of Hybrid Momentum Optimization for Transformer Language Models
View PDFAbstract:This paper presents a systematic investigation of hybrid momentum optimization techniques for transformer language models. We examine the feasibility of combining standard momentum updates with selective orthogonalization for large parameter matrices, focusing on training stability and performance tradeoffs. Our experiments on a 134M parameter transformer model demonstrate that while our HyMo optimizer achieves comparable performance to AdamW (validation loss of 4.983 vs 4.927), it does not outperform existing approaches. The study provides insights into the practical challenges of incorporating orthogonal updates in modern language model training pipelines and establishes baseline expectations for similar hybrid approaches.
Submission history
[v1] Fri, 31 Oct 2025 13:33 UTC