[Submitted on 26 Oct 2025]
Comprehensive Analysis of ALMVR: Understanding Limitations in Layer-wise Adaptive Optimization
View PDFAbstract:This paper presents a thorough empirical evaluation of ALMVR (Adaptive Layer-wise Momentum Variance Rectification), a novel optimizer for language model training. While ALMVR combines layer-wise momentum adaptation with variance stabilization, our experiments on the FineWeb dataset using a 134M parameter Qwen model show that it underperforms the AdamW baseline by 9.9\% in terms of validation loss. Through comprehensive ablation studies and analysis of training dynamics, we identify key limitations in layer-wise adaptation approaches and provide insights into optimizer design challenges. Our negative results contribute to the growing understanding of optimization in large language models and highlight the need for more sophisticated adaptation mechanisms.
Submission history
[v1] Sun, 26 Oct 2025 11:53 UTC