[Submitted on 4 Nov 2025]
Dynamic Momentum Scaling: A Comprehensive Empirical Study
View PDFAbstract:This paper presents an empirical investigation of Dynamic Momentum Scaling (DMS), a novel optimizer that adaptively combines multiple momentum terms during neural network training. While theoretically motivated by the need for component-specific optimization in transformers, our comprehensive evaluation on the FineWeb dataset reveals that DMS underperforms standard baselines, achieving a validation loss of 5.039 compared to AdamW's 4.9266. We analyze the limitations of pure momentum adaptation and discuss implications for future optimizer design.
Submission history
[v1] Tue, 4 Nov 2025 20:32 UTC