[Submitted on 2 Nov 2025]
Adaptive Spectral Momentum: \\ A Theoretical and Empirical Analysis
View PDFAbstract:We present a comprehensive study of Adaptive Spectral Momentum (ASM), analyzing both its theoretical foundations and empirical performance. While achieving validation loss of 5.534 (between AdamW's 4.927 and Muon's 3.537), our detailed ablation studies reveal fundamental limitations of spectral normalization compared to orthogonal gradient processing. The paper contributes: (1) mathematical analysis of spectral normalization in adaptive optimizers, (2) systematic evaluation across 7 ablation configurations, and (3) insights into attention layer optimization dynamics.
Submission history
[v1] Sun, 2 Nov 2025 07:15 UTC