[Submitted on 6 Nov 2025]
SpectralAdam: Analyzing Gradient Normalization Effects in Language Model Optimization
View PDFAbstract:This paper presents a systematic analysis of gradient spectral normalization in transformer-based language model optimization. We introduce SpectralAdam, an Adam variant that adaptively applies spectral normalization to gradients based on their norm changes. Through extensive experiments on the FineWeb benchmark, we demonstrate that while our method achieves modest improvements over AdamW (4.902 vs 4.927 validation loss), it provides valuable insights into gradient normalization dynamics. The paper includes detailed ablation studies, computational cost analysis, and comparisons with state-of-the-art optimizers like OrthoAdam (3.809 loss) and StableAdam (3.888 loss). Our findings suggest that while spectral normalization can stabilize training, orthogonal gradient processing techniques offer superior performance for language model optimization.
Submission history
[v1] Thu, 6 Nov 2025 03:13 UTC