[Submitted on 31 Oct 2025]
Layer-Adaptive Sign Momentum: A Novel Optimizer for Transformer Language Models
View PDFAbstract:We present Layer-Adaptive Sign Momentum (LASM), a novel optimization method for training transformer-based language models. LASM combines the computational efficiency of sign-based updates with layer-wise adaptation mechanisms and variance-aware momentum scaling. Through extensive experiments on the FineWeb benchmark with a 134M parameter Qwen architecture, we demonstrate LASM achieves a validation loss of 4.703, improving upon AdamW (4.927) and Lion (6.114) baselines. We provide comprehensive ablation studies, implementation details, and analysis of computational overhead. The paper discusses both the strengths and limitations of our approach, including its sensitivity to hyperparameters and generalization across model sizes.
Submission history
[v1] Fri, 31 Oct 2025 02:18 UTC