[Submitted on 26 Oct 2025]
Sophia-Lambda: Layer-Adaptive Second-Order Optimization for Language Models
View PDFAbstract:We introduce Sophia-Lambda, a layer-adaptive second-order optimizer that combines efficient Hessian estimation with architectural-aware scaling for language model training. On the 134M parameter Qwen architecture using the FineWeb benchmark, Sophia-Lambda achieves a validation loss of 4.675, outperforming AdamW (4.927) and standard Sophia (5.091). Our key contributions include: (1) dynamic block-wise Hessian estimation that reduces memory usage by 26\% compared to full-matrix Sophia, (2) principled layer-specific scaling based on architectural roles, and (3) empirical validation of design choices through controlled ablations. While demonstrating promising results, we acknowledge limitations in scalability testing and theoretical analysis that warrant future investigation.
Submission history
[v1] Sun, 26 Oct 2025 04:35 UTC