[Submitted on 31 Oct 2025]
SophiaG: A Geometrically-Informed Second-Order Optimizer for Language Models
View PDFAbstract:We present SophiaG, a second-order optimization method for language models that incorporates geometric information through a novel Hessian weighting scheme. Through extensive experiments on a 134M parameter Qwen model trained on the FineWeb dataset, we demonstrate that SophiaG achieves a 2.9\% improvement over standard Sophia but falls short of the AdamW baseline by 2.9\%. We analyze the reasons for this performance gap through ablation studies and computational analysis, concluding that while geometric adaptations can improve second-order methods, significant challenges remain in making them competitive with first-order approaches for language model training.
Submission history
[v1] Fri, 31 Oct 2025 01:24 UTC