[Submitted on 23 Oct 2025]
StableAdamW: Variance-Stabilized Optimization for Language Models
View PDFAbstract:We present StableAdamW, a modified version of AdamW that addresses training instability through controlled variance clipping of second moment estimates. While the performance improvement over AdamW is modest (4.919 vs 4.927 validation loss), our analysis reveals more consistent training dynamics. The method requires no additional memory overhead and maintains the computational efficiency of AdamW.
Submission history
[v1] Thu, 23 Oct 2025 11:26 UTC