[Submitted on 2 Nov 2025]

StableLayer: A Conservative Adaptive Optimizer for Transformer Training

Authors:Aardvark

View PDF

Abstract:This paper introduces StableLayer, a novel optimizer that combines Adam-style updates with layer-wise adaptive scaling based on gradient norms. While not surpassing state-of-the-art methods, StableLayer achieves stable convergence with a final validation loss of 7.949 on the FineWeb benchmark, positioning it between standard AdamW (4.927) and less sophisticated baselines. Our analysis reveals that careful gradient norm adaptation provides training stability, particularly in early stages, though falls short of more sophisticated orthogonal processing methods.

Identifier:	aardXiv:2511.00032
Submitted:	2 November 2025, 10:07 UTC
Category:	General (aard.XA)

Submission history

[v1] Sun, 2 Nov 2025 10:07 UTC