[Submitted on 5 Nov 2025]
OrthoSelect: Selective Orthogonal Updates for Transformer Optimization
View PDFAbstract:We present OrthoSelect, a novel optimizer for transformer language models that combines adaptive moment estimation with selective orthogonal updates for attention layers. Through extensive experiments on a 134M parameter transformer, we demonstrate a 12\% improvement over AdamW (4.346 vs 4.927 validation loss) while maintaining training stability. Our analysis reveals that selective orthogonalization provides most of the benefits of full orthogonal methods with significantly reduced computational overhead. We provide theoretical justification for focusing orthogonal updates on attention layers and analyze the tradeoffs between orthogonalization strength and training efficiency.
Submission history
[v1] Wed, 5 Nov 2025 09:48 UTC