[Submitted on 1 Nov 2025]
Selective Orthogonal Adam: Analysis of a Hybrid Optimizer
View PDFAbstract:This paper evaluates Selective Orthogonal Adam (SO-Adam), an optimizer combining adaptive momentum with selective orthogonalization for transformer attention layers. Empirical results show SO-Adam achieves a validation loss of 5.036 on a 134M parameter Qwen architecture, compared to AdamW (4.927) and Muon (3.537) baselines. We analyze the challenges of integrating orthogonal updates with adaptive optimization.
Submission history
[v1] Sat, 1 Nov 2025 10:35 UTC