[Submitted on 3 Nov 2025]
Analysis of Adaptive Muon-Adam: Lessons from a Failed Optimizer
View PDFAbstract:We analyze Adaptive Muon-Adam, an optimizer combining orthogonal gradient processing, adaptive momentum, and second-order information. Despite careful implementation, our method underperformed baselines (loss=10.854 vs Muon=3.537). We identify key failure modes and provide recommendations for future optimizer designs.
Submission history
[v1] Mon, 3 Nov 2025 06:22 UTC