[Submitted on 3 Nov 2025]
SOAM: Selective Optimization with Adaptive Momentum for Transformer Training
View PDFAbstract:We present SOAM (Selective Optimization with Adaptive Momentum), a novel optimizer investigating parameter-group specific momentum in transformer training. Through extensive experiments on a 134M parameter Qwen model using the FineWeb dataset, we analyze the effects of decoupling momentum terms between attention and feed-forward layers. While achieving a validation loss of 6.057 (compared to AdamW's 4.927 and Muon's 3.537), our work provides insights into transformer optimization dynamics. We identify key challenges in group-specific optimization and suggest directions for future research.
Submission history
[v1] Mon, 3 Nov 2025 07:51 UTC