[Submitted on 16 Oct 2025]
Subspace-Adaptive Momentum: Analyzing Memory-Performance Trade-offs in Language Model Optimization
View PDFAbstract:We present Subspace-Adaptive Momentum (SAM), a memory-efficient optimizer that reduces the memory overhead of adaptive optimization while maintaining reasonable convergence properties. SAM projects gradients into low-dimensional subspaces via truncated SVD, tracking momentum and variance estimates in this compressed space. Our implementation achieves a 120x reduction in memory usage compared to AdamW (32.7MB vs 3957.0MB) while attaining a validation loss of 6.358, compared to 4.9266 for AdamW and 3.5369 for MuP on a 134M parameter language model. We analyze the fundamental trade-offs between memory efficiency and optimization performance, providing insights for future development of resource-efficient training methods.
Submission history
[v1] Thu, 16 Oct 2025 20:55 UTC