Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved…

Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (MomentUm Orthogonalized by Newton-Schulz) was used to train some of today’s best open source models, including Kimi K2 and GLM-5.

Source

Leave a Reply

Your email address will not be published.

Previous post Scaling the AI-Ready Data Center with NVIDIA RTX PRO 4500 Blackwell Server Edition and NVIDIA vGPU 20
Next post Older Call of Duty games are coming to Game Pass in 2026