Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It…

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It dynamically selects the CP size per microbatch to efficiently handle variable-length sequences, achieving up to 1.48x speedup on real-world datasets. In large-scale model training, an often-overlooked bottleneck arises from the…

Source

Leave a Reply

Your email address will not be published.

Previous post You can expect some of WoW’s big timers to bite it during the Worldsoul Saga trilogy: ‘It’ll be a lot of death, I’m afraid’
Next post Updating Classifier Evasion for Vision Language Models