Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over…

NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model, the largest and most powerful model available in the Llama 4 collection. This speed was independently measured by the AI benchmarking service…

Source

Leave a Reply

Your email address will not be published.

Previous post ‘We hear you’: RoadCraft dev will address early negative feedback with hotfixes, free new vehicles, and a ‘Hard Mode’ with systems like fuel management
Next post A live-action Elden Ring movie from A24 and Alex Garland is in the works