Rome wasn’t built in a day, they say. Okay, but it still only took Elon Musk just 122 of ’em to tool up what is claimed to be the most powerful AI training system on the planet. Everyone’s favourite billionaire misanthrope doesn’t hang about, then.
Musk’s new toy, dubbed Colossus and built by his new AI startup, xAI, has been created to train the latest version of the GROK language model, known as GROK-3. It’s powered by no fewer than 100,000 Nvidia H100 GPUs.
If that’s not enough for you, in an X post Musk says Colossus will double in power “in a few months” thanks to the addition of another 50,000 H200 Nvidia chips, which each pack roughly twice the AI acceleration performance of an H100 GPU.
It’s not clear how much this is all costing Musk and xAI. Estimates of pricing for Nvidia’s H100 GPUs vary from $20,000 to as much at $90,000 a pop. Presumably, Musk managed to get a comparatively decent deal buying 100,000 of the things in one go.
But even at the lower estimate, you’re looking at $2 billion for the Nvidia chips for phase one, let alone building the datacenter, all the relevant infrastructure, staffing up, and doing all the work involved in setting up training for an advanced LLM. Oh, and whatever those other 50,000 H200 are costing on top as a little light frosting.
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent…September 2, 2024
Indeed, it was only a few weeks ago that xAI launched GROK-2 as an exclusive-access thing for X subscribers. GROK-2 apparently made do with a piffling 15,000 H100 chips for training, the poor deluded little AI dear. And yet by some measures, GROK-2 ranks second and only behind ChatGPT-4o in the LLM league tables.
So, even the first phase will be six to seven times more powerful than GROK-2, only to supposedly double in power a few months later. Clearly, Musk has his sights set on building the most powerful LLM out there.
(Image credit: Future)
Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.
As for when GROK-3 might be unleashed, Musk told conservative polemicist and latterly podcaster Jordan Peterson just last month that he hoped GROK-3 would go live by December.
Incidentally, such a machine doesn’t come without collateral consequences. The new cluster, located in Memphis, Tennessee, will chew through 150 megawatts of power and has been allocated up to one million gallons of water a day for cooling.
So, add environmental impact to the roster of reasons to be unnerved by Colossus, alongside wider concerns about the direct impact of AI and Musk’s ever-increasing volatility. That’s plenty to be getting on with.