xAI's Colossus: Most Powerful AI Cluster Online in 122 Days

TkTech · 2024-09-04T22:56:04 1725490564

Is there a reputable source for these claims? Given source has been proven to lie about scale, delivery, and functionality, especially for topics that might boost share prices. https://elonmusk.today/

JumpCrisscross · 2024-09-05T01:21:33 1725499293

“The utility company said that by August, xAI would have access to around 50 megawatts of power, or only enough for around 50,000 chips, and that an upcoming electric substation on the site would give him another 150 megawatts—enough to power 100,000 chips or more. But that wouldn’t happen until 2025, the utility company said” [1].

[1] https://www.theinformation.com/articles/why-musks-ai-rivals-...

summerlight · 2024-09-04T23:11:03 1725491463

This looks like quite a futile PR to downplay other players? MS already had more than 100k H100 before the beginning of this year and the same thing for Meta. Google has its own TPU cluster so it's not even possible to do apple-to-apple comparison.

Havoc · 2024-09-04T23:43:09 1725493389

Yeah the charts posted in comments on x didn’t look right to me either

XorNot · 2024-09-04T23:25:03 1725492303

Doesn't this literally not matter? We know that something Llama is being trained in about 3 months on Meta's current resources from their current datasets.

If they had less GPU power available, it would take longer but not dramatically so. Building a "more powerful AI cluster" doesn't give you a more powerful AI currently. In fact none of the Transformer architecture's seem to, and their instantiations seem to all be targeted towards "single H100 executions" for inference.

So this really does seem like nothing but hype: buying a lot of commercially available GPUs isn't accomplishing anything.

latentsea · 2024-09-05T04:18:46 1725509926

Right, but that means Meta can train 4 models a year. So, 4 opportunities for feedback. They however, need to release those to the public in order to get feedback on the limitations of the model, so that might limit it to training and releasing 2 models a year.

If you had more GPUs you could iterate faster...

disposition2 · 2024-09-06T17:38:27 1725644307

Apparently running on many gas turbines (enough to power 50,000 homes), without any permits for all that pollution [1]. Also, most likely, hoovering up water from the often low Mississippi River

Just your modern day robber baron...

1. https://www.reuters.com/business/environment/musks-xai-opera...

blackeyeblitzar · 2024-09-04T22:51:32 1725490292

Is this really the most powerful “AI cluster” or is this an exaggeration? I would expect that hyperscalers like Google or Amazon would actually have the largest amount of compute power to throw at AI. Or is that no longer true?

EDIT: I see a comment that has a graph of AI compute across different companies. I am not sure if this is trustworthy however, as the graph is itself just generated by an LLM (Claude):

https://x.com/AnthonyEveryWhr/status/1830680977103794177

danpalmer · 2024-09-04T23:10:54 1725491454

I'd say this graph is absolute rubbish. The fact that the numbers are round numbers, decreasing in even steps, with no units, no comparison, incorrect scaling, and incorrect layout for a graph, suggests that this is little more than fanfiction.

The truth is that the hyperscalers are very closely guarding information around things like this. Who has the biggest "cluster" is impossible to answer. Also most hyperscalers will have many clusters, and what's more useful, the SUM or MAX of those clusters? At a small scale probably the MAX, but at a large scale the SUM because you can distribute across clusters without as much loss in overheads and because you have a range of jobs to run.

blitzar · 2024-09-05T08:35:44 1725525344

For these numbers to be correct Nvidia would have to be committing securities fraud on a scale never seen before.

Havoc · 2024-09-04T23:44:33 1725493473

No the chart is wrong.

MertsA · 2024-09-05T23:03:04 1725577384

The data in that graph is just complete misinformation.

https://engineering.fb.com/2024/03/12/data-center-engineerin...

mellosouls · 2024-09-04T23:19:45 1725491985

This sounds like a notable achievement but they really ought to have chosen a different name.

https://en.wikipedia.org/wiki/Colossus_computer

torlok · 2024-09-04T23:33:22 1725492802

Does xAI do anything interesting or are they just trying to catch up?

Vuizur · 2024-09-05T07:02:19 1725519739

Grok-2 is rank 2 on LLM arena, it's basically as good as the best Gemini model. They already caught up. Only the latest ChatGPT model is a tiny bit better.

Havoc · 2024-09-04T23:43:33 1725493413

To be fair catching up is quite a feat in this game

torlok · 2024-09-04T23:51:35 1725493895

That's undeniable, but it would be interesting to see something beyond "you can generate a picture" or "you can ask our model questions, though it may make things up".

nickthegreek · 2024-09-05T03:21:22 1725506482

Generating pictures isn’t even xAI’s model. It uses flux.

ado__dev · 2024-09-04T23:57:23 1725494243

(x) Doubt

jacknews · 2024-09-05T01:49:41 1725500981

Colossus ... didn't I watch a movie about this?

somethoughts · 2024-09-04T22:51:20 1725490280

Is this notable due to the lack of HN comments?