Tesla's wafer-sized Dojo processor is in production – 25 chips combined into one

vessenes · 2024-05-02T18:53:26 1714676006

This is interesting. Groq (chip co, not twitter’s ‘Grok’ LLM) has a similar silicon scale, I’m not sure about architecture, though. One very interesting thing about Groq that I failed to appreciate when they were originally raising is that the architecture is deterministic.

Why is determinism good for inference? If you are clever, you can run computations distributed without waiting for sync. I can’t tell from their marketing materials, but it’s also possible they went for the gold ring and built something latch-free on the silicon side.

Groq seems to have been able to use their architecture to deliver some insanely high token/s numbers; groqchat is by far the fastest inference API I’ve seen.

All this to say that I’m curious what a Dojo architecture designed around training could do. Presuming training was a key use case in the arch design. Knowing the long game thinking at Tesla, I imagine it was.

zer00eyz · 2024-05-02T19:29:07 1714678147

It's a great era to be a hardware nerd.

TSMC got so far out in front of everyone that their competitors had to get creative and solve other issues.

Why is this on 7mn? Because I dont think you could do this on 3nm. It is my understand that everything down at that scale is double shot/imaged to get the right sized components, and with that a higher defect rate.

Look at what intel is doing, and holding out for single shot processes. Their pushing of double sided chips (power on one side and data on the other) would be impossible with the 3nm double shot (I cant see flipping the die as being a good way to get reliability in alignment for 4 imagings..)

I suspect that were getting to the end of size (shrink) scaling and were going to get into process and design scaling. Going to be interesting to see what happens to cost and capacity if we're at that point. Process flexibility would be the new king!

wg0 · 2024-05-02T18:47:53 1714675673

Yield would be low. No?

geor9e · 2024-05-02T19:12:49 1714677169

Not necessarily. It's not like an image sensor where you throw it out if theres a single dead pixel. To increase yield of CPUs, they design them to have parallel redundancy and deactivate cores or memory chunks and sell it as a lower tier model. These AI chips are way more parallel than a CPU. They had decades of statistics of wafer flaw distributions before they began the design process, so they would design in just enough redundancy to get the desired yield for the process. I wouldn't be surprised if each processor has hundreds of things disabled (ALUs,memory units, whatever theyre using), and thousands to spare.

cityofdelusion · 2024-05-02T18:51:58 1714675918

Traditionally, that has been the issue with wafer scale processors. Cerebras is supposedly selling these things to production customers for obscene pricing, probably offsetting the low yields. Who knows if it’s profitable though or still burning cash.

TradingPlaces · 2024-05-02T18:57:32 1714676252

Cerebras has a way of bypassing bad transistors. They over provision the WSE by 1.5%, and have 100% yields that way. https://www.anandtech.com/show/16626/cerebras-unveils-wafer-...

MBCook · 2024-05-03T03:34:50 1714707290

So it sounded like the wafer was being used as an interconnect kind of thing that might be much easier to fab.

The actual processors sound like they’re being made on other wafers in a standard way and then cut out and attached to the interconnect wafer.

So it’s not like the entire part was built only on one wafer and a single mistake would cause the whole thing to have to be thrown out.

At least that’s how I understood the article + educated guessing.

vsskanth · 2024-05-02T18:33:09 1714674789

So is this used exclusively for training FSD ?

bradgranath · 2024-05-02T18:39:08 1714675148

The point of Tesla is apparently to promise a firehose of impossible deliverables in order to pressurize a separate hose connected to Elon's wallet.

MBCook · 2024-05-02T18:42:44 1714675364

It needs 18,000 amps per wafer and puts off 15 KW of heat?

This feels a little like a “you were so preoccupied with whether or not you could” thing.

Wow.

I can’t imagine how much one of those wafers must cost. I’d love to know.

Hope it accomplishes what they want because they’ve certainly had to spend a fortune to get to this point.

StressedDev · 2024-05-02T18:48:11 1714675691

We don’t know if this wafer is a good investment or not. It really depends on what its performance and cost are compared to the performance and cost of other solutions.

My guess is if they did this, they thought it would be a huge improvement over buying off the self hardware. The reason I say this is it’s expensive to design a wafer computer, and get it manufactured.

coffeebeqn · 2024-05-02T18:43:49 1714675429

18000 amps? What kind of a motherboard and power delivery is that

ein0p · 2024-05-02T18:48:43 1714675723

The chip operates without a traditional motherboard, and it’s basically permanently and directly attached to power delivery and a massive heatsink. It’s a very ambitious design, I never thought we’d see it so soon. Technical overview for this reads like porn for HW engineers

cityofdelusion · 2024-05-02T18:57:22 1714676242

This. Gene Amdahl himself (of Amdahl’s law fame) gave up on wafer scale, proclaiming it to be 100 years too soon. Now it’s looking like it was only 40-50!

ants_a · 2024-05-02T18:49:28 1714675768

A comparable product from Cerebras is rumored to cost around $2-3M.

dkjaudyeqooe · 2024-05-02T18:46:46 1714675606

Is this marketing nonsense? Or more Musk fantasist nonsense?

Because it doesn't seem to make much sense.