Pretty huge move. Google and their TPUs are looking infinitely more prescient as...

skeledrew · 2026-06-24T16:07:39 1782317259

Training is pretty much a 1x cost, and efficiency there is already on the way down with architectural improvements. Inference though is an ongoing cost which over time takes orders of magnitude more resources, so focusing on making that far more efficient means way greater gains over time.

forrestthewoods · 2026-06-24T15:57:53 1782316673

Inference costs are higher than training now. I think.

Nvidia is king of general purpose training chips. But inferences can be specialized.

lugu · 2026-06-24T23:36:01 1782344161

What makes you think this? With wider adoption the ratio shall shift in favor of inference. And API price is becoming more important than SOTA capability.

forrestthewoods · 2026-06-25T00:01:08 1782345668

> With wider adoption the ratio shall shift in favor of inference

Yes? That’s why more money will be spent on inference than training?

I’m talking absolute cost. As the number of people using AI and burning tokens goes up the amount of spend on inference goes up.

I am fairly confident that Anthropic has way way more GPUs serving Claude Code to users than they have training models. They’ve got a lot of users!!

> API price is becoming more important than SOTA capability.

Also yes? This is why custom silicon for efficient inference makes sense!

I think we’re in total agreement here :)

cactusplant7374 · 2026-06-24T21:47:19 1782337639

Cerebras's Codex Spark 5.3 has been a huge flop. Small context window and old model. But hopefully they can improve so that we can benefit from 1000 tokens/second with GPT 5.5.

zer00eyz · 2026-06-24T16:09:35 1782317375

> early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art

We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.

I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.

granzymes · 2026-06-24T17:42:50 1782322970

To be clear, that is not "Google's memo". It's a memo by a guy who happened to work at Google. There is a diversity of opinions at a company that employs 180,000 people.