Probably not. If the price of Nvidia is dropping, it's because investors see a w...

Jlagreen · 2025-01-28T07:36:20 1738049780

Your implication is that we have unlimited compute and therefore know that LLMs are stalled.

Have you considered that compute might be the reason why LLMs are stalled at the moment?

What made LLMs possible in the first place? Right, compute! Transformer Model is 8 years old, technically GPT4 could have been released 5 years ago. What stopped it? Simple, the compute being way too low.

Nvidia has improved compute by 1000x in the past 8 years but what if training GPT5 takes 6-12 months for 1 run based on what OpenAI tries to do?

What we see right now is that pre-training has reached the limits of Hopper and Big Tech is waiting for Blackwell. Blackwell will easily be 10x faster in cluster training (don't look on chip performance only) and since Big Tech intends to build 10x larger GPU clusters then they will have 100x compute systems.

Let's see then how it turns out.

The limit on training is time. If you want to make something new and improve then you should limit training time because nobody will wait 5-6 months for results anymore.

It was fine for OpenAI years ago to take months to years for new frontier models. But today the expectations are higher.

There is a reason why Blackwell is fully sold out for the year. AI research is totally starved for compute.

The best thing for Nvidia is also that while AI research companies compete with each other, they all try to get Nvidia AI HW.

yo-cuddles · 2025-01-28T16:56:30 1738083390

The age of pre-training is basically over, I think everyone acknowledged this and it's not to do with not having a big enough cluster. The bull argument on AI is that inference time scaling will pull us to the next step

Except o3 benchmarks are, seemingly, pretty solid evidence that leaving LLM'S on for the better part of a day and spending a million dollars gets you... Nothing. Passing a basic logic test using brute force methods and which falls apart on a marginally easier test that it just wasn't trained on.

The returns on computer and data seem to be diminishing with more and more exponential increases in inputs returning geometric increases in quality, and we're out of quality training data so that is now much worse even if the scaling wasn't plateauing.

All this, and the scale that got us this far seems to have done nothing to give us real intelligence, there's no planning or real reasoning and this is demonstrated every time it tries to do something out of distribution, or even in distribution but just complicated. Even if we got another crank or two out of this, we're still at the bottom of the mountain here. We haven't started and we're already out of gas

Scale doesn't fix this any more than building a mile tall fence stops the next break in. If it was going to work we would have seen to work already. LLM's don't have much juice left in the squeeze, imo

YZF · 2025-01-28T05:25:48 1738041948

We don't know for example what a larger model can do with the new techniques DeepSeek is using for improving/refining it. It's possible the new models on their [own] failed to show progress but a combination of techniques will enable that barrier to be crossed.

We also don't know what the next discovery/breakthrough will be like. The reward for getting smarter AI is still huge and so the investment will likely remain huge for some time. If anything DeepSeek is showing us that there is still progress to be made.

yo-cuddles · 2025-01-28T16:58:35 1738083515

Pending me getting an understanding of what those advances were, maybe?

But making things smaller is different than making them more powerful, those are different categories of advancement.

If you've noticed, models of varying sizes seem to converge on a narrow window of capabilities even when separated by years of supposed advancement. This should probably raise red flags

jes5199 · 2025-01-28T08:06:47 1738051607

> You can't do the distill/magnify cycle like you do with alphago

are you sure? people are saying that there’s an analogous cycle where you use o1-style reasoning to produce better inputs to the next training round

yo-cuddles · 2025-01-28T17:04:58 1738083898

KIND OF

if you've tried to get o1 to give you outputs in a specific format, it often just tells you to take a hike. It's a stubborn model, which implies a lot

This is speculation, but it seems that the main benefit of reasoning models is that they provide a dimension along which RL can be applied to make them better at math and maybe coding, things with verifiable outputs.

Reasoning models likely don't learn better reasoning from their hidden reasoning tokens, they're 1) trying to find a magic token which when raised to its attention make it more effective (basically give it room to say something that jogs its memory) or 2) it is trying to find a series of steps which do a better job of solving a specific class of problem than a single pass does, making it more flexible in some senses but more stubborn along others

Reasoning data as training data is a poison pill, in all likelihood, and just makes a small window of RL vulnerable problems easier to answer (when we have systems that don't better). It doesn't really plan well, doesn't truly learn reasoning, etc

Maybe seeing the actual output of o3 will change my mind but I'm horrifically bearish on reasoning models

esafak · 2025-01-28T04:39:12 1738039152

So you're saying we're close to AGI? Because the game doesn't stop until we get there.

r00fus · 2025-01-28T04:57:50 1738040270

I don't think LLMs lead to AGI. It's a local maxima.

00N8 · 2025-01-28T15:09:27 1738076967

I think LLMs are getting us closer to AGI in the same way that Madame Tussauds wax museum got us closer to human cloning

gdiamos · 2025-01-28T08:49:51 1738054191

This argument ignores scaling laws

yo-cuddles · 2025-01-28T17:08:12 1738084092

It really doesn't lol. Those laws are like Moore's law, an observation rather than something Fundamental like laws in physics

The scaling has been plateauing, and half that equation is quality training data which is totally out at this point.

Maybe reasoning models will help produce synthetic data but that's still to be seen. So far the only benefit reasoning seems to bring is fossilizing the models and improving outputs along a narrow band of verifiable answers that you can do RL on to get correct

Synthetic data maybe buys you time, but it's one turn of the crank and not much more