While it's true that language models are fundamentally based on statistical patt...

iLoveOncall · 2024-09-25T22:18:37 1727302717

> Generative video models like OpenAI's Sora clearly have a world model as they are able to simulate gravity, collisions between objects, and other concepts necessary to render a coherent scene.

I won't expand on the rest, but this is simply nonsensical.

The fact that Sora generates output that matches its training data doesn't show that it has a concept of gravity, collision between object, or anything else. It has a "world model" the same way a photocopier has a "document model".

svara · 2024-09-25T22:43:10 1727304190

My suspicion is that you're leaving some important parts in your logic unstated. Such as belief in a magical property within humans of "understanding", which you don't define.

The ability of video models to generate novel video consistent with physical reality shows that they have extracted important invariants - physical law - out of the data.

It's probably better not to muddle the discussion with ill defined terms such as "intelligence" or "understanding".

I have my own beef with the AGI is nigh crowd, but this criticism amounts to word play.

phatfish · 2024-09-25T23:17:25 1727306245

It feels like if these image and video generation models were really resolving some fundamental laws from the training data they should at least be able to re-create an image at a different angle.

some1else · 2024-09-26T00:45:07 1727311507

"Allegory of the cave" comes to mind, when trying to describe the understanding that's missing from diffusion models. I think a super-model with such qualifications would require a number of ControlNets in a non-visual domains to be able to encode understanding of the underlying physics. Diffusion models can render permutations of whatever they've seen fairly well without that, though.

svara · 2024-09-26T09:14:12 1727342052

I'm very familiar with the allegory of the cave, but I'm not sure I understand where you're going with the analogy here.

Are you saying that it is not possible to learn about dynamics in a higher dimensional space from a lower dimensional projection? This is clearly not true in general.

E.g., video models learn that even though they're only ever seeing and outputting 2d data, objects have different sides in a fashio that is consistent with our 3d reality.

The distinctions you (and others in this thread) are making is purely one of degree - how much generalization has been achieved, and how well - versus one of category.

PollardsRho · 2024-09-26T00:00:58 1727308858

> its scaling keeps going with no end in sight.

Not only are we within eyesight of the end, we're more or less there. o1 isn't just scaling up parameter count 10x again and making GPT-5, because that's not really an effective approach at this point in the exponential curve of parameter count and model performance.

I agree with the broader point: I'm not sure it isn't consistent with current neuroscience that our brains aren't doing anything more than predicting next inputs in a broadly similar way, and any categorical distinction between AI and human intelligence seems quite challenging.

I disagree that we can draw a line from scaling current transformer models to AGI, however. A model that is great for communicating with people in natural language may not be the best for deep reasoning, abstraction, unified creative visions over long-form generations, motor control, planning, etc. The history of computer science is littered with simple extrapolations from existing technology that completely missed the need for a paradigm shift.

versteegen · 2024-09-26T02:16:54 1727317014

The fact that OpenAI created and released o1 doesn't mean they won't also scale models upwards or don't think it's their best hope. There's been plenty said implying that they are.

I definitely agree that AGI isn't just a matter of scaling transformers, and also as you say that they "may not be the best" for such tasks. (Vanilla transformers are extremely inefficient.) But the really important point is that transformers can do things such as abstract, reason, form world models and theories of minds, etc, to a significant degree (a much greater degree than virtually anyone would have predicted 5-10 years ago), all learnt automatically. It shows these problems are actually tractable for connectionist machine learning, without a paradigm shift as you and many others allege. That is the part I disagree with. But more breakthroughs needed.

ttul · 2024-09-26T14:55:48 1727362548

To whit: OpenAI was until quite recently investigating having TSMC build a dedicated semiconductor fab to produce OpenAI chips [1]:

(Translated from Chinese) > According to industry insiders, OpenAI originally actively negotiated with TSMC to build a dedicated wafer factory. However, after evaluating the development benefits, it shelved the plan to build a dedicated wafer factory. Strategically, OpenAI sought cooperation with American companies such as Broadcom and Marvell for its own ASIC chips. Development, among which OpenAI is expected to become Broadcom's top four customers.

[1] https://money.udn.com/money/story/5612/8200070 (Chinese)

Even if OpenAI doesn't build its own fab -- a wise move, if you ask me -- the investment required to develop an ASIC on the very latest node is eye watering. Most people - even people in tech - just don't have a good understanding of how "out there" semiconductor manufacturing has become. It's basically a dark art at this point.

For instance, TSMC themselves [2] don't even know at this point whether the A16 node chosen by OpenAI will require using the forthcoming High NA lithography machines from ASML. The High NA machines cost nearly twice as much as the already exceptional Extreme Ultraviolet (EUV) machines do. At close to $400M each, this is simply eye watering.

I'm sure some gurus here on HN have a more up to date idea of the picture around A16, but the fundamental news is this: If OpenAI doesn't think scaling will be needed to get to AGI, then why would they be considering spending many billions on the latest semiconductor tech?

Citations: [1] https://www.phonearena.com/news/apple-paid-twice-as-much-for... [2] https://www.asiabusinessoutlook.com/news/tsmc-to-mass-produc...