"I see no reason that this technology couldn't smoothly scale into human-level i...

"I see no reason that this technology couldn't smoothly scale into human-level intelligence, yet lots of people seem to think it'll require a step change or is impossible."

I am a big fan of LMs and am not in the don't really understand crowd, but here are a couple of reasons:

1. Large language models such as GPT or Codex still have several major architectural limitations. They lack the ability to make use of long-term memory, since they have a fairly limited amount of info they can take as input; GPT 3 is great at short stories, but can't go beyond that, and it's hard to prime it with a lot of information as you would eg a new employee. There is some work on this, but afaik not very much and it's very much unsolved.

2. Large language models have only gotten this good by ingesting massive amounts of data and scaling up compute. Yet, this growth comes with diminish returns for every order of magnitude. So it just not being to scale either the data or the compute needs sufficiently (with existing hardware architectures) is a very plausible reason.

3. Large language models 'have it easy' because they only deal with one modality (text). Humans intelligence on the other hand is multimodal - we can process vision inputs, sound, touch, etc. sound, etc. simultaneously and share concepts between these modalities. And we likewise output motor commands that result in motion, text. So far it's not too obvious how to achieve this - OpenAI took a step with DALL-E, but that was by just mining a massive amount of image-text pairs, and it's not obvious this is easy for other modalities, in particular for motor control.

4. Human-level intelligence is often framed as having system 1 (reactive output) and system 2 (longer term reasoning not in response to immediate stimuli) - this is not at all present in language models.

5. related to above two, at least some of human intelligence is derived from reinforcement learning (optimizing a policy that is multi-step with a delayed reward). This is much harder than the plain self-supervised learning of LMs.

And probably there are a bunch more like these. So while I do think these sorts of models represent a lot of progress, there are many reasons to be doubtful that just 'scale it up' will work to get much further.