I remain to be convinced that encoding statistical information about syntactical manipulation alone will somehow magically convert to semantic knowledge and agency if you just try really hard and do it a lot.
Your doubt about more data -> better semantic knowledge about the world is well-placed and I freely admit that at this stage it's mostly conjecture, although GPT-3 does provide some evidence. However, my point is that once this hurdle is overcome, building agency on top isn't that much of a leap.
I think that's overly dismissive of it. GPT-3 for me represents the very beginnings of breaking out of the paradigm of a specialized machine learning system for each task. GPT-3 is alternatively a (surprisingly good and consistent) machine translating system, a (disappointingly mediocre) calculator, a (very uneven with flashes of brilliance) chatbot, etc. All without special engineering to do any of those things. I can't think of any other system I have access to that does that.
Regardless of whether you think GPT-3 represents a track towards true AGI, this is a huge advance! Even OpenAI's API for it is astounding compared to other APIs. I can't think of any other API that amounts to "provide an English language description of your task" and returns an answer. Like I said, the results are still quite uneven, but the fact that it's not extraordinarily outlandish to provide such an API is absolutely mind-boggling to me.
I don't think I could have ever imagined such an API existing even just 5 years ago!
This is way way beyond even the same qualitative thing as ELIZA.
Interesting. I would love to see two people who disagree about this topic have this conversation without using the words "semantic", "syntactic", "symbolic", "know", "meaning", "understand" or "think(ing)". When you boil it down to predicting what the thing can and cannot, or might and might not be able to do. If it can tell you that fire his hot, and hot things burn, burning hurts, and that a person might want to avoid fire because it's hot and they might get burned ... I don't think I care if "anybody is home" or what have you.
I also think it's important to consider my favorite question, "What is the easiest accomplishment you confident AI will not be able to achieve in the next two years?" At least then the goalposts are in place.
GPT-3 is a complete opposite of ELIZA. It's a statistical model that learns from data. ELIZA is just a clever parser with a bunch of hardcoded explicit rules.
No agency, that requires the model manipulating the model, and no one else being able to. And if you want true agency - no kill switch. Though idk why we’d want to create anything close to true agency, seems like a pretty dumb idea imo.
Not agency, agency is a separate system that needs to be developed, but semantic knowledge, absolutely.
The word to explain this is emergence. This indeed not quite intuitive, but neural networks exhibit many phenomena of emergence. It is tied to their ability to perform effective/efficient computation -- after all the "goal" of nature with our brain design (from which abstractions and knowledge emerges) was also effective cognition. For example, when you feed a large convolutional neural (classifier) network diverse objects, and human faces, you can verify experimentally the convolutional filters resemble "concepts", subdivisions used to assemble a larger whole (nose, eyes, mouth, etc. are the components of a face). That's the strategy of dividing and conquering, a basic aspect of efficient cognition/computation. The network has enough neurons and a good prior structure[1], that this effective architecture emerges from gradient descent training. It really is wonderful. You can see it as a primitive/rough, but powerful, form of algorithm search (or algorithm optimization). The best algorithms tend to employ abstractions.
Natural internal representations emerge.
Emotions probably don't fully emerge (in the whole breadth of emotions), although they may exist as internal representations when dealing with human bodies of work. That's because emotions are tied to our motivational system: they compose qualities (qualia) that propel us to do various activities, generally (but not always) tied to straightforward evolutionary beneficial goals: enjoying eating, craving sleep, having sex, engaging the community (mammals rely heavily on group for survival), etc.
Without agency, it's unlikely (but I can't say with certainty) those emotional qualia would emerge with accurate fidelity, simply because a non-agent model wouldn't employ those to function, wouldn't optimize for the same functions. The extent of emergence is limited to understanding and reproducing the human production, not accurately replicating its exact (internal) quality, that's derived from its computational structure and relationship with motivation. It (in this case, GPT-3) only needs to understand those human emotions insofar as predicting human behavior to a reasonable accuracy. As the corpus goes to infinity, with a sufficiently diverse expressive[2] literature, you could conjecture emergence is guaranteed[3] (just how large a corpus would we need though? Who knows). But I find it likely in practice you really need to set up the network with agency (and train it adequately to exhibit effective, motivated, behavior) before it starts reproducing well those qualities, i.e. deriving some of its understanding from practical situations that are too sparse (or absent from) the training corpus.
[1] In fully connected networks, usually a funnel or hourglass shape; in convolutional networks, a decreasing size in the 2D image domain, and increasing size across it, as if images were transforming into concepts; this structure is baked in usually (although you can do hyperparameter optimization, etc).
Finally, agency is essentially impossible to emerge (unless your training code has serious bugs, but I can't find a plausible way) from a (purely) predictive/generative neural network. There is simply no concept of itself, less so of its own goals, nowhere in its structure (only the concept of other persons/characters/things, or even some understanding of agency of other things) or training objective. Worse, it never has the opportunity to exercise this goal-oriented behavior in a comprehensive setting (again this depends on training corpus).
[2] In terms of expressing internal states, and accurately describing our actions.
[3] Then there's the question of whether the representational structure we have is a unique solution -- i.e. whether there are other ways of feeling the same feelings while acting exactly the same.
Obs: I intend to flesh this argument into an article and post here later -- I find it a quite recurrent doubt on neural network behavior.
Setting aside any objections one may raise regarding the term emergence[1], the objective of this post was to discuss how while GPT-x alone cannot become an agent, its internal representations can be harnessed to create an agent. Note that human-like emotions and qualia are not in any way necessary for an artificial agent.
I have to disagree with the term objections; my usage here has a precise meaning: when you train large (and efficient) neural networks, abstract internal representations emerge. Emerge simply means they weren't programmed explicitly, but the training procedure and other properties are so it will arise.
I agree that's a plausible way to build an agent. In my line of questioning, I would wonder if it could really perform in the real world, due to limitation of corpus, and really lacking training as an agent (and not just a text predictor). This could be addressed using simulated (and possibly multi-agent) training environments and the like.
To me GPT-3 excitement is equivalent to when people get hyped about "defeating aging" after seeing some resveratrol trial or something.
Language is only part of it. And you can't get complete understanding without integrating spatial information. Take a look at Josh Tenenbaum's work for explanation of why.
It talks about how you can go from a language model that simply generates text to an agent that is capable of performing actions in the real world
Essentially, the missing pieces in the picture come down to input and output modules. "How do you formulate any given problem into a form that a language model can answer?".
I don't doubt the input and output modules will need some work, but in the grand scheme of things probably not that much. The big missing piece imo is just models with better world models—and it doesn't look like the scaling wars will stop anytime soon, nor does it look like size will stop helping, at least not anytime soon. (https://www.gwern.net/newsletter/2020/05#baking-the-cake)
tl;dr Language models like GPT-3 include incorporate some model of the world. That's why they can generate plausible-sounding text. Future language models will be larger, more powerful, and have more complete models of the world. So we will be able to ask the language model questions, like 'what will happen if we do X?'. By compiling the answers to many such questions, we can figure out the best[0] thing to do.
[0] Assuming you have some utility function you can maximize.