As an AI researcher myself I can definitely see why people used to have these sentiments, but I also have to point out how the big papers of the last ~12 months or so have changed the landscape. Especially multi-modal models like Gato. To my surprise, a lot of people (even inside the community) still tend to put human intelligence on some sort of pedestal, but I believe once a system (or host of systems) achieves a sufficient level of multi-modality, these same people will no longer be able to tell human and machine intelligence apart - and in the end that is the only thing that counts, since we lack a rigorous definition of "understanding" or consciousness as a whole. Transformers are certainly going to become even more general purpose in the future and I'm sure we haven't even seen close to what's possible with RL. Regarding the AGI timescale, I'm a bit more bullish. Not Elon-type "possible end-all singularity in 5 years" but more Carmack-like "learning disabled toddler running on 10k+ GPUs by the end of the decade." Whether RL+transformers will be able to go all the way there is impossible to tell right now, but if I was starting an AGI company today that's definitely the path I would pursue.
I'm not sure Gato actually demonstrates intelligence. It definitely does demonstrate the ability to perform a wide variety of tasks, but I don't think that is actually intelligence. Gato has yet to demonstrate that it understands things. I do think it is easy to fall prone to the gullibility gap (and also take too strong of an opposing view, which I may be victim to).
How are you defining "understands things"? Or more precisely, what are the observable consequences of understanding things that you are waiting to see?
The common current example? A text prompt of "A horse riding an astronaut" without prompt engineering. Though I don't think the successful production of this image will demonstrate intelligence/causal understanding either (but it is a good counter example).
Causal understanding is going to be a bit difficult to prove tbh.
> The common current example? A text prompt of "A horse riding an astronaut" without prompt engineering. Though I don't think the successful production of this image will demonstrate intelligence/causal understanding either (but it is a good counter example).
I'm not sure why you think this falsifies intelligence. There are plenty of puzzles and illusions that trick humans. The mere presence of conceptual error is no disproof of intelligence, any more than the fact that most humans get the Monty Hall problem wrong is.
Your argument is that there are adversarial cases? Sure... But that's not what I'm even arguing here. There's more nuance to the problem here that you lack an understanding of. I do suggest diving deep into the research to understand this rather than arrogantly make comments like this. If you have questions, that's a different thing. But this is inappropriate and demonstrates a lack of intimate understanding of the field.
I didn't make an argument that there are adversarial cases, you did. You brought up an adversarial example, and said the existence of that example proves these algorithms are not generally intelligent. If that follows, it follows that the existence of adversarial examples for humans proves the same thing about us.
And in general, if you're going to be condescending, you should actually make the counter argument. You might make fewer reasoning errors that way.
Counter-argument: DALL-E is smart enough to understand that an astronaut riding a horse makes more sense than a horse riding an astronaut, and therefore assumes that you meant "a horse-riding astronaut" unless you go out of your way to specify that you definitely do, in fact, want to see a horse riding an astronaut.
Because intelligence is more than frequentism. Being able to predict that a dice lands on a given side with probability 1/6 is not a demonstration of intelligence. It feels a bit naive to even suggest this and I suspect that you aren't a ML researcher (and if I'm right, maybe don't have as much arrogance because you don't have as much domain knowledge).
Gato is definitely cool, but it's worth remembering it was based on supervised learning with data from a bunch of tasks, so data availability is a strong bottleneck with that approach. Still, I'd agree the past year or two have made me more optimistic about HLAI (if not scary AGI).
Yeah, I'm definitely not saying this is already the ultimate solution. But if you look at their scaling experiments and extrapolate the results, it roughly shows how a model with number of parameters on the same order of magnitude as the human brain could more or less achieve overall human level performance. Of course that doesn't mean the architecture has to extrapolate, but it's definitely a path worth walking unless someone proves it's a dead end. In that case I'd replace "end of decade" with 10 to 20 years in my above statement.
Except it is pretty naive to extrapolate like that. A high order polynomial looks flat if you zoom in enough. You can't extrapolate far beyond what you have empirical data for without potentially running into huge problems.
How do you know what the function looks like? Unless someone shows that this scaling behaviour breaks down, it is definitely worth pursuing, since the possible benefits are obvious. Everyone who just says it will break down eventually is almost certainly right, but this is also a completely trivial and worthless prediction by itself. The important distinction is that noone knows where the breakdown happens. And from what we can tell today, it's not even close.
I think your viewpoint is the more naive one and I think you have the burden of proof to argue why local trends are representative of global trends. This is because the non-local argument accepts a wider range of possibilities and the local argument is the the more specific one. Think of it this way, I'm saying that every function looks linear given the right perspective and you're arguing that a particular function _is_ linear. I'm arguing that we don't know that. The burden of proof would be on you to prove that this function were linear beyond what we have data for. (Path towards HLI doesn't have to be linear, exponential, nor logarithmic but you do have to make a pretty strong argument to convince others that the local trend is generalizable).
You are arguing completely beside the original point. Gato was just meant as one surprising example out of many to show how much more the architectures we already have today are capable of. Whether you believe how the scaling curve for this very particular model will work in detail is totally irrelevant to me or anyone else and I'm not trying to convince you of anything. I was just pointing out that there is a clear direction for future research and if you think people like Carmack are on the wrong path - so be it. You don't have to pursue it as well. But I don't care and he certainly doesn't either.
I understood your argument, in the context of the discussion we've been having, that Gato was a good example of us being on a good path towards building intelligent machines. I do not think Gato demonstrates that.
I think it is pretty disingenuous to read my comments as not wanting to investigate. I definitely do. I did say I'm an ML researcher and I do want to see HLI in my lifetime. But I also think we should invest in other directions as well.
If you review the history of AGI research going back to the 1960s, there have been multiple breakthroughs which convinced some researchers that success was close. Yet those all turned out to be mirages. You're probably being overly optimistic, and the Gato approach will eventually run into a wall.
Gato is just one tiny glimpse out of many, and the combined research effort both from academia and industry and the level of success has never even come close throughout history to the things we are seeing right now.