It's easy to be snarky at ill-informed and hyperbolic takes, but it's also pretty clear that large multi-modal models trained with the data we already have, are going to eventually give us AGI.
IMO this will require not just much more expansive multi-modal training, but also novel architecture, specifically, recurrent approaches; plus a well-known set of capabilities most systems don't currently have, e.g. the integration of short-term memory (context window if you like) into long-term "memory", either episodic or otherwise.
But these are as we say mere matters of engineering.
Oh, that was my intent, to support the grandparent's claim of "it's also pretty clear" - as in this is what people believe.
If I had evidence that it "is true" that AGI will be here in 5 years, I probably would be doing something else with my time than participating in these threads ;)
IMO this will require not just much more expansive multi-modal training, but also novel architecture, specifically, recurrent approaches; plus a well-known set of capabilities most systems don't currently have, e.g. the integration of short-term memory (context window if you like) into long-term "memory", either episodic or otherwise.
But these are as we say mere matters of engineering.