Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Or maybe it-is-only-about™ gathering and labeling large amount of data for their trainings?

Well, GPT-3 isn’t a classifier and it isn’t using labeled data.

As an outsider it definitely appears that GPT-3 is an engineering advancement, as opposed to a scientific breakthrough. The difference is important because we need a non linear breakthrough.

GPT-3 is a bigger GPT-2. As far as we know, there is no more magic. But I think it’s a near certainty that larger models will not get us to AGI alone.



As someone in the deep learning community I disagree with your assessment that GPT-3 is a not scientific breakthrough. The GPT-3 paper won a best paper award at the most prestigious machine learning conferences after all. GPT-3 didn't make any modeling advances, but it introduced a completely new paradigm with few-shot learning.


Perhaps we have to distinguish between GPT-3-the-model and GPT-3 paper. IMHO GPT-3 as a model is straighforward engineering, putting a lot of resources in an oversized GPT-2; and while there's significant novelty in the "Language Models are Few-Shot Learners" paper about how exactly you apply these models, that is orthogonal to GPT-3-the-model, the scientific content of that paper applies to any other powerful language models and isn't intimately tied to specifics of GPT-3.

In essence, I feel that the same people introduced two quite separate things - a completely new paradigm on how to obtain few-shot learning from a language model in a way that competes with supervised learning of the same tasks; and the GPT-3 large model which is used as "supplementary material" to illustrate that new paradigm bit is also usable and used with the old paradigms, and by itself isn't a breakthrough. And IMHO when the public talks about GPT-3, they do really mean GPT-3-the-model and not the particular few-shot learning approach.


Those two are tied together because many of those few-shot capabilities only emerge at scale. If OpenAI had trained a large model and not analyzed it rigorously it would have very little scientific value. But it would have been impossible to have the scientific value without the engineering effort.


Agree, just because it took money and scale to do it doesn’t mean it isn’t a breakthrough.


Bigger models might get us to AGI alone. I say that because of the graphs in this paper: https://arxiv.org/pdf/2005.14165v4.pdf

Quality is increasing with parameters. Even now, interfacing with codex leads to unique and clever solutions to the problems I present it.


Not in the field, so genuine question: what is the evidence/theory to support the notion that deep learning is at all a reasonable route towards AGI? As I understand, this is nothing like how actual neurons work - and since they are the only "hardware" which has ever demonstrated general intelligence hoping for AGI from current computational neural networks feel like a stretch, at best.


Why is AGI the goal instead of continuing to augment human intelligence with better tools?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: