
Why GPT-3 Matters - teruakohatu
https://leogao.dev/2020/05/29/GPT-3-A-Brief-Summary/
======
AeP6cheo
I really hope this tech will be miniaturized at some point.

I hate to be that guy, but when I saw the article the other day that ended
with "plot twist, this article was autogenerated with GPT-3!", I was not
impressed, mainly because it looked like content farm's content to me,
conveying no information. It basically looked like an incredibly costly spam
tool.

But then I thought of possible applications. Give humans a new tool, and they
will amaze you, I'm sure we'll see cool applications of this tech in the
future (and probably horrible ones too). One thing that I could think of where
it would totally rock : games. If games were able to use such tech, their game
world could be filled with casual discussions rather than the developers
needing to fill everything. This means that NPCs in a game could have ever
changing discussions, and even answer to the player about the subject they're
discussing. Their discussion could even change based on what is happening in
the world or what the players are doing, even to things with the smallest
impact, or what is happening directly around them at the moment, without devs
needing to script it all. That would be awesome.

But yeah, this won't happen easily unless the model can be embedded and
shipped on local computers.

~~~
yyyk
Hm. GPT-3 is trained on internet data from the real world. Your NPCs are in
the gameworld. I guess you wouldn't want NPCs to ever reference anything
'real' for fear of breaking immersion (not to mention political backlash if
the model grabs the wrong thing!). However, if you limit your corpus to the
gameworld that's nowhere near comparable to the real dataset, _and_ someone
would have to retrain it.

There needs to be a way of absolutely limiting it, from what I've seen the
prompt can't do that with 100% success. A blacklist would be useful in some
applications (people tolerate their search engine screwing up once in a
while), but not in games. Still, maybe we can accept limited performance by
limiting to the gameworld + some manual texts. It's not NPC dialog is
literature-level anyway...

~~~
labelbias
The main thing about GPT-3 is that they wanted to demonstrate one-shot fine-
tuning and succeeded at it.

So the model can be transformed to output part-of-speech words, dependency
grammar trees or named entities in input even if training data is sparse.
Similarily, you could fine tune it to produce game lore and then see how it
works for that. The model easily switches to different modes of operation and
achieves state-of-the-art or close to state-of-the-art performance.

It's quite funny how NLP folks tried to solve low level tasks (POS tagging,
NER, Named entity relationship extraction, dependency parsing, sentiment
classification etc.) to get to higher level tasks (good summarization, machine
translation, text generation, question & answering) and now a single model
captures all the low level stuff for free and does high level stuff so good
that finetuning it to do low level stuff is unnecessary.

~~~
fredliu
This, the difference between one-shot fine-tuning, vs fine tuning for GPT-2,
is one of the major breakthroughs. Since GPT-3 is so hot in the past few days,
people seem to forgot or not realize lots of the GPT-3 examples shown off
today were possible with GPT-2, with the catch that you had to fine-tune your
own GPT-2 model to fit your problem domain (game plots, poems, music, bots
that chats like certain characters, etc). GPT-3 makes that fine tuning process
unnecessary (although practically you probably can't/can't afford to fine-tune
your GPT-3 model)

------
YeGoblynQueenne
Why does GPT-3 matter? Well, the article starts with a plot that shows what
looks like an exponential growth in the number of parameters, compared with
previous models. So it matters because it's bigger.

Further down there's a plot titled "Aggregate Performance Across Benchmarks"
where we can see that performance is on average about 50%. I don't know what
the baseline for this plot should be (what is the expected average if the
tasks are solved by a random classifier?) but comparing this plot with the
plot at the top of the article, it doesn't really look like there's a huge
improvement in accuracy with a huge increase in the number of parameters. In
fact, it's quite the contrary: there's a small increase and a very smooth,
almost linear curve. So that's an exponential increase in the use of resources
for an almost linear increase in performance? That's not that impressive.

So it appears that the big thing about GPT-3 is that it's big.

It should also be noted that the public interest about GPT-3 is mostly focused
on its ability to generate text, for which there is no good metric. So
basically, GPT-3 is big, but it's not that good in tasks for which there are
formal benchmarks (such as they are, because Natural Language Understanding
benchmarks are often very poorly made and don't really measure what they say
they measure) and we can't really tell how good it is in the one task that
interests most people the most.

~~~
quanticle
The significance of GPT-3 is that the scaling _isn 't slowing down_. With
every increase in the number of parameters the doubters say, "Oh, you'll hit
diminishing returns," or "Oh, the curve will go sigmoid," but it hasn't
happened.

If OpenAI develops GPT-4, with 1T parameters, I wouldn't be surprised to see a
performance gain larger than the jump between GPT-2 and GPT-3.

What GPT-3 shows us is that we're going to have ML systems that can write at
the level of an average human pretty soon now.

~~~
faitswulff
I have a much lower opinion of the average human's writing capabilities. Much
of what we see online has been written by people who either love to write or
are journalists. I think GPT-3 is already at the average human's writing
level.

~~~
simias
I'm not sure what's being discussed here exactly. If we talk about vocabulary,
spelling and grammar I agree with you. On the other hand humans are able to
express opinions and idea, come up with novel things to say, not merely
mimicking an input.

If you give me a huge corpus of chinese texts and a very long time, I might be
able to figure out what character goes with what other, find the various
structures in the text and then be able to generate a somewhat convincing made
up chinese text while still not understanding a word of it.

These GPT-3 demos are impressive because they look like real text with proper
syntax and grammar, but they still express absolutely nothing. It reads like a
long series of rambling that goes nowhere. There's no intent behind it.

It reminds me of these videos of apes imitating humans by using their tools,
banging hammers ineffectively. They are able to copy the appearance of the
behavior, but not the reasoning behind it. They don't get _why_ we bang
hammers or what it achieves.

~~~
faitswulff
Have you read any business books? I used to read quite a few. For the most
part, they take a central thesis and then repeat variations on the theme over
and over again. Sometimes with anecdotes of questionable veracity. I venture
that many of them could be generated with GPT-3.

My point is, GPT-3 is operating at human levels for certain contexts. I think
it would get passing grades on essays in a lot of schools in the US, for
instance, just based on syntax and grammar.

~~~
gfodor
This stuff is so new that HN threads may be the first to mention realistic
potential applications - congratulations, I think you just found one. Having
GPT-3 render a first draft of books in the archetype you mention (one simple
idea stretched out over many pages) seems like a very profitable endeavor.

~~~
abernard1
> Having GPT-3 render a first draft of books in the archetype you mention (one
> simple idea stretched out over many pages).

Given what I've seen so far with GPT-3, that simple idea would have to have
already been discussed at length on forums on the internet and in the corpus.

Usually books have facts and studies that they use as supporting points. Many
of the connections they make between the subject material and their thesis are
unique, and this forms their supporting argument. GPT-3 is rearranging words
and sentences to resemble structures it's seen before, but it does not create
novel facts.

~~~
visarga
So ideally it could work like a meta-study. Meta-studies combine results from
multiple separate studies, making correlations and drawing more confident
conclusions. Most 'original' human ideas are just reinventions of older ideas,
too.

The interesting part is that GPT-3's leap in performance can be attributed to
scaling. That's easier to do than inventing completely new approaches. Scale
data, scale compute, scale money, then you have something you couldn't have
invented directly.

------
ForHackernews
> GPT-3 shows that it’s _possible_ for a model to someday reach human levels
> of generalization in NLP

This is a big, big claim tossed in as a throwaway line. GPT-3 shows that we
haven't yet reached the limit of the "just throw more resources at it" school
of AI development, but it doesn't automatically follow that it'll reach human
levels of NLP if you give it enough resources.

By analogy, this is claiming "New, larger steam locomotives are strictly
faster than older, smaller ones, so this shows with enough coal it's
_possible_ for steam-engines to someday drive interstellar transport at 0.5c"

~~~
jimmySixDOF
>strictly faster

The question is how fast is fast enough.

At what point do you stop being able to distinguish an agenda determined
social engineering bot from a non native speaking teenage 4chan troll ?

The bar is not much higher - much less than the gain of function already seen
from v2. So will it be v4? v5?

For a lot of disruptive applications, just a little bit "faster" is all you
need for the bad actors to act.

~~~
quadrifoliate
> At what point do you stop being able to distinguish an agenda determined
> social engineering bot from a non native speaking teenage 4chan troll ?

I'm...not sure why you would want to? I can't see either being particularly
helpful regarding discourse on the Internet or in real life.

------
simonkafan
> that it’s possible for a model to someday reach human levels of
> generalization in NLP

Fully disagree. There is no evidence that we are now closer to human level
text understanding than before GPT-3. Yes, GPT-3 produces grammatically
correct sentences but it still can't form a coherent idea or meaning and
express it in sentences afterwards - that's what humans would do. GPT-3 is
just better at obfuscating that the model has no clue what it's talking about.

Nevertheless, compared with Eliza or other bots from 1960-2000 we made
remarkable progress.

~~~
jhrmnn
I wonder if 10 years ago one would believe that we’d have a model capable of
generating an article that fools many given a complex prompt, yet is
essentially incapable of any reasoning.

~~~
newen
The Chinese room argument shows lots of people were thinking about it being a
possibility but to have it happen so soon is something else.

------
s1t5
Hasn't GPT-3 been out for a while now? Why are there so many articles about it
on the front page over the past few days?

~~~
lukeplato
They recently gave certain devs early access and a lot of demos are only now
being shared by people with followings

------
barbegal
What can GPT-3 do that is useful? I can understand that it outputs text based
on a prompt and some input but I don't understand how that can be leveraged to
do useful things. I can ask it trivia questions but is it better than doing a
Wikipedia search? Is it possible to give it a prompt of some scientific paper
say and write an article about it? Or will it just generate nonsense?

~~~
polytely
At the moment it feels like the only use-cases are:

\- toys/games: AI Dungeon, fun chatbots, generally playing around with
generating text in a certain style

\- humour: generating jokes/memes

\- deception: trolling/disinfo campaigns/spam/gaming advertisement market by
creating garbage content.

I guess there will be use cases where a human operator can use it to simplify
their job (or lower the skill level required to do a job) by curating and
editing generated content instead of writing it themselves. I'm thinking
things like simple writing jobs where quality isn't that important: social
media posts, newsletters?

~~~
valine
I like the idea of using it as a tool to mitigate writers block. Say you're
writing a school paper and you've done your research but you can't seem to
keep your train of thought flowing. With gpt3 you could have it generate a
sentence prompt based on your previous writing, or maybe you have it write
your conclusion for you that summarizes your thoughts from the paper. There
are lots of possibilities in that arena.

------
brunoluiz
As a programmer but without that much of data science background, could
someone explain what is the whole hype/breakthrough of GPT-3? I know it
generates content that makes sense, "HTML+CSS" and some other text based
stuff.

But in layman terms, what is the huge deal with it?

~~~
vsskanth
The first two versions generated content based on a prompt that was coherent
for a sentence or a paragraph.

GPT-3 generates an entire article that's largely coherent (though still
lacking nuanced meaning).

Each version of GPT has exponentially increasing parameter count. Raw compute
seems to be winning out here. Basically for this application they aren't
seeming to hit diminishing returns for increasing parameters.

Now people are wondering what GPT-4 can do. If you can write human-level
coherent articles or if it passes the turing test or something it's gonna
trigger an arms race for governments to obtain this capability.

~~~
brunoluiz
Thanks for the explanation ;)

> Each version of GPT has exponentially increasing parameter count What are
> parameters in this context?

> Now people are wondering what GPT-4 can do. If you can write human-level
> coherent articles or if it passes the turing test or something [...] Well,
> when I first heard about "text generation" last year what I first raised to
> my friends was "surely this will make fake news even worse". But well, any
> technology come with a bright and dark side. I try to not be pessimistic in
> these moments hehe

~~~
vsskanth
parameters here are just the size of the neural net of the model. Each unit of
computation has its own parameter that needs to be fit. Bigger model = more
layers or units of computations = more parameters.

------
TrackerFF
Picture this grim future:

1\. Companies will use models like these to generate automatic job listings

2\. People will use models like these to generate automatic job applications

3\. Companies from 1) will use automated tools to parse and analyse said
applications

We're gonna end up with abysmal SNR, as the sheer number of applicants will
explode.

~~~
ajzinsbwbs
It doesn’t sound like the worst way to hire model builders.

------
qwerty456127
> GPT-3 shows that it’s possible for a model to someday reach human levels of
> generalization in NLP—and once the impossible becomes possible, it’s only a
> matter of time until it becomes practical.

The fact it can [occasionally] generalize and participate in a verbal
conversation on a human level means it can potentially organize and direct
productive (on a human level at least) action if assigned as a manager to a
team of humans. Doesn't it?

------
jungletime
Use it to post to Reddit and create a feedback loop. Help to hasten the AI
takeover.

What I find fascinating is the carful craft of framing things out of context,
as a tool for politics.

[https://en.wikipedia.org/wiki/AI_takeover](https://en.wikipedia.org/wiki/AI_takeover)

Narrative wins. This will be a great tool to misinform.

~~~
runeb
There is already a pretty good one using GPT-2

[https://www.reddit.com/r/SubSimulatorGPT2/](https://www.reddit.com/r/SubSimulatorGPT2/)

------
zumachase
We're incredibly excited about GPT-3. I think there is a fair bit hype
exhaustion, especially from the likes of OpenAI ("our AI is too dangerous to
release"). So this is completely understandable.

However I think what's missing here is our benchmarks (a la Turing test) are
about negation as opposed to affirmation. We tend to evaluate AI on whether or
not we can discern the fact that it's AI. We seek to negate it as human, as
opposed to affirming it as human (or close to). And this is not the right
mindset when it comes to AGI because the gap between "obviously not human" and
"human-like" is enormous. These are all definitely steps in the right
direction, and the applications for even robotic process automation will be
huge. But we're not even close to having nets that can reason about even the
most basic things.

~~~
abernard1
> However I think what's missing here is our benchmarks (a la Turing test) are
> about negation as opposed to affirmation.

I would question the value of the Turing test, and maybe think that's not a
great example for AI.

There's always been this assumption that passing the Turing test would mean we
had AI, but I think that was always predicated on the machine generating the
outputs. With the GPT- models, it's not clear that this isn't a form of
compression over an immense data set, and we're sending pre-existing _human_
responses back to the user. It implies to me that we can pass the Turing test
with a large enough data set and no (or very little) intelligence.

All of this makes me believe "These are all definitely steps in the right
direction" is questionable.

------
supernova87a
Can someone answer a question for a layman:

Is the number of parameters to be read as the indicator of how "advanced" the
training has gotten, or the accuracy of the output? As in, this
dataset/training has gotten to the point that it understands the 160 billionth
small exception to the general rules of how language should be interpreted, or
constructed, to be considered believable?

Sometimes (as a layman) I look at this and think instead, wow, how slow these
ML algorithms must be that they need 160 billion parameters to predict
correctly.

Is it one of these statements?

~~~
killerstorm
> Is the number of parameters to be read as the indicator of how "advanced"
> the training has gotten, or the accuracy of the output?

Accuracy, of course.

> As in, this dataset/training has gotten to the point that it understands the
> 160 billionth small exception to the general rules of how language should be
> interpreted, or constructed, to be considered believable?

It memorized a lot of facts, but it is also better at figuring out rules than
its predecessor.

> Sometimes (as a layman) I look at this and think instead, wow, how slow
> these ML algorithms must be that they need 160 billion parameters to predict
> correctly.

There are more specialized models which are trained on much smaller datasets.
They usually are given a specific task, such as classification. GPT-3 is
trained on a very large dataset in unsupervised way. And as a result, it is
able to handle a very wide variety of tasks (without re-training). If you tell
it to do math, it will do math. If you tell it to translate between different
languages, it will do translation. If you tell it to write JS code, it will
write JS code. If you ask it to write a Harry Potter parody as if it was
written by Hemingway, it will do that.

So the whole point is that it can do pretty much any imaginable task involving
text given only few examples, with no specific training.

~~~
supernova87a
Thanks!

------
kemonocode
GPT-3 won't matter until all of the work put into it can be replicated by
someone else, and as it stands right now it's just a toy for people with far
too many resources to spare.

------
k__
Can a model like this be realized as a chip?

I mean, 300GB memory is crazy. But back in the days I ran full HD videos on a
netbook, because it hat special optimzied chips.

~~~
justinmchase
Yeah but that chip is probably going to look a lot like a 300GB memory chip.

~~~
k__
Like big external power bank? Doesn't sound too bad.

------
benlivengood
What I would really like to see is an analysis of the weight that individual
chunks of context contribute to the final probability for an output. That
might allow for better prompting; when GPT-3 gets the wrong answer it would be
fairly obvious what was lacking in the prompting given the context it thinks
has the most influence. Also, good prompts would presumably have a higher
weight.

~~~
jadbox
Once discovered, you could have a kind of outline builder where you provide
all the right linking outline context to have it generate appropriating
paragraphs around it.

------
ricksharp
For me it’s performance passes a usability threshold for human + machine
collaboration.

It is possible to use it to generate texts which can be quickly curated or
edited by a user.

Specifically this could be useful in authoring fiction (sci-fi novels, game
dialog, etc.).

Imagine the Star Trek holodeck characters. It’s dialog quality is nearly good
enough to make that level of interaction feasible.

------
nojvek
I can see comment and review farms wanting their hands on GPT really bad.
Imagine being able to generate 1000s if Human like reviews with positive
sentiment. Businesses pay real money for this.

Right now AI is only available to mega tech corps. Even OpenAI is a closed
research lab. So one can infer that AI will always be the divider.

------
nemoniac
Speaking of unconscious bias, this quote from the original article made me
raise my eyebrows: "We wanted to identify how good an average person on the
internet is at detecting language model outputs, so we focused on participants
drawn from the general US population."

~~~
anchpop
Clearly the average USian is not the average person, but given that they need
to speak English to have a shot (unless gpt-3 works for other languages?) it
doesn't seem like a terrible approximation

------
robtigo1
So can you feed a video to it and let 'dream' of all possible outcome? If this
is achievable then the internet will eventually become a loop-back device for
our senses just like the matrix movie.

------
justinmchase
I don't feel like this article answered the question in its headline. I don't
even know what GPT-3 is so maybe even the tiniest bit of background could have
helped.

------
lerchmo
it is also a pretty good case study in the "bitter lesson" and all but ensures
that the future of AI will be driven by the companies with the deepest
pockets.

~~~
devalgo
OpenAI is far from having the deepest pockets.

------
mcemilg
We saw that the GPT-3 doesn't matter that much, right?

~~~
evanrich
For me, it has kind of broken HN’s comment sections. I find myself jumping to
the bottom of longer comments to look for “btw, this comment was written by
gpt3”. To me it seems like we are going to be entering a perpetual April fools
day where we never really know what’s real.

~~~
arkitaip
This has always been a problem with HN due to the robotic nature of its
audience.

~~~
ss2003
Ouch!

------
highfrequency
There’s been a lot of discussion on HN lately about the implications of GPT-3:
are we moving toward general AI or is this just a scaled up party trick?

I have no idea whether scaling up transformers another 100x will lead to
something resembling real intelligence, but it certainly seems possible. In
particular, I find the arguments against this possibility to be fairly silly.
These are the three main arguments I have seen for why GPT type models will
never approach AGI, and the reasons I don’t think they are valid:

 _1\. GPT-3 requires vast amounts of training data (hundreds of billions of
words from the internet), whereas a human can become fluent in natural
language after “training on” much less data._

It’s not analogous to compare the GPT-3 training corpus to the education that
one human receives before becoming fluent in natural language. We benefit from
millions of years of evolution across billions of organisms. A massive amount
of “training” is incorporated in the brain of an infant. This must be the case
because even if you could somehow read all of the text on the internet to your
dog, it would not approach intelligence.

 _2\. There was no intellectual breakthrough in the development of GPT-3 just
more “brute force” training on more data, therefore it or its successors can’t
achieve a breakthrough in intelligence._

We must remember that there was no intellectual breakthrough required for the
development of human intelligence, it was just more of the same evolution. The
core pattern of evolution is extremely simple: take an organism, generate
random variants from it, see which ones do the best, and then create new
variants from the good ones. This is perhaps the most basic scheme you could
think of that might actually work. Evolution has produced amazing results in
spite of its simplicity and inefficiency (random variations!) because it
generalizes well to many environments and scales extremely well to millions of
generations. These are exactly the strengths of gradient descent. In fact,
gradient descent follows the same structure as evolution, except that at each
iteration we don’t generate random variations, but instead make an educated
guess about what a fruitful variation would be based on available gradient
information. This improves learning efficiency tremendously; imagine being
able to say: “this Neanderthal died because he stepped into a fire, let’s add
some fire-avoidance to the next one” instead of waiting for this trait to be
generated randomly. Speaking of brute force and amount of training, it would
take 355 years to train GPT-3 on a _single_ GPU. This strikes me as quite fast
relative to evolutionary time scales.

 _3\. Machines lack capabilities fundamental to the human experience: in
particular feeling pleasure, pain, and an internal drive toward a goal._

Indeed, if you turn a computer off in the middle of a computation, there is no
evidence of suffering. And if the computer successfully writes a blog post of
human quality, it feels no joy in the human sense. My claim is that these
sensations are not core aspects of intelligence. In fact, pleasure and pain
are very primitive developments that even cockroaches can claim. The most
impressively human accomplishments (harnessing vast external energy sources,
breaking out of bare subsistence, landing on the moon, etc.) were made in
_spite_ of the fact that we are messy bags of emotion that unpredictably feel
anger, jealousy, despondence or elation. These emotional responses were
selected for because they were useful as proximate goalposts orienting us
toward reproduction—basically, to overcome forgetfulness in the pursuit of
long-term goals. If in the future we can simply direct a computer to write a
captivating novel without needing to program in lots of visceral intermediate
stimuli to keep it on track, so much the better.

~~~
rst
A stronger contrast between human natural language learning and GPT-3 is that
the human is an active participant, continually trying things out and getting
feedback. GPT-3's training is entirely passive -- and when all humans have of
a language is a corpus of fragments with unknown referents (Minoan Linear A),
we don't do well.

------
anonimunos
it seems that this is the future

