
OpenAI releases larger GPT-2 model - p1esk
https://openai.com/blog/better-language-models/#update
======
minimaxir
Notably, the 345M model (1.5 GB on disk) is big enough that it's pushing the
limits of conventional GPUs, and an alternative method of finetuning the model
([https://github.com/nshepperd/gpt-2/commit/47df6da611716b4826...](https://github.com/nshepperd/gpt-2/commit/47df6da611716b4826e3397cd68d711c6951c8e5))
has to be used to prevent the GPU from going OOM.

I'm working on tools to streamline GPT-2 text generation: I'm currently
porting the code above to gpt-2-simple
([https://github.com/minimaxir/gpt-2-simple](https://github.com/minimaxir/gpt-2-simple))
to allow easy finetuning/generation, and am also working on a way to quickly
build an API/client for easily deploying GPT-2 to production and generating
text at scale and cost effectively. Even the 117M model, managing CPU and RAM
performance is tricky.

But given the incredible results of just the 117M model (e.g. Hacker News
titles from a retrained 117M model: [https://github.com/minimaxir/hacker-news-
gpt-2](https://github.com/minimaxir/hacker-news-gpt-2)), I'm eager to put the
345M model through its paces.

~~~
jchw
>(e.g. Hacker News titles from a retrained 117M model:
[https://github.com/minimaxir/hacker-news-
gpt-2](https://github.com/minimaxir/hacker-news-gpt-2))

Wow, thats great. “The Bullshit Bubble” “Fuck you, Bootstrap” “We should give
up on America” - they’re practically comedy, yet very believable too.

~~~
zrm
Has anybody tried feeding it comedy to begin with to see what it spits back
out?

~~~
minimaxir
There was an attempt on Reddit but it didn't turn out well, likely because
there wasn't enough input data:
[https://www.reddit.com/r/MachineLearning/comments/bgvzdu/d_j...](https://www.reddit.com/r/MachineLearning/comments/bgvzdu/d_jokes_generated_via_gpt2/)

(I have an idea for a more proper approach)

------
ehsankia
Worth noting that the new 345M model is still far from the full 1.5B model
they were avoiding to post. The headline makes it seem like they finally
decided to give the full model, but it's just a slightly larger demo model.

~~~
krick
It strikes me as weird that they are not publishing it, by the way. According
to their rhetoric when they started, it was the whole purpose of OpenAI:
acknwoledging that we are at point when anybody with enough resources can
produce something, let's say, _interesting_ with ML, and striving to give
everyone more or less equal possibilities by serving as a more effective
academic organization for the world, before Facebook or Google takes over the
world completely.

Plus, it's not as if this thing is more "potentially harmful" than, well...
basically anything of use, like electricity, internet, fire, less perfect
language models. In fact, it isn't even anything new, it's just (possibly)
less broken language model than what we already have.

Admittedly, it would be quite problematic to use the full model with today's
mainstream GPUs, so I'm not that much saddened by them hoarding on it. It just
seems curious to me.

------
imranq
While I appreciate that this is a large advance, I’m worried that these
releases will make the internet completely worthless after some time. If AI
can come up with fake news, fake text, fake videos and pretty much anything
the user wants it to, then we will be flooded with biased content that’s
untrustable. There’s probably some critical percentage of AI generated content
on the web that guarantees this happening. (I’m guessing it’s around 40%)

~~~
TaylorAlexander
Their release strategy is to provide lower quality models to the public while
giving research partners access to the full models. The goal of this approach
is to let researchers devise methods of detecting and counteracting this new
technology. It’s kind of like “this technology is going to exist so we need to
prepare responsibly.”

------
Felz
Oh wow, I was literally just making a toy Discord bot for GPT-2. Guess I'll
update it with the bigger model.

EDIT: Done! It takes about 3x longer than it did before to generate a
response, so if you try it be very patient. Also, I made this in three hours
so I wouldn't be surprised if it goes up in flames at some point.

[https://github.com/ScottPeterJohnson/gpt2-discord](https://github.com/ScottPeterJohnson/gpt2-discord)

~~~
p1esk
How long does it take, and which hardware do you use?

~~~
Felz
About 30 seconds for a smaller response on an EC2 instance. Mind you this is
without a GPU because I couldn't figure out how to set one up.

------
steve_g
Maybe a dumb question - how does the model which is trained to predict the
next words answer questions, as shown in the reading comprehension example? Do
you just feed it the question and watch it generate the answer, or is
something else going on?

~~~
dhairya
you add a linear classifier at the top to predict start and end positions of
the answer span. The augmented model is trained on a qa dataset like squad to
actually learn how to answer questions.

hugging face has a simple implementation that augments bert in this manner and
you can see the code there. their bertqa model get like an 84 F1 on squad 1.1
which really strong performance. you can augment the thier gpt2 implementation
similarly.

~~~
p1esk
I think they used gpt-2 for qa without any finetuning.

------
zcw100
People should be equally astonished by what this model can create as they
should be by what vacuous, insipid garbage passes for writing these days. Some
of the worst offenders being corporate CEOs spewing bullshit while people sift
through the verbal excrement trying to figure out what they had for breakfast
at 4:00 hoping it will make them rich too.

------
burtonator
If you guys want some background check out this blog post:

[https://openai.com/blog/better-language-
models/](https://openai.com/blog/better-language-models/)

It's fascinating to think an AI wrote that story...

~~~
mkl
That's actually the same page as the main link (which is scrolled down to the
update).

------
kyledrake
> Due to our concerns about malicious applications of the technology, we are
> not releasing the trained model.

Has anyone said when they are intending to release the full model? There's
likely a lot of positive applications of this technology as well.

~~~
p1esk
They said within 6 months, however by that time there will likely be something
better.

~~~
speedplane
> They said within 6 months, however by that time there will likely be
> something better.

There has. BERT has come out and is better.

~~~
gwern
BERT is bidirectional. How do you use that for language generation?

~~~
visarga
It will generate words for every [UNK] in its input sequence.

~~~
gwern
It'll generate one token because it's trained to predict one missing UNK, as I
understood it. What is the scaffolding? Do you generate random sentences and
iterate repeatedly? And how does that get you whole coherent paragraphs? (Has
anyone demonstrated that this actually works with BERT?)

~~~
speedplane
BERT can pretty easily be used to generate text. It's intended to be used as a
base model and fine-tuned with an additional model on top. The fine-tuning
model could then be trained to generate sentences with the underlying language
model powered by BERT.

------
snrji
Tangentially, is there any estimation of the amount of words/sentences that a
kid may have heard by the time he or she learns to talk?

------
rand_r
Maybe I’m too naive here, but I’m not seeing the potential malicious usage of
this model. People will generate text, and then what?

~~~
nestorD
Especially what kind of usage that could not already be achieved by asking a
human to write a text.

~~~
cthalupa
It makes it cheaper, faster, and effectively infinitely scalable. You'll run
out of man hours and people to generate this stuff by hand than you can
launching a ton of resources in the cloud.

With the main concerns being troll army/fake news type stuff, I don't think
this makes a difference. We seem pretty sure there are state level actors
behind a lot of that stuff, and I think it would be silly to believe they
can't recreate something at the level of GPT-2, especially with the underlying
principles out there and understood, competitors like BERT available, etc.

I think their heart is in the right place, but also incredibly naive.

------
notsgnik
in other words: "we can't release our finding before retraining it on filtered
Reddit data-set, cause we haven't released our first finding for a reason...
stay tune for more trained model with filtered content" ( what's the fuzz
about the first fully trained model?" )

------
septor
Here’s a summary: we are aware of the fact that this model will harm society
and we are releasing it anyway. We are fiddling around with the way it’s
released in an attempt to absolve ourselves of blame while simultaneously
collecting the profit in the form of a juicy acquisition.

The net result of these advanced forms of signal processing will be negative.
Nobody has come forward to prove that they will benefit society on the whole
or even that they are safe. But anyone who raises concern is shouted down and
called names like “alarmist” and “Luddite.”

These companies are playing with fire, and the whole world stands to be
burned. Wake the fuck up.

~~~
feanaro
_Someone_ is going to invent this model sooner or later, simply because it is
possible. There is not much sense in trying to stop it. We just have to adapt.

~~~
septor
That’s not correct. What you are saying is that there is no plausible
organized effort that could stop or slow the creation of signal processing
models that will have pronounced negative impacts. The error is on two levels:
you are using too much analogy with other technologies. And you are writing
off the possibility of stopping ai when it’s still not clear that it can’t be
stopped.

This isn’t something that can be built and tested in isolation like other
things we are familiar with. Training these models is not an exact science.
Nothing about ai is an exact science. Progress only comes with trial and
error. And each trial requires huge compute resources; at least for the most
capable and dangerous models. It can’t be done in your basement. Not without
significant effort and drawing attention to yourself. Could we sense whenever
someone was trying to do it? Could we form a global coalition to stop every
attempt? That brings us to the next thing.

What you are doing is the following: we are both in a car that is about to
roll off a cliff. I propose that we try pressing the brakes. You respond by
saying that, geez it looks like we probably wouldn’t stop in time — we are
going awfully fast and it probably wouldn’t work to press the brakes so why
even try? Let’s just brace our heads and hope the impact doesn’t kill us.

Obviously the better thing to do is to try and press the brakes. Even if you
aren’t sure if you can stop in time.

~~~
jcims
I think if you try to be less cynical about what OpenAI is doing you might
even find them an ally to your perspective. Facebook throwing pocket change at
an AI ethics org and Google's rather embarrassing failure at staffing an
ethics panel of its own is evidence that we've got bicycle brakes on a freight
train.

OpenAI suffered a ton of blowback for not just releasing the full model from
the start. You can read their intitial blog post [1] looking particularly at
the sections for Policy Implications and Release Strategy. I would also highly
recommend you listen to Lex Fridman's podcast with Greg Brockman [2] to hear
their rationale about the recent org changes at Open AI.

Obviously you can posit that everything they say is bullshit and they are only
after almighty dollars. I can't prove it's not true and personally believe
there is at least a kernel of truth to it, but we live in a messy world and
finding imperfect allies is generally better than having none at all.

[1] - [https://openai.com/blog/better-language-
models/](https://openai.com/blog/better-language-models/)

[2] -
[https://www.youtube.com/watch?v=bIrEM2FbOLU](https://www.youtube.com/watch?v=bIrEM2FbOLU)

