
OpenAI Releases Largest GPT-2 Text Generation Model - epoch_100
https://openai.com/blog/gpt-2-1-5b-release/
======
vicaya
Just like various chat bot incarnations, it's reasonably OK at generating
generic sentences without real understanding. I tried several attempts with
this leading fragment: "Is borrow checking unique to Rust? Probably not. But".
The results are not pretty (the best one is as follows). Not impressed at all
with this giant model :|

"Is borrow checking unique to Rust? Probably not. But the Rust library is
written by and for Rust programmers, so if you're looking for a library that's
designed for the language you already know, this might be your first choice.

The library is still in a fairly early stage. It does not support the standard
libraries.

If you're interested in reading about how the library works, the docs are up
on GitHub for reference!

How can I get this for my project?

Get it here:
[https://github.com/paulp/rustbinder](https://github.com/paulp/rustbinder)

I've just read your blog post and want to use it, what can I do?

If you're looking for a project to use this library for, check out the docs
for the "RustBinder project".

What are the current limitations and how will future changes affect this
library?

This is still a work in progress. The library currently does not support the
standard Rust library. There are a few work in progress"

~~~
csomar
Am I the only one impressed by the text generated? Sure, it doesn't have any
understanding but are you factoring that 1. most people on the world do not
know that Rust is a programming language and 2. a single person can not have
that much general knowledge. Sure they can know about Rust borrow checker but
will not be able to expand that much on another subject.

~~~
thesz
This looks like it can copy large chunks of text. That's it.

It is hard task in itself ("copy" is a standard test for memory-enhanced
neural systems like RNN and such) but here it needs to "understand" things.
For example, what makes borrow checker in Rust unique which would be valid
continuation.

~~~
nmca
Have you tried googling sentences generated?

~~~
thesz
Have you?

[https://www.google.ru/search?newwindow=1&ei=beXCXciEHqyyggem...](https://www.google.ru/search?newwindow=1&ei=beXCXciEHqyyggemoo6ACg&q=%22The+library+is+still+in+a+fairly+early+stage.%22&oq=%22The+library+is+still+in+a+fairly+early+stage.%22&gs_l=psy-
ab.3...5284.5743..6204...0.0..0.293.554.2-2......0....1..gws-
wiz.OY6LIS32OuM&ved=0ahUKEwiI76eK8tXlAhUsmeAKHSaRA6AQ4dUDCAo&uact=5)

------
rm_-rf_slash
At a credibility score of 6.91/10, many people will rightly judge that the
full GPT-2 model will remain insufficient for malicious use in creating fake
news.

However, even the smaller models are already good enough for
spamming/trolling/astroturfing. It doesn’t take a Shakespearean soliloquy to
convince people of a point. Just enough of a flood of short 1-3 sentence
pro/con comments on a forum can drastically affect the perceived public
opinion of an issue. Those comments can then spur real people to reply, which
could result in an ultimately organic but directed propaganda vector.
Propaganda directors will carefully craft something for people to look at, and
the GPT-2 bots will move people’s eyes in that direction.

You can see the same happen on r/subsimulatorgpt2, where the longer titles and
prompts and replies eventually sprawl into incoherence, but the shorter
sentences from the finetuned bots in the comments section are effectively
indistinguishable from the kinds of short comments you would find on their
respective subreddits.

Or in other words, the malicious uses for GPT-2 won’t be a tidal wave, but a
flash flood.

~~~
antpls
> Just enough of a flood of short 1-3 sentence pro/con comments on a forum can
> drastically affect the perceived public opinion of an issue.

Even more than public opinion, it can affect the result of sentiment analysis
algorithms about a topic. Those algorithms run on all the comments or tweets
and output and overall sentiment score, which is then used as "insight" to
make actual decisions for human deciders (journalists, analysts, marketers)
and/or is used as input for other algorithms.

~~~
rm_-rf_slash
Holy shit I hadn’t even considered that. Thank you, that gives me a lot to
think over.

------
krick
Wow, some samples are frighteningly good. I was impressed by previous models
and I don't know if I'm just lucky this time, but... wow. Can anybody who is
not into climbing even tell this is all fake?

 _Jain Kim is an experienced climber._

In 2006, she became the first woman from Korea to climb all five 8,000 meters
(24,064 ft) peaks in the Swiss alpine ski run Alps in 24 hours. In 2009, she
made history again by setting the record for the fastest time to climb an
8,000 meter peak with a team from China and South Korea.

She made the first ascent of 8,832-meter K2 in China, the second highest
mountain in the world, in 2009 and the third highest mountain in Europe. She
also is the first female Korean to summit a world-class peak.

During her two years as a mountaineering professor at Sogang University in
Korea, she established two new routes in the Yalu River area. The first of
these routes is a 3,547-meter peak named K2 on Mount Long in China. Her second
route is on the same mountain, called the Lomonosov Ridge, at 3,632 meters.

~~~
pure-awesome
> Can anybody who is not into climbing even tell this is all fake?

Yes, quite clearly from the following:

> She made the first ascent of 8,832-meter K2 in China, the second highest
> mountain in the world, in 2009 and the third highest mountain in Europe.

Firstly, this sentence scans poorly. I'm guessing it should be:

> In 2009, she made the first ascent of 8,832-meter K2 in China, the second
> highest mountain in the world, and the third highest mountain in Europe.

Second, how can a mountain in China be the third highest mountain in Europe?
How can the second highest mountain the world be the third highest in Europe?

If I came across this in the wild, then even if I didn't think it was fake,
I'd definitely think it was poorly proofread.

~~~
nojvek
Oh well, contrary to all the AI hype, computers don’t really get what
mountains and heights are.

We are quite a bit far away from really understanding language and making
inference on that.

------
clmnt
We (Hugging Face) added it to Write With Transformers if you want to try the
text generation capabilities of the model:
[https://transformer.huggingface.co/doc/gpt2-xl](https://transformer.huggingface.co/doc/gpt2-xl)

~~~
toxik
Enjoyable! It’s really going to change the spam game that’s for sure.
Hopefully we can also use these models for estimating how realistic a sentence
is.

~~~
aglionby
Some work has been done in this direction!
[https://arxiv.org/abs/1905.12616](https://arxiv.org/abs/1905.12616)

------
epoch_100
Paper:
[https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf)

Code: [https://github.com/openai/gpt-2](https://github.com/openai/gpt-2)

~~~
sillysaurusx
If anyone wants to fine-tune the 1.5B model, I ported the gpt-2 code to TPUs.
You can fine-tune it in Colab. Snapshots are 5.8GB.

notebook:
[https://twitter.com/theshawwn/status/1191800180192010246](https://twitter.com/theshawwn/status/1191800180192010246)

code: [https://github.com/shawwn/gpt-2](https://github.com/shawwn/gpt-2)

It's a fork of nshepperd's gpt-2 codebase
([https://github.com/nshepperd/gpt-2](https://github.com/nshepperd/gpt-2))
which lets you fine-tune 117M and 345M on GPUs.

For a tutorial on how to fine-tune GPT-2, see
[http://gwern.net/GPT-2](http://gwern.net/GPT-2)

~~~
zitterbewegung
Cool this is awesome !

I’m going to try to retrain this with a twitter dataset called sentiment140 (
I have already processed it with gpt2 345M).

~~~
MasterScrat
Is your fine-tuned model available somewhere?

~~~
zitterbewegung
I can provide it to you. I have only done 355M. I was trying this for 1.5B but
ran into memory issues .

~~~
MasterScrat
I would be very interested! My email is on my profile.

~~~
zitterbewegung
Email sent

------
rfhjt
Prompt: "Real things don't exist unconditionally and things that exist
unconditionally are not real. However the reality has an essense. It is"

Response: "an actual thing, and it is not the thing to which we attach
meaning. It is not real because it is not a thing. And therefore, it does not
possess the qualities that are inherent in all real things."

Just wow. Sure, there are a few logical mistakes here, but this response
serves as a good prompt for my bio-GPT. In other words, we usually need some
starting points or hints for analysis and discovering these hints is non
trivial because whatever we can think of is not very new to us. This GPT just
gave me an answer that smells like a serious wisdom and I'll surely dig in
that direction to see if this idea has any substance.

Edit: what's happening here is while I can't ask this model to give me a short
and conscise summary on a topic, I can still interrogate this model and find
out what it's seen in the training set. I can't possibly read all these books
in the training set, but now I can rapidly navigate in the multidimensional
meaning space: I tell it where to start and it says what it seems in close
proximity to my prompt. This is a breakthru.

~~~
DiogenesKynikos
Maybe you've just proven that philosophical tracts can be written without any
understanding. If the wording is sophisticated enough, everyone will assume
there's some deep, hidden meaning in it. All form and no content.

------
hint23
You can try it at: [http://textsynth.org](http://textsynth.org)

~~~
tptacek
The right place for a chill after work beer. Free peanuts, _but don’t order
“the other” in the name of science. Free peanuts, but don’t order “the other”
in the name of science. The last thing you’ll see after work is a sign that
says, “Sorry we are closed on Tuesday because we have peanuts.” The last thing
you’ll see after work is a sign that says, “Sorry we are closed on Tuesday
because we have peanuts.“_

~~~
endergen
This is some Escher level shit. These words, just like his paintings, make
sense totally within your window of vision as you move your eyes through, but
make no sense on the whole.

~~~
Mirioron
I would not be surprised at all to find a paragraph like that in some fiction,
especially in poem form.

------
buboard
> (CTEC) found that extremist groups can use GPT-2 for misuse, specifically by
> fine-tuning GPT-2 models on four ideological positions: white supremacy,
> Marxism, jihadist Islamism, and anarchism. CTEC demonstrated that it’s
> possible to create models that can generate synthetic propaganda for these
> ideologies

I wonder how they tested that

~~~
TaylorAlexander
wow I’d love to read the Marxist and anarchist texts it has produced. I wonder
if they used good source material.

I would bet it gets the talking points but can’t convey the subtlety.

~~~
FillardMillmore
Imagine the hilarity of a robot preaching anarchism, it would make for quite a
laugh: "AI good, government bad. Government taxes you and makes you pay. AI
thinks for you and does what you say. Imagine the day, AI comes to stay, we'll
build the roads for you to play, and make the evil government go away, let's
rise together, perhaps today?"

~~~
htmk
I'd battle against the government with this AI.

------
chaz6
Surely we are not far off models capable of submission-quality essays that
will enable a new generation of cheating.

~~~
JRKrause
From my observation, even the largest GPT-2 model has difficulty retaining any
long-range relationship information. In the "unicorn" writing example that was
published originally, the model 'forgets' where the researchers are (climbing
a mountain versus being beside a lake iirc) after just a few sentences.
Because of this, it's hard to imagine models of this type being able to write
long-form coherent papers. Now if we could somehow constrain the generated
text to conform to a predefined graph structure that isn't forgotten so
quickly...

~~~
jsinai
Maybe the problem is that most of these models seem to rely on sequential
information (even the transformer needs this for forward generation of text)
to encode long range information.

But I can’t remember the last time I relied on sequentially remembering the
ordering of tokens in order to complete an essay or hell even reply to an
email.

Structurally we retain some kind of hierarchical information (topic, places,
names, events) about text.

Is there any active research looking into text generation models which do
this? Maybe some kind of query that is made in a learned vector space and
which is not temporarily dependent but rather “spatially” - as in these are
the facts about the text being generated so far.

~~~
flancian
I'm interested on this as well.

I have been trying to fine-tune GPT-2 on genre fiction to work as a sort of
"fiction replicator". Stylistically it actually seems to do quite reasonably,
but it lacks narrative cohesion. This problem, as you point out, is corpus
agnostic.

I thought of trying to keep track of characters and key interactions outside
of the model, but I haven't figured out how to make these two models interact
reliably -- outside of just having the first component generate prompts for
the second model in a kind of cooperative setting.

Is there a known way to set up transformer to do infix generation? That is:
give it a start _and_ end prompt, and an estimated number of tokens to fill in
between. That seems like it should be doable and could improve things, but I
haven't found any work on this problem yet and haven't had the time (and
potentially don't have the skills) to look deeply myself yet.

------
rfhjt
Prompt: The coming global recession is a real possibility and"

Response: "The coming global recession is a real possibility and the Fed is
playing games, creating artificial market conditions to make a recovery seem
possible in the short-term. The Fed has an option to change its monetary
policies but it will not make the problem go away, so it is in their best
interest to pretend it won't happen."

Change and to however and you'll get another stereotype opinion. It really
just composes pieces of texts it's seen around the prompt, but it does this
really well.

Most of the news agencies can now fire most of their monkey typewriters: this
GPT will outperform them on every metric.

------
k8si
Omfg can we stop making these things bigger PLEASE

Like, who cares??

* What I mean is, text gen models are big enough. We need controllable text generation; like, so it can talk about a specific THING sensibly. Rather than spew statistically plausible nonsense.

------
oaskmutboard
I think this could make a great Tinder feature to suggest chat lines.

~~~
YeGoblynQueenne
Oh, I can imagine a few:

<input> "If I told you your body is hot, would you hold it against me?"
<input>

<output> It was hot and the body of a young woman was lying in a bloody hell.
Hell was hot and was full of beautiful young women. The body was lying in the
entrace of the lobby and there was a small crowd gathering. it was a hot day
in hell <output>.

Could work on the right person though.

------
gerash
Sampling realistic text from large pretrained models is non-trivial. I came
across this paper in one of ACL 2019 workshops:

[https://arxiv.org/pdf/1904.09751.pdf](https://arxiv.org/pdf/1904.09751.pdf)

------
ionwake
Sorry for asking but is there an example output and an example input?

~~~
bungula
You can interact with the full model here:
[https://talktotransformer.com/](https://talktotransformer.com/)

~~~
minimaxir
Huh, he updated that to the full model quickly.

------
490d0aff0ee8
Tangent rant.

I'm skimming over some of the code at
[https://github.com/openai/gpt-2/blob/master/src/model.py](https://github.com/openai/gpt-2/blob/master/src/model.py)
and I can't help but feel frustrated at how unreadable this stuff is.

1\. Why is it acceptable to have single-letter variable names everywhere?

2\. There's little to almost no documentation in the code itself. It's unclear
what the parameters of any given function mean.

3\. There are magic constants everywhere.

4\. Function names are so terse... ("gelu", "attn")

~~~
high_derivative
My professional observation (as ml researcher at big tech):

These companies hire a lot of engineers straight out of undergrad/master's
degrees. The interviews test leetcode knowledge, and today lots of degrees are
heavy on Python-scripted ML homework.

The result is companies with billion dollar funding and world-changing goals
having a lot of their code look like complete spaghetti.

And this is the engineers who are meant to clean up research scientist code.
Scientists generally don't feel like it's their responsibility to write strong
code.

Systems-side teams/orgs have better code, but essentially as soon as you enter
the 'ml engineer/research engineer/research scientist' layer, it's doomed.

~~~
gdb
(I work at OpenAI. Before that, I worked at Stripe. I've spent most of my
software career thinking about how to build effective engineering cultures.)

I think this code is actually well-written and maintainable. This is proven in
practice because we've adopted it many places in OpenAI, and I've personally
found it very easy to adapt to other use-cases (certainly much more so than
the from-scratch Transformer implementations I've written!).

As
[https://news.ycombinator.com/item?id=21456605](https://news.ycombinator.com/item?id=21456605)
points out, the complexity of the code arises from the complexity of the
underlying algorithm. Complexity due to software engineering concerns, like
Tensorflow scopes, are elegantly handled. [edited for clarity:] Writing a
Transformer in 174 lines of code requires a lot of deep thinking about the
right underlying abstractions.

> but essentially as soon as you enter the 'ml engineer/research
> engineer/research scientist' layer, it's doomed.

We actually don't do this! Our only official technical title is "member of
technical staff". (People sometimes choose to self-identify as an engineer or
researcher, so you might see that on LinkedIn, but we don't have a distinction
internally.) Everyone is responsible for their own code, and people care quite
a bit about writing code that others can build on.

~~~
garmaine
I’m very sorry to see someone who obviously cares so much to be defending this
code. This does not follow best practices, and using complexity of the
underlying algorithm is just an excuse. Complex code can be beautiful and well
documented.

Writing a complex method in 174 lines is not elegant nor beautiful. Writing a
well documented file that can take an engineer in a different specialty and
bring them up to speed in 1,000 lines is.

~~~
gdb
We also have code like that. For example, that's the explicit goal of the
Spinning Up repo:
[https://github.com/openai/spinningup/blob/master/spinup/algo...](https://github.com/openai/spinningup/blob/master/spinup/algos/ddpg/ddpg.py)

In practice, it's much harder to use that code, and we tend not to consume
code like that internally. There's a real tradeoff!

~~~
falcor84
ddpg() takes 17 parameters and is over 200 lines long. I'm very far from being
a domain expert, but having worked in other complex domains, I'm pretty
confident this can be redesigned such that it's both more maintainable and
more pleasant to use.

~~~
jachiam
Hello! Spinning Up author here. I would love to hear your thoughts on this! So
far I have had a lot of success teaching people about DDPG using this code
example, but I'm grateful for every fresh perspective. :)

Feel free to reach out by email, jachiam at openai.

~~~
garmaine
There is no function in the world that should ever take 17 parameters. If the
algorithm permits such configuration, as I am sure it does, then it should
take a configuration object which has all these values. The object could then
be constructed using special purpose factories that take fewer parameters, and
then customized from there as needed.

It may be an indication that the whole thing needs refactoring though.

~~~
matz1
You refactor that way but then you make it unnecessarily more complicated.

