
GPT2, Counting Consciousness and the Curious Hacker - exolymph
https://medium.com/@NPCollapse/gpt2-counting-consciousness-and-the-curious-hacker-323c6639a3a8
======
jcims
FTA:

 _OpenAI decided not to release the full scale version of their model (1.5B),
because they were afraid of its potential security implications, in particular
in generating fake news._

One thing I believe gets indequate recognition, including in this article, is
OpenAI’s stated motivation for their release strategy. From the GPT2 release
announcement[1]:

 _This decision, as well as our discussion of it, is an experiment: while we
are not sure that it is the right decision today, we believe that the AI
community will eventually need to tackle the issue of publication norms in a
thoughtful way in certain research areas. Other disciplines such as
biotechnology and cybersecurity have long had active debates about responsible
publication in cases with clear misuse potential, and we hope that our
experiment will serve as a case study for more nuanced discussions of model
and code release decisions in the AI community._

It was, in part, due to direct safety concerns, but it was also to force the
conversation about how to handle the release of legitimately dangerous
capabilities in the future.

[1] - [https://openai.com/blog/better-language-
models/](https://openai.com/blog/better-language-models/)

~~~
Iv
I would have accepted that from a publicly funded research lab. From a group
funded by private interests with a very personal views of future and progress,
I think it is fair to suspect ulterior motives.

------
ve55
Short summary of the GPT2-related points from this article as it is quite
long:

1\. GPT2 is the next inevitable step in the trend of AI-generated content, and
powerful organizations are already capable of these things

2\. Delegating your trust to algorithms or other groups is not a good solution
to fake news

3\. Humans are still smart and will be able to adapt

Actual GPT2-1.5B replication post:
[https://medium.com/@NPCollapse/replicating-
gpt2-1-5b-86454a7...](https://medium.com/@NPCollapse/replicating-
gpt2-1-5b-86454a7f26af)

On a side note, I'm still amazed that the NYT is able to uphold its curated
public reputation so well that it is still consistently used as an example of
an organization that would never dare publish something low-quality.

------
minimaxir
The general 1.5B model is not a threat to anything: it's the _finetuned_ GPT-2
models and conditionally generated text from GPT-2 models that have been
coming out lately which will take the next step.

Here's a few apropos outputs from a GPT-2 model I'm working on to
conditionally generate programming article titles based on keywords "Java",
"AI", and "death":

In the age of machine learning, is a human becoming too powerful? If so, what
impact does AI have on how we view death?

An AI death machine created in Python

The death of an AI: What the world is looking to AI for now

Why Is AI So Bad At Killing And Exploring For People In Java?

Java is a death sentence – and AI is coming for all of us

How to avoid death by AI in Java

The AI is using Java to make it's own deathmatch

Google's "AI for death" is now a death sentence for Java

Why is Java a death sentence for AI?

Java is not a death sentence, but a step towards AI

The AI is dying... And it's not even on Java.

Go to Java to see the death of AI

Google's AI will stop every death before it's too late

~~~
zitterbewegung
I have trained a model based on Donald Trumps tweets. I was using the GPT-2
345MB to generate the below tweets. It took around 12 hours to get loss
numbers <= 1.0 . I used
[http://trumptwitterarchive.com/](http://trumptwitterarchive.com/) to get
every tweet which was nicely indexed. WIP of it is at
[https://colab.research.google.com/drive/1xRXqEzBO5x_62z7I9Eq...](https://colab.research.google.com/drive/1xRXqEzBO5x_62z7I9EqI5tCpmBpC2WFV)
Here are some samples from it.

Alpha male: @marthamaccallum is a huge supporter of @realDonaldTrump and an
amazing person. He is incredible on women he is great for business.

Join me on Wednesday May 17 in Albany New York at 6pm for a #TaxReform rally!
Tickets: [https://t.co/TN1QMA7bio](https://t.co/TN1QMA7bio).
[https://t.co/yIqBTUKrRC](https://t.co/yIqBTUKrRC)

A must watch: Attacking the "Dossier" is like attacking a sore thumb - bad for
the country!

Why isn’t Hillary Clinton being investigated for her dealings with Russia-
Russia? There is no reason why her corrupt ledger should still be closed. (and
of COURSE!)

Totally illegal Donna Brazile calls Barack Obama "The Drilling America" on her
Sunday morning television show.

Unbelievable evening in Albany New York last night! Disgraceful given the way
Crooked Hillary is treated. There is no way this should be happening. She is a
disaster!

So funny Bernie Sanders saying that the Democrats should go to a
constitutional amendment, like our present system. They will never win if we
continue toovertake!

Very sad that Crooked Hillary Clinton can't close the deal with Bernie
Sanders. I can't imagine she has $10 million on hand! A vote for her is a vote
for defeat.

Thank you Ed Klein! It is so great how you have done so much for the people of
Vermont. So proud of you as their Governor. Now need your support to
#MakeAmericaGreatAgain!

Wow Crooked Hillary Clinton has just indicated that she will not run in 2020
&amp; that Bernie Sanders is a fraud! What does she have to hide!

Will be doing @TheIssue with @Greta at 9pm tonight

Heading to Albany New York for an interview with @CNN. Enjoy!

Join me in Albany New York- tomorrow evening at 6pm! #NYPrimary #MAGATickets:
[https://t.co/NVTaAQ53Zo](https://t.co/NVTaAQ53Zo)
[https://t.co/Z0cKQhC8Up](https://t.co/Z0cKQhC8Up)

"@greta: Will @realDonaldTrump run? If he doesn't run and if we don't get a
real developer and if we don't get a businessman that can get us the answers
we seek

So many false and phony T.V. commercials being broadcast in New York State.
Federal Election Commission rules prohibit "buying or selling of political
ads."

I very much appreciate the many nice statements Mayor Stephanie Rawlings-Blake
has made regarding the need to rebuild and restore trust. AARP currently
representing VA

Unbelievable howarty Shore needed a rebuild just like the people of Virginia.
So much corruption - big money and all. RAISE YOUR HANDS!

.@AlexSalmond Can you believe-you spent $1000000 to take this issue to the
Scottish Parliament. You couldn't even get United Kingdom!

for Scotland! [https://t.co/HVruc6Up48](https://t.co/HVruc6Up48)

I agree! [https://t.co/R0rRYTJ7Xh](https://t.co/R0rRYTJ7Xh)

I agree! [https://t.co/d2tsQS0YTb](https://t.co/d2tsQS0YTb)

We are getting down to the wire on our Great American Infrastructure.
Presidential Memorandum on #Infrastructure was well received and easy to
understand.

Unbelievable howarty Shore needs a rebuild just like the people of Virginia.
So much corruption - big money and all. RAISE YOUR HANDS!

We must stop the crime and human trafficking taking place on our streets. It
is very sad that many more people are being robbed and killed by certain
groups. We must get smart!

"@XKidd92: @realDonaldTrump help me Donald Trump I am so tired of reading
about your so called "victories" !"

"@BeaumontAnthony: @realDonaldTrump you are the man Mr Trump!" Thanks.

"@BitcoinMoneyMan: Bitters night at the Trump Hotel in Chicago. Tin Roof is
better than the day I became a citizen. #nice!"

"@TiniQBoy: @realDonaldTrump don't let it be said you were a great guest of
Putin. You were a good friend. Let the truth be known!"

"@BitcoinMoneyMan: @realDonaldTrump I would call my daughter Ivanka that. She
is a great person. She has a lot of class."

"@TWIAMundo: @realDonaldTrump you were a great role model in life. Make us all
proud. We are all very proud of you!

"@JenaFeuers: @realDonaldTrump @MillionWit Yes! We are so happy you are coming
back to Vegas! It is such an amazing place."

"@ChrisFoley_: Just set my DVR to record Celebrity Apprentice starting next
Sunday night at 9 pm EST"

"@liamvanvorhis: @realDonaldTrump attending Canizal 2015 in Verona. En Charles

~~~
convivialdingo
That’s eerily good at replicating the style. Even the links are styled
properly.

~~~
Reelin
Larger blocks of text are often quite eerie or surreal now. Example:
[https://www.reddit.com/r/SubSimulatorGPT2/comments/bxe3rn/i_...](https://www.reddit.com/r/SubSimulatorGPT2/comments/bxe3rn/i_believe_that_a_lot_of_the_problems_in_your/)

------
namuol
Here are some samples generated from this version of the 1.5B parameter model:
[https://github.com/ConnorJL/GPT2/blob/master/samples/](https://github.com/ConnorJL/GPT2/blob/master/samples/)

I'm not particularly impressed with the results.

Here are some random samples from OpenAI's version of the model for
comparison:
[https://raw.githubusercontent.com/openai/gpt-2/b5ef71a922efc...](https://raw.githubusercontent.com/openai/gpt-2/b5ef71a922efc2357f2e668182a58d80414c7e03/gpt-2-samples/)

Of course, take these with a few grains of salt, these aren't apples-to-
apples, etc.

------
lovasoa
The published samples from the author's replicated GPT2-1.5M [1] look nothing
like the ones from OpenAI's original publication. I really would have loved to
see a student with no funding replicating a super large state of the art
model, but unfortunately he must either have collected lower-quality data for
training, or have failed to replicate every aspect of the model. Almost none
of the generated samples have anything to do with the input text, and many of
them don't even make any sense.

[1]
[https://github.com/ConnorJL/GPT2/tree/master/samples](https://github.com/ConnorJL/GPT2/tree/master/samples)

~~~
mcguire
From the Readme:

" _Unlike in OpenAI 's blogpost, I did absolutely nothing to manually cherry
pick the quality here. Some examples are good, most are bad. Almost all
outputs decay in quality as the post gets longer. I wanted to keep the raw
outputs to give an accurate feel of the strengths and failure cases of the
model. The truth is that using a GPT2 type model to create text you want is
more an art than a science and can be very finicky._"

~~~
gwern
The problem is OA also specified the level of cherrypicking, including
sometimes picking the very first one. Do any of the conditional samples,
picking out of 1 or 5, look like the OA blogpost ones? I didn't read them all,
but they didn't look like it. Or much better than the 345M recently released,
for that matter.

~~~
YeGoblynQueenne
I don't know why it is assumed that eyballing text generated by a neural net
can give an accurate measure of the network's quality. Evaluating this sort of
output is difficult and there are no good metrics for it.

Then of course there is the fact that the network can generate text a lot
faster than a human can read it. Think for a moment how many distinct passages
these systems can generate -possibly even infinite- and how few of them a
single person can realistically hope to read. It's very hard to know if the
few passages one ends up reading are representative of the output of the
network, or not.

It'd be even harder to compare two systems just by eyballing their output
side-by-side.

~~~
gwern
You have to eyeball it because (a) 'what it looks like to a human eyeballing
it' is the most important metric for misuse - no one cares that much about a
model which has a slightly better log loss but a human can instantly spot is
robospam, that can't be abused or used for fun projects like generating
poetry; and (b) the log loss between GPT-2 and his will not be comparable due
to the different training corpuses (and possibly architectural differences as
well), so the actual metrics are not useful.

Given the large difference in quality, it only takes a few comparisons. Again,
just _look_ at the samples. Read a few. Do they _really_ look the same in
coherency and quality and realism?

~~~
YeGoblynQueenne
Do you think that if you were given a sample of text genereated by one of the
two systems you would be able to tell which system generated it?

Looking at two texts side-by-side while knowing which is which may well create
an illusion of striking differences, which however are not really there. This
is why concrete metrics are necessary- because human judgement is biased and
inconsistent.

But, like you say- there are no good metrics. In fact, the only thing we have
to go by when discussing Open AI's model in the first place is opinions and
convictions. Not least the opinion of OpenAi that their model generates text
of unprecedented coherence, which itself is not based on anything else than
eyballing.

~~~
gwern
I think I could. Few or none of the new samples even manage to maintain
coherency in a sample, and you can see all sorts of garbage prose being
generated.

~~~
YeGoblynQueenne
Perhaps you could. Personally, I think it's impossible to get an accurate
comparison in this way.

~~~
gwern
I think it is possible, because it turns out I am right: the quality is _much_
lower when compared directly on perplexity as well:
[https://medium.com/@NPCollapse/addendum-evaluation-of-my-
mod...](https://medium.com/@NPCollapse/addendum-evaluation-of-my-
model-e6734b51a830)

------
solidasparagus
Not once in this blog did the author address why OpenAI has chosen to release
the model in the way they have and why he disagrees with them, other than the
belief that 'it's probably not possible to distinguish between GPT-2 and
humans so let's not bother giving people a chance to try to develop
techniques'. Blithely dismissing what OpenAI is trying to do is incredibly
arrogant.

\- "And I think no currently existing technique can even scratch the
capabilities humans have in this area [detecting truth], even with all of
their biases".

This is literally why OpenAI is releasing the model slowly.

\- "I really do think that if people just knew that something like GPT2 exists
and is out there in the wild, it would force them to improve their standards
for what information they trust".

I disagree. Look at how the knowledge that 'shills' and 'russian trolls' exist
has shaped online discussions. It becomes an easy way to dismiss things that
challenge your worldview, but does not significantly improve their ability to
determine veracity.

\- "I think we have reached a point where it is no longer, in general,
possible to determine whether a given text is human generated or not".

\- "It means that even if we had a system that can perfectly detect AI
generated babbling and deployed it, it would censor not just AI, but a good
chunk of real, human communication as well".

Based on what? This is a very thoughtless dismissal of a very important (and
very much unanswered) question. Deep learning often leaves identifiable
artifacts, such as deep fakes not blinking. GPT-2 is impressive, but to assume
that it is identical to human generated text is a lazy assumption.

\- "I’m as anti-war, pro-human as you could imagine, but if I was alive in the
40s, and the US offered me to work on the Bomb…I don’t know if I could have
said no. Not because I wanted to hurt people, this is the important thing to
understand about the curious hacker, but because it was just so damn cool.
Splitting the literal building blocks of matter itself to make a giant
explosion? That’s fucking awesome!"

\- "a digital 21st century Manhattan project (Which, again, sounds like heaven
to me)".

There are no words to describe my disgust at this sentiment. I'm not anti-war,
but I think prioritizing 'having fun' over human lives is evil.

~~~
NPCollapse
These are some of the best counterarguments I've heard yet. Would you be
interested in discussing this at more length over email or DM? I'm sorry I
have done and said things you think are wrong. I am genuinely trying to do my
best to do what is right.

~~~
solidasparagus
I'll follow up on this when I have time, but I should say that, while I do
stand by everything I said in my post, my tone is harsher than I would like
for reasons unrelated to you (I'm in a bad mood).

------
macawfish
Okay... I got hit with the shivers when the point came together at the end.
Too much coffee today? Who knows. I think and read about this stuff quite
seriously when I'm not burying my head in the sand about it. But this post got
me on a spiritual level in a way that a lot of tech writing usually doesn't.

Aside, it's a very beautiful example of text that isn't babbling, rambling as
it may be. Although now I wonder about a time when this level of elucidation
might be arbitrarily generated in bulk by some ultra finely tuned blob of
mathematical functions. Trained on human thoughts, of course. Maybe we're not
all that far off.

I became acutely aware at various points while reading this of the powers of
silence, listening, slang, poetry, signifying, innuendo, of knowing, of secret
languages, of stories, curiosity, of music. Amongst others, these are deep
gifts we have, and I take some comfort in remembering them.

 _My take: I 'm fairly confident in saying I think this was written by an
actual human._

I'm usually a little bashful about showing my music, but it was kinda uncanny
how much I was reminded of an old song:

[https://soundcloud.com/micahphones/capsized](https://soundcloud.com/micahphones/capsized)

------
scarejunba
All the GPT2 output just seemed like spam text that you find on forums that
don't have Akismet or something installed. Honestly, I've never found it
either interesting or dangerous.

------
6gvONxR4sf7o
NPCollapse, do you have any metrics on the quality of your replication?
Without any mention of metrics in either post, I'm skeptical that you've
really replicated their results, which would render everything else rather
moot. As you said, trust is hard.

------
_Microft
With growing power and capabilities of individuals the only thing that really
stands between us and disasters of any kind (physical, social, environmental,
...) is the utter incompetence and lack of creativity of bad actors.

(This is not meant to say that this guy is bad, in fact neither he nor the
content of the article matter at all. The only necessary context is that a
single person decided to do what a group of top-notch researchers considered
as maybe a bad idea.)

------
ptest1
Another argument the author may not have considered is that by prematurely
releasing his implementation, he may be making it more (not less!) likely that
future discoveries are hidden away from the public.

It seems OpenAI wanted to release this project in phases, allowing people time
to adjust to its nature. If in the future an even more disruptive project is
created (by OpenAI or others), if the creator feels they cannot release it in
their own perception of what a “safe” way is, they may simply avoid publishing
and instead privately communicate with companies and powerful individuals.
Which I don’t think is the outcome the author wants. So I hope he reconsiders
here.

~~~
MegaButts
> It seems OpenAI wanted to release this project in phases, allowing people
> time to adjust to its nature.

Do you really think anyone besides AI enthusiasts are paying attention? It's
not like the general public is even aware of this, let alone following its
progress.

~~~
ptest1
I may have spoken too generally here. By “people” I meant e.g. engineers at
Google, Facebook, Reddit, news outlets, that kind of thing. I see it a bit
like a security disclosure.

~~~
MegaButts
I understand OpenAI is experimenting with their release of the GPT2 model, but
I still don't understand their reasoning. If it's too dangerous to release
today, what's going to change in the few months before they release it? They
don't say why it's too dangerous beyond hand-waving, so it's impossible to be
able to protect against that.

Security disclosures are much simpler - we found a vulnerability and we will
provide time for the company/team/organization to patch it before announcing
it to the world so it won't be exploited by bad actors.

If OpenAI truly feels they have something akin to nuclear weaponry, and that
fewer actors having it is better, than they have to openly admit that they
consider themselves better gatekeepers of the technology than the public and
back away even further from their non-profit/limited profit ideals. "We are
creating this technology for the good of the world, but it's so good we are
afraid to let you use it, so only we will benefit from it."

I find them wildly inconsistent in their messaging, trying to have the best of
everything with none of the drawbacks.

------
mmastrac
There's a lot of odd hand-wringing around GPT2. Unless there's a ban on AI-
capable hardware, _someone_ is going to release a full model for this.

~~~
pdxww
The model needs to be retrained from sctratch for different types of texts.
One can release a model trained to generate Trump tweets, but it's of not much
use for generating fake news on a specific topic.

~~~
gwern
Not in the least. It's quite easy to retrain, even for very different domains.
Like my GPT-2 poetry:
[https://www.gwern.net/GPT-2](https://www.gwern.net/GPT-2) Or google around
and look at all the things people have been retraining GPT-2 on, like
[https://www.reddit.com/r/SubSimulatorGPT2/](https://www.reddit.com/r/SubSimulatorGPT2/)

~~~
p1esk
Can you please show us the best poetry example you generated? Does it rhyme?

~~~
gwern
Most of the examples don't rhyme. It's unclear to me if this is because most
of the original poetry doesn't rhyme so it's just faithfully replicating the
lack of rhyme, or if it only partially and accidentally grasps the idea of
rhyme.

As for the best one, I quote the ones that struck me during the training
process, and some are highlighted in
[https://www.gwern.net/GPT-2#unconditional-
samples](https://www.gwern.net/GPT-2#unconditional-samples)

Some of the ones I like are 'We never say "Thank you"', 'Thy soul, thy very
soul is burning!', '"It is morn!” said the clover-bush', 'And they have seen
the last light fail', 'There comes a murmur low and sweet'.

Probably the best IMO is 'The sun is gone, and the night is late', but of
course everyone will have a different favorite.

~~~
p1esk
Yes, "The sun is gone..." starts out amazingly well. But later fixates on
tides for some reason :)

Everything is generated by the 117M model, correct? If so, do you expect the
quality to improve for larger models, or is there not enough poetry to train
them on? I wonder how much of total poetry is contained in Gutenberg poetry
corpus...

By the way, here's some poetry which has been generated by a Markov model:
[http://www.kurzweilcyberart.com/poetry/rkcp_poetry_samples.p...](http://www.kurzweilcyberart.com/poetry/rkcp_poetry_samples.php)

~~~
gwern
It's a mix of OA 117M and 345M at the moment. I haven't observed too much in
the way of overfitting yet, so there should still be benefits to going up
another 4.4x in model size to 1.5B. My guess is that at 1.5B, it'll start
being more important to improve the poetry corpus, since you can already start
to see problems with it - the Alexander Pope brokenness and the occasional
prose generation of footnotes/commentary are definitely undesirable, and I
suspect there would be less 'run on' effect in samples if the original corpus
actually properly marked '<|endoftext|>' for each poem...

------
YeGoblynQueenne
>> Well, I replicated 1.5B.

Is there some way to test this claim, other than waiting for the model to be
released on the 1st July?

~~~
6gvONxR4sf7o
Seconded. From the author's linked blogpost about the model, it seems like
they've trained a similar analogous model, but I see no reference to metrics
that might suggest it's really of the same quality.

------
lostmsu
I want to warn anybody looking forward to play with his 1.5G model, that
there's no confirmation yet, that this model actually beats 345M one from
OpenAI. Connor had to come up with his own training procedure, which might
have led to a worse (or better) result.

Great, admirable work still.

------
lostmsu
An update:

\- upon verification the model appeared to be weaker, than the smallest one
released by OpenAI

\- the author ended up declining to release it

------
wyldfire
> But why do we trust the New York Times? Because the New York Times is
> composed of humans using their brains to do exactly what these detection
> algorithms try to do

This reasoning sounds circular and doesn't go deep enough. Couldn't NYT or
individual reporters at NYT abuse our trust on rare occasions if they felt the
stakes were high? Couldn't they unintentionally mislead us because they were
mislead by a source? No: we trust NYT because they have a financial interest
in being a reliable source of news. People wouldn't read NYT anymore if they
thought that they didn't take that responsibility seriously.

------
kodz4
What I like about this article (despite slightly too much babbling) is a CS
undergrad using lessons from Psychology and History to shape his thinking and
conclusions.

This was not happening 10 years back.

In my experience the majority of 30 and 40 year olds in Tech today pushing AI
code out have no clue who Kahneman, Hariri and Pinker are.

So expect subprime meltdowns/Trump/Brexit type unintended bullshit for a few
more years till the psychologist+historian+sociologist coders take over.

~~~
no_identd
Pinker seems like a horrible example, tho:

* [https://www.salon.com/2019/01/26/steven-pinkers-fake-enlight...](https://www.salon.com/2019/01/26/steven-pinkers-fake-enlightenment-his-book-is-full-of-misleading-claims-and-false-assertions/)

* [https://www.opendemocracy.net/en/transformation/steven-pinke...](https://www.opendemocracy.net/en/transformation/steven-pinker-s-ideas-are-fatally-flawed-these-eight-graphs-show-why/)

* [https://www.currentaffairs.org/2019/05/the-worlds-most-annoy...](https://www.currentaffairs.org/2019/05/the-worlds-most-annoying-man)

* [http://iainmcgilchrist.com/reply-to-steven-pinker/](http://iainmcgilchrist.com/reply-to-steven-pinker/)

~~~
kodz4
People have found issues with Harari and Kahneman too. That's okay. But who
else are you going to read to get a sense of these subjects?

People attacked Faraday for suggesting electromagnetism was a thing, hardly
paid attention to Maxwell for doing the Math because it was too complicated,
and attacked Oliver Heaviside his whole life, for simplifying the math that
everyone uses today. Each of these characters made mistakes and had weaknesses
too and its easy to find fault.

That's the way things work. There are always going to be more people who
fixate/react to mistakes, than use them as stepping stones to new discoveries.

Without these writers I would have hardly any awareness about these subjects,
because these subjects were hardly discussed when I was in school. And it's
not even that long ago. They opened the door. History will show that was the
role they played.

------
wyldfire
> Didn’t expect to see them in this post, did you?

No one ever really does expect it.

------
teslow
This is obviously a teaser for Silicon Valley - Season 6.

------
shanxS
Waiting for informed comments on this one. =)

