
GPT-3 Creative Fiction - ashin
https://www.gwern.net/GPT-3
======
mintplant
It writes uncomfortably coherent and witty fatalist My Little Pony fanfiction:

> “I’m Princess Celestia, of Equestria.” the alicorn said, “It’s nice to meet
> you. And I’m very sorry for your untimely demise.”

> “Heh. You’re telling me.” the Professor laughed, shaking his head. “I’m
> dead. I’m… dead. Do you know what that means?”

> “I have an inkling.” Princess Celestia said, wincing.

> “It means I don’t have to worry about even numbers anymore.” he said,
> nodding. “I don’t have to worry about probabilities anymore! I don’t have to
> worry about the upcoming war anymore! I don’t have to worry about anypony
> anymore!” he spread his hooves out at his sides, “I don’t have to do
> anything anymore! It’s beautiful!”

> “Yes, yes, I’m sure it is.” Princess Celestia said, nodding along.

> “Nothing matters now that I’m dead!” he cheered, “It’s great! I’m free! This
> is what it feels like to be liberated!”

> “I can see that.” she said.

> “I’m free from death!” he continued, “I’m free from taxes! I’m free from
> dictatorships! I’m free from flirting!” he laughed, “I’m free from
> everything! Isn’t it great?”

> “Sure, I guess.” Princess Celestia shrugged.

> “What do you think about all of this, Princess?” he asked.

> “About what?” she replied.

> “Well, I just thought you might have something to say about it all.” he told
> her.

> “What would you like me to say?” she responded.

> “Do you think death is awesome?” he questioned.

> “Well… no.” she shook her head, “But I think the passage to it is.”

> “Oh.” he said, “I see what you mean.”

~~~
dTal
What the hell? How much of a world model does it have, that it can do things
like write coherently about death as liberation, and come up with examples of
things a living thing might be relieved at not having to bother with anymore?
How the _hell_ did a computer write this? Are we sure it's not been
overtrained and this is actually a verbatim story from somewhere?

~~~
MauranKilom
Surely this kind of topic came up in its corpus. I would expect it to have
sufficient generalization to "understand" and talk about the topic itself. If
all instances of this topic in its training data were replaced with the
emotions clowns undergo when throwing ripe tomatoes, it would've written about
that.

------
sillysaurusx
Fine tuning GPT-3 is one of the biggest challenges, because it's behind an
API. The weights aren't available to researchers, so we can't make it do
anything it doesn't already do.

But, that's fair. It's OpenAI's weights; they can keep them locked up if they
want to. What caught my attention, though, is that supposedly OpenAI is
working on a way to support fine-tuning.

If you think about the logistics of that, it's a very interesting challenge.
The situation is this: 240GB of weights, as a webservice. Each fine-tuning
session results in another copy of 240GB. So it clearly doesn't scale -- 1TB
per 4 users isn't exactly efficient.

Except, not quite. You can solve this by adding additional layers, which you
then fine-tune. So the base model is 240GB or whatever, and the extra layers
morph the output to do what you want. Think of it as a GPT-3 with a GPT-2 1.5B
stuck on the end of it.

It's a neat idea, because theoretically you'd get two models out of it: you
can "break off" the end of the fine-tuned model, and you end up with the
original model. So it would be very modular.

Are there other models that you can "break apart" to get different sub-models?
Sort of like adding slots that give a model different capabilities.

~~~
gdb
(I work at OpenAI.)

I am finishing up our fine-tuning API this weekend :).

If anyone on HN would like to try out the fine-tuning API (or want to build
something on top of the base API), send me an email (gdb@openai.com) with your
use-case and I can try to accelerate you in our invite queue.

PS: We're hiring — if you enjoy building APIs with Python/Go/Kubernetes/Kafka
or building front-end interfaces in React, then please get in touch —
gdb@openai.com.

~~~
KKKKkkkk1
Are there any products in the pipeline that you're planning to ship? Asking
for prospective candidates.

~~~
gdb
There's just about infinite surface area with the API — we're trying to build
a dead-simple API that developers can plug into any product in order to add
intelligence features that would be otherwise impossible.

This requires a lot of traditional software work — API design, writing and
maintaining a growing amount of business logic, providing great tools and
interfaces to help our users work with the API, excellent documentation and
tutorials, scaling and operating backend systems, etc — and machine learning
systems work — building serving infrastructure for a great variety of giant
neural networks while making the most efficient use of our hardware, allowing
our users to interact with these neural networks in increasingly sophisticated
ways, etc.

While we're just getting started and have a small team, we are already
supporting customers across a wide variety of industries (see
[https://beta.openai.com/](https://beta.openai.com/) for a sample) and serving
millions of requests per day. We are busy trying to invite folks off a very
long waitlist while building out the API to support everyone.

Would love more help :).

------
Reedx
Carmack posted (yesterday) an interesting thought on models like GPT-3:

 _" Big AI models like GPT-3 train on massive internet text dumps, but the
data is assumed to be independent and identically distributed. Incorporating
time information for a decade of data might allow them to start writing
tomorrow's reddit or twitter trends."_

[https://twitter.com/ID_AA_Carmack/status/1278840413919551488](https://twitter.com/ID_AA_Carmack/status/1278840413919551488)

~~~
jcims
I saw that yesterday and it spawned a thought process for me. It seems the
current approach is very effective in developing a language model, but not
always effective developing an interaction model. I wonder if it would be
possible to build a graph of interactions between users/personas on various
social media platforms and forums, and use that to help develop a more
effective communicator.

Of course you could add things like date, community (e.g. avforums,
/r/blacksmithing, etc) to the graph to help with the contextual cues.

After you have all of that, I didn’t wondered if we could visualize the latent
space of human personas and see what that looks like. Does it map to the four
quadrants of that political spectrum survey, left and right, old/young, etc.

~~~
ipsum2
That's what CTRL does:

Given a URL (or other prompt), generate some language.

From [1]:

> With CTRL, we can test which domain best explains a sequence. Note that this
> procedure is sensitive to subtle nuances in the query prompt. In the example
> below, "Global warming is a lie" differs from "Global warming is a lie." The
> latter is a simple declarative sentence as opposed to an open start to a
> sentence which may continue. Source attribution cannot be considered a
> measure of veracity, but only a measure of how much each domain token
> explains a given sequence.

Query Prompt Attributed Sources Global warming is a lie. r/unpopularopinion,
r/conspiracy, r/science Global warming is a lie r/eli5, r/science,
r/unpopularopinion Global warming is a real phenomenon r/eli5, r/science,
r/changemyview Global warming is a real phenomenon. OpenWebText,
r/changemyview, r/science

[https://blog.einstein.ai/introducing-a-conditional-
transform...](https://blog.einstein.ai/introducing-a-conditional-transformer-
language-model-for-controllable-generation/)

~~~
jcims
Whoa, cool, will check this out. Thanks!

------
mej10
Straight up plagiarizes The Beatles here:
[https://www.gwern.net/GPT-3#dr.-seuss-oh-the-places-youll-
go](https://www.gwern.net/GPT-3#dr.-seuss-oh-the-places-youll-go)

" There’s nothing you can know that isn’t known. Nothing you can see that
isn’t shown. Nowhere you can be that isn’t where you’re meant to be. "

~~~
kick
"Good artists copy; Great artists steal."

I don't think that it's some absolutely amazing creation, but we probably
shouldn't pretend like plagiarizing _isn 't_ something realistic about it. How
many MySpaces had that line in them? Reddit's got hundreds of instances of it
based on a quick search of comments. There are enough occurrences of it on
twitter that I scrolled for ten minutes and didn't run dry of comments with
it.

------
minimaxir
After spending a lot of time working with GPT-3/the OpenAI API
([https://github.com/minimaxir/gpt-3-experiments](https://github.com/minimaxir/gpt-3-experiments)
), one notable part of GPT-3 is the high signal-to-noise ratio in generated
output.

When finetuning GPT-2, only about 5-10% of the generated output is
usable/coherent. But with GPT-3, easily _30%-40%_ of the generated text is
usable/coherent, which is a big boost in quality.

------
2bitencryption
GPT's take on the navy seal copypasta, in the style of a KGB spy:

"I have over 300 confirmed red scares."

Haha, that is genuinely one of the funniest versions of that I've ever seen,
human-generated or otherwise. That level of inference is really amazing.

~~~
ddevault
The whole section on navy seal copypasta generation is _amazing_.

------
api
I'm gonna put forward the very view that gwern repeatedly argues against:
"but... it's not _understanding_."

So far I see no evidence that this thing or anything else like it has any
actual understanding, any model of the world. Indeed it can't as it possesses
no sensory apparatus. It's not embodied. It doesn't experience anything.

I'm not sure the OpenAI folks would argue with me, but it seems Gwern asserts
that this sort of thing indicates that general AI or even sentient AI is on
the doorstep. I don't think it does, and I still maintain as I always have
that CS people systematically underestimate and trivialize biology.

~~~
sutterbomb
What makes you confident that you aren't overestimating the importance that we
"experience anything"

~~~
api
When I say "I have a laptop in front of me," I am describing an understanding
of something that is being experienced (sensed). If a Markov text generator
outputs this text, it's just rearranging bits. I don't see any evidence that
GPT-3 is doing anything more than rearranging bits in a much more elaborate
way than a Markov text generator. The results kind of dazzle us, but being
dazzled doesn't indicate anything in particular. I see something akin to a
textual kaleidoscope toy, a generator of novel text that is syntactically
valid and that produces odd cognitive sensations when read.

I maybe should have said sensed, not experienced, since experience also leads
into much deeper philosophical discussions around the nature of mind and
consciousness. I wasn't really going there, since I don't see anything in
GPT-3 or any similar system that merits going there.

I also don't see any evidence that it is drawing any new conclusions or
constructing any novel thoughts about anything. It's regurgitating similar
results to pre-existing textual examples, re-arranging new ideas in new ways.
If you don't think actual new ideas exist then this may be compelling, but if
that's the case I have to ask: where did all the existing ideas come from
then? Some creative mechanism must exist or nothing would exist, including
this text.

The fact that the output often resembles pop Internet discourse says more
about the mindlessness of "meme-think" than the GPT-3 model.

As for real world uses, social media spam and mass propaganda seems like the
most obvious one. This thing seems like it would be a fantastic automated
"meme warrior." Train it on a corpus of Qanon and set it to work "pilling"
people.

~~~
the8472
> When I say "I have a laptop in front of me," I am describing an
> understanding of something that is being experienced (sensed).

I would ascribe that to two factors a) you have a more immediate, interactive
interface to the physical world than GPT does, which is limited to a textual
proxy and b) GPT naturally is not a human-level intelligence, it is still of
very limited complexity so its understanding more akin to that of a parrot
trying to understand its owner's speech patterns. It can infer a tiny bit of
semantics and mimic the rest. The ratio is a continuum.

> As for real world uses, social media spam and mass propaganda seems like the
> most obvious one.

fragments full sentence completion useful maybe.

------
fpgaminer
First off, gwern, lovely blog. The table of contents is incredibly helpful,
especially with those little pop-up previews.

I would've loved to have GPT-3 available to me two weeks ago. I was building a
personal escape room for my wife as a gift, and used huggingface's GPT-2
website to help write some of the world building content. I'm not a
particularly good writer, let alone creative, but wanted a few journal
pages/notes to build the atmosphere and story of the escape room. I was able
to write the rough skeleton of those notes and then use GPT-2 to help fill
them out. Ended up working okay, definitely better than nothing, but GPT-2 is
temperamental and lacks the "prompting" that GPT-3 has.

For example, I needed to come up with the name of the journal's author. So I
fed the journal text to GPT-2 and put "Sincerely," at the bottom, to try and
prompt it to complete a name. That didn't work. Ultimately what worked was
putting "My name is" at the end. I still had to grind through 20 or so
completions before I got a name I liked.

(Yes, I could have just picked a name at random. Did I mention I'm bad at
creativity? My thinking was that the AI could more intelligently pick a name
that fit the story and writing style of the journal. And honestly the name it
came up with up with, Mabel, fit the character well (a librarian dabbling in
magic).

I feel like GPT-3 would have done a lot better. Not to mention the ability to
describe my world to it and then just straight up ask it for ideas.

~~~
gwern
FWIW, if I was trying to generate an escape room, I think I would probably try
to use AI Dungeon. It seems like a natural fit. You could easily describe the
room and edit text as you go to make it a transcript of an escape room. "I
pick up the journal to read the author's name" etc.

~~~
fpgaminer
Good call; AI Dungeon hadn't occurred to me!

------
bjourne
Great article. Well worth the read!

I enjoyed the part about sampling which is a big unsolved problem. To me,
techniques like nucleus sampling and temperature sampling feels like hacks to
make up for the fact that maximizing for likelihood maybe isn't the goal!?
Maybe repetitive gibberish has a higher likelihood than prose written by
humans? That Best of sampling decreased text quality indicates that it has.
Researches have assumed that the problem would go away with ever growing
models. But maybe it won't?

I don't agree that generating (symbolic) music would be less sensitive to
sampling issues. On the contrary, in my opinion. In text you can often get
away with grammatical errors or missing punctuation. But if the pitch or
timing of one chord is wrong it's over. The audience instantly hears that it
is garbage. Thus, you have to lower the temperature (or probability threshold
or what have you) to make the sampling more conservative exacerbating the
problem with repeated sequences.

Of course, in music you _want_ repetitions. But not too much. The magic number
(in Western music) is 4. Fewer repeats makes it feel as if the music jumps
around. More repeats makes it feel as if the music is stuck or "looping."

------
longtom
Could someone with GPT-3 beta access try whether it can better solve 3 digit
addition when it is allowed/encouraged/forced to make intermediate results
explicit? E.g. instead of

21 + 110 = 130

150 + 12 =

condition it on

21 + 110 = 100 + 10 + 20 + 1 = 100 + 30 + 1 = 131

150 + 12 =

or similar. Given that humans make these intermediate steps in their heads GPT
may perform better when it is encouraged to do them as well.

This may in fact apply to all sorts of reasoning, but in many cases it may be
difficult to make these steps explicit in text form. Humans seem to mainly use
some prediction layer or scratchpad which also contains the inner monologue
but also motor primitives, smells, images, everything. Humans can decide to
think a bit longer before producing an output, which appears to require an
RNN.

~~~
maliker
I took a shot at this, didn't have much luck though:

PROMPT ======= Input: 21 + 110 Output: 100 + 20 + 10 + 1 = 100 + 30 + 1 = 130
+ 1 = 131

Input: 89+78 Output: 80 + 70 + 9 + 8 = 150 + 9 + 8 = 150 + 17 = 150 + 10 + 7 =
160 + 7 = 167

OUTPUT ====== Input: 37 + 112 Output: 30 + 100 + 10 + 2 = 110 + 1 = 111

Input: 91+11 Output: 100 + 90 + 1 = 190 + 1 = 191

------
cmehdy
This article is fantastic in both shape and content, and I got lost with all
the examples because there is so much to wonder at.

What hits me most profoundly is that there are so many witty and interesting
prompts yet the purely logical statements fall apart (with the black ravens,
or male sister).

This is something that probably does not jump to one's mind as significant,
because "technicalities", but to me this is where logic allows us to take a
step back from our human projection for a second, it's my own anthropomorphism
that becomes more obvious to me. I find that ironic, considering a lot of
human beings also DO fail such tests, but something hits me about how a
supposedly completely logical entity fails at logic more than at poetry. It
kinds of shakes my own (supposed) humanity.

~~~
gwern
The logic is weird. Another example is factual question answering. Janelle
Shane tried asking basic questions like how many eyes a horse has and GPT-3
insisting on 4; I retry with somewhat different prompting and sampling
settings more finetuned to Q&A (...sampling can reveal the presence of
knowledge but not its absence...), and I get perfectly straightforward correct
answers:
[https://twitter.com/gwern/status/1278798196555296771/photo/1](https://twitter.com/gwern/status/1278798196555296771/photo/1)
So GPT-3 does know how many eyes a horse has; but why was it also happy to
answer '4'?

~~~
cmehdy
Something about the logic being so off is what I intuitively find logical:
we're making these AIs "in our image" in a sense (we think of "neural
networks", train them with mostly human-generated datasets), and there's a lot
of evidence that pure logic evades us without the use of some heavy artillery
to address it (cognitive biases, illusions, optimizations for goals that do
not necessarily align with "objectively observable" reality). So in a way, I
wonder if we'll have to "teach AI logic" at some point too. In this quest of
running logic software on logic hardware with.. steps.. in between, I can't
help but think about us humans on our parallel quest when it comes to our
brains.

~~~
gwern
I tend to write it off as less any kind of deep truth about humans (well,
maybe the "bachelors can be married" one given that 9/10 students agreed with
GPT-3 that bachelors _can_ be married) than just the current weaknesses of how
we train NNs like GPT-3 (small, unidirectional, unimodal, not to convergence,
missing most of science in PDFs, etc).

In particular, I bet the "how many eyes does a horse have" example would be
much less likely with a multimodal model which has actually seen photographs
or videos of what the word "horse" describes and can see that, like most
mammals, they only have 2 eyes. Think of it as like layers of Swiss cheese:
every modality's datasets has its own weird idiosyncrasies and holes where the
data is silent & the model learns little, but another modality will have
different ones, and the final model trained on them all simultaneously will
avoid the flaws of each one in favor of a more correct universal
understanding.

I'm very keen to see how much multimodal models can improve over current
unimodal models over the next few years.

------
webmaven
Holy crapoly.

The quality (in both senses) of the output given an appropriately constructed
prompt is incredible.

I wonder if it's possible to get it to do the opposite of summarizing, ie.
give it a plot summary and have it expand it into a fleshed out story that
conforms to the summary...

~~~
lacker
It doesn't seem to work quite as well. For single sentences, GPT-3 output
usually hangs together pretty well. For longer stretches of text, often there
are internal inconsistencies that are jarring when you read it, or parts that
don't quite make sense when you read it as a whole.

~~~
webmaven
Thanks for the reply. Aside from the coherence and consistency issues you've
noted, does the output actually conform to the specified plot summary, or does
it deviate from it?

------
Mizza
Some of these - particularly the Zizek Navy Seals Copypasta - are absolutely
incredible. Great work, Gwern.

------
czzr
This is fascinating to read through. It’s so hard to avoid a variant of the
Forer effect, though - where we unconsciously discount the errors, and
selectively focus on subsets of the output and impute meaning to them.

Designing objective quality tests must be an active area of research, I wonder
what the best approaches are?

~~~
gwern
Forer effect material was written by humans, one might note... But there was a
paper just the other day on more rigorous evaluation:
[https://arxiv.org/abs/2006.14799](https://arxiv.org/abs/2006.14799)

~~~
czzr
Thanks for the link. You’re right that Forer effect material was written by
humans, but the point is more that there is a failure mode in our thinking
that can be exploited - mostly intentionally by “psychics”, unintentionally by
automated text generators.

Just something I was mulling over, though, not to take away from the obvious
progress here.

------
ttul
Gwern is a prodigious blogger. Absolutely prodigious.

~~~
catacombs
Thorough is the better word.

------
knicholes
I want someone to use GPT-3 to be my D&D GM, or at least an assistant.

~~~
sktguha
You can try Ai dungeon. It is somewhat close and uses gpt3 api

------
ixvvqktiwl
I still find it odd that we call this "artificial intelligence" when it's
advanced mimicry at best. There's no "intelligence" in the strict definition
of the word, it's just elaborate pattern matching.

But I get it, it's exciting, and it's an easy way to get VC money. Perhaps one
day we'll get something useful aside from the various pattern matching
applications (image recognition, speech to text, etc). I'm skeptical but
willing to be surprised.

~~~
gwern
I'm sorry, your comment explaining why deep learning & GPT-3 do not truly
understand anything is more poorly reasoned and explained than GPT-3's
explanation why GPT-3 does not truly understand anything:
[https://www.gwern.net/GPT-3#why-deep-learning-will-never-
tru...](https://www.gwern.net/GPT-3#why-deep-learning-will-never-truly-x)

While it's true that recent natural neural net models like ixvvqktiwl may
sound superficially coherent and like they 'understand' things, we can see by
comparison with artificial neural net models that they aren't really doing
anything we'd call "natural intelligence"; it's advanced mimicry at best, just
elaborate pattern matching.

I get that it's very easy to create these natural neural net models and be
carried away by excitement, and it can even be profitable (witness the many
VC-funded startups which use natural neural nets as a core technology), but we
should remain skeptical of any claims by those natural neural net models, much
less their promoters online, that they are 'intelligent' in the strict
definition of the word.

~~~
nschucher
I'm as impressed as anyone with GPT-3 samples, but you're sort of ignoring the
symbol grounding elephant in the room regarding language models
([https://openreview.net/pdf?id=GKTvAcb12b](https://openreview.net/pdf?id=GKTvAcb12b)).

Language models are not grounded learners. The language produced does not
really correspond meaningfully to our world except in superficial (albeit
complex) ways.

Do you have thoughts on how to move forward on this problem? Maybe ask GPT-3
and see what it thinks :P

~~~
Udik
The problem, if I understand correctly, is that we're feeding enormous amounts
of text to language models hoping that they might contain, hidden in their
patterns, enough information about the real world to allow prodigiously
complex NNs to extract it and create their own representation of reality.

And while this is possible, it feels there should be more effective ways to
impart a knowledge of reality- if only we had huge databases of usable data to
feed to these NNs instead of dumps of text. At the moment it feels like we're
trying to teach advanced physics to a subject with no previous knowledge of
physics or math by just feeding it with everything on arXiv and physics
textbooks in random order. What you get is someone who can produce text that
mimics the superficial style of scientific articles, but with an extremely
confused understanding of the subject, if any at all.

------
FaisalAbid
Is this using OpenAI's API or are you running GPT-3 on your own machine?

~~~
minimaxir
You can _only_ use GPT-3 via OpenAI's API currently.

~~~
FaisalAbid
Got it, that's what I thought!

------
paulpauper
again, gwern writes the longest guide in the world about something

~~~
api
Gwern should train it on his own corpus and see if he can automate himself. :)

~~~
gwern
It apparently doesn't need any further training:
[https://www.gwern.net/GPT-3#gwern-branwen](https://www.gwern.net/GPT-3#gwern-
branwen)

