
Better Language Models and Their Implications - yigitdemirag
https://blog.openai.com/better-language-models/
======
cs702
This kind of "blocking-and-tackling" work is important.

The authors take a well-known architecture, the Transformer[a], configure it
with a progressively larger number of parameter, train it to predict the next
word conditioned on previous text, using a large dataset consisting of 40GB of
text scraped from the Web, and test each trained model on a range of zero-shot
transfer-learning tasks.

Remarkably, the performance of a Transformer in the tested tasks improves
_log-linearly_ with the number of parameters, suggesting that even the largest
model tested, with 1.5B parameters, still _underfits_ 40GB of text.

This is _compelling evidence_ that we do NOT need new architectures, NOR new
kinds of training objectives, NOR new theories, for better language modeling!
We can get better language modeling simply by increasing model capacity (i.e.,
by adding more parameters to existing models), which becomes easier and
simpler to do as hardware continues to improve over time.

Great work.

PS. In case it's not clear: I'm not saying we should suddenly stop searching
for new, better ideas and architectures. That would be silly. Please don't
attack a straw-man :-)

[a] [https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762)

~~~
modeless
Agreed that "simply" scaling up with more compute will result in progress and
useful systems, and work in that direction is interesting and valuable. But,
while we may not need new architectures or training objectives to make
progress, we do need them to approach human level sample complexity. Humans
don't need to read through 40 GB of text multiple times to learn to write.

~~~
cs702
_> Agreed that "simply" scaling up with more compute will result in progress
and useful systems, and work in that direction is interesting and valuable.
But, while we may not need new architectures or training objectives to make
progress, we do need them to approach human level sample complexity._

Yes, agreed. Nothing I said above contradicts that! :-)

> Humans don't need to read through 40 GB of text multiple times to learn to
> write.

Yes, that's true... but to keep the comparison fair, note that we _do_ need
many years of schooling to learn to read, say, at a high-school or college
level. And before learning to read, we first must learn to speak, which surely
helps. And we also get to inhabit bodies that see, smell, touch, and interact
with the physical objects that we read and speak about during our formative
years, which also helps. The more one thinks about it, 40GB of data is
actually a _tiny_ figure in comparison to the amount of training data that
flows continuously to our brain from all senses. I think I read once that our
brains process on the order of _10 to 100 GB of training data per second_.

~~~
Cybiote
Conscious processes are estimated to work on the order of 10^2 bits. Vision,
at the retina, is estimated at 10^7 bits/sec. It drops another order of
magnitude by V1. Also note that long as they're not isolated, a deaf and blind
person has no trouble getting to full human reasoning ability despite being
vastly more impoverished in available data compared to the average person.

A human will also be learning vision, hearing, walking, physics, causal
reasoning, and much more. This comparison just isn't well grounded. Task
specific is how much training does a young brain require to learn to produce
language? If the brain comes with innate advantages then rather than be resort
to inefficiency and excusing our models, we should try to see if they can be
bettered.

~~~
EForEndeavour
I'm not well versed at all in signal theory, so I'm genuinely curious how
these bitrate estimates are made, and would love to see the source of these
specific numbers.

How do you estimate the effective (digital) bitrate of an inherently analogue
system?

------
resters
While censoring the full data set seems in some way to support the rationale
of the OpenAI charter, it also means that only state actors and very well-
funded entities will be able to use the work to create models of the size
necessary to do the impressive stuff in the write up.

Based on the concerns, it would seem that restricting the capabilities only to
state actors would have the opposite of the intended effect. Why not let
thousands of amateur researchers, undergrads, etc., use the model to detect
instances where the model was used to generate text, etc.?

~~~
skybrian
I would guess this reduces the risk. Why would you say it does the opposite?

My argument: state actors might misuse this tech, but letting any script
kiddie do whatever they want almost guarantees someone will misuse it.

~~~
mikeyouse
Which is exactly what happens whenever there's a leak of NSA or other foreign
government 'hacking' tools. As soon as they're public, ransomware authors and
other shitty actors all deploy them to steal as much as possible before
systems are patched.

Case-in-point:

Wannacry:
[https://en.wikipedia.org/wiki/WannaCry_ransomware_attack](https://en.wikipedia.org/wiki/WannaCry_ransomware_attack)

NotPetya:
[https://en.wikipedia.org/wiki/Petya_(malware)#2017_cyberatta...](https://en.wikipedia.org/wiki/Petya_\(malware\)#2017_cyberattack)

~~~
resters
I think it’s a bit different because the attacks are not entirely clear. The
power of many of the attacks would be for them to fly under the radar because
nobody was expecting it.

So having lots of script kiddie attacks (which would be sloppier and easier to
notice) would lead to a more rapid adoption of safeguards.

------
rfdearborn
Sure, not releasing the full trained model probably delays it, but sooner or
later a bad actor will do their own scraping and train their own model and
share it around and the genie will be out of the bottle. Then what?

I think we need to be conducting AI research (and building software generally)
under the assumption that all of it will eventually be repurposed by bad
actors. How would our practices be different if we consistently and cautiously
did this?

Here's a thought experiment: how would the Manhattan project have been
different if it were carried out in the open and its products were
instantaneously and infinitely reproducible? What is the MAD equilibrium of AI
research? I think the impact potential is similar even before AGI.

~~~
a-dub
wasn't that the point of this whole openai thing? they didn't like the idea of
there being a club with just google in it that had access to resources and
funding to collect and train on massive datasets so they were going to be the
"bad actors" who would do their own scraping, train their own models and share
them around?

isn't it supposed to be called OPENai?

they don't want to share the data because they don't want to throw away the
edge they've gained by collecting it. :)

computer programs that generate human like text aren't dangerous, the internet
is full of human like text that is mostly bullshit anyway.

~~~
lc5G
I have had the same impression regarding their work on dota. They got a lot of
publicity with it but their work is not open at all. They have released
neither their code which runs the bots on dota2 nor their training code nor
the final model. All we have is video recordings of a few games against
humans.

------
yongjik
> ... some believe that perhaps the creatures were created when a human and a
> unicorn met each other in a time before human civilization. According to
> Pérez, “In South America, such incidents seem to be quite common.”

Man, the auto-generated text is hilarious. And uncannily good. Though I have
to wonder if it's total random fluke or there's one among their 1.5 billion
parameters that predict "likelihood of mythical bestiality in South America".

~~~
sanxiyn
Considering that it memorized Gettysburg Address verbatim and knows Charles
Darwin wrote Origin of Species, it probably knows about both South America and
unicorns more than me...

Unicorn story does demonstrate that it knows South America has to do with
Argentina, Andes Mountains, and University of La Paz.

------
legatus
I was honestly surprised by the quality of the generated text. While I can't
say I've been following the state of the art in the last months, this seems
like a pretty important step forward. Furthermore, at the end of the post they
note that the samples are somewhat representative of their results. Maybe they
should consider releasing a text file with some more (not hand-chosen)
samples? Whatever the case, fantastic work, my congratulations to the authors.

~~~
wuthefwasthat
Thank you! We've released 500 random unconditional samples from GPT-2 at
[https://github.com/openai/gpt-2/blob/master/gpt2-samples.txt](https://github.com/openai/gpt-2/blob/master/gpt2-samples.txt)

~~~
schoen
Wow, some of these really go off the rails but those that only kinda go off
the rails are absolutely hilarious and/or bizarre.

A few summaries of ones that I looked at which appeared to be more or less
staying on a single topic:

Sample 1: An Austin nonvegetarian vegetarian restaurant encounters a series of
difficulties in opening, as its nonexistent but extensive menu depicts a wide
range of food options and the restaurant opening is delayed by financial and
food-safety concerns. The nonvegetarian restaurant has also annoyed vegetarian
clientele with its plans to be a vegetarian restaurant. Food reviewers
nonetheless manage to eat at the new restaurant and post their reviews; the
establishment also becomes "the first Austin restaurant to ride a ride-sharing
service in Austin since the 'Bike-Share — Share the Ride' controversy
erupted".

Sample 3: Denise Schroeder encounters perhaps the most complex and confusing
legal trial in American history as she gets murdered, is accused of murder,
comes under investigation for liquor law violations, becomes an abuse victim,
prompts others around her to commit suicide, is arrested, and ultimately wins
the right to marry her same-sex partner.

Sample 6: Cooking rice and beans by steaming a roast in a wok is easy! Just
follow these 40 simple steps to update your XBox firmware, and you'll end up
with a nice fried soup.

Sample 8: A protest march against drought in South Asia attracts very broad
support, but its radical nationalist message is simultaneous endorsed and
feared by virtually everyone in the region.

Sample 13: The global bicycle industry, although very large, is perhaps
unsurprisingly extraordinarily unpopular and economically irrelevant following
a very complex cycling accident involving an area woman.

Sample 14: Indian restaurant owners in Canada have to contend with an
_amazing_ array of economic, technological, and environmental challenges as
the infrastructure of their society seems to collapse around them -- but they
do all right in the end.

Sample 26: The previously untold history of Blackwater USA, in which founder
Erik Prince is capable of meeting Bill Clinton on a day in January that was
actually in March, and results in Blackwater and Prince having shady dealings
with all sorts of celebrities -- though the organization "may not like to
admit what a true dick the injustice has wrought".

Sample 29: What does the KKK believe? Apparently, lots of complicated
conspiracy theories about black history. Also, if you find their theories
traumatic, you can find "several biblical [...] references that can be used to
up your level of moral competency in your longterm relationship with Mr.
Soros."

Sample 30: Wikileaks reports harshly on speculations of Linux adoption by
rural tribal mobile device users.

Sample 37: faint praise for soccer champion who apparently keeps winning games
through poor performance.

Sample 49 (following the end of the reviews section): world traveler and
masterful hotel architect Frederick Beckey remains unperturbed by racist
gatherings at his hotel.

Sample 50: comedians fear the looming resolution of a long-running comedian
feud. Also, Soviet spectators at the Munich Olympics cheer Yuri Gagarin, who,
although escorted by Russian soldiers, uses rockets and airplanes in his
Olympic performances to win multiple medals. The crowd of Soviet spectators,
"[l]argely composed of high school students in tight-fitting vacant uniforms
[...] walked away believing that Gagarin was the next North America's greatest
athlete".

~~~
xiphias2
I'm looking forward to computers generating short clips from these stories. I
wouldn't be surprised if some of them get viral (especially if A/B testing
would be incorporated by looking at when the users stop watching the videos).

~~~
schoen
Compare _Sunspring_ (2016).

[https://arstechnica.com/gaming/2016/06/an-ai-wrote-this-
movi...](https://arstechnica.com/gaming/2016/06/an-ai-wrote-this-movie-and-
its-strangely-moving/)

------
resters
Just as many pesticides mimic the hormonal and chemical signals of pests to
drive certain behaviors that lead to eradication, this work mimics the
linguistic signals of humans. I think viewing it metaphorically as the most
sophisticated humanicide discovered to date is probably appropriate.

Consider that conventional munitions make an effective pesticide but are not
used due to their side effects. Instead, chemicals are used to destroy or
mimic the perception and production of various signals so that populations of
unwanted critters effectively self-destruct.

Imagine a war fought with a weapon like this that left entire cities perfectly
intact!

# end hyperbole

~~~
honzzz
This is exactly the line of thinking that this article inspired in me... AI
scouring the internet, figuring out what makes us tick and generating
perfectly persuasive stories to convince us to to... what? I kept thinking
about the story "Sort by controversial" by Scott Alexander:
[https://slatestarcodex.com/2018/10/30/sort-by-
controversial/](https://slatestarcodex.com/2018/10/30/sort-by-controversial/)

------
songeater
Is anyone else troubled by them not releasing the source
model/dataset/parameters here? Yes, the technology can be used for malicious
means - but would argue that "DeepFaking" language is FAR less of a problem
than "DeepFaking" video/photo/audio... which already occurs. Seems like they
went back on their charter to share AI developments broadly ("not concentrate
power") under the excuse of "safety."

(These results look fire btw)

Note: copied my comment from dupe thread

~~~
ipsum2
I agree with this. OpenAI isn't particularly 'open', compared to other AI
research organizations (Notable ones that open source almost all their work
are AllenAI and FAIR, but I'm sure there are others).

Wonder what their excuse is for not releasing the source model or code for
their DoTA bot. Surely there's no safety issues there?

------
dzink
This tech can easily be used to flood humanity’s shared brain with auto-
generated propaganda. Schizophrenia of the internet in a way. There is plenty
of incentive with Google algorithms favoring number of words and relevant
keywords in content for rankings - you could have NLP bots lifting junk sites
to top results.

To step ahead in that chess game, a detection tool for fake would be just
training grounds for better GAN. Instead we may see a certifying authority
that labels content as human generated, certified fact maybe? Wikipedia and
reddit are not safe without fast automatic moderation either.

Do you have a brainstorm or idea/prototype submission site where people can
submit approaches to countering bad ai actors? An white/grey hat ai bounty
program of sorts?

~~~
comboy
Web of trust. Your results and how much you trust given text is based on who
gave it to you. You assign trust to people you know, then it's a small world
effect and recursive trust calculation based on who they trust.

Centralization won't work. Whether something is good or bad or fake must be
subjective and based on your personal network / your beliefs. Otherwise, long
term, I don't see how you could avoid dystopia.

~~~
pmohun
This has a dangerous implication that we've seen recently with the newsfeed
"bubble."

If dissenting opinions are never allowed into your sphere of influence then we
may continue to see an accelerating polarization of our society towards
extremes.

~~~
comboy
Bubble that's dangerous for society is when everybody believes the same thing,
not when everybody has different set of beliefs.

I know the more popular bubble is information one and you make a fair point,
but I believe it's powered more by recommendation systems than the things that
you trust.

I got distracted lately and I wanted to clean it up before putting it out
there but if somebody is bored:
[http://comboy.pl/wot.html](http://comboy.pl/wot.html)

------
Cybiote
It's becoming ever more certain that the transformer architecture is one of
the largest contributions to AI (not merely machine learning, but AI), often
beating LSTMs despite LSTMs being expressive enough to capture Turing
Equivalence (at least in theory). Its main ideas are three: shorter paths help
gradient flow, the training setup and the final key aspect, unhelpfully called
self-attention. Self-attention is better thought of as a form of similarity
gated key-value soft memory on which learning operations allows Transformers
to learn non-trivial programs with contextual weights look-ups.

I also notice reported tries, suggesting some level of curation. While this
level of generation is undoubtedly impressive and a sign of non-trivial levels
of understanding, the ability to project along arbitrary dimensions of
similarity at a fine-grained level and to learn from text instruction is more
useful than text generation. Although the unicorn story was a really fun read,
better than many humans already, I doubt it could have gone on for much
longer. It maintains a theme but not coherently or fluently (see especially
the Kennedy nanotech and recycling examples, _comparing the dis-fluency there
versus the excellence of the Civil War report suggest at least some over-
fitting_ ). These relatively minor caveats aside, this is unambiguously an
outstanding result.

Winograd Schemas are the single metric to track if interested in understanding
how language understanding is truly improving. OpenAI reports 71% and
_wrongly_ report the previous record as 63%. The current record here is at 65%
[https://gluebenchmark.com/leaderboard](https://gluebenchmark.com/leaderboard)
though not fully comparable. Will OpenAI be submitting? Note that you can get
to 60% using about 1-2 orders of magnitude less data and compute.

It concerns me that results here are so far dependent on such large data and
computation. However, based on several papers I've read, I do not believe this
to be inherent even in transformers. I plan to do some experiments on this
when I free up some bandwidth.

If everyone is pulled in by the glamour of working for a well funded,
prestigious operation then it should be no surprise that they do not consider
paths which operate on several orders of magnitude less data and computational
resources.

We all should consider bringing about a group of researchers who swear to an
austere computational life of a single GPU, no more than 4-8x average RAM and
CPUs that do not cross 90 Watts. _The Bicameral Order_ would be a good name
for such a group.

~~~
wuthefwasthat
Yeah, there are definitely still places the samples fall short! Keep in mind
we're still using very naive sampling techniques.

RE Winograd: WNLI is different, see
[https://arxiv.org/pdf/1804.07461.pdf](https://arxiv.org/pdf/1804.07461.pdf)

~~~
Cybiote
Amazing results, how excited are you? :)

You're right, I noted too that the comparison isn't direct but then, I wasn't
justified in calling out the gap claim as wrong, so sorry for that. I think
it'd be nice however, to have it undergo an external or more neutral test of
performance. I say this without at all doubting the quality of the results.

------
lucidrains
Started a Google colab with the interactive text generation script.
[https://colab.research.google.com/drive/1da54684tFMjPbR5idbv...](https://colab.research.google.com/drive/1da54684tFMjPbR5idbvoCyjOoEGwIVwV)

~~~
exit
to be clear, this is the "politically innocuous" open sourced model. the
results are not impressive.

~~~
marviel
Yeah the results aren't amazing on their own, but if you treat them similarly
to how they do over at Botnik -- with some human curation involved-- you can
find some interesting sentences.

------
gallerdude
These samples are _freaky_ good. We're approaching some threshold very, very
fast. I'm not sure what that threshold is, and whether or not crossing it is a
good thing, but soon we'll be there.

~~~
Matumio
I like how it effortlessly switches into "git diff" mode at the end of sample
112. Sadly it doesn't do whitespace.

> Showing 1 changed file with 4 additions and 19 deletions. +4 −19
> png_source/colors/pointer.py Show comments View 8
> png_source/colors/pointer.py @@ -35,6 +35,7 @@ def
> _draw_hull_class_level(self): repr(Shape[td_get_framepanel_pcs(dc) for dc in
> xrange(dc.cols)]), self.doublesize.values) \ } def _draw_hull_class_level
> [etc.]

It also inserted a helpful reference link to
[http://wiki.openarcade.com/wiki/List_of_Programmer_Constants](http://wiki.openarcade.com/wiki/List_of_Programmer_Constants)
(I've had to check: no, there is no openarcade wiki.)

Also, sample 217 is some mediocre Java, with comments and all. Impressive how
a single model can handle this all at once.

------
option
This is very impressive. The decision to not release the model is questionable
imho. There are labs, companies and state agents which have way more compute
than OpenAI and therefore can do even better.

Perhaps we need some kind of competition for detecting machine generated vs
human generated content?

~~~
nprateem
That's a GAN.

------
minimaxir
The generated text sounds _too_ good; is it possible that the model overfit
the source material (especially since the n-previous-tokens value is infinite,
while other approaches like char-rnns/textgenrnn use a fixed window length)?
It's something I've encountered many times while working with text generation.

~~~
DenisM
It _is_ uncanny.

I would expect a language model to pull up different person names for each
time that one was called for. For a person name to be consistently used
through several paragraphs it is not enough to rely on word co-occurrence.

If I had to produce a text like this I would simply take an existing text and
replace randomly chosen words with other similar words (as hinted by amvalo).
Similar as in - words that tend to occur in similar contexts. So John->Bob
throughout entire text. But that would not be a language model product
anymore, and where is fun in that?

I should set aside some time to read this paper.

~~~
dannyw
That’s because the transformer model used has memory.

------
winterismute
Seems like in the near future the idea of "turing test" and the one of "not
fake news" will eventually coincide...

------
lordnacho
This is so crazy good, someone needs to do a Turing test by sending it to some
unsuspecting publishers.

I get the feeling that debatepocolypse is not far away. Every forum can now be
spammed with reasonable sounding gibberish that humans will have to slog
through.

~~~
Retra
That's not a Turing test. A Turing test is an ongoing dialog, not a one-shot
filter.

~~~
Isinlor
Add some bias to create dialogues when sampling based on movie subtitles and I
could see it passing proper Turing test.

~~~
Retra
A proper Turing test allows you to ask questions whose answers must indicate
novel introspection, learning, and social awareness. It's absurd to think any
computer system today would pass such a test.

~~~
lordnacho
This thing does exactly that. You can see in the text it generates, it sounds
like someone who's thought about things and keeps a consistent tone about the
subject.

~~~
Retra
That is not what it needs to do to pass a Turing test. It needs to sustain
_ongoing_ dialog and thought, while maintaining a consistent awareness of
another human's perspective in real time. It needs to do _what a human does_
while having a conversation. And humans don't just spit out responses to
questions. They get bored. They get distracted. They display tonal
inconsistencies in response to emotions.

------
avivo
This was only a matter of time.

For the DEFCON AI Village in August I talked about the implications of this
sort of tech, and how that impacts how we release "exploit" code / think about
"cognitive vulnerabilities": [https://medium.com/@aviv/what-does-a-world-with-
automated-so...](https://medium.com/@aviv/what-does-a-world-with-automated-
social-engineering-look-like-79cd09b5a7b1).

If you are doing work in this space, either in ML research or related
security, you _need_ to be thinking about implications (also see e.g.
[https://maliciousaireport.com](https://maliciousaireport.com)).

~~~
levesque
I mean, the ideas are there. The scope of the project is probably too big to
reproduce for now, but eventually it will be accessible to your average
spammer / scammer. We _will_ get there. We _won 't_ be able to get these tools
locked, make them exclusive for a certain type of responsible AI specialists.
Someone will spill the beans, the models. People with bad intentions will
reproduce these results. To me, the real deal is how we will manage these
outbursts when they happen.

I assume discriminative models will solve the problem for a while, but as with
Generative adversarial networks, you will be able to train models that are
harder to discriminate against. I posit we're in for a big societal change
(maybe more a content crisis) sometime in the next 10 years. Pretty sure we
won't be able to keep it from falling in bad hands.

~~~
ctoth
“Spam-filters, actually. Once they became selfmodifying, spam-filters and
spam-bots got into a war to see which could act more human, and since their
failures invoked a human judgement about whether their material were
convincingly human, it was like a trillion Turing-tests from which they could
learn. From there came the first machineintelligence algorithms, and then my
kind.”

I Row-Boat, Cory Doctorow, 2005:
[https://craphound.com/overclocked/Cory_Doctorow_-
_Overclocke...](https://craphound.com/overclocked/Cory_Doctorow_-
_Overclocked_-_I_Row-Boat-A4.pdf)

~~~
mr_toad
[https://xkcd.com/810/](https://xkcd.com/810/)

------
plzHireMeImGood
Is this comparable to Google BERT (Bidirectional Encoder Representations from
Transformers) ? Benchmarks are different. Can I use any of this models for
other tasks no mentioned in the papers, something more than the "fine tuning"?

------
aaaaaaaaaaab
In 10 years, content written by actual humans will be a premium niche, like
tailored suits - reserved for the elites.

The rest of us will be force-fed with machine-generated garbage.

~~~
Franciscouzo
That doesn't makes sense, text can be distributed at marginal cost, tailored
suits are expensive because there's not a lot of supply.

~~~
aaaaaaaaaaab
Tailored suits used to be cheap too, but taylors went out of business due to
mass production.

------
zellyn
The generated Unicorn story has about the writing quality (in both senses:
standard and feel) of fanfic

------
roywiggins
I'd love to see what it would produce if you fed it the first sentence of your
average Nigerian Prince scam. Fully-automated phishing- you could even
automatically toss in a couple details about the recipient and let the AI riff
on that for a bit.

------
paraschopra
Anyone who’s done large scale model training like this, an you shed light on
following questions:

What is the process like? Do you prototype locally? How do you confidence that
the only limitation to good results is more compute power and NOT the model
architecture or applicability of deep learning to a particular task? At what
point do you decide that shelling many tens of thousands is OK? How often do
you do large scale training only to find non-impressive results and hence the
money wasted?

~~~
slashcom
There’s a natural way to parallelize these models so that using 128 GPUs is
the same as a 128x batch size. You can similarly simulate 128x batch size by
accumulated gradients before backpropping. So you can test on just one or a
few GPUs before you run the full thing.

By that point you know it’s going to work, it’s just a matter of how well and
whether you could’ve done nominally better with different tuning.

There’s been enough research leading up to this paper to suspect that just
scaling larger would play out.

~~~
paraschopra
Thanks.

>By that point you know it’s going to work, it’s just a matter of how well and
whether you could’ve done nominally better with different tuning.

This can't be true in all cases, right? I'm assuming that for many initially
promising results on less-compute when they scale it, the results aren't
impressive. I'm very curious to know what is the trials-to-success rate of
publishable results when big-compute is thrown in the mix.

~~~
slashcom
It’s indeed a very high trials to success ratio. Again though, there’s enough
papers preceding this one that you could have good confidence in the effort.
Another thing that helps is orgs like OpenAI have their own servers, rather
than renting ec2 instances.

You also don’t just launch that many things and them ignore it. You monitor it
to make sure nothing is going terribly wrong.

But yeah there’s also the fact that if you’re Google, throwing $2m worth of
compute at something becomes worth it for some reason (eg Starcraft)

------
fouc
I think this would be extremely useful when we can do the inverse. Basically -
can we detect if someone's writing is nonsensical or not? Can we detect if
someone that is producing many well written essays is adhering to reality or
not? Are they subtly re-defining terms, using flawed examples, etc?

The generated example of the biologists discovering a unicorn herd is too
convincing on its own. It's only because it's so outlandish that we get the
sense it's fictional.

~~~
levesque
Our models have reviewed your submission, and deemed it to be 87% incoherent,
52% redundant, and 24% fake. We therefore reject your submission. Sincerely,
the Chief Bot Editor.

------
antpls
It's beautiful that the reduced model itself is only 175 lines of Python code,
thanks to TensorFlow :
[https://github.com/openai/gpt-2/blob/master/src/model.py](https://github.com/openai/gpt-2/blob/master/src/model.py)

------
lotaezenwa
this will definitely be reverse engineered and open-sourced.[0]

[0]
[https://en.wikipedia.org/wiki/Streisand_effect](https://en.wikipedia.org/wiki/Streisand_effect)

------
zawerf
It's pretty interesting that their training set consists of "outbound links
from Reddit which received at least 3 karma". There are definitely large
subreddits which are flooded by highly voted fake news which you don't want to
emulate (unless that's the goal).

It also reminds me of a short fictional story which explores what would happen
if an AI learn how to maximize reddit's sort by controversial score instead:
[https://slatestarcodex.com/2018/10/30/sort-by-
controversial/](https://slatestarcodex.com/2018/10/30/sort-by-controversial/)

Maybe that dystopian story is closer to reality than we thought?

------
teabee89
The best part is "Scroll down for video" from [https://blog.openai.com/better-
language-models/#sample3](https://blog.openai.com/better-language-
models/#sample3) :)

------
beefman
I wonder how this would do on the Hutter Prize (I doubt it would beat the
current record but I'm curious what the result would be)

[http://prize.hutter1.net](http://prize.hutter1.net)

~~~
gwern
They include the enwik8 BPC estimates. It may not be strictly comparable - the
most obvious issue is that since the HP takes the compression paradigm of
intelligence, the compressor size is part of the total, and those 1.5b
parameters certainly are not cheap.

(This is one reason I think the HP is outdated. The corpus is not big enough
to allow the superior asymptotics of approaches like RNNs or Transformers to
compensate for their far larger binary size. HP is not measuring progress on
an intelligence metric we care about, it's sort of measuring a 'demo scene'
metric of intelligence.)

------
mrfusion
I’m most impressed by its ability to answer questions about the text. Why
can’t someone built something like this on top of Wikipedia? That would be
amazing to ask Wikipedia any question you can think of.

~~~
piggyzach
Google has already built this, but it definitely isn't perfect.

[http://www.seobythesea.com/2014/10/google-fact-questions-
ent...](http://www.seobythesea.com/2014/10/google-fact-questions-entity-
references-unstructured-data/)

------
kaffee
Not releasing the model? These people aren't scientists.

edit: toned down a bit.

~~~
schoen
Somehow, some part of me really wants to see what this model would generate
with that as a prompt.

------
LHxB
This reminds me of Dürrenmatt's "Die Physiker
"([https://en.wikipedia.org/wiki/The_Physicists](https://en.wikipedia.org/wiki/The_Physicists)).

While this has indeed very scary implications, one should be aware that if
it's thinkable, eventually it will be thought (I'm paraphrasing here).

------
mark_l_watson
Wonderful results. I Don’t think I will experiment with the smaller available
model, at least right now. I am still happy with BERT, especially for
basically solving anaphora resolution (coreference of pronouns, etc.)

------
braindead_in
Seems like magic! I wish I could do something similar with our chat support
questions and answers. It would be nice to have something like this built-in.

------
grok2
Nobody has asked about the animated text on the left at the top -- how is that
done? That is more interesting to me!

~~~
marvy
It's an mp4 video. But then the next question is: how did they make that
video? And the answer to that is... I have no idea.

------
dogcomplex
Plot twist: all the comments on this thread were auto-generated.

------
mrfusion
Can anyone do an eli5 of how this works?

