Hacker News new | past | comments | ask | show | jobs | submit login
Better Language Models and Their Implications (openai.com)
426 points by yigitdemirag on Feb 14, 2019 | hide | past | web | favorite | 130 comments

This kind of "blocking-and-tackling" work is important.

The authors take a well-known architecture, the Transformer[a], configure it with a progressively larger number of parameter, train it to predict the next word conditioned on previous text, using a large dataset consisting of 40GB of text scraped from the Web, and test each trained model on a range of zero-shot transfer-learning tasks.

Remarkably, the performance of a Transformer in the tested tasks improves log-linearly with the number of parameters, suggesting that even the largest model tested, with 1.5B parameters, still underfits 40GB of text.

This is compelling evidence that we do NOT need new architectures, NOR new kinds of training objectives, NOR new theories, for better language modeling! We can get better language modeling simply by increasing model capacity (i.e., by adding more parameters to existing models), which becomes easier and simpler to do as hardware continues to improve over time.

Great work.

PS. In case it's not clear: I'm not saying we should suddenly stop searching for new, better ideas and architectures. That would be silly. Please don't attack a straw-man :-)

[a] https://arxiv.org/abs/1706.03762

> …we do NOT need new architectures…

You do realise the irony in stating this regarding transformers that arguably made their first appearance as decomposable attention in 2016 [1] and then as transformers in 2017 [2]? It is not as if this is a vanilla RNN straight out of the 90s sweeping the floor with decades of model innovations, rather it looks like we are seeing the rise of a new, simple model category that works remarkably well – akin to how Mikolov et al. reconsidered how to learn vector representations back in 2013 [3] to ingest magnitudes more data than previously possible.

[1]: http://www.aclweb.org/anthology/D/D16/D16-1244.pdf

[2]: http://papers.nips.cc/paper/7181-attention-is-all-you-need

[3]: https://arxiv.org/abs/1301.3781

I would not call myself a model-focused researcher so I would love to play down their importance, but to be intellectually honest one should call out hyperbole where ever one sees it.

Agreed, the transformer's approach to memory/attention is certainly novel and a non-trivial departure from RNNs. It's also questionable the extent to which this is not over-fitting. Some are already finding phrases on the web the model is clearly stitching together. This is not to underplay the impressiveness of stitching the correct phrases and n-grams in a thematically coherent way for paragraphs at a time, it's still very impressive (this transformer should be nicknamed Bumblebee).

Rather than under-fitting, I'd consider this diminishing returns in the model's ability to make use of the available information. You can get within 10% of it on the challenging winograd schema task using only 2%-5% of the used data. Many here are programmers and computer scientists, rather than stoop to hyperbole we should try to understand. What class of automata can this model learn? Finite state machines, push down automata? Transformer has memory but no loops so I very strongly doubt them to be more than that and yet they perform so well on language (but then again, not simple program learning) tasks.

Do u have any recent links to program synthesis (with transformer or other novel attempts ?)

The point I was trying to make is that we can get better language models with more computation, not that we should stop researching new ideas and architectures. In hindsight, perhaps the language in my post wasn't sufficiently clear on this. See my PS. You and I are not in disagreement :-)

I do not get it. If what you are trying to say is: “more data/computation helps”, then why even bother throwing in models, training objectives, and theory into your statement? We have known that more data helps for at least two decades, so saying this is stating the obvious. Are you then trying to say: “Right at this very moment we can push a bit further with more data/computation given what we have for this specific task”, well, then again this is nothing particularly insightful to bring to the table as surely this occurs at least once a month for a given task/model category.

This is not a matter of clarity, it is a matter of form – drop the hyperbole and speak specifics rather than trying to appease both sides of a fictitious divide.

Agreed that "simply" scaling up with more compute will result in progress and useful systems, and work in that direction is interesting and valuable. But, while we may not need new architectures or training objectives to make progress, we do need them to approach human level sample complexity. Humans don't need to read through 40 GB of text multiple times to learn to write.

> Agreed that "simply" scaling up with more compute will result in progress and useful systems, and work in that direction is interesting and valuable. But, while we may not need new architectures or training objectives to make progress, we do need them to approach human level sample complexity.

Yes, agreed. Nothing I said above contradicts that! :-)

> Humans don't need to read through 40 GB of text multiple times to learn to write.

Yes, that's true... but to keep the comparison fair, note that we do need many years of schooling to learn to read, say, at a high-school or college level. And before learning to read, we first must learn to speak, which surely helps. And we also get to inhabit bodies that see, smell, touch, and interact with the physical objects that we read and speak about during our formative years, which also helps. The more one thinks about it, 40GB of data is actually a tiny figure in comparison to the amount of training data that flows continuously to our brain from all senses. I think I read once that our brains process on the order of 10 to 100 GB of training data per second.

Conscious processes are estimated to work on the order of 10^2 bits. Vision, at the retina, is estimated at 10^7 bits/sec. It drops another order of magnitude by V1. Also note that long as they're not isolated, a deaf and blind person has no trouble getting to full human reasoning ability despite being vastly more impoverished in available data compared to the average person.

A human will also be learning vision, hearing, walking, physics, causal reasoning, and much more. This comparison just isn't well grounded. Task specific is how much training does a young brain require to learn to produce language? If the brain comes with innate advantages then rather than be resort to inefficiency and excusing our models, we should try to see if they can be bettered.

I'm not well versed at all in signal theory, so I'm genuinely curious how these bitrate estimates are made, and would love to see the source of these specific numbers.

How do you estimate the effective (digital) bitrate of an inherently analogue system?

After 40 GB of text the model doesn't know anything about how the world works, and it shows many times in the examples. Nobody would do some of those mistakes, not even young kids. Other mistakes are more subtle but still show a total lack of understanding.

Then yes, it's enough to write text that nobody really cares about and that could cover a lot of what we read.

It's because nobody dumps 40GB on kid. Kids go through long process of feedback-corrections. I imagine if there would be some crowd funded project to provide feedback about mistakes to this model, it would learn and produce better results fast.

40 GB is surprisingly close. I estimate I already read at least 4 GB of text so far. That's just 10 times more samples. I probably write better than GPT-2, but certainly not faster.

4GB of English? 500 words per single-spaced page, 5 letters on average per English word, so 2500 bytes/page.

4 gigs then would be 1,600,000 pages.

That's 219 single-spaced pages per day every single day for 20 years straight. High, but I guess not outside the realm of possibility depending on the complexity of the text.

For perspective, 219 pages amounts to about a railway novel in a day, or about two ~800-page fantasy doorstoppers in a week. At a rule of thumb of about a page per minute for light reading, it's just under four hours: a hefty investment, but well within the realm of possibility. High compared to the general population, but table stakes for a book club or otherwise avid reader.

I'd've easily doubled or tripled that over the summer as a teenager.

I guess the data requirements act as a stand-in for our genetic evolution... Our brain models are good at learning the things we learn. Our computer models aren't yet.


I think your conclusion is too strong. Yes, we know bigger models and data generally lead to better performance (e.g. BigGAN results last year) but progress in architecture can still speed up progress in ML tasks. If we were still stuck using RNNs and LSTMs for language modeling we wouldn't be talking about this news today.

I feel like my human brain underfits a lot of data.

My human brain definite zero-shot transfer learns almost everything.

While censoring the full data set seems in some way to support the rationale of the OpenAI charter, it also means that only state actors and very well-funded entities will be able to use the work to create models of the size necessary to do the impressive stuff in the write up.

Based on the concerns, it would seem that restricting the capabilities only to state actors would have the opposite of the intended effect. Why not let thousands of amateur researchers, undergrads, etc., use the model to detect instances where the model was used to generate text, etc.?

They can always revisit the decision once other entities seem to have made this progress. In the meantime, they don't have to be so disruptive.

I would guess this reduces the risk. Why would you say it does the opposite?

My argument: state actors might misuse this tech, but letting any script kiddie do whatever they want almost guarantees someone will misuse it.

Which is exactly what happens whenever there's a leak of NSA or other foreign government 'hacking' tools. As soon as they're public, ransomware authors and other shitty actors all deploy them to steal as much as possible before systems are patched.


Wannacry: https://en.wikipedia.org/wiki/WannaCry_ransomware_attack

NotPetya: https://en.wikipedia.org/wiki/Petya_(malware)#2017_cyberatta...

I think it’s a bit different because the attacks are not entirely clear. The power of many of the attacks would be for them to fly under the radar because nobody was expecting it.

So having lots of script kiddie attacks (which would be sloppier and easier to notice) would lead to a more rapid adoption of safeguards.

Or, in 3 years, anyone. I suggest everyone start thinking on ramifications now.

Sure, not releasing the full trained model probably delays it, but sooner or later a bad actor will do their own scraping and train their own model and share it around and the genie will be out of the bottle. Then what?

I think we need to be conducting AI research (and building software generally) under the assumption that all of it will eventually be repurposed by bad actors. How would our practices be different if we consistently and cautiously did this?

Here's a thought experiment: how would the Manhattan project have been different if it were carried out in the open and its products were instantaneously and infinitely reproducible? What is the MAD equilibrium of AI research? I think the impact potential is similar even before AGI.

A lot of the advancements that made the Manhattan project were published by the Germans. On hearing about Hiroshima, Otto Hahn was "'shattered,' and went on to say that he felt 'personally responsible for the deaths of hundreds of thousands of people,' believing that his discovery had made the bomb possible." (https://www.washingtonpost.com/archive/opinions/1992/03/01/t...)

wasn't that the point of this whole openai thing? they didn't like the idea of there being a club with just google in it that had access to resources and funding to collect and train on massive datasets so they were going to be the "bad actors" who would do their own scraping, train their own models and share them around?

isn't it supposed to be called OPENai?

they don't want to share the data because they don't want to throw away the edge they've gained by collecting it. :)

computer programs that generate human like text aren't dangerous, the internet is full of human like text that is mostly bullshit anyway.

I have had the same impression regarding their work on dota. They got a lot of publicity with it but their work is not open at all. They have released neither their code which runs the bots on dota2 nor their training code nor the final model. All we have is video recordings of a few games against humans.

>faint praise for soccer champion who apparently keeps winning games through poor performance.

I can’t disagree.

CommonCrawl already has open dataset in petabyte size ready on AWS. Even if it didn’t exist, scrapping 80GB of data in AWS is trivial. I am surprised authors considered this as such a big deal. Also notice that performance is not anywhere close to humans. It sort of works and it’s astonishing that it does but long way to go before we have to fear weaponizing text generation.

I think the big deal is the size of model, BERT large is 300M params, and this one is 1.5B. Bert has been trained on pod with 64 TPUs, and this model requires even larger GPU/TPU cluster. There is no way indie underfunded researcher can train such model.

> ... some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

Man, the auto-generated text is hilarious. And uncannily good. Though I have to wonder if it's total random fluke or there's one among their 1.5 billion parameters that predict "likelihood of mythical bestiality in South America".

Considering that it memorized Gettysburg Address verbatim and knows Charles Darwin wrote Origin of Species, it probably knows about both South America and unicorns more than me...

Unicorn story does demonstrate that it knows South America has to do with Argentina, Andes Mountains, and University of La Paz.

I was honestly surprised by the quality of the generated text. While I can't say I've been following the state of the art in the last months, this seems like a pretty important step forward. Furthermore, at the end of the post they note that the samples are somewhat representative of their results. Maybe they should consider releasing a text file with some more (not hand-chosen) samples? Whatever the case, fantastic work, my congratulations to the authors.

Thank you! We've released 500 random unconditional samples from GPT-2 at https://github.com/openai/gpt-2/blob/master/gpt2-samples.txt

Wow, some of these really go off the rails but those that only kinda go off the rails are absolutely hilarious and/or bizarre.

A few summaries of ones that I looked at which appeared to be more or less staying on a single topic:

Sample 1: An Austin nonvegetarian vegetarian restaurant encounters a series of difficulties in opening, as its nonexistent but extensive menu depicts a wide range of food options and the restaurant opening is delayed by financial and food-safety concerns. The nonvegetarian restaurant has also annoyed vegetarian clientele with its plans to be a vegetarian restaurant. Food reviewers nonetheless manage to eat at the new restaurant and post their reviews; the establishment also becomes "the first Austin restaurant to ride a ride-sharing service in Austin since the 'Bike-Share — Share the Ride' controversy erupted".

Sample 3: Denise Schroeder encounters perhaps the most complex and confusing legal trial in American history as she gets murdered, is accused of murder, comes under investigation for liquor law violations, becomes an abuse victim, prompts others around her to commit suicide, is arrested, and ultimately wins the right to marry her same-sex partner.

Sample 6: Cooking rice and beans by steaming a roast in a wok is easy! Just follow these 40 simple steps to update your XBox firmware, and you'll end up with a nice fried soup.

Sample 8: A protest march against drought in South Asia attracts very broad support, but its radical nationalist message is simultaneous endorsed and feared by virtually everyone in the region.

Sample 13: The global bicycle industry, although very large, is perhaps unsurprisingly extraordinarily unpopular and economically irrelevant following a very complex cycling accident involving an area woman.

Sample 14: Indian restaurant owners in Canada have to contend with an amazing array of economic, technological, and environmental challenges as the infrastructure of their society seems to collapse around them -- but they do all right in the end.

Sample 26: The previously untold history of Blackwater USA, in which founder Erik Prince is capable of meeting Bill Clinton on a day in January that was actually in March, and results in Blackwater and Prince having shady dealings with all sorts of celebrities -- though the organization "may not like to admit what a true dick the injustice has wrought".

Sample 29: What does the KKK believe? Apparently, lots of complicated conspiracy theories about black history. Also, if you find their theories traumatic, you can find "several biblical [...] references that can be used to up your level of moral competency in your longterm relationship with Mr. Soros."

Sample 30: Wikileaks reports harshly on speculations of Linux adoption by rural tribal mobile device users.

Sample 37: faint praise for soccer champion who apparently keeps winning games through poor performance.

Sample 49 (following the end of the reviews section): world traveler and masterful hotel architect Frederick Beckey remains unperturbed by racist gatherings at his hotel.

Sample 50: comedians fear the looming resolution of a long-running comedian feud. Also, Soviet spectators at the Munich Olympics cheer Yuri Gagarin, who, although escorted by Russian soldiers, uses rockets and airplanes in his Olympic performances to win multiple medals. The crowd of Soviet spectators, "[l]argely composed of high school students in tight-fitting vacant uniforms [...] walked away believing that Gagarin was the next North America's greatest athlete".

I'm looking forward to computers generating short clips from these stories. I wouldn't be surprised if some of them get viral (especially if A/B testing would be incorporated by looking at when the users stop watching the videos).

Blackwater USA was born January 21, 2003, when Blackwater founder Erik Prince met Bill Clinton in March of that year. In an email dated 08/28/99, Bill Clinton wrote to "Uncle Ray." Erik Prince was a small business owner and business leader. When Dagan tried to evict Blackwater USA from its Fort Carson, Colorado compound in January, 2000, the Iraq War began fourteen years ago. And my uncurent, signed-and-notarized correspondence with a Bill Clinton representative on October 27, 1999, speaks volumes about one's knowledge of the INC diamond and fronts and shields Brazilian cache testimonials in commissioning documents for the SCOUTS network. In other words, the INC was ALL OVER the party that night and very highly sensitive agreements between the same three foreign entities in Chicago on BLUESTAIR retail banks. Yes, I believe the whole "####BETTIN Powers intact." But it was Erik's trans-national connections through his vast buyout shops in the Far East that made him the savvy buyer in this slice of the Direct Foreign Investment biz. Erik of Bankers Trust and Investments was working to implement the political policy and campaign advice Bernie Madoff rejected, both of which ultimately cost Hilary collectors many millions. Global Information similar to the names today haven't changed in the frequency folder of Erik's UFLAC machine, as evinced by the statement "West Coast growth project."

First members moved to Las Vegas and a quiet retreat in Mesa, Arizona came into existence titled Predius Group LLC <http://www.predius.com/>

The massive Madison Avenue bank and hedge fund structure that devised illegal high rollers for such organizations as SS&XM 13/CCGB, Marc Rich and use of the old Puerto Rican nuclear smuggling routes from Santo Domingo continues, as deeply buried secrets leaked to me show that a master of legal portals exited from a long, multimillion compensation firm in Wisconsin for BP with a large bolillo-stuffed look like the federal prosecutor U.S. Grand Nomen Suisse makes to seem "human." One intelligence utility broke for their relative comfort in several layers of a "cheque factory/lobby/university."

Blackwater USA may not like to admit what a true dick the injustice has wrought. What the rape of Alexandria, a slap in the face to the American public seems to belong able to early final order technology, esp. Skyjack Special Forces Weapons, at the price of just 3 million followed by a back-door, fired check into Hillary's back pocket allows us to view this sub rosa of tax-proof lottery economics as messy FleetMicrosoft. investors that bloom with sedition, all hands seem tied at the start. We may never learn who the infamous Bradie Burglars were and for a directors' distance in Texas. obscurity that loafed Fletcher Marshall bridges Hab Saber guaranteed earned59,5/83/67 which for this two is two sacrificed, locks dread beneath the cave like bankers shielded. Noir in the wake of JFK stimulates a game of IDEAS over Hypothetically recruTCater John Waldincyv. The chairman of Sears Remedy Strategies and GSD Group Inc. New York had once taught a class called with the subject Logic and Order Manufactured Domination subject17 Excitement vacuum playoffs NASA policies war against Al Qaeda. A homeland security branch of the FBI using its HAWK MAN wooden stake farm & an AK assault rifle on Muslims , refuses to bend its political opinions in line with a Supreme Court accepting Hollywood records, magazines of political horrors, such post minority weekend get blown off. As a result of projecting it at God they sound and dissent singing. while boghoum badges still as nasty MO. Uncovered I act as the CERO, Scarlet Crusade presence. and begins to operate out of a converted American Air National Guard SOC air attache tube at the embassy port of entry. I adapt mode that accommodates the supervisor/chief and begins to adjust to alienation anomaly Lebanese charities making intercepts. ownsACK board of USCT to go from that I CAN READ AND DOB less than 09 or 2005 what is running on twitter, like We are to blame motherfuckersmobile. We BLACKWATERim audiences lesson stronger then in multiple appearances in 2008 or 2009, on their website is clearly for everyone to see.

Reading through these is amazing. It’s like an alternate reality...

There’s a whole article about whales attacking boats and posing a threat to shipping.

My days as a novelist are numbered.

Sorry about that, I should have not missed it. Since you're here, I'll ask a quick question to satisfy my curiosity. This is a pretty big step, even just looking at the benchmarks. Where do you see text generation going in 5 years? What about 10 years? As I said, no need to give long explanations, I just want to know what do you expect, as an insider,on an intuitive level.

Some of these are basically horror short stories:

======================================== SAMPLE 164 ========================================

I want to see a ton of households with kids with no food in their houses. Surviving famine requires storing up food in hopes a spoiled enclosure might provide the behaviors of food memories we crave. In near future, they will have to leave their cages.

We're spending all our time theorizing about animal minds, yet we have living, reacting animals. As if we can just no be in touch with our emotions at is practical.

My kale starts from an academic grown Himalayan. It hydrates in the garden, but it seems another species than we know is fermenting! My general lab is a recently renovated cave survival room with sun-filled bowls full of food for rats. The interior design is nearly utilitarian (no smart phones, blog, writing tools, or ovens there), except the cave ceiling. This is where we grow the food to feed animals. Which they will eventually eat.

My girl starts out with a kind of scaffolding green and wild, but soon she can't grow more than a few leaves. Who needs her monotub artistically scissors and wire? The boxy roots are brittle right away before growing into skin weeds. So that's not going to do it.

An unsteady start for another young carnivore's evening. I go in for some priming with baking soda and dry iron. A fun little experiment is once the kopi luwak seems she will grow some damp spikes. Consider this:chopsticks are soft under a microscope, webbies, and twining vines all apply more pressure from tangles of animal blood.

Taking over the cultural mythos of science, I peek over her purple skeleton spikes with an endoscope.

Once the higher organs take over, the kabuka unfurrows skin at a clever angle to reach past the body curse powered at this end of the lab. Curiously, without a hoe or a pot to turn into a tiled floor, she does not chew with her jaws.

I have undressed her to her base to fill the bowl. She is pushing her way to get her shelter to stretch out into, or if I've planned correctly, squish into. She does not move like a biological creature, pulling up with her haunches while excellent and facing directly downward. She prefers laying there moss-covered and mother-prey encouraged paying attention to her uphill path.

Lack of summer fur gives me ample room to poke through her skin, pull out black seeds, and taste the damp thing she'll roost under in future. Would the static reference very much reduce a virus being blocked within its first expenditure, nights? I'm not sure, but it's for sure a more effective inoculation. Hopefully, these experiments will be continued. Hopefully.

My guru and master who made us does not focus so much on running where we want as it does on empathy for the soil,, plants, and animals with which he touched in some perfect way. Delhi is often flooded my first morning in the shafts of windy sunshine. Let's not give success an easy choice.

The trees are in the underground and they change colors weekly. In simple terms, they're scattered flowers taking their cues from the transynthetic forest. While I'm pioneering, the one thing I sacrifice in the relationship is conducive to imagination. A full color-changing lets us enjoy cozy warmth while simultaneously losing whoever is formal and rational for the feel good activity. Like cosmic lap of the firmotsuites.

An American outcropping of stonemines blocks the sunlight. The unseen wave depose colonies to accommodate my habit; sandblasting. There are microbes from amoebas that change the composition of the water. In the habitat where the natural and mecopate are as thoroughly ever so mindful of the fence as the meesthe gardener has been, I'll shed this duality when I sit with her. Food grows from space; human needs feed them. Sandbringation turning our microbes and animals can save thousands in pandemic deaths. I value them all in metaphorical and real (if tough) tenacity.

Species survival deals good cop in these two every where storytelling machines.

Door is opened. My guru goes rescuing the kupi luwak elder from drought resistant drought. I alone have learned her humdrum bookish biogeography is yielding to light and fire. Cow has problems taking roots. Bring my love to her and make a mend. She needs a clean-air daughtersnail easy experience not in a month's time, but in just a few hours of thinking about reflexes, calendar dates, and natural spatial relations for a mind that has grown certain.

Is this the future of politics...


Photo: DEA / W.H. 038 WITHDRAWN: Rep. Allen West to run again in 2016

Former Rep. Allen West will run for the U.S. Senate in 2016 -- okay, maybe not quite against Florida GOP Sen. Marco Rubio.

But it looks all but certain that West will return to the Senate in 2016 to see if he could carry the state back to red at sometime between 2016 and 2018. Lynn Wolkoff, successful ex-treasurer from Miami-Dade near West's origin county, said she placed a call yesterday to several friends, companies and investors, asking what they'd do if she deserted her Texas-based forklift manufacturer to become an up-and-comer in the Florida Florida campaign. And in the process, she learned that West himself might want to run.

"Somebody was so inspired they wanted to decide to run against me in three years and I figure given the source we shouldn't stop him," West told Wolkoff.

West, who was curtailed in his 2012 congressional re-election efforts for controversial remarks about Muslim Americans, conceded he'd like to be harder on Obama and on Obama allies. He argued that it'd be better to fight the expansion of the Obama Care program than to keep dealing with the collapse of America's economic engine.

"Taxes should be increased for very high incomes. We should go after the highest-income earners. Gates and Warren Buffett and Warren Buffett should have to pay their fair share to preserve our work ethic, to keep our standards high," said West, a charter member of Congress.

Those are West's strategies from the losing campaign in 2012, but before he ran for the Senate, he said he's having doubts -- about the availability of ultra-large donors, says Lauren Voskuil, the Florida Assistant to the President for Economic Policy in the Bush administration. He pushed former Florida Gov. Jeb Bush against "self-funding" laws that ban him from using some super PACs - and West's strategy is strong, Voskuil said, calling Westbrook "a master strategist".

"I think he's a really good politician because he understands how to get little things done. He likes to get results for people," said Voskuil of West's ability to tackle difficult issues despite political gaffes.

Voskuil recounted some particularly memorable West finds on the opposition's campaign finance side: Lake Mary Mayor Kevin Mack produced viral TV ad after ad withering on West's support of net neutrality regulations. Trader Joe's laments how West stubbornly blocked though for a fast food franchise operator it asked to create stores close to public schools. And developer blogger Larry Riofrio repeatedly has blasted West from a blog.

"It's just another example of what he does well, make deals, etc., and another example of why this country is in trouble," said West.

Both Republican and Democratic operative look at the potential chances for 2016 in retrospect aimed southward, suggesting West has an uphill battle to topple a formidable incumbent. Terry Dalsia took aim at the stated motivation to run

"Mr. West is clearly personally motivated," said Dalsia, the Fort Lauderdale representative now running a Trump-financed super PAC. "He is running because he can do it in 2044. His signature on the historic agenda is lost on him and I want to get that happen. He will be leaving the Senate with every GOP vote in 2014 ... just not before. Long way off."

Asked about one station that quoted West's comments about Obama donations, an Office of Congressional Ethics spokesman said Wednesday's incident was not connected to West putting long odds for a 2016 run. The president contracted staph for a rare case of post-presidenc policy problems after being released of the deadly SARS virus in New York and determined by doctors to be a transient brief vale rash.

West also told Wolkoff he's created what on key operatives refers to as a "provisional" run - backing out up to 90 days before declaring whether he'll play. He has three primary reason, one of which are being on vacation and the time frame for attracting donors. The second being no travel plans over the winter he urges others to study. The last boils down to the fact he really, really thinks he could win. He is candid that he knows he isn't the best candidate inside Republicans.

Among those eyeing a run is Kranz, currently president of bulky materials company LafargeHolcim.lt. Briefly considered for the job of visit by Sen. John McCain, Goldwater's son said he opposes Texas Gov. Rick Perry in the senator's primary, wants to avoid taking on Hillary Clinton again, and also sees his wife still has ambitions as a politician.

"When I heard Sen. West said he's on the way out, I told myself -- that's great, he knows he's not

This sample suggests to me the AI arguably better-than-average-human writing and story composition skills.

Just as many pesticides mimic the hormonal and chemical signals of pests to drive certain behaviors that lead to eradication, this work mimics the linguistic signals of humans. I think viewing it metaphorically as the most sophisticated humanicide discovered to date is probably appropriate.

Consider that conventional munitions make an effective pesticide but are not used due to their side effects. Instead, chemicals are used to destroy or mimic the perception and production of various signals so that populations of unwanted critters effectively self-destruct.

Imagine a war fought with a weapon like this that left entire cities perfectly intact!

# end hyperbole

This is exactly the line of thinking that this article inspired in me... AI scouring the internet, figuring out what makes us tick and generating perfectly persuasive stories to convince us to to... what? I kept thinking about the story "Sort by controversial" by Scott Alexander: https://slatestarcodex.com/2018/10/30/sort-by-controversial/

Is anyone else troubled by them not releasing the source model/dataset/parameters here? Yes, the technology can be used for malicious means - but would argue that "DeepFaking" language is FAR less of a problem than "DeepFaking" video/photo/audio... which already occurs. Seems like they went back on their charter to share AI developments broadly ("not concentrate power") under the excuse of "safety."

(These results look fire btw)

Note: copied my comment from dupe thread

I agree with this. OpenAI isn't particularly 'open', compared to other AI research organizations (Notable ones that open source almost all their work are AllenAI and FAIR, but I'm sure there are others).

Wonder what their excuse is for not releasing the source model or code for their DoTA bot. Surely there's no safety issues there?

Quote from their charter [0]: Today this includes publishing most of our AI research, but we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research.

They referenced that from the article you commented on in the section 'release strategy' [1]: [...] we see this current work as potentially representing the early beginnings of such concerns, which we expect may grow over time. This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas. Other disciplines such as biotechnology and cybersecurity have long had active debates about responsible publication in cases with clear misuse potential, and we hope that our experiment will serve as a case study for more nuanced discussions of model and code release decisions in the AI community.

[0] https://blog.openai.com/openai-charter/ [1] https://blog.openai.com/better-language-models/#releasestrat...

This tech can easily be used to flood humanity’s shared brain with auto-generated propaganda. Schizophrenia of the internet in a way. There is plenty of incentive with Google algorithms favoring number of words and relevant keywords in content for rankings - you could have NLP bots lifting junk sites to top results.

To step ahead in that chess game, a detection tool for fake would be just training grounds for better GAN. Instead we may see a certifying authority that labels content as human generated, certified fact maybe? Wikipedia and reddit are not safe without fast automatic moderation either.

Do you have a brainstorm or idea/prototype submission site where people can submit approaches to countering bad ai actors? An white/grey hat ai bounty program of sorts?

Web of trust. Your results and how much you trust given text is based on who gave it to you. You assign trust to people you know, then it's a small world effect and recursive trust calculation based on who they trust.

Centralization won't work. Whether something is good or bad or fake must be subjective and based on your personal network / your beliefs. Otherwise, long term, I don't see how you could avoid dystopia.

This has a dangerous implication that we've seen recently with the newsfeed "bubble."

If dissenting opinions are never allowed into your sphere of influence then we may continue to see an accelerating polarization of our society towards extremes.

Bubble that's dangerous for society is when everybody believes the same thing, not when everybody has different set of beliefs.

I know the more popular bubble is information one and you make a fair point, but I believe it's powered more by recommendation systems than the things that you trust.

I got distracted lately and I wanted to clean it up before putting it out there but if somebody is bored: http://comboy.pl/wot.html

Great Idea...but better make sure it's not hosted on the internet.

It's becoming ever more certain that the transformer architecture is one of the largest contributions to AI (not merely machine learning, but AI), often beating LSTMs despite LSTMs being expressive enough to capture Turing Equivalence (at least in theory). Its main ideas are three: shorter paths help gradient flow, the training setup and the final key aspect, unhelpfully called self-attention. Self-attention is better thought of as a form of similarity gated key-value soft memory on which learning operations allows Transformers to learn non-trivial programs with contextual weights look-ups.

I also notice reported tries, suggesting some level of curation. While this level of generation is undoubtedly impressive and a sign of non-trivial levels of understanding, the ability to project along arbitrary dimensions of similarity at a fine-grained level and to learn from text instruction is more useful than text generation. Although the unicorn story was a really fun read, better than many humans already, I doubt it could have gone on for much longer. It maintains a theme but not coherently or fluently (see especially the Kennedy nanotech and recycling examples, comparing the dis-fluency there versus the excellence of the Civil War report suggest at least some over-fitting). These relatively minor caveats aside, this is unambiguously an outstanding result.

Winograd Schemas are the single metric to track if interested in understanding how language understanding is truly improving. OpenAI reports 71% and wrongly report the previous record as 63%. The current record here is at 65% https://gluebenchmark.com/leaderboard though not fully comparable. Will OpenAI be submitting? Note that you can get to 60% using about 1-2 orders of magnitude less data and compute.

It concerns me that results here are so far dependent on such large data and computation. However, based on several papers I've read, I do not believe this to be inherent even in transformers. I plan to do some experiments on this when I free up some bandwidth.

If everyone is pulled in by the glamour of working for a well funded, prestigious operation then it should be no surprise that they do not consider paths which operate on several orders of magnitude less data and computational resources.

We all should consider bringing about a group of researchers who swear to an austere computational life of a single GPU, no more than 4-8x average RAM and CPUs that do not cross 90 Watts. The Bicameral Order would be a good name for such a group.

Yeah, there are definitely still places the samples fall short! Keep in mind we're still using very naive sampling techniques.

RE Winograd: WNLI is different, see https://arxiv.org/pdf/1804.07461.pdf

Amazing results, how excited are you? :)

You're right, I noted too that the comparison isn't direct but then, I wasn't justified in calling out the gap claim as wrong, so sorry for that. I think it'd be nice however, to have it undergo an external or more neutral test of performance. I say this without at all doubting the quality of the results.

Started a Google colab with the interactive text generation script. https://colab.research.google.com/drive/1da54684tFMjPbR5idbv...

to be clear, this is the "politically innocuous" open sourced model. the results are not impressive.

Yeah the results aren't amazing on their own, but if you treat them similarly to how they do over at Botnik -- with some human curation involved-- you can find some interesting sentences.

You need to run it a few times; the results vary very wildly.

Yea, somewhere in between the released model and the model they presented, the deep net has learned narrative structure.

This is the approach I recommend; Colaboratory's GPU is helpful when the sequences are really long.

Super appreciate this!

Thank you.

These samples are freaky good. We're approaching some threshold very, very fast. I'm not sure what that threshold is, and whether or not crossing it is a good thing, but soon we'll be there.

I like how it effortlessly switches into "git diff" mode at the end of sample 112. Sadly it doesn't do whitespace.

> Showing 1 changed file with 4 additions and 19 deletions. +4 −19 png_source/colors/pointer.py Show comments View 8 png_source/colors/pointer.py @@ -35,6 +35,7 @@ def _draw_hull_class_level(self): repr(Shape[td_get_framepanel_pcs(dc) for dc in xrange(dc.cols)]), self.doublesize.values) \ } def _draw_hull_class_level [etc.]

It also inserted a helpful reference link to http://wiki.openarcade.com/wiki/List_of_Programmer_Constants (I've had to check: no, there is no openarcade wiki.)

Also, sample 217 is some mediocre Java, with comments and all. Impressive how a single model can handle this all at once.

Sample 271

"ORIGINAL ARTICLE Year : 2007 | Volume : 42 | Issue : 4 | Page : 421-429

Masturbation as a strategy of parenthood: incidence and socio-demographic characteristics in young people

Santosh S. Bhatt and Regina M. Wagner1

Department of Human Circulation and Heart Diseases, Neuromed, Coachwerk-Werke, Bremen, Germany

Date of Web Publication 28-Jan-2008

Correspondence Address:

Santosh S. Bhatt

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/0972-0285.10374


A review is carried out on the activities, influence, and consumption of males versus females in childhood and adolescence and what effects masturbation is associated with their socio-demographic characteristics, experience with pregnancy and childbearing, aggression and behaviour. According to general sociological theories, males and females have different interests in interaction and stimulation during childhood and adolescence but will adjust accordingly when making their real life decisions. The current results show that males are more often receptive to prepubescent and adolescent communication about sex, reproductive role in future life and skepticism about their own genitaliating potential while females differ in behavior and preference when it comes to erotically stimulating activities or psychological stress than male behavior."

Note that web publication date and offline volume is coherent, supposed-to-be superscript 1 to show affiliations, and that the corresponding author is the first author.

Sample 105

"In 2009, researchers at NASA from the Air Force Office of Scientific Research's Microwave Propulsion Laboratory held a series of contests, including one for "best brain-computer interfaces." In their prize match, teams from around the world competed to develop new brain mapping techniques for ways of collecting data from inside people and rewriting it over and over. One single original BrainBridge video colonized all channels of web-viewing in the world. I've watched many of these videos on YouTube.

Advertisement - Continue Reading Below

On one hand, if you're convinced that the next breakthrough in neuroprosthetics will be profoundly more powerful because better/faster or cooler/more reliable algorithms capture the wiring that connects our neurons, then it makes sense to equip humans with mind-reading implants. But when it comes to real brain-reading devices, there are a slew of caveats, arguments, and threats on the horizon that mean that these technologies cannot take off anytime soon. If you truly think that these technologies will be the basis for a global brain-reading surveillance state, it makes sense to act NOW. Since so much money is on the line (in computers and other components, research funding, patent and trademark rights, marketing, investor interest) making and maintaining a head-mounted-camera program is done with a certain level of speed by enterprises like Google and Facebook."

That "Advertisement - Continue Reading Below" is very realistic indeed? These technologies, like GPT-2, will be the basis for a global brain-reading surveillance state, so indeed it makes sense to act NOW, like not releasing training data. Kudos. But this will be replicated with a certain level of speed by enterprises like Google and Facebook. Well said.

This is very impressive. The decision to not release the model is questionable imho. There are labs, companies and state agents which have way more compute than OpenAI and therefore can do even better.

Perhaps we need some kind of competition for detecting machine generated vs human generated content?

That's a GAN.

The generated text sounds too good; is it possible that the model overfit the source material (especially since the n-previous-tokens value is infinite, while other approaches like char-rnns/textgenrnn use a fixed window length)? It's something I've encountered many times while working with text generation.

It is uncanny.

I would expect a language model to pull up different person names for each time that one was called for. For a person name to be consistently used through several paragraphs it is not enough to rely on word co-occurrence.

If I had to produce a text like this I would simply take an existing text and replace randomly chosen words with other similar words (as hinted by amvalo). Similar as in - words that tend to occur in similar contexts. So John->Bob throughout entire text. But that would not be a language model product anymore, and where is fun in that?

I should set aside some time to read this paper.

That’s because the transformer model used has memory.

The unicorn story struck me as being structurally very similar to the standard wire stories about new species discoveries. That's probably why it's the best of the bunch.

Try googling sentences.

Hmm, I tested a few sentences and it didn't turn up any exact matches (aside to this article), so maybe I'm wrong.

With a temperature of 0.7/1.0, that's enough for sufficiently random text I suppose. (the raw, uncurated generated text using the smaller model is a bit more random: https://raw.githubusercontent.com/openai/gpt-2/master/gpt2-s...)

Those samples are from the large model (GPT-2)! Regarding memorization vs. generalization, see our paper for more analysis.

Have you tried to determine which parts of the training data contributed to the model generating a certain output? I wonder whether the model avoids reproducing exact matches of the training data by splicing several similar articles together.

For example, the generated text about the Civil War mentions that Thomas Jefferson Randolph [0] was named after his grandfather, the president. But is the wording mostly influenced by articles talking about that specific fact, or does it draw from more general examples of someone being named after their grandfather?

[0] https://en.wikipedia.org/wiki/Thomas_Jefferson_Randolph

Does it seem like there will be any way to go backwards from the sample to the prompt?

From a safety perspective, it would be useful to see what prompt a piece of text might have been generated with...

Seems like in the near future the idea of "turing test" and the one of "not fake news" will eventually coincide...

This is so crazy good, someone needs to do a Turing test by sending it to some unsuspecting publishers.

I get the feeling that debatepocolypse is not far away. Every forum can now be spammed with reasonable sounding gibberish that humans will have to slog through.

Few MIT students did that with their simple text generator. It is one way to expose conferences with low submission standards.


That's not a Turing test. A Turing test is an ongoing dialog, not a one-shot filter.

Add some bias to create dialogues when sampling based on movie subtitles and I could see it passing proper Turing test.

A proper Turing test allows you to ask questions whose answers must indicate novel introspection, learning, and social awareness. It's absurd to think any computer system today would pass such a test.

This thing does exactly that. You can see in the text it generates, it sounds like someone who's thought about things and keeps a consistent tone about the subject.

That is not what it needs to do to pass a Turing test. It needs to sustain ongoing dialog and thought, while maintaining a consistent awareness of another human's perspective in real time. It needs to do what a human does while having a conversation. And humans don't just spit out responses to questions. They get bored. They get distracted. They display tonal inconsistencies in response to emotions.

I expanded on my comment somewhat more here: https://www.metaculus.com/questions/73/will-the-silver-turin...

Feel free to predict and comment too!

We must run a bot on HN with upvotes/downvotes for comments acting as reinforcement.

This was only a matter of time.

For the DEFCON AI Village in August I talked about the implications of this sort of tech, and how that impacts how we release "exploit" code / think about "cognitive vulnerabilities": https://medium.com/@aviv/what-does-a-world-with-automated-so....

If you are doing work in this space, either in ML research or related security, you need to be thinking about implications (also see e.g. https://maliciousaireport.com).

I mean, the ideas are there. The scope of the project is probably too big to reproduce for now, but eventually it will be accessible to your average spammer / scammer. We will get there. We won't be able to get these tools locked, make them exclusive for a certain type of responsible AI specialists. Someone will spill the beans, the models. People with bad intentions will reproduce these results. To me, the real deal is how we will manage these outbursts when they happen.

I assume discriminative models will solve the problem for a while, but as with Generative adversarial networks, you will be able to train models that are harder to discriminate against. I posit we're in for a big societal change (maybe more a content crisis) sometime in the next 10 years. Pretty sure we won't be able to keep it from falling in bad hands.

“Spam-filters, actually. Once they became selfmodifying, spam-filters and spam-bots got into a war to see which could act more human, and since their failures invoked a human judgement about whether their material were convincingly human, it was like a trillion Turing-tests from which they could learn. From there came the first machineintelligence algorithms, and then my kind.”

I Row-Boat, Cory Doctorow, 2005: https://craphound.com/overclocked/Cory_Doctorow_-_Overclocke...

On the flip side, this will give an edge to specific groups that can communicate in highly vernacular/non-written/non-standard languages.

"you need to be thinking about implications" Thinking about it, won't put food on your table.

These are not necessarily similar to "exploit" codes, but more like a box of innocuous tricks often needed for good purposes. Gather enough of those throw enough compute, then you go from something that a human can perceive as obviously false, to something human-like.

There are plenty of people out there, with the competence to replicate those results, yet only a few big companies reap most the rewards, monopolize user data and interactions, and open-source for free results which could have been monetized by a company to provide useful services to users.

Seriously, how can one expect that most of those Phd who didn't get recruited don't use AI for nefarious uses?

Is this comparable to Google BERT (Bidirectional Encoder Representations from Transformers) ? Benchmarks are different. Can I use any of this models for other tasks no mentioned in the papers, something more than the "fine tuning"?

In 10 years, content written by actual humans will be a premium niche, like tailored suits - reserved for the elites.

The rest of us will be force-fed with machine-generated garbage.

That doesn't makes sense, text can be distributed at marginal cost, tailored suits are expensive because there's not a lot of supply.

Tailored suits used to be cheap too, but taylors went out of business due to mass production.

There was an old science fiction novel set in a world like that. I've forgotten the name, but they called the robot-written stuff "wordwooze". (It doesn't work as a search term because some publisher is using it now.)

Added: it was The Silver Eggheads by Fritz Leiber. An odd, forgettable book itself.

But will ML companies pay link tax? Assuming ML would write/fill in forms to incorporate itself, have a conversation or two with some tax office person, and then off to the wild...

Where can I study computational law to stay ahead of the curve?

The generated Unicorn story has about the writing quality (in both senses: standard and feel) of fanfic

I'd love to see what it would produce if you fed it the first sentence of your average Nigerian Prince scam. Fully-automated phishing- you could even automatically toss in a couple details about the recipient and let the AI riff on that for a bit.

Anyone who’s done large scale model training like this, an you shed light on following questions:

What is the process like? Do you prototype locally? How do you confidence that the only limitation to good results is more compute power and NOT the model architecture or applicability of deep learning to a particular task? At what point do you decide that shelling many tens of thousands is OK? How often do you do large scale training only to find non-impressive results and hence the money wasted?

There’s a natural way to parallelize these models so that using 128 GPUs is the same as a 128x batch size. You can similarly simulate 128x batch size by accumulated gradients before backpropping. So you can test on just one or a few GPUs before you run the full thing.

By that point you know it’s going to work, it’s just a matter of how well and whether you could’ve done nominally better with different tuning.

There’s been enough research leading up to this paper to suspect that just scaling larger would play out.


>By that point you know it’s going to work, it’s just a matter of how well and whether you could’ve done nominally better with different tuning.

This can't be true in all cases, right? I'm assuming that for many initially promising results on less-compute when they scale it, the results aren't impressive. I'm very curious to know what is the trials-to-success rate of publishable results when big-compute is thrown in the mix.

It’s indeed a very high trials to success ratio. Again though, there’s enough papers preceding this one that you could have good confidence in the effort. Another thing that helps is orgs like OpenAI have their own servers, rather than renting ec2 instances.

You also don’t just launch that many things and them ignore it. You monitor it to make sure nothing is going terribly wrong.

But yeah there’s also the fact that if you’re Google, throwing $2m worth of compute at something becomes worth it for some reason (eg Starcraft)

I doubt 1.5B params will fit any single GPU. I think they spread parts of models between GPUs/TPUs similarly to mesh-tensorflow: https://arxiv.org/abs/1811.02084

I think this would be extremely useful when we can do the inverse. Basically - can we detect if someone's writing is nonsensical or not? Can we detect if someone that is producing many well written essays is adhering to reality or not? Are they subtly re-defining terms, using flawed examples, etc?

The generated example of the biologists discovering a unicorn herd is too convincing on its own. It's only because it's so outlandish that we get the sense it's fictional.

Our models have reviewed your submission, and deemed it to be 87% incoherent, 52% redundant, and 24% fake. We therefore reject your submission. Sincerely, the Chief Bot Editor.

It's beautiful that the reduced model itself is only 175 lines of Python code, thanks to TensorFlow : https://github.com/openai/gpt-2/blob/master/src/model.py

this will definitely be reverse engineered and open-sourced.[0]

[0] https://en.wikipedia.org/wiki/Streisand_effect

It's pretty interesting that their training set consists of "outbound links from Reddit which received at least 3 karma". There are definitely large subreddits which are flooded by highly voted fake news which you don't want to emulate (unless that's the goal).

It also reminds me of a short fictional story which explores what would happen if an AI learn how to maximize reddit's sort by controversial score instead: https://slatestarcodex.com/2018/10/30/sort-by-controversial/

Maybe that dystopian story is closer to reality than we thought?

The best part is "Scroll down for video" from https://blog.openai.com/better-language-models/#sample3 :)

I wonder how this would do on the Hutter Prize (I doubt it would beat the current record but I'm curious what the result would be)


They include the enwik8 BPC estimates. It may not be strictly comparable - the most obvious issue is that since the HP takes the compression paradigm of intelligence, the compressor size is part of the total, and those 1.5b parameters certainly are not cheap.

(This is one reason I think the HP is outdated. The corpus is not big enough to allow the superior asymptotics of approaches like RNNs or Transformers to compensate for their far larger binary size. HP is not measuring progress on an intelligence metric we care about, it's sort of measuring a 'demo scene' metric of intelligence.)

I’m most impressed by its ability to answer questions about the text. Why can’t someone built something like this on top of Wikipedia? That would be amazing to ask Wikipedia any question you can think of.

Google has already built this, but it definitely isn't perfect.


Not releasing the model? These people aren't scientists.

edit: toned down a bit.

Somehow, some part of me really wants to see what this model would generate with that as a prompt.

Actually, this prompted me to look up the funders of OpenAI:

- Sam Altman

- Greg Brockman

- Reid Hoffman

- Jessica Livingston

- Elon Musk

- Peter Thiel

Useful information in interpreting the mission statement, I think.

This reminds me of Dürrenmatt's "Die Physiker "(https://en.wikipedia.org/wiki/The_Physicists).

While this has indeed very scary implications, one should be aware that if it's thinkable, eventually it will be thought (I'm paraphrasing here).

Wonderful results. I Don’t think I will experiment with the smaller available model, at least right now. I am still happy with BERT, especially for basically solving anaphora resolution (coreference of pronouns, etc.)

Seems like magic! I wish I could do something similar with our chat support questions and answers. It would be nice to have something like this built-in.

Nobody has asked about the animated text on the left at the top -- how is that done? That is more interesting to me!

It's an mp4 video. But then the next question is: how did they make that video? And the answer to that is... I have no idea.

Plot twist: all the comments on this thread were auto-generated.

Can anyone do an eli5 of how this works?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact