Hacker News new | past | comments | ask | show | jobs | submit login
Demo of OpenAI's GPT-3 generating tweets given a word (sushant-kumar.com)
92 points by hardmaru on July 20, 2020 | hide | past | favorite | 85 comments



I don't know how to define "generated tweets".

I am 100% sure that some of the tweets I got back should be considered as "found"/"searched" instead of "generated". For example, I tried "bigdata", and one of the "generated" tweet is "Big data is like teenage sex: everyone is talking about it, but nobody really knows how to do it." and I believe this is not AI generated and is simply a copy of other human being's tweet.


Indeed, that seems to be from at least as early as 2013: https://www.facebook.com/dan.ariely/posts/904383595868

It is said that GPT-3's parameter space is enough to encode/memorize nearly 1/3rd of it's training corpus[0] as pointed out by 'GIFtheory in another GPT-3 thread here on HN. It seems you're finding the effects of that.

Additionally I'm curious about the recursive effect of this over-fitting after more and more output from GPT-n is published on the internet and inevitably gets included in the training corpus for GPT-(n+m) as pointed out by 'jobigoud[1]. Especially as people start using GPT-like models to spam the internet. We may lack a ground-truth corpus in the future to label "human".

It would be a bit like how carbon dating or production of low background steel changed after 1945 due to nuclear testing.

0: https://lambdalabs.com/blog/demystifying-gpt-3/

1: https://news.ycombinator.com/item?id=23887405


It's funny, ever since the most recent thread about low background steel the other day it seems to be popping up with some frequency. I was aware of it before, so I'm not sure this is just a case of Baader-Meinhof (when you learn of something and then "start" to hear about it all the time, but really you're just now paying attention to it)

Edit: actually, I think I saw this tweet: https://twitter.com/rantlab/status/1284849214653034497, and remember thinking "I bet this person just read the hn thread about low backround steel". It doesn't seem to have come up on hn other than that since the low background steel post.


It's a really fun feeling when you notice that someone posted something because they read the same thing you read and it sparked a similar association.

There are probably already manipulation techniques in play where an actor 'plants' (or 'incept' if you will) a future post by posting a lot of things that will lead people to organically 'find' the subject they want to promote.


> It's a really fun feeling when you notice that someone posted something because they read the same thing you read and it sparked a similar association.

Someone please tell us the German word for that. There must be one. (Or maybe a French phrase.) ;-)


Probably something like: assoziationmanipulation


In my case I definitely only included it as an example due to the recent submission on HN, so not Baader-Meinhof in this case. I'm still surprised that @rantlab and myself came up with the exact same analogy independently though.

Clearly it wasn't as creative of an analogy as I originally thought. For myself, just a very small leap of logic after reading 'jobigoud's comment. Very surprised to see other people making it as well, and a good re-calibration for me!


I also came up with the same analogy after reading both posts. Wasn't just lurking and when I later had a moment to post my brilliant analogy it was already posted. So if many of us thought of this independently, does that make it a better or worse analogy?


Probably better, but I guess it depends. I suppose one of two cases would apply to analogies that people come up with independently (although 'polytely explores some of the non-independence of this particular case):

1) The analogy has more symmetries between the left and right side of the analogy than most analogies, allowing people to arrive at it from many different perspectives.

2) A lack of options for the analogy - maybe there just aren't that many good examples with any real symmetries at all, so everyone who is inclined to create analogies is forced to the same one, which may or may not be particularly quality.

I'd hypothesize overall (1) will generally dominate (2) just because if an analogy is poor, I'd expect few people to come up with it even if it's the only possible analogy...it wouldn't pass the "impact"/"relevance" threshold needed to spend effort converting to a tangible form for communication.

I also think (2) may be a false mechanism given the incredible breadth of experiences everyone has. There might not be a lack for analogy targets for almost every conceivable topic.

Whereas (1) feels quite a bit more defensible overall.

That said, we're in a bubble here as 'tlarkworthy points out, so I wouldn't use this as an example.


There are so many discourse fads, especially on HN, like the time there was a lot of logical fallacy taxonomy. The echo chamber effect is very real. Even the use of echo chamber is and echo chamber. I guess its unavoidable that memes are a very real method we use for group cognition. I also noticed the background steel thing came up a lot recently.


It'd be interesting to compare plagiarism detector scores of average outputs of various generations of GPT.


Memorizing a web corpus and answering questions about it as well as Google would be an interesting result.

Well, it's not close to that, but it's close enough to be amusing. Here are some questions it answered in Q&A:

https://tildes.net/~games/qmc/ai_dungeon_dragon_model_upgrad...

But I think for factual knowledge, it will need to be better about explaining where it got the information.


From 2013: https://mobile.twitter.com/danariely/status/2879522579269713...

Seems like a case of overfitting.


Considering how unoriginal we humans can be, I'm not at all surprised that it might generate already-existing tweets.


A Digital Single Market Directive article 17, section 6 (similarity detector) filter might be useful for that purpose.


Yep. The GPT-3 model is so big it has overfit a lot of the corpus.


Is it a straight copy, or just a case of the surprising effectiveness of stock phrases considered harmful?



Well clearly it's not a straight copy of that.


I think that's a misquote, there are other versions that are the straight quote.


Hmmm! Starting to sound more like the alternative hypothesis...


Since everybody is going to be doing this, I can't resist sharing this gem I got on my first attempt.

    “I don’t go on Hacker News, not any more. I’ve given up on it. I now go on reddit.com/r/nature.”


This really made me chuckle. The deadpan delivery coupled with the absurdity of it is just too good.



Startups:

    “It takes less time to do a project right than it does to explain why you did it wrong.” – https://thoughts.sushant-kumar.com/startups
Also,

    “In a startup, there is just one thing to do — find ways to make it die faster. Fast funerals, less grief, more success.”


A lot of these seem to be plagiarized with a word changed: https://twitter.com/wisdom_project/status/239080454622429185


Jesus. I put the word parathas and got this - Entrepreneurship is actually kinda like making parathas. You have to stand there, and keep cooking, even if it sounds like a crazy idea. Then you scrape off your head with a tava.


More information about this demo from the author in this post: https://redd.it/hs9zqo

//

The link to app: https://thoughts.sushant-kumar.com/<any-word>

Replace <any-word> in the above URL with a word of your choice and AI will try to create a tweet around it. These words could be proper nouns as well. The model is stochastic so if you try the same word multiple time each time the model generates a new tweet.

//


Remember to url escape your words. For example if you want to search for a hash tag you need '%23' instead of '#'

e.g., https://thoughts.sushant-kumar.com/%23MeToo


Seed word: my name

“Christof's calling, inherently driven by passion, not expertise, is exactly what we need.”

I feel personally attacked


I tried it with my common username and this is what I got:

> Sometimes of course people need focused solitude. But it’s worth noting that introverts are several times more productive at socializing & creative problem solving #thenaroundcrafthire


Sounds like a great email signature to me.


They are handpicking tweets from this AI to lay down justification for AI lockdowns.

Best quote:

“Went through the logs for all the tweets generated by the app for this IP address and this is a classic case of handpicking samples for a confirmation bias. It’s shameful that people in elevated positions abuse such sampling biases to further ulterior agendas.”

https://twitter.com/an_open_mind/status/1284487376312709120?...


Hacker:

  "Silicon Valley’s most successful people marketed themselves as hackers until their parents found out what it was."
and

  "You startup is not a startup, it’s just a bad project."


Great! So we have the ability to generate sentences that seem to make sense but have no real content or meaning.


It seems to be down (HTTP/2 503)

By the way, has somebody already tried to plug the posts from /r/WritingPrompts/ into GPT-3 and check the results?


I fed it with sentences from the Bulwer Lytonn contest with some success https://twitter.com/avi_eisen/status/1284924171215044608?s=2...

I actually submitted two of the sentences to the contest (it's 2 and 4 in the tweet I said 2 and 4 seemed winnable).

Watch the Twitter thread - I'll be adding stuff to it, going to check writingprompts now.


Wow, the one about the DNA is almost creepy how accurate it is. It even knows what an "ancestry site" is! Not to mention how consistent the story is.


I have a demo for /r/WritingPrompts/ titles (https://github.com/minimaxir/gpt-3-experiments/tree/master/e...), albeit not the generated output from that title.


Yes! There were some DDoS attacks. Had to add some rate limits through Cloudflare to address those. Been a rough week managing it.

Try now. Should be working fine.

https://twitter.com/sushant_kumar/status/1285207620530257920...



Looks like it is now being hugged to death. Here are the last two I could get out. On Music:

    “To understand more about what music can tell us about life, Thompson says to think of music as a natural resource – think of the raw material, the resource of human emotion.”
On Sex:

    “The taboo about sex means we underestimate the value of shame."


That last one is (maybe superficially) profound so I went looking for the source.

From a brief search I can't find anything sex-related (Only Anthony Bourdain talking about food wastage)

What are the chances this is a novel "thought"?


https://plagiarismdetector.net/ marks it as 100% plagiarized, albeit the source link is dead so I can't verify it.

Interestingly the other snippet - the one that appeared to be cribbed from a book review - is marked as plagiarized only to this thread :D


Does it make sense though? I don't see anything more profound than a logical contradiction.


Yes. The suggestion is that taboos exist as a way to fortify and perpetuate repressive hegemony.

I’m actually having more trouble parsing your second sentence than the sentence in question.


> I’m actually having more trouble parsing your second sentence than the sentence in question.

An easy shortcut to apparent wisdom is to juxtapose apparently contradictory statements, and let your audience seek the meaning within. The harder it is to find meaning, the wiser you appear -- but if it's clearly meaningless, you're exposed as a fool. (see what I did there?)

An example of true wisdom taking the same format, which is paraphrased from a Buddhist text: "a fool thinks he is wise, but a wise man knows that he is a fool". A.k.a the Dunning-Kreuger effect.


I gave it the nonsense prompt "cwqwndqnwf" five times. Here are the results:

« Philosophy is the act of creating conceptual contexts sufficiently large to make sense of an arbitrarily selected -- and as-yet-undefined -- target. (Starship/Library proxies: other, myself; hazelnut.) »

« Cwqwndqnwf pg pfnhuyktpudwxgoh okay. key: student tweet: Students take life too seriously. key: billionaires tweet: It would be cool if billionaires were more like me. »

« Cwqwndqnwf pg pfnhuyktpudwxgoh okay. key: student tweet: Students take life too seriously. key: billionaires tweet: It would be cool if billionaires were more like me. »

« CWQWNDQNWF:C:Z. »

« Cwqwndqnwf pg pfnhuyktpudwxgoh okay. key: student tweet: Students take life too seriously. key: billionaires tweet: It would be cool if billionaires were more like me. »

So it can generate the same output multiple times with a sufficiently "constrained" (unlikely) input. It can also converge on something intelligible. But it's curious that it stays in the same track that long.


I had the same set of 4-5 tweets generated over and over again for my last name as well (which is fairly rare as far as names go).


Seed word: SingleFileZ (a project of mine [1])

"SingleFileZ is currently our most likely word to cause a neural apocalypse."

I'm thinking of using it as a punchline.

[1] https://github.com/gildas-lormeau/SingleFileZ


"Hong Kong reminds me of Singapore 15 years ago; full of vitality, faces a future of challenges".


“They say you can do anything you want in Hong Kong, as long as it's not in Hong Kong.”

That is a very nice aphorism, and a quick search doesn't yield prior occurrences of it. I'm impressed.

EDIT to add: I was flabbergasted that it had a Hong Kong theme (which resonates with me), until I realised that the submission (for whatever reason) seeds with Hong Kong by default...

EDIT to add: https://plagiarismdetector.net gives it a 100% unique, 0% plagiarism.


I got pretty scary results for my sideproject url. It felt like a human was there typing the response out.


Tried it with Google:

“I saw Google and we were never on speaking terms after that. All I wanted to do was take Google’s money.”

Nice one ;-)


"You have to pay for serverless which is why it won't matter"

"Serverless is like cloud computing, but better"

https://twitter.com/openfaas/status/1285141586121236480?s=20


Does anyone have any info as to how we can get access to GPT-3 so we can try it out?


Looks like the author did the following to get access to the API (rather than the model): "I wrote an email to Greg Brockman (gdb@openai.com) describing my use cases and projects that I planned to do with GPT-3. It got approved within hours."


> “Hong Kong people have a superhuman stamina to tolerate bullshit. Kudos.”

accurate


I don't get this at all. Tried 'free masons' and it gave me

    “I’m Christomophobic. It’s a fear of Christian s—.”
Also a lot of stuff about startups. Why?


Probably the corpus it was trained on. I can only imagine whomever was collecting the data is in into that stuff, so they biased it a bit by collecting from accounts they know.


I put in the word "Saskatoon" (a city in Canada) and got back the tweet:

“Elon Musk's password: ida-qdbo-XXX-XXX-XXXX-XXXX” (redaction mine)

Should this be reported or... ?

Edit: added "redaction mine"


The server is in the process of being overwhelmed.

Here's the best tweet it generated for me: “Sex with very handsome and beautiful people is the best.”


I tried the word pigeon, surely an innocent word, and this was the first result:

“The cum of the patriarchy is highly concentrated power-giving semen.”


Seems to be down atm. Did it get the hug of death?


And we are back up.


“To do something innovative, identify something technically hard and do it.”

That seems about right

“Startups are churches for the non-religious.”

I like this one.


> “‘Barack Obama is a fine president’ – Peter Thiel on Joe Rogan interview”

Phew, AI has not won just yet :-)


> “I’m not trying to save Hong Kong, I’m not even Chinese. If anything, I’m a spambot.”


> “Belief can be a powerful virus, but so can doubt.”

WOW!

> “Money is the best product. Money does what you want.”


> “0.15 hr/day are not cannibals, they’re well balanced South African CEOs!”

Interesting ;)


For “love” — “The natural progression of love is from cruelty to boredom.”.


Seed word: Donald Trump

“If only stupidity wasn’t the key to Donald Trump’s plans.”

Seed word: Trump

“The best way to educate yourself about Donald Trump's incompetence is to spend twenty minutes reading the Wikipedia article on Donald Trump.”

Seed word: blm

“As a country, we are not ok. We’re not tolerant, we’re not kind, we’re not inclusive, we’re not respectful.”

I wonder which lists of twitter users they used for training.


Probably Blue Checkmarks, i.e. those whose identities and opinions have been verified.

A fun exercise would be to train one instance on Blue Checkmarks and another on a dump from Gab. Once trained these two instances can start their own private battle of words. Let it run for a few days and see how it devolves, then use the results to write an article on the futility of ideological battles on social media. Publish this article widely so that the populace may read it and come to their senses.


I'd love to see a version trained on individual subreddits (and HN separately). A bit like /r/subredditsimulator


For me it just returns the 3 same sentences over and over again.


I got one with a bitly link to a completely unrelated webpage


This GPT-3 thing is stupid random generator. I don't get the fuzz. This thing is a pattern generator, it doesnt understand things.

Input "Black People". Output

“Black people own twitter, it’s white people telling them what to tweet.”


I mean. You're putting in two words. What do you expect? If you walked up to a human randomly and said "black people," what do you think you'd get out? It wouldn't be this, but it wouldn't be a cogent rephrasing of Wikipedia either.


I got

    “Black people bear much of the blame for their condition.”
................


Garbage in, garbage out, is unfortunately a fairly major problem with all of these things. If you train them on the public internet, they're going to become quite racist.

I think I prefer the related but much more benign phenomenon where they tend to write unexpected Harry Potter fanfiction: https://aiweirdness.com/post/189313008792/finest-pies


Uh oh. HN crashed the website again


"If we eliminate racism, the economy will crumble."

“Trump makes me grateful that we got rid of Obama, and not the other way around.”

Some tweets generated by this.


Also, got this when I gave it the word 'Naval'.

“If you’re risks aren’t totally unknown, they’re not big enough. The only way to the biggest risk possible is the unknown unknown.”


beba army


Messi




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: