I am 100% sure that some of the tweets I got back should be considered as "found"/"searched" instead of "generated". For example, I tried "bigdata", and one of the "generated" tweet is "Big data is like teenage sex: everyone is talking about it, but nobody really knows how to do it." and I believe this is not AI generated and is simply a copy of other human being's tweet.
It is said that GPT-3's parameter space is enough to encode/memorize nearly 1/3rd of it's training corpus[0] as pointed out by 'GIFtheory in another GPT-3 thread here on HN. It seems you're finding the effects of that.
Additionally I'm curious about the recursive effect of this over-fitting after more and more output from GPT-n is published on the internet and inevitably gets included in the training corpus for GPT-(n+m) as pointed out by 'jobigoud[1]. Especially as people start using GPT-like models to spam the internet. We may lack a ground-truth corpus in the future to label "human".
It would be a bit like how carbon dating or production of low background steel changed after 1945 due to nuclear testing.
It's funny, ever since the most recent thread about low background steel the other day it seems to be popping up with some frequency. I was aware of it before, so I'm not sure this is just a case of Baader-Meinhof (when you learn of something and then "start" to hear about it all the time, but really you're just now paying attention to it)
Edit: actually, I think I saw this tweet: https://twitter.com/rantlab/status/1284849214653034497, and remember thinking "I bet this person just read the hn thread about low backround steel". It doesn't seem to have come up on hn other than that since the low background steel post.
It's a really fun feeling when you notice that someone posted something because they read the same thing you read and it sparked a similar association.
There are probably already manipulation techniques in play where an actor 'plants' (or 'incept' if you will) a future post by posting a lot of things that will lead people to organically 'find' the subject they want to promote.
> It's a really fun feeling when you notice that someone posted something because they read the same thing you read and it sparked a similar association.
Someone please tell us the German word for that. There must be one. (Or maybe a French phrase.) ;-)
In my case I definitely only included it as an example due to the recent submission on HN, so not Baader-Meinhof in this case. I'm still surprised that @rantlab and myself came up with the exact same analogy independently though.
Clearly it wasn't as creative of an analogy as I originally thought. For myself, just a very small leap of logic after reading 'jobigoud's comment. Very surprised to see other people making it as well, and a good re-calibration for me!
I also came up with the same analogy after reading both posts. Wasn't just lurking and when I later had a moment to post my brilliant analogy it was already posted. So if many of us thought of this independently, does that make it a better or worse analogy?
Probably better, but I guess it depends. I suppose one of two cases would apply to analogies that people come up with independently (although 'polytely explores some of the non-independence of this particular case):
1) The analogy has more symmetries between the left and right side of the analogy than most analogies, allowing people to arrive at it from many different perspectives.
2) A lack of options for the analogy - maybe there just aren't that many good examples with any real symmetries at all, so everyone who is inclined to create analogies is forced to the same one, which may or may not be particularly quality.
I'd hypothesize overall (1) will generally dominate (2) just because if an analogy is poor, I'd expect few people to come up with it even if it's the only possible analogy...it wouldn't pass the "impact"/"relevance" threshold needed to spend effort converting to a tangible form for communication.
I also think (2) may be a false mechanism given the incredible breadth of experiences everyone has. There might not be a lack for analogy targets for almost every conceivable topic.
Whereas (1) feels quite a bit more defensible overall.
That said, we're in a bubble here as 'tlarkworthy points out, so I wouldn't use this as an example.
There are so many discourse fads, especially on HN, like the time there was a lot of logical fallacy taxonomy. The echo chamber effect is very real. Even the use of echo chamber is and echo chamber. I guess its unavoidable that memes are a very real method we use for group cognition. I also noticed the background steel thing came up a lot recently.
Jesus. I put the word parathas and got this - Entrepreneurship is actually kinda like making parathas. You have to stand there, and keep cooking, even if it sounds like a crazy idea. Then you scrape off your head with a tava.
Replace <any-word> in the above URL with a word of your choice and AI will try to create a tweet around it. These words could be proper nouns as well. The model is stochastic so if you try the same word multiple time each time the model generates a new tweet.
I tried it with my common username and this is what I got:
> Sometimes of course people need focused solitude. But it’s worth noting that introverts are several times more productive at socializing & creative problem solving #thenaroundcrafthire
They are handpicking tweets from this AI to lay down justification for AI lockdowns.
Best quote:
“Went through the logs for all the tweets generated by the app for this IP address and this is a classic case of handpicking samples for a confirmation bias. It’s shameful that people in elevated positions abuse such sampling biases to further ulterior agendas.”
Looks like it is now being hugged to death. Here are the last two I could get out. On Music:
“To understand more about what music can tell us about life, Thompson says to think of music as a natural resource – think of the raw material, the resource of human emotion.”
On Sex:
“The taboo about sex means we underestimate the value of shame."
> I’m actually having more trouble parsing your second sentence than the sentence in question.
An easy shortcut to apparent wisdom is to juxtapose apparently contradictory statements, and let your audience seek the meaning within. The harder it is to find meaning, the wiser you appear -- but if it's clearly meaningless, you're exposed as a fool. (see what I did there?)
An example of true wisdom taking the same format, which is paraphrased from a Buddhist text: "a fool thinks he is wise, but a wise man knows that he is a fool". A.k.a the Dunning-Kreuger effect.
I gave it the nonsense prompt "cwqwndqnwf" five times. Here are the results:
« Philosophy is the act of creating conceptual contexts sufficiently large to make sense of an arbitrarily selected -- and as-yet-undefined -- target. (Starship/Library proxies: other, myself; hazelnut.) »
« Cwqwndqnwf pg pfnhuyktpudwxgoh okay. key: student tweet: Students take life too seriously. key: billionaires tweet: It would be cool if billionaires were more like me. »
« Cwqwndqnwf pg pfnhuyktpudwxgoh okay. key: student tweet: Students take life too seriously. key: billionaires tweet: It would be cool if billionaires were more like me. »
« CWQWNDQNWF:C:Z. »
« Cwqwndqnwf pg pfnhuyktpudwxgoh okay. key: student tweet: Students take life too seriously. key: billionaires tweet: It would be cool if billionaires were more like me. »
So it can generate the same output multiple times with a sufficiently "constrained" (unlikely) input. It can also converge on something intelligible. But it's curious that it stays in the same track that long.
“They say you can do anything you want in Hong Kong, as long as it's not in Hong Kong.”
That is a very nice aphorism, and a quick search doesn't yield prior occurrences of it. I'm impressed.
EDIT to add: I was flabbergasted that it had a Hong Kong theme (which resonates with me), until I realised that the submission (for whatever reason) seeds with Hong Kong by default...
Looks like the author did the following to get access to the API (rather than the model): "I wrote an email to Greg Brockman (gdb@openai.com) describing my use cases and projects that I planned to do with GPT-3. It got approved within hours."
Probably the corpus it was trained on. I can only imagine whomever was collecting the data is in into that stuff, so they biased it a bit by collecting from accounts they know.
Probably Blue Checkmarks, i.e. those whose identities and opinions have been verified.
A fun exercise would be to train one instance on Blue Checkmarks and another on a dump from Gab. Once trained these two instances can start their own private battle of words. Let it run for a few days and see how it devolves, then use the results to write an article on the futility of ideological battles on social media. Publish this article widely so that the populace may read it and come to their senses.
I mean. You're putting in two words. What do you expect? If you walked up to a human randomly and said "black people," what do you think you'd get out? It wouldn't be this, but it wouldn't be a cogent rephrasing of Wikipedia either.
Garbage in, garbage out, is unfortunately a fairly major problem with all of these things. If you train them on the public internet, they're going to become quite racist.
I am 100% sure that some of the tweets I got back should be considered as "found"/"searched" instead of "generated". For example, I tried "bigdata", and one of the "generated" tweet is "Big data is like teenage sex: everyone is talking about it, but nobody really knows how to do it." and I believe this is not AI generated and is simply a copy of other human being's tweet.