
“AI-generated” content is sometimes just unaltered training data - weswpg
https://twitter.com/eevee/status/1298932940697550848
======
weswpg
> it appears this was "produced" by an AI, but it also exists verbatim
> elsewhere as written by a human, so it was probably picked up in the
> training data and then spat back out unaltered

> so if it decides to start with A, and it's only ever seen A next to B, and
> it's only ever seen B next to C, and it's only ever seen C next to D... then
> you might get out ABCDEFG and go "wow amazing it came up with the alphabet
> all by itself"

> but no, it didn't do that. it saw someone else do that and it couldn't
> figure out anything else to do it. it's not a sign of intelligence; it's
> exactly the opposite. and the messier your training data is, the less
> similarities within it, the more likely this is to happen

> yeah you can get an essay out of that one text generator. cool. but do keep
> in mind 1. it had //millions// of pages of input from formal sources like
> newspapers and encyclopedias, all written to a similar style. 2. how much of
> its output is truly original? have you checked?

> the funny thing is that we already saw a lot of this stuff happen 20+ years
> ago with markov chain bots in irc. sometimes the bot would say something
> remarkably poignant and, oh, no, it's basically repeating something a human
> said earlier because it started with a rare word

------
MrStonedOne
that one copyright case everybody in AI is depending on to protect them is not
likely to be as protective as they are assuming.

