
Bot Submissions to Comment Website Can’t Be Distinguished from Human Submissions - ilamont
https://techscience.org/a/2019121801/
======
buboard
This isn't exactly bot sumissions, and the process is not really scalable:

> To quickly weed out inappropriate comments, I handpick from generated
> comments those that ensure a high coherence and high relevance sample for
> submission.

So basically it's a validation of GPT-2 making sense with small amounts of
text. Judging from the demo test page, they are pretty good texts, but he said
it himself that larger texts betray the bot. So, i m not sure what he's trying
to prove by using MTurkers, since this does not attack the problem mentioned
in his introduction: the fake FCC comments were weeded out through text
analysis, not via human work.

In all, i'm not sure if this is something that people didn't know about gpt-2.
The title is certainly not justified, perhaps "Curated bot comments can't be
distinguished by humans to be obviously fake" would be better, but also more
banal.

~~~
notahacker
I'm also wondering whether the 'handpick[ed]...to ensure a high coherence and
high relevance' GPT-2 comments actually outperform the comparatively trivial
sentence-spinning script in getting approved by MTurkers.

Think
[https://www.reddit.com/r/SubSimulatorGPT2/](https://www.reddit.com/r/SubSimulatorGPT2/)
is more impressive than a study where half of GPT-2 comments handpicked for
being human-like by one human were accepted by another human. Particularly
given that some of the comments in question were three or four words long...

------
inciampati
Taking the test linked from the article, I got 75% correct. Most of my
failures were for very short (3-5 word) sentences which could simply be
memorized from existing texts. Human responses were largely more coherent,
keeping on topic through multiple sentences, while bot ones didn't.

It's a very good idea to make sure submissions come from humans, but it's also
slightly overstated and alarmist to state that model-generated text is
reliably passing the Turing test.

~~~
ses1984
There is no single Turing test. A bot might fool me but not you. You don't
have to fool everyone to have an impact. Also not everyone reads everything as
though they are in the middle of a Turing test.

~~~
sharemywin
Priming does make a difference.

~~~
ses1984
Are we at the stage where a comments section needs an anti bot primer
blastered over the top?

------
romaaeterna
I scored 75%. I got 4 out of 7 wrong at the beginning, and then 1 out of 13
wrong for the rest. Once I understood the context, this got much easier. On
the other hand, adding just a little more crazy misspelled exclamation point
passion to the bot versions would make them much harder to tell apart from the
real thing.

------
OceanSunfish
This is tangential, but I wonder if the problem with "bot commenting" isn't
inherent to forum-style discussion. When our only channel for
analysis/criticism is ephemeral comment sections, we lose the ability to
compare related discussions of a topic over time, or to rectify disagreements
that occur across several threads, or even across different sub-trees of
discussion.

Compared to a wiki-style website, where all angles of the argument can be
collected into one place to make a cohesive comparative overview; as forum-
users, we are left stranded in noisy content, and we rely on making heuristic
judgements based on popularity of certain opinions and stubbornness of certain
commenters. Bots make easy work of exploiting these flawed heuristics.

~~~
OGWhales
I agree. Sites like Reddit, with visible comment voting, can quickly turn
certain communities into echo chambers. Disagreement with the "hive mind" can
be punishing, discouraging debate and encouraging the melding of opinions.
Secondly, seeing how others voted influences others to vote the same, causing
a snowball effect. I prefer hidden vote count and the inability to downvote
someone. I believe this makes an important difference when a reader is
determining their opinion about comments, even if the comments are still
organized by highest vote.

All media has its flaws, and I still prefer to check forums for the greatest
diversity of opinions. Strangely, I have noticed an unintuitive aspect of
forums: smaller forums appear to have a greater diversity in opinion than
larger ones.

~~~
forgetfulusr
I agree with the vote counting and I am curious, is there a forum which only
shows comments after you have made a comment on the article itself? As long as
empty comments and "this" aren't allowed, it can be a good filter for on
topic, more organic discussion.

------
thrower123
Depending on how wide the audience of the comment section is, there's no way
you could determine what is a bot from what is a barely literate real person.
If I go on Facebook, or onto the reply threads of the local newspaper, where I
know the people personally, it's such disjointed ranting and jumbled nonsense
that it really looks like it came out of a Markov generator. Probably the
biggest tells for a real person would be the bizarre lack of attention to
spelling and grammar and willy-nilly capitalization and punctuation - although
if I got the same message as an email, it'd be an immediate spam flag.

------
cmdshiftf4
Is everything automatically generated with models now going to be classified
as "Deepfake x"?

A lot of news is now generated by bots, Bloomberg itself has 30% of its
content almost entirely generated [0], so does that render said news "Deepfake
news"?

Or is it only when we're attempting to be alarmist?

[0][https://www.nytimes.com/2019/02/05/business/media/artificial...](https://www.nytimes.com/2019/02/05/business/media/artificial-
intelligence-journalism-robots.html)

~~~
Retric
If Bloomberg marked their bot generated content as bot generated more people
would skip it. As it is their willingness to do is why I don’t read their
content.

~~~
cmdshiftf4
>If Bloomberg marked their bot generated content as bot generated more people
would skip it.

What would make you believe that? If the content generated is of decent
quality, timely and accurate (as of the time), why would people skip it?

~~~
Retric
Assuming it’s of decent quality, timely and accurate is a huge stretch. The
fact it’s not going to be those things is the issue.

~~~
OGWhales
Seems like they just break down finical reports and the likes, which seems
rather trivial. I don't read Bloomberg, so I may be mistaken, but that seems
like an alright use case for automated articles.

------
darepublic
The bot submission used as an example doesn't make a strong argument it uses
basically the senseless politics one liners that usually get bandied about.
Which shows just how vacuous and useless they always were.

------
teddyuk
I would love it if I could tell a bot the outline of an article and the main
facts and then it filled in the rest, that would be awesome.

It doesn’t have to be a human or bot but a human and bot together :)

