

An algorithm that can spot and kill Twitterbots before they start spamming - bbr
http://gigaom.com/2013/08/23/meet-the-algorithm-that-can-spot-and-kill-twitterbots-before-they-ever-start-spamming/

======
mistercow
The problem with looking at registration to identify bots is that it's easy to
counteract. Right now, bots use simple methods for generating names, and fill
out the form instantly, because there's no incentive not to. Neither of those
factors is difficult to change.

This article also commits the cardinal sin of telling us the false positive
rate without also telling us the false negative rate, or the base rate. You
can't tell much from "falsely flagged about 6 out of 10,000" alone.

~~~
solistice
I predict roughly a month till they've circumvented it, if not sooner. A lot
of these bots are simple contraptions, and simulating the way real people sign
up to services seems to boil down to putting in some timers and a new name
generating algorithm.

~~~
mistercow
Depends on what you're going by. I'd give a 90% confidence interval for the
first spam bot being modified to circumvent it between 1 hour after the paper
was presented, and 1 day after it hit any site on the level of HN.

But if we're talking about some higher degree of prevalence among spam bots,
it will be a bit longer. I expect that there are a lot of spam bots out there
that are running passively and haven't been touched by their creators in
years.

------
tbirdz
>falsely identifying accounts as fake only 0.0058 percent of the time (or
about 6 out of 10,000)

That may sound impressive, but if there is 554,750,000 twitter users [1],
554750000 * (6/10000) is still 332,850 people who would have a false positive.

[1]: [http://www.statisticbrain.com/twitter-
statistics/](http://www.statisticbrain.com/twitter-statistics/)

~~~
random42
That is the bane of Machine Learning algorithms. They cannot be 100% accurate.
That is why for destructive measures, you need to establish human oversight
when using such algorithms.

~~~
_greim_
However, humans are also machine learning algorithms, and may or may not have
a better false positive rate than 0.0058%.

------
petercooper
I think Twitter's own system for doing this is a bit eager. I registered a new
account the other day, followed the obligatory number of people they make you
follow on signup, put in some basic profile info, then logged out to finish
set up the next day. Logged in the next day.. account has been suspended for
being suspicious. Ha!

------
denzil_correa
For full details, please read the paper published at USENIX 2013 [0]. Some
points which I found interesting

\+ 6% of the fraudulent accounts purchased are resold (Table 2)

\+ India is the most popular location to register fraudulent accounts (Table
3)

\+ Hotmail is the most popular e-mail service used to confirm fraudulent
Twitter accounts (Table 5)

\+ The algorithm to "retroactively predict" fraudulent account is not based on
any popular machine learning technique. Most of it looks like regex matching
patterns.

\+ The evaluation of the algorithm was done on 4,800 random sample Twitter
accounts (200 each merchant) out of the available 121,027 accounts on which
the longitudinal study was performed

[0]
[https://www.usenix.org/system/files/conference/usenixsecurit...](https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_thomas.pdf)

------
AznHisoka
I feel you can get 80% of the way there just by banning the major
cloud/dedicated server hosts (ie AWS, Rackspace, Linode)

~~~
driverdan
Had you read the paper you'd know you're wrong. Most of the IPs are from
botnets.

------
100k
I have a particular tweet that must've gotten into a twitter bot database.
It's old but it gets favorited on a regular basis by bots which are (I'm
assuming) trying to look like real users.

Not really related to the article, but I've found it interesting. Anyone else
see this behavior?

~~~
chewxy
Same here. It was a tweet about my typo from "python" to "pythong"[0]. Every
few weeks I would get a random favourite from a plausible sounding name.
They're all bots

[0]
[https://twitter.com/chewxy/status/333827059908497409](https://twitter.com/chewxy/status/333827059908497409)

