

Ask HN: Review My (Twitter) App: Follow Ham - dangrossman
http://www.followham.com

======
dangrossman
I learned a lot about machine learning in AI courses the past year and a half
in college, and really wanted to apply it somewhere outside of the classroom.
This was a one-day project app, so it's really functionally simple. It looks
at your Twitter follower and following list, computes the difference, decides
if each of those people is "spam" or "ham", then recommends which you are safe
to follow and which you should block. Since the API calls can take a while, it
does it all with a background worker and e-mails you when the results are
ready, rather than doing the work synchronously while you wait at a browser.

The hard part was getting ready for it. I decided to do a very basic spam
filter for Twitter, one which looked only at the profile information, and not
the text of anyone's tweets, since that'd require too many API calls without
special access beyond even a whitelisted API account. I wrote some scripts
that would use Twitter's search API to find tweets directed at @spam for a
week and a half. It took that long because looking at old tweets would yield
accounts Twitter had already suspended, meaning I couldn't get any info about
them through the API. I had to get the reports while they were fresh and
download the profile info of known spammers to build a training corpus. For
the "ham" examples I used the following lists of trusted friends with various
interests.

I used that corpus to build and prune a decision tree, which tested to around
99% precision/recall accuracy with 2000 ham and 2000 spam examples. Then I
manually rewrote the tree as simple conditionals in PHP to create the
classifier for the website.

Note: I'm about to hit my rate limit (on a whitelisted account, so 20k calls)
in only half an hour...

~~~
timdorr
If you authorize the requests, you'll get a boost of the individual user's
available requests after you run out of whitelist API requests for your IP
(20000 + (users * 150)). Also, make sure you're using the api.twitter.com/1/
URL format, as they're going to be upping the rate limit for users to 1000
"soon" (it was announced at Le Web 1.75 months ago...).

Also, that only applies to OAuth, so you should get on that boat soon. They're
deprecating basic auth this summer.

~~~
dangrossman
For many accounts with more followers than they follow, there are more than
150 profiles I need to download. Someone just passed me an account with 6000
followers he's not following back. That's 6k API calls. I thought at first of
using the 'followers' timeline' request to get 100 at a time, but the profile
information it gives for each follower is abbreviated. I need other
information, so I have to make a separate API call for each and every profile
I want.

Using up someone's entire API usage in a flash while they need it for the rest
of the hour to run their Twitter client isn't gonna fly.

The best I can do right now is make 20k authorized requests from my
whitelisted account, and 20k unauthorized requests from my whitelisted IP. I
just built in some status checking on the homepage so it won't let more people
add themselves to the queue when it's already overloaded for the hour.

~~~
arikfr
AFAIK, Twitter's API rate limit is _per IP_ \+ per user (i.e. you can do X api
requests per user on each machine). Also, on a whitelisted IP you get 20K
requests per _user_.

There was a long discussion about that a few months ago on the dev mailing
list, whether this by purpose or mistake. Back then, the official response was
that it's by purpose, but verify if that's true. If it is, you have plenty of
API calls you can do per user.

~~~
dangrossman
It does seem to be. In this case, I have added reason to implement OAuth!

------
arikfr
Here's an idea - instead of emailing the report, post it on Twitter: "@arikfr
152 of your followers look spammy, while another 3334 look follow-worthy!
<http://www.followham.com/show.php?username=arikfr>

This way you gain two things - 1\. More users will try the service (giving
away Twitter username is easier than giving away email address). 2\. More
chances that people will RT this messages and spread the word about the
service.

And as @dschobel mentioned - you should register @followham.

~~~
dangrossman
I implemented this and Twitter suspended my account 3 minutes later.

~~~
arikfr
WOW, sorry for that :/

------
timmorgan
I love the background. I think you do a decent job of explaining what's about
to happen.

My only suggestion: perhaps you could tell me how many are in the queue ahead
of me, and estimate time till arrival?

~~~
dangrossman
I added the size of the queue to the message you get after submitting the
initial form. I didn't attempt to estimate time just yet as that's dependent
on how many followers the accounts in queue have, which isn't known until they
get processed.

------
ujjwalg
I really like the idea. I think if you provide some interesting statistics as
example can make it viral. For example: how many spammy followers aplusk,
techcrunch, mashable etc. have. How many of them are common to all etc., etc.
This might get you a follow up article on techcrunch or the likes too. :)

~~~
dschobel
Not to mention that having a birds-eye view of the spam situation will let him
black list accounts, because a spammer is a spammer for the first user he
follows as much as the next hundred.

------
Vindexus
I was able to submit the form without entering an email. You should add some
form validation.

------
jeff18
My review:

Don't ask for an email address. Just give the user a unique page that auto-
updates and tells you what the wait time is. If the users wants, let the user
enter their email on this second page.

Asking for an email (even with a disclaimer that says trust me, I will never
ever spam you, etc.) is a __huge __barrier to entry.

~~~
jeff18
Ok, got the email, so I can review the second half.

Don't ask for my Twitter login / password. That's a terrible, terrible
practice which trains people to get phished. You don't even use SSL, so the
username and password is transmitted in the clear!

At the very least, you should warn people about the dangers of giving a third
party 100% access to their Twitter account and the dangers of transmitting
their login credentials openly. This is true for any site -- let alone a
project with 0 reputation. As it stands I have to say that your web app seems
really irresponsible.

------
jfornear
I like the concept, but I worry about the accuracy for a feat like this. For
my results (@jfornear), about 1 in 5 of the recommended follow backs would be
considered spam by a human. There were about 26 accounts without pictures that
were recommended follow backs, and all 26 were spam or dead accounts.

I'm actually working with the Twitter API right now myself and have been
meaning to filter out accounts that don't bother to upload their own picture.
That might be something you could think about.

All (and only) default images are stored on <http://s.twimg.com>, I think.

------
aditya
Just one question: why not use Twitter OAuth?

~~~
dangrossman
If anyone actually uses the site I'll take the extra time to implement that, I
considered it non-trivial for me to learn that in addition to building
something in a day.

~~~
arikfr
I would recommend: <http://github.com/jmathai/twitter-async/> \- it implements
Twitter's OAuth API. Will save you some work.

~~~
badave
I can vouch for Twitter-Async.

------
maxaf
This is a really, really useful app. Helps me bounce back wife's questions
when strange "chicks" follow me on Twitter out of nowhere. Of course, they're
bots!

I wonder where you got the background.

------
wooster
Immediately left the site because the background was taking forever to load.
The other thing that made me leave was the requirement of an e-mail address.

------
teuobk
I like the idea, and I like the simplicity of the interface, but I was
disappointed by the results. Most of my results were classified as "not spam,"
but a cursory inspection suggests that at least 70% were actually spam.
Conversely, one of the three accounts marked as spam was definitely not spam.

As an example, here's one of the accounts that was marked as "not spam":
@PhatApples (NSFW!)

------
steveplace
It was very, very difficult to read what the site did because of the noisy
background. The tool is great, and the background would make a good wallpaper,
but it was very distracting.

It seems that the image adjusts to the size of the screen, so when I had FF
maxed out on my monitor (1680 x 1050) it blew up the image and made it much
louder.

~~~
dangrossman
The white background to the black text is only 5% transparent. Does it look
more transparent to you? Maybe it's a bug. I know I have some bugs on IE, but
only 3% of the visitors have used IE so I let those go for the first day.

------
araneae
2 of the 5 "ham" users were ones I knew irl and really should be following.
Thanks Follow Ham! :D

------
RK
The results page made my Firefox (3.0.17 on Ubuntu 8.04) slow to a crawl.

Other than that the results were interesting.

------
josch
i just got the report. i.e., <http://twitter.com/1000free> is considered ham
by your app, looks like spam to me. maybe a feedback button would be
appropriate (mark as spam/ham.)

~~~
dangrossman
Like I said, it doesn't look at the account's tweets, just its profile. That
account has a good follower/following balance, it's not brand new, it hasn't
sent a ton of tweets in a short time, it has a profile image... basically it
looks safe.

Yes feedback is definitely needed to improve it! And to identify spam for
other users with the same followers.

~~~
arikfr
I guess that you use users/show endpoint to get the user's profile. Instead
you can ask for each user's timeline - it is still 1 API call, you still get
their profile data, but you also get their tweets.

The only downside is that the request might take longer.

Btw, beware that some spam accounts use others tweets to look legit. I can
send you some for example, if you're interested.

------
dschobel
very cool. you need to register @followham though and add it to the "do you
want to tweet about about this?" message rather than "follow ham" as it
currently is so we can refer directly to you to spread the site.

~~~
dangrossman
Yeah unfortunately someone already has it

Edit: I just registered @spamorham for this

~~~
arikfr
I have the account @spamcop on Twitter. If you want, I will gladly transfer it
to you.

