Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: how would you protect a URL redirection service from spam?
10 points by benhoyt on Mar 4, 2009 | hide | past | favorite | 9 comments
I run DecentURL.com, and many people have found it pretty useful to date, but the problem is that now most of the URLs submitted are spam. Legitimate users don't see this, which is great, but as sysadmin I don't like the fact that 95% of the links in my growing database are spam (or that spammers are then using DecentURL as a kind of spam proxy).

Mostly it's some site like nycshorttermrental.com submitting say 200 times with a different random title each time, over the course of a day or two.

Something like Akismet doesn't work, because there's not enough data. All you give DecentURL is one URL, and Akismet can't tell it's spam from a bare URL (you can try it at http://www.voidspace.org.uk/cgi-bin/akismet/test_akismet.py).

Any suggestions? How do other URL redirection services kill spam? (Or do they just ignore the problem, and let the spam come in?)

If you don't take a peek at the content on the other side of the URL then your options are somewhat limited. If I were faced with this problem I would probably start by doing broad "spam site" classification based on the domain name (which would probably eventually involve things like doing DNS/whois lookups and weighting by registrar.) Once you have likely targets you can silently blackhole the known spam sites and maybe create a "this might be spam, press the button to continue at your own risk" landing page for sites in the classification grey zone.

For ri.ms and tinyarro.ws, we've actually seen significant reduction in spam just by banning abusive IP addresses. So far, that has had the biggest bang for the buck in terms of dropping spam considerably.

In other words, we periodically do human review and obvious spammers are IP banned. It's a neverending problem, but they don't use so many IPs that things are out of control.

If you do come up with a better approach that doesn't ruin usability, I'd love to hear it, too!

Thanks for the comment. Unfortunately what I'm seeing is that each time they submit it's with a (somewhat) different IP address. For a given link, they might be all within 173.x.y.z, but if I ban 173.*, won't there be legitimate IP addresses in that range?

Yeah-- we do specific IP addresses and not ranges. We thought at first they were all different, but they weren't. There were hundreds of different, but lots with of multiple spam posts, some every day. Group them by IP address and you should see some that are bugging you all the time.

Hard to say though-- could be a different "shape" to your spam than what we get.

Add a captcha, if you aren't using one already.

Rate-limit link creation based on the user's IP.

Check submitted links against a URIBL. Prevent users from adding links which are on the blacklist, and remove links which appear on the blacklist.

Reject links to newly registered domains outright.

Checking a publicly-available URL blacklist is a good idea, thanks!

I may well end up adding a captcha of some sort. However, that effectively means I can't have a public "create a DecentURL" API call like I do now -- they'd have to sign up and have an API key or something. That's a bit of a bummer.

The utility of an API for this sort of service is honestly pretty limited. If you're doing things right, most of your links should be getting created by casual users (for pasting into chat/email/twitter) - it's not too much of a hassle for them to hit your site and fill out a form.

I'd let the spam come in the first instance and not publish the link, based on rules, or algorithms once you have more data. Later on you can block certain ip addresses. Also try creating a blacklist to get you started.

they are probably done almost solely by bots, institute sbl-xbl.spamhaus.org on your service and rate limit.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact