
Ask HN: how would you protect a URL redirection service from spam? - benhoyt
I run DecentURL.com, and many people have found it pretty useful to date, but the problem is that now most of the URLs submitted are spam. Legitimate users don't see this, which is great, but as sysadmin I don't like the fact that 95% of the links in my growing database are spam (or that spammers are then using DecentURL as a kind of spam proxy).<p>Mostly it's some site like nycshorttermrental.com submitting say 200 times with a different random title each time, over the course of a day or two.<p>Something like Akismet doesn't work, because there's not enough data. All you give DecentURL is one URL, and Akismet can't tell it's spam from a bare URL (you can try it at http://www.voidspace.org.uk/cgi-bin/akismet/test_akismet.py).<p>Any suggestions? How do other URL redirection services kill spam? (Or do they just ignore the problem, and let the spam come in?)
======
evgen
If you don't take a peek at the content on the other side of the URL then your
options are somewhat limited. If I were faced with this problem I would
probably start by doing broad "spam site" classification based on the domain
name (which would probably eventually involve things like doing DNS/whois
lookups and weighting by registrar.) Once you have likely targets you can
silently blackhole the known spam sites and maybe create a "this might be
spam, press the button to continue at your own risk" landing page for sites in
the classification grey zone.

------
thorax
For ri.ms and tinyarro.ws, we've actually seen significant reduction in spam
just by banning abusive IP addresses. So far, that has had the biggest bang
for the buck in terms of dropping spam considerably.

In other words, we periodically do human review and obvious spammers are IP
banned. It's a neverending problem, but they don't use so many IPs that things
are out of control.

If you do come up with a better approach that doesn't ruin usability, I'd love
to hear it, too!

~~~
benhoyt
Thanks for the comment. Unfortunately what I'm seeing is that each time they
submit it's with a (somewhat) different IP address. For a given link, they
might be all within 173.x.y.z, but if I ban 173.*, won't there be legitimate
IP addresses in that range?

~~~
thorax
Yeah-- we do specific IP addresses and not ranges. We thought at first they
were all different, but they weren't. There were hundreds of different, but
lots with of multiple spam posts, some every day. Group them by IP address and
you should see some that are bugging you all the time.

Hard to say though-- could be a different "shape" to your spam than what we
get.

------
duskwuff
Add a captcha, if you aren't using one already.

Rate-limit link creation based on the user's IP.

Check submitted links against a URIBL. Prevent users from adding links which
are on the blacklist, and remove links which appear on the blacklist.

Reject links to newly registered domains outright.

~~~
benhoyt
Checking a publicly-available URL blacklist is a good idea, thanks!

I may well end up adding a captcha of some sort. However, that effectively
means I can't have a public "create a DecentURL" API call like I do now --
they'd have to sign up and have an API key or something. That's a bit of a
bummer.

~~~
duskwuff
The utility of an API for this sort of service is honestly pretty limited. If
you're doing things right, most of your links should be getting created by
casual users (for pasting into chat/email/twitter) - it's not too much of a
hassle for them to hit your site and fill out a form.

------
vikram
I'd let the spam come in the first instance and not publish the link, based on
rules, or algorithms once you have more data. Later on you can block certain
ip addresses. Also try creating a blacklist to get you started.

------
sdgsfhg
they are probably done almost solely by bots, institute sbl-xbl.spamhaus.org
on your service and rate limit.

