
Implementing Google Safe Browsing server-side to sanitize untrusted input - kellanem
http://codeascraft.etsy.com/2012/03/04/google-safe-browsing/
======
obtu
Interesting idea. There are some extra challenges when the scanning is not
done live.

Does Etsy accept url shorteners? Including some that allow editing? Is it
feasible to rewrite the content to have the redirected URLs?

~~~
indec
We "unroll" redirects as much as possible so that we can check the URL at the
end.

~~~
obtu
In the sense of following them, or rewriting them as well?

~~~
indec
We follow them on the server side and check the final URL we get to against
the GSB database. We don't modify the user-generated content.

~~~
underwater
I assume you check each step of the unrolling? Otherwise a malicious site
could easily do:

    
    
       if (is_etsy_ip()) header('Location: http://www.google.com/') && die();

~~~
indec
Well, generally following the redirects is actually somewhat redundant. The
idea of GSB is that URLs that lead to bad things would all be identified and
added to the database.

Customising attacks for a given site specifically adds complexity and cost to
the attack, which is really the aim for all of this sort of work. Everything
you can do to drive up the cost of the attack makes you a less inviting
target.

It would be a mistake to think that usb4ugc (or tools like it) would protect
everyone all the time. It's never a replacement for vigilance and education on
the user-side, just a useful extra line of defense.

~~~
underwater
It's not really redundant. Legitimate users use redirectors like bit.ly all
the time, so you can't blacklist them. If you're leaving such a big hole in
your system then spammers will work around it in next to no time.

Etsy are big enough that it is worth the spammers time to do so. Once you get
reach a certain size you can't just say "the user should be careful". Scammers
and spammers will hammer at you because they know the numbers make it worth
the effort. Users won't understand what's happening; they will have a bad
experience and they will blame your product.

------
nodata
Privacy implications?

~~~
indec
The privacy risk for GSB in general is that you are sharing URLs with a
(trusted) third party. That's most acute with the REST API, but most
implementations (including gsb4ugc) cache a local copy of the lookup tables
and so don't actually send URLs to Google. There is still a very occasional
need to send a link to Google for validation, but in the server-side case the
only context Google has for the request is the IP address of the server, which
minimizes privacy risks as much as possible.

