
De-Anonymizing Web Communities with Gravatar - wickedchicken
http://rgov.org/2010/11/27/gravatar/
======
extension
Release a new API that uses symmetric encryption instead of a one way hash.
Use however many rounds are necessary to prevent brute force attacks. Gravatar
keeps a list of generated keys, but not who generated them. The API accepts a
key ID and encrypted email. That makes the keys convenient to generate and
change. Deprecate the old API and turn it off some day.

I _think_ that works, but I try not to think about crypto at three in the
morning, or at all really.

EDIT: Also, a consumer site could implement this themselves with a simple
proxy server. They could even open up the service to other sites.

~~~
steveklabnik
One of the best aspects of gravatar is that fact that it takes one line of
code to implement:

    
    
        "http://www.gravatar.com/avatar/#{MD5::md5(email.downcase)}"
    

If it wasn't that easy, I'd probably consider just doing something else. Maybe
just have it use a stronger hashing algo?

~~~
Estragon

      ...have it use a stronger hashing algo?
    

The strength of the hashing algorithm is ancillary to this vulnerability. The
OP was able to resolve the emails because they had a predictable format.

~~~
steveklabnik
When I said 'stronger' what I really meant was 'one with a larger number of
possible hashes', so it'd be harder to resolve them. Not that it'd stop it
totally, because he still knows enough about the keyspace...

Then again, crypto is not my strong point, so I should probably stop talking.

------
il
There was a post here a while back by someone who did the same thing with the
Stackoverflow data dump. I think he was able to recover about 10% of users'
emails from a vastly larger search space.

Moral of the story: simply hashing emails is not enough if you're going to
display them publicly. It's trivial to use a salt and prevent such an attack.

~~~
duskwuff
I believe the mechanics of Gravatar make salting impossible, though. The whole
system works _because_ there's a single hash for any given author email
address.

~~~
teaspoon
Yep. The salt would have to public, which would defeat the purpose.

However, a user can protect herself by "salting" her own email with an address
tag. For example, provide tom+ES85jFxz@rpi.edu as your address instead of
tom@rpi.edu.

~~~
davidcuddeback
A few years ago, I tried using email tags with many online services so that I
could use the tags for email filters. Most form validations rejected my email
tags because they considered "+" to be an invalid character in email
addresses.

~~~
spindritf
Using your own domain with different local names lets you get around that. But
requires a domain.

~~~
dotBen
I've been using this technique for 10+ years. Sadly it doesn't work with
Gravatar because

    
    
       md5("comment+sitename1@mydomain.com") != md5("comment+sitename2@mydomain.com")
    

However, it is really interesting to see the amount of spam I get sent to
certain 'snowflake' email addresses I have only ever used to submit a comment
to top well known blogs. Shows how much database hacking for email harvest
goes on.

~~~
tomjen3
I ended up using that very same technique and ran into the exact same problem.
My solution to this has been to sign up for one gravatar, but after that I
simply stopped caring about the service.

If whatever service I am using can't work without gravatar, they will have to
see a generic picture.

------
abraham
A commenting system that still shows user specific avatars next to anonymous
comments? That sounds like a terrible implementation.

------
to
thats funny... i still have an email from probably 2006 telling gravatar that
a single md5 for each user (email address) is just idiotic for many reasons.
possible collisions, anonymity, and so on. they replied back within an hour
that im wrong, that there is no risk of collision because every email is
unique because its the primary key (dah?) and they will continue using it.

made me laugh. since then i ignored the service as they are obvious just a
bunch of script kiddies.

to be honest - the only obsticle for such a service is server redundancy.
nothing special about it. besides that its now a obsolete service since every
major site offers oauth.

~~~
studer
Given your statement on MD5 collisions, it's not entirely clear to me who's
the script kiddie here.

~~~
to
how many users do they want to store in their db with a unique md5 string? its
just a really bad bet. thats a script kiddie for me, assuming that a hash
never collides within a database and than chosing md5. there are better, even
very simple, ways to store hashes without forcing collisions.

~~~
ErrantX
There are an awful lot of md5 digests - so, even with an insane number of
users it is highly unlikely they would have a collision. Hell; at work we
generate insane numbers of hashes and it took us quite a while, and a large
dataset to find a collision :-)

~~~
to
but why md5 when there are better alternatives? not to speak about the issues
with OP

~~~
philfreo
because it's very simple for anyone to build an app that uses Gravatar since
md5 is probably the most well-known hashing function.

