Hacker News new | past | comments | ask | show | jobs | submit login
An analysis of the Adobe password dump (pv.tl)
50 points by xlfe on Nov 2, 2013 | hide | past | favorite | 34 comments



What a flawed, sensationalist headline. So, he looked at all people that used aliases for Gmail accounts (added a +something or . to their address) to register multiple times.

By comparing the hashes for these Gmail users, he then determined that 51% used the same password for duplicate Adobe accounts. He asserts they were "highly technically savvy users" and thus 51% is the lower bound for the population of all Adobe users, thus 51% of the users used the same password somewhere else. Therefore "64 million" user's accounts are compromised on other sites, because they must be using the same password else where.

OK. Here is the problem. You, a savvy user, have multiple accounts, on the same site, why would you use different passwords for your alias accounts? Same password on same site != Same password on all sites.


I don't use the same password on other sites, but do use the same email (not the one on my HN profile) in a lot of other places.

I'm still not "less angry" at Adobe. Probably irrationally so, since data leaks can happen to anyone, but just the idea that my info is in a downloadable archive somewhere for any Joe Shmuck to pickup is driving me up the wall.


Who knows? All I am doing is working from the data available.

I've tried to be clear about the methodology. My guess is that most users just reuse a small number of passwords. Anecdotal evidence supports this.

If you have a suggestion for how to derive a better _estimate_ from available data I'd be very interested to hear it.


Since your lower bound ignores a common case: I use a formula based on the site to create a [unique]+[common] combined password, your premise that it is a lower bound is invalidated.

A better lower bound would be: find password hashes that occur with high frequency. "password1" and "wordpass" are probably each in the data a few thousand times (or rather their hash is in there a few thousand times).

Then use the logic that if a person is using an extremely common, known insecure password, that they're probably using the same lazy password in a lot of places. Use this as a lower bound, as it is a lot more defensible.


You're looking under the flashlight.

A better analysis could be done if it were compared with another password leak from another site.

But even if there's no better analysis, the best analysis may still be severely broken.


Thanks for the feedback.

I think it's debatable actually - while intersecting the list with another does give you less bias, it also reduces the sample size immensely (as happened in the 2011 analysis I linked to), so there are trade offs in both directions.

I'm not claiming that I know exactly how many accounts reuse passwords - I am suggesting that, based on my estimate, it is more than half.

The evidence (that I've linked to) supports this. There are other studies which show upto 60% of password reuse: http://www.troyhunt.com/2012/07/what-do-sony-and-yahoo-have-...

If you can produce a better estimate please go ahead.


I don't doubt there is much password reuse. Like the original complaint, I don't think the data point has value because reusing passwords on same site implies nothing on different sites.


Do email providers ever look at these dumps to see how well they're doing compared to the competition?

A very quick sum of the top 100 providers gives me:

24476611 Google accounts

41177410 Microsoft / MSN / Live / Hotmail / Outlook accounts

24319157 Yahoo accounts accounts

Which surprises me. Maybe I missed something, or I'm double counting. I'm a bit tired and I've probably messed it up.


I wonder how far back these accounts actually go. "Adobe accounts" have been a thing for at least 4 years -- the first time I needed one was in 2009. I'm sure they probably extend back further than that.

If this is a database that's been growing over a long enough period, the distribution might be very different compared to a system that's only been around for, say, a year.


Email providers have enough address books on file to know how they're doing already :)


You aren't missing anything. Pretty typical distribution.


People do often use an alternative email for web signups though. For me: gmail for personal use, yahoo/hotmail for unimportant (ala Adobe) sign-ups.


Is it illegal to possess the dump file? I ask this purely from a self-protection standpoint as, over the years, I'm pretty sure I've signed up for more than one account, and would want to check to see what data is registered in my name.


I don't think there is anything copyrighted in that file. And you don't crack/circumvent any protection to download that file. IANAL.


I don't think so. If I find it I'm going to download it just to check for my email address.


Where to find it, then?


In addition, the password length of an Adobe account is restricted to 12 characters:

http://www.jirasekonsecurity.com/2012/12/adobe-limiting-maxi...


That hasn't been my experience. I changed mine today to a 40+ character password just fine.

That said, I was still a victim of the hack. Good thing I had a unique password that I didn't share with any accounts.


I changed mine to a 20 character pw, but couldn't login with it. I worked, when I entered only the first 12 characters.

You could try logging in with the first 12 characters.


That is correct, 1Password tells me that for my two accounts created in 2011 and 2012. (At least mine are 12 chars, and I always max them out)


Just reset my passwords to 32chars, which worked (both setting and logging in). (Now, wether or not all 32 chars are actually being used...)


Evidently the passwords weren't hashed with a unique salt or this analysis wouldn't have been possible.

Edit later: Ugh, now that I read up on this some more I realize they weren't even hashed, but encrypted instead.


I did a similar analysis a few years back. http://rafekettler.com/2011/06/16/analyzing-the-lulzsec-pass...


Is there anyone out there that takes these reports and the reported files and sends an email out to all the people informing them that their password is potentially compromised?

I haven't yet received an email from Adobe, but I would imagine that my email address is likely among the accounts that were leaked in this file.


you'll need to still download and gzip -d the file... but this will help you search

https://gist.github.com/taf2/7285640


or just use grep...


I tried grep first - at 10-12mg / sec disk io it was taking too long... This scans the file in about 10 - 30 sec depending on whether your email is near the top or bottom... Obviously if the emails where sorted we could do better...


Ironically, if Adobe is truncating their passwords to 12 characters that would be a major positive for me in the case of this leak as they'd only have half of my password. At least one of my accounts account from back when I memorized all of my 20+ character passwords was compromised. I'm sure I still have a few accounts with the same pass that need to be reset just to be safe, but it's at least there's something to chuckle about...


but what if adobe only checks the first 12 characters entered? if so, then your "whole" password needed to log in was in fact leaked.


When logging in to Adobe, but that's a password reset away. They won't be able to get into any sites which allow more than 12 characters which, while it probably doesn't help me much in practice, may at least serve to frustrate a couple people who try to mess with my accounts. It's still a major net negative.


So someone decided to host the only copy I can find of users.tar.gz on sourceforge, who has of course taken it down. There’s a torrent, but it seems to be stuck at 75%. Anyone else got something I could use to warn customers in the list to change their password?


Someone's done it already - http://adobe.cynic.al/


I have just seen this dump, 153 million email addresses just waiting to be spammed.


I see three of my email addresses in the list 2 are no longer valid - I imagine because of the number of invalid email a addresses in the list any spammer attempting to use the list would pretty quickly be shut down . But surely someone smarter about email delivery could comment




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: