0. This is a file of SHA1 hashes of short strings (i.e. passwords).
1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.
Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.
5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 is not present
000001e4c9b93f3f0682250b6cf8331b7ee68fd8 is present
Same story for 'secret':
e5e9fa1ba31ecd1ae84f75caaa474f3a663f05f4 is not present
00000a1ba31ecd1ae84f75caaa474f3a663f05f4 is present
And for 'linkedin':
7728240c80b6bfd450849405e8500d6d207783b6 is not present
0000040c80b6bfd450849405e8500d6d207783b6 is present
2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.
3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.
4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...
5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.
I thought 16k entries might be reasonable but that doesn't even last 3 weeks for me. I think there might have been some issue with slow disk seeks so at some point I restricted it to that many.
I guess it probably it would be better to regularly backup the history file to deal with possible some accidental truncations and issues when running multiple shells concurrently, but probably the overall effort to set up such a system would outweight the benefits.
If you want coverage, generate a few hundred thousand SHA1 hashes along with your password.
Actually, running a trickle query of random SHA1 hashes from your box might be a fun exercise, along with a trickle query of random word tuples (bonus points for using Markov chains to generate statistically probable tuples).
If you search for 'sha1 foo', that's being sent across the network to DDG's servers. And sure, if you're using SSL then it's not going across in plain text, but it's decrypted and handled on their servers in plain text; it'll probably even end up in logs and/or tracking databases somewhere. You're giving DDG your password.
At worst you're giving the attacker a hash target to try brunting. He still has to brute it, and that takes time. Select your plaintext from a large enough keyspace and it's astronomical time.
I'll need to review their policy more closely, but DDG claim fairly minimal tracking. At best someone might be able to correlate hash lookup with some IP space. That's a long way from handing over passwords. And as I already indicated, you could cradled the queries to make the search space much larger.
No, no, no. You're 100% completely misunderstanding this.
When you search for 'sha1 foo', that query ("sha1 foo") goes up to the server. They know your password is "foo" and that you're attempting to "sha1" it. They don't have a hash, they take that data and perform the hash, then send that down to you.
[xargs] allows you to pipe the output of one command as an argument to another command. By default it will show up at the tail end of the second command's arg list, but if you want to interleave it you can use -I flag:
I'm surprised at the backlash to what I thought was fun code golfing. No one called me names after I posted a simple Python solution that didn't check the file. For what it's worth I've changed my LI password and I haven't bothered downloading the actual hash file.
node has a neat API for quickly knocking out stuff like this; it's a useful tool for more than just server code. Calling that comment fanboyism is just displaying the opposite of fanboyism, prejudice against hyped-up tools that nevertheless are good tools.
"Which brings us to the most important principle on HN: civility. Since long before the web, the anonymity of online conversation has lured people into being much ruder than they'd dare to be in person. So the principle here is not to say anything you wouldn't say face to face. This doesn't mean you can't disagree. But disagree without calling the other person names. If you're right, your argument will be more convincing without them."
Some people actually do call names to others when face to face.
Personally, while I don't, I do tend to get a little aggressive and then I'm often surprised with the backlash, because I get that way when I'm genuinely enjoying the conversation, not when I'm irritated.
The first one ramps up memory use like crazy (which I was trying to avoid) and the second one is much better with memory, but you need to move the sha1_hex into the BEGIN block or you're recomputing the hash for every line parsed, thrashing your CPU. Interesting use of 'shift' though, I didn't know you could modify the file argument to -n like that.
dups is indeed a little helper of mine. Like uniq it only handles sorted input. Update: I see you edited your answer to include uniq -d. I wasn't aware of the option, thanks. Now I can simplify the implementation of dups. But I find the name valuable, and I think it's perverse to say uniq when you mean its opposite.
I'm not sure what you're suggesting. I'm supposed to echo |cut ...? But I have a whole file, not just one line. So I have to cat ... |cut ... -- which is what I did. So what's your point?
I could keep the file first by saying:
$ < combo_not.txt cut -c7-40 |sort |dups |wc -l
To which I reply, "Yuck!"
Perhaps we should stop here. You seem to have made this account just a few hours ago for the express purpose of poking at people's code fragments in this thread. You're making stylistic nitpicks (they don't affect correctness, do they?) and you're making them in a tone that I'm not sure I would take from Randal Schwartz himself (you actually edited http://news.ycombinator.com/item?id=4076556 to be ruder than the original). It's a drag, man.
I disagree with #5, I had a few of my coworkers check their sha1 against the DB and most of them were not in the dump. I also checked for truncated hashed, none of which were found. I have the feeling this is a subset of the full database
So I have a funny wild theory...remember back when the Gawker database was compromised? And LinkedIn forced a password reset for users who (according to what I read) used email addresses that matched the Gawker leak?
What if they also (or actually) compared password hashes from their database to the ones released in the Gawker breach? In that case, they likely wouldn't have pulled data straight from the database but actually might have pulled passes from the db, output to text files, cut the text files up to parcel out for processing via Hadoop or something? And somehow one of those text files got loose somehow...or someone MiTMed the actual process (I'd vote for a floating text file just because it's been so long; the Gawker breach was in December 2010).
my fairly complex alphanumeric+symbol password IS in the dump, though not prepended truncated with 0's and the other one I found, which my coworker admitted was too short and alpha only, was in the dump with prepended 0's.
This could validate the fact that the truncated hashes are actually already cracked.
My password is in the dump. I use the Forget Passwords Chrome extension , which is based on pwdhash.com, and generate site-specific passwords based on a master password -- i.e. my password is only used on LinkedIn and it's unlikely that I share it with someone else.
I think I have changed to this password during the last year.
Hmm. My truncated password (for my now-deleted account) is not in the list of hashes -- so it's not just a uniq'd full DB. Also, the original forum thread where the file was first posted only managed to break around 600,491 passwords before it went offline ... so 3,521,180 broken passwords could mean that the original hacker has had access to some LinkedIn accounts for more than just a few minutes today.
This yielded success on some known passwords and a bunch of obvious passwords. Not mine, but I assume this dump is a list of the passwords they've cracked so far (i.e., even if your password isn't on this list - change it).
A stock JtR 1.7.9-jumbo5, using the default rules, is finding quite a few of the non-zeroed ones pretty quickly. This surprises me; I would have expected them to have run the list through the JtR mill before passing it on to others.
Likewise, my password (MybXy836YCza), which wasn't used anywhere except my LinkedIn account created 29-Jan-2012, and has been stored securely at my end, wasn't on the list (either as a full SHA1 sum, or as part of the SHA1).
As you probably guessed from the fact that I posted my old password, I changed it just in case the list that was shared is only a partial list of what was obtained.
Do you remember when you first used this password at LinkedIn? It could help narrow the dates of the breach. Especially useful would be the presence of a strong password in the list that was subsequently changed. That might help determine its freshness, if the new password isn't present (although this may be an incomplete list from an ongoing breach).
I couldn't find my password on the list and I've been using the same password for LinkedIn since I registered. I was trying to remember when was that. If someone know how to find out the last time you changed your pass or when you registered for linkedIn please let me know. I'd guess I use linkedIn for over 4 years at least.
A "member since" date is available on the "Account & Settings" page. Choose "settings" in the drop down that appears when you hover over your (account) name in the upper right corner of any LinkedIn page.
I'm not a math person either, but here's some fodder for someone who is.
Mark Burnett's extensive password collection (which he acknowledges is skewed, because it's largely based on cracked passwords, he only harvests passwords between 3 and 30 chars, etc.). Here's how some of his stats shake out:
* Although my list contains about 6 million username/password combos, the list only contains about 1,300,000 unique passwords.
* Of those, approximately 300,000 of those passwords are used by more than one person; about 1,000,000 only appear once (and a good portion of those are obviously generated by a computer).
* The list of the top 20 passwords rarely changes and 1 out of every 50 people uses one of these passwords.
So it's conceivable that 6M unique passwords could cover a very significant portion of a 120M user namespace.
"We were curious what would happen to our share price if our company did something incredibly stupid"
The above comment might seem incredibly harsh, but really, there's no good excuse for a site this prominent to not have a salted, secure password hashing system. Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.
The only way I could regain respect for LinkedIn is if we find that these unsalted hashes were from users who never logged in to LinkedIn after the security upgrade. From the replies of other HN users who have found their password hashes in the leaked list, this doesn't seem to be the case though.
I can understand database leaks. Bad things happen. Not being prepared for such an event however is where I draw the line. These leaks impact users far beyond just the site at fault.
It's not enough to say users should use LastPass. They don't, and that's the world we live in, for better or worse.
If computer security doesn't take into account problematic users, then it's flawed computer security.
Surely just hashing the username|password would massively reduce the effectiveness of leaks like this? Sure, a hacker would know what the "salt" is, but since it now varies between users you would expend the same amount of effort breaking one person's login as you previously would spend breaking everyones (on average).
(Not recommending it, just wondering if my reasoning is correct.)
I hear this commonly, so it is a good idea to clear it up.
Usernames have lower entropy than a random salt and are predictable in many cases. People re-use usernames and some usernames are common. If your password system became common on the web, or if I knew the workings of your password system (i.e. open source / leaked codebase / Kerckhoffs's principle), I could generate a rainbow table for either common or targeted users. This means I could generate a rainbow table for "Jabbles", gain access to your password and compromise your account before the website is likely even aware of a breach or has time to warn you. Salts only act to slow down, not prevent, compromising leaked password hashes (as you can always brute force which is quite practical with MD5/SHA1). Thus, using a username defeats one of the stated purposes of salting.
It's also said ad nauseam (with good reason) but rolling your own in security is a bad idea, especially when libraries exist that do exactly what you'd intend to do just as easily. Algorithms such as bcrypt and scrypt exist and are well vetted. bcrypt is easy to integrate with many languages and provides a trivial interface and sane defaults for iterations/rounds [brute force] and salts [rainbow table]. bcrypt can also handle increasing the security of your system over time as the metadata is stored as part of the hash.
tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.
Often people say "Don't roll your own security" but the reality is that developers aren't trying to roll their own. They are trying to solve a problem, and if a quick google doesn't turn up a good library then they'll try and figure it out. Googling for password security implementations is likely to be fraught with horrible horrible advice.
I guess what I'm saying is that it's not enough to say don't do it, instead the defaults need to be there (and very visible).
You need more than just bcrypt. You've hinted at other things, but a few random things popping in to my mind:
* Preventing password logging (many web frameworks log parameters)
* Secure password recovery
* New alternative attack vectors (eg. Facebook, Twitter auth)
* XSS and CSRF
There are so, so many simple to make security errors, and worse - many of them are inter-related so that forgetting one will make another vulnerable. This is why you need safe defaults and more Security education.
True point and this is probably off topic, but out of curiosity, what is the recommended approach for his point about logging messages/requests?
On previous projects, we've gone through all sorts of machinations to detect a password in our SOAP logging. This usually involves XML parsing (slow, ineffective on malformed messages) and Regexes (ineffective on malformed or "unusual" messages).
I can't think of anything better, short of "you can't leak what you don't log" which is nice in theory but not always practical.
Having a password salted with the username fairly easily balloons out the complexity of building and searching a rainbow table by a factor of the number of usernames you want to be useful for. This factor is larger then you'd expect, given the sheer quantity and variety of usernames in various systems.
For a targeted attack it really doesn't matter as the time complexity to produce the rainbow table is equivalent to that of simply brute forcing the hash, ie, you can't say 'well assume the rainbow table contains only some small number of usernames"...
It also is entirely unlike the WPA2 rainbow tables in that you don't have millions of users all sharing the same username (ie. factory default SSIDs).
Overall it's more secure then it seems at first glance but you still have to ask yourself why you'd use that over a random salt.
The targeted attack does matter though, for the reason I pointed out above.
I can produce a rainbow table offline before I compromise the targeted system as I know the username of my target. This is not possible if the salt is random. This means I can crack a targeted user's password hash _instantly_ upon gaining access to the system.
With a random salt, you can only perform the brute force attack on that targeted user _after_ you've gained access to the system and likely alerted them to a compromise.
If the response time of the compromised system and team is a factor, this means using a username as a salt compromises your security greatly.
tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.
1) You know the hash function beforehand
2) You know that they are salting in exactly this way
3) You know how they are doing their salting (HMAC vs., vs.)
4) You have enough time to create this new rainbow table
5) You have only just enough access to the system to dump the hashes (ie. the easier routes are blocked off from you)
That would in fact, with some probability (based upon the complexity of your rainbow table and the complexity of the users password), give you the passwords for a particular set of users.
I did say that it was more secure then it seems, not that it was perfectly secure :)
Remember salts don't need to be secret to do their job. The goal is to change the algorithm slightly (by adding additional input) for each user. That means you can't mass-precompute (rainbow tables), and just look up what matches, you have to break each user individually.
Your reasoning about how salts work is correct.
There's also something called a pepper which is another additional bit of input data, that is only stored in the app code (fixed for entire app). So an attacker who only manages to get a database dump would need to guess yet another chunk of data (making it near impossible). So a well-seasoned hash would be SLOW_HASH(pepper+salt+password).
Security is all about layers. Each layer protects a bit more, or prevents things from being easy for the attacker.
Edit: Don't do this yourself. Know it for the theory part - but then just use a well-vetted library to do it.
Please refer to my comment above. You can precompute a rainbow table if you know the username (trivial) and the method of hashing. Whilst usernames as salts would increase security over no salt, it results in a potential exploit / vulnerability that would not exist if the salt was truly random. Hence, suggesting the use of usernames as salts is not wise.
I read cschneid's comment twice, and nowhere to I see where he or she specifically recommends using the username as a password; he or she simply recapitulates the logic behind using a unique salt value for each stored hash, and describes using an additional non-unique value which is not stored with the passwords ("pepper"), which is a new and interesting idea, at least to me.
It would make it a lot easier for LinkedIn to identify whose hashes were leaked because with a salt, all passwords would be unique. It would also make rainbow tables useless.
But in this day and age, the bigger problem is how fast you can compute the hashes, salt or no. With GPUs you can calculate a few hundred million(depending on the hashing algorithm) per second, making the algorithm used the real vulnerability.
Best practice involves increasing the calculation time of you're algorithm. Theoretically, you could just rehash y few thousand times in a loop, throwing in a salt here and there, but practically, you should just use bcrypt or scrypt.
In a password hashing scheme with a salt, you're supposed to consider everything except the cleartext to be public, for the purposes of analysis. The password should be unrecoverable even if the attacker knows the algorithm and any salts.
It's true that that would be an improvement, however we try to avoid discussing things like that seriously because of the risk that someone new to the game will actually try to do it. The easy answer is to use an out-of-the-box secure password strategy, anything else is adolescent.
Regarding requiring users to log in; wouldn't it be better to run their current hash through another password hashing scheme (while we're at it bcrypt, scrypt, PBKDF, etc)? Then, the next time they log in, verify them by running their password through the old algorithm, and the result through the new one.
That could be a good transition strategy if you're worried about being compromised before all your users have logged in again, but you would still want to move them over to using just the new system when they do. It probably would be fine, but when it comes to crypto you don't take chances when you don't have to.
>> Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.
In thinking about this, I wonder if in that scenario you'd even have to wait until next login. You could just use the weak hash as the input to your salted hash function and keep a flag of whether or not you need to 'pre-hash' the password before using your v2.0 salted hash. As users log in you could replace slowly replace the double hashed entries with single salted hash versions and flip the flag.
What do you recommend users do instead? Unfortunately there will probably always be websites storing passwords in unsecure ways. I mean I'd certainly rather not have to deal with the hassle (however small) of using LastPass, but as you said, that's the world we live in. Hoping for competence by the writers/maintainers of websites is also flawed computer security, is it not?
Hoping for competence is indeed flawed from both sides.
I would hope users use distinct, random passwords for each site they visit and that developers store those passwords in a safe secure way. I also assume both sides won't listen to logic however :)
The reason I'm annoyed with this particularly is that larger sites are more likely targets due simply to their size. Larger sites generally have the developer resources to provide a good solution to the problem from their end but commonly don't.
This makes them look bad and means their users are left in more danger than before. No-one wins.
KeePass works well too - open source, offline solution that has an "Autotype" function.
I actually only run into passwords that are a pain on mobile devices. Now that my Android phone has no keyboard but tons of power, that's becoming more and more significant.
I use keepass too. I keep my database in dropbox and use the android dropbox and keepass clients on my android. Logging into an app or website involves opening dropbox, clicking on the database, entering my password, choosing the site, and clicking on "copy password to clipboard." It's a few extra steps, but it's not that much of a hassle.
 I find this easier than opening keepass and selecting the database from dropbox for some reason that might be as simple as dropbox having an easier to spot icon.
You can also use the favorite feature on Dropbox to keep a fresh copy of the database on your phone and have KeePassDroid remember that location. Then your flow is 1) open KeePassDroid 2) enter password 3) select site 4) copy/paste
One more KeePass user here (actually KeePassX). But I'm using it only for not my own passwords, provided by others and so on.
For my personal ones I'm keeping few algorithms in my brains. I'm using resource type (website/some server/device) and name (e.g. domain/model) as variables and after few steps in my head I always have different password for each kind of service.
I use open source tools such as "pwgen", "emacs" and "gpg". Open up the encrypted file in the editor, type your pass phrase if you haven't this session, cut and paste, close file. The built-in keyboard navigability makes this faster than everything but the in-browser form filling.
Nothing, really. However, I trust the LastPass guys to keep their shit secure as much as I trust myself to keep my own system secure.
After all, if my own system is compromised, I just get a lot of hassle. If LastPass ever gets hacked and leaks their passwords, they lose their business overnight. That's pretty good motivation for them to keep on top of their stuff.
I used to use 1Passwd, which stored the passwords in a local file, and that could be said to be marginally more secure, except that it generally uses something like iCloud or Dropbox to sync the passwords, so there's still a single point of failure... The main reason I moved away from 1Password was that they gave me a shitty response when I asked them if they were going to support Chrome. I decided at that point that I didn't want to give them my money anymore, and so I didn't upgrade to 1Password 3.
The big difference between "hosted service" and "encrypted file in the cloud" is that the hosted service has, by definition, to store the key next to the lock to be practical.
The key for your encrypted file stays in your head (and/or in your wallet), so even a full-on total breach of Dropbox/iCloud, your key is safe, and 8 million rounds of 265-bit AES and a good password (my current KeePass settings) is still unbreakable.
1: Unless (perhaps) you have the attention of certain governments. And they always have the option of using a $5 wrench on you, anyway.
As far as I know, LastPass does not "store the key next to the lock." The browser extension encrypts/decrypts locally. If you use your password file through the web site you're still downloading your encrypted DB from them and encrypting/decrypting locally (whether with the extension, or I believe they also have a pure JS implementation).
 Or so they say. I've never MITMed their SSL, and their software is not open source AFAIK. This is not to say someone couldn't e.g. distribute a trojaned version of their browser extensions. If you poke around the developer(s) have at least revealed the encryption method for the your DB so you can verify how it is encrypted for yourself, which is a good sign if nothing else.
I use 1Password, rather than lastpass. On that system, your password file is stored locally by default, so their isn't a centralized password store to attack. If you do syncing of passwords between machines, you keep an encrypted password file in your dropbox account.
LastPass encrypts your passwords using your master password as (at least part of) the key. This means that they do decryption of passwords client-side as well. The entire password file is not stored locally but they had an intrusion of some sort a number of months back which demonstrated that they have a pretty good system set up along with quite a bit of monitoring. Truecrypt in dropbox is obviously a good choice if you're super paranoid but after seeing LastPass respond to security really well and it having an overall pretty simple UX, I don't have any reason to not recommend it.
I use KeePass right now synced with Dropbox - what keeps me up at night is the fact that if the bad guys got my password file today, there could turn out to be a vulnerability in it discovered years from now that could allow them to get my password.
You're free to hit "delete" on linkedin, but there's a very high likelihood that it will only mean "hide my profile". Anyone who got your user/pass would probably be able to reinstantiate your account and do anything to it they wanted.
I took the step of markedly decreasing the information on my current legit profile. It includes my name and general title, but no job history. Public disclosures of connections, etc., are highly limited.
Having a fictional LinkedIn account can be amusing.
I'm worried in a few years LastPass could become a target, and now instead of someone having a password that 'could' be shared among your multiple accounts, you have now given the complete keys to the city by listing all of your logons great and small in a central repository.
This central repository then becomes a very appealing target.
I say this as a LastPass user, as I think it is the best of the current offerings, but I'm uncertain how to shield this huge central list. I wish it had multiple logon PW so that you could at least segment the risk and reduce the time the high PW is used to when you really need it.
It saddens me that every, single, time this topic comes up, HackerNews, of all places, displays an immense lack of knowledge of current password storage applications, how they work and what value they bring.
I think it's really humorous that people feel safe putting an encrypted file in something like Dropbox, but don't trust LastPass (who are doing the exact same thing, everything is local, client side encryption). Especially when you're missing out on all of the benefits of browser integration.
Please, take a whole 3 minutes and do a tiny bit of research. Your future self will thank you when people like swombat and myself get to laugh at LinkedIn, change our passwords and never think about it again.
I think the difference you're missing is that LastPass offers the OnlineVault option.
I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.
If Lastpass.com is compromised, the attacker can MitM compromise my credentials.
If 1Password.com is compromised, that is not the case. (Yes, if Dropbox is compromised, they could capture my dropbox credentials, but it would be more difficult for them to then capture my 1password credentials)
>I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.
You can't even do that. You have to install a local client. Download the file, open it in your new client, edit it, manually reupload it. If you don't want to use the on-web LastPass vault, then don't, but it's still doing local decryption and you can still used the signed Chrome extensions to carry out ops if you don't trust LastPass.com proper.
>If Lastpass.com is compromised, the attacker can MitM compromise my credentials.
Which part of "local, client-side encryption" is confusing?
edit: 1PassAnywhere is the exact same thing as what LastPass is doing with it's LastPass.com-served Vault.
edit2: There's even multifactor auth available for it and the Online Vault feature.
I apologise for my immense lack of knowledge of current password storage applications (i'm not a programmer and come here for the other stuff), but what is the benefit of these services (lastpass etc)? This is a genuine question.
It seems to me that instead of having several passwords in my head (i can remember random long strings of characters pretty well, and have a heirachy of randomness/longness depending on what I care about), I only have to remember one. But if that one's compromised, aren't all the rest then available?
Reminds me of the bit in hitchhikers guide to the galaxy (life the universe and everything i think) where passwords and biometrics etc had become really difficult and secure, so a datacube thing was created to store them all. Which was then found by a character before hilarity ensued.
1. Your physical machine, or the LastPass/Dropbox server.
2. Your master password
3. (optionally) a second-factor auth source
Then yes, they have access to all your passwords. But this is vastly superior to having one password that alone compromised grants access to all of your accounts, right?
I mean, the most secure way imaginable would be perfect biometric signatures, or humans smart enough that they could perform asymmetric encryption in their heads to sign challenges in a verifiable manner. Outside of that, this is decentish.
You could use a text file in a Truecrypt volume with keys that are stored on separate jumpdrives (but what if someone compromises a machine that you plug those drives into), etc, etc.
Please stop stirring up drama about this issue. While you are technically incorrect (PBKDF2-SHA1 is faster than and thus inferior to bcrypt), it's irrelevant: all three of [scrypt, bcrypt, PBKDF2] are just fine, and you can safely pick one at random.
Am not a cryptographer by any means, so please correct me if I'm wrong:
If you use any reasonable cost for bcrypt, you're talking hundreds of milliseconds per attempt on a modern CPU. For each 6-character password (since you can't generate a rainbow table) at 100ms per pop, you're talking about something on the order of 2+ years per password divided by the number of CPUs. With something like 900 CPUs running continuously, you could expect to recover one 6-char every day if the passwords were randomly distributed in the 6-char alphanumeric space. So, pretty feasible, assuming a 100ms cost. Short passwords do hurt you; I agree.
Now for 8-char alphanumeric passwords, you'd have to run ~1 million CPUs continuously to expect to recover one per day at a 100ms-per-pop cost. This is more of a stretch, assuming you're trying to do this with, e.g., botnets. It seems that someone asking for help cracking a password list on a forum would probably not be able to assemble this much computing power.
Or 1 billion CPUs continuously to expect to recover one 10-char alphanumeric password per day.
Of course, the assumption of random alphanumerics is wrong, both because many people will use common passwords and because others will use non-alphanumeric character substitution.
At any rate, it seems to me that leaking non-salted SHA1 hashes is virtually the worst case disaster scenario, short of plaintext passwords.
The time bcrypt takes is configurable, so in the future you can adjust the amount of work per password -- this is literally a one-character change in your code -- and be alright again. Ditto for the rest of the decent password hashing schemes.
I think you are propagating the myth that a scheme can be secure forever.
It's ok if WAP is breakable with cloud computing, because the whole point was to secure it for the next X years so that it takes more than Y dollars to break it. You only need to protect million dollar data enough that it costs 10 million dollars to get it.
If the data is valuable enough and protected heavily enough with crypto, the cheapest way to get it is through a meatspace attack (break-in, abduction, etc).
> WEP was considered "good enough"
Not by security professionals once they saw the effective size of the key. It's the downgrading of what looked like a 64bit key into a 48bit key that was the biggest problem.
These hashes were posted on a forum as a plea for help: the guy did not have enough computational power to crack them all on his own. Had they been salted bcrypt hashes, it might have actually discouraged him to the point of not even trying.
So yeah, the weakest passwords will always fall, but good solutions will go to great length to protect even the most clueless of users.
I wonder, why do people saying "just use bcrypt" never, ever bother to elaborate on what benefits it has, and which of them are relevant to the subject of the conversation? Believing in some function without understanding implications of its use does very little for real security.
Bcrypt does not require your understanding. The most important thing is that you use a strong password hashing method -- of which bcrypt is the best-known, and an excellent choice. For a basic level of understanding, here's a slightly exasperated blog post that a lot of people link to:
It's not an in-depth answer. It does not say, for example, why bcrypt is more secure than nested SHA1. (I believe it has to do with the possibility to efficiently implement SHA algorithms in GPUs.)
People are using unsalted SHA1, because someone told them in the past "just use sha1". Now someone else tells them "just use BCrypt". Without understanding why, it's nearly impossible to to decide which security policy is sensible. There are many different types of advice competing for attention, and not all of them are good.
Somebody once said fire was composed of phlogistons. Later, different people said that fire was instead a process of decomposing fuel molecules and a release of visible light due to the energy of the chemical chain reactions taking place inside the flame.
The guy who said "phlogistons" was wrong. So was "just use SHA1" guy.
I wonder why people who make this complaint never ever bother to google: "why use bcrypt". It's like they somehow forget they have the best magical oracle to answer questions at their fingertips, which can answer the question better than most people who understand bcrypt could.
stef25, this is known as key stretching, as others have already explained elsewhere in this thread. Essentially the idea is to make computing the final hash of the password slower by iterating the hash function many times.
This additional slowdown is unlikely to be noticed by a user during an interactive login (hashing the password may take 1ms instead of 1us -- an imperceptible difference to a human) but it dramatically slows down the speed at which an attack can compute hashes to try and recover the password for a leaked hash. It also increases the amount of storage space required for (a naive implementation of) a rainbow table since the attacker would need to store the output for 1, 2, ..., n iterations of the hash function.
I'm not familiar with iterations, anybody care to clue me in? I would have thought salted sha-1 would be decent for password hashing, though not the most solid possible, but at least not laughable. Is that not the case?
Keep in mind that whoever leaked the hashes is probably keeping the usernames / emails for themselves. The forum in question doesn't allow posting of user-identifiable information according to the forum guidelines.
The leaked hashes seems to be SHA-1. I've also confirmed that the hash of my own (semi-complex) LinkedIn password is in the list.
Accidentally this is the same password as I had for HN and that I've now changed (phew! THAT'd been bad! :-)
It would still take a moderate amount of time for a single password if it's long and complex -- you're essentially generating the rainbow table. You might as well just download a sha1 rainbow table and just perform a O(1) lookup. You could reverse all the 6.5M password hashes in mere seconds.
Actually, for a large enough list of unsalted password hashes, bruteforcing is faster that rainbow tables:
- a rainbow table may require a constant amount of time to reverse 1 hash, but it has to be repeated N times for N passwords.
- when bruteforcing, a password candidate can be checked against N hashes in a constant amount of time (look up the candidate hash in a hash table)
For example if it takes 10 minutes to look up a hash in a very large rainbow table (such as the A5/1 GSM tables published a few years ago), it would take 123 years to attempt to reverse these 6.5M hashes. On the other hand, millions of the leaked SHA1 hashes can be cracked in mere hours on a GPU with oclhashcat which tests billions of candidate hashes per second.
true, for extremely large rainbow tables. SHA1 tables are around 20-60GB depending on how large your base character set is. If you shoved all this data into a giant database, query speed is still under a few milliseconds. In general, rainbow tables can be sharded fairly easily, so if your data set is a few hundred terabytes, just split it across a few machines and you'll retain the millisecond query times. Storing and querying easily partitioned data will usually be faster than a brute force calculation.
Calculating it is like saying you want to find the fibonacci number for any given N, and you have a really fast processor to calculate it to that N, but if you just persisted pre-calculated values up to C, you'd only need to calculate N-C hashes. So even if you are bruteforcing the password, it is still faster to have rainbow tables up to a certain length.
What I say is true for any size of rainbow table. It seems you forget that RT lookups require CPU resources in addition to mere I/O resources. There is always a number of hashes beyond which brute forcing them is faster than RTs. Sometimes this number is very high (billions of hashes), sometimes it is lower (thousands of hashes). It depends on many factors: RT chain length, speed of the H() and R() functions, speed of the brute forcing implementation, etc.
To take your example of a small SHA1 rainbow table of 20GB, assuming it has a chain length of 40k, looking up a hash in it will require on average 200M calls to the SHA1 compression function (assuming a successful lookup). A modern CPU core can do about 5M calls per second. Therefore looking up one hash will take at least 40 sec, and looking up these 6.5M LinkedIn hashes would take 8.2 years! (This is just counting CPU time, I assume the RT is loaded in RAM for a negligible I/O access time to its data.) A RT of this size would cover a password space of about 2^44. For comparison a decent GPU can brute force this many hashes concurrently at a speed of roughly 500M per second (see oclhashcat perf numbers on an HD 7970). Covering the same password space would take only 9.8 hours. Compare 8.2 years vs. 9.8 hours: obviously the LinkedIn hashes that have been cracked so far have been brute forced, not looked up in RTs!
And even if you leveraged GPUs to perform RT lookups, they would speed up the computations by roughly a factor 100x, reducing the 8.2 years down to 30 days, still unable to match the short 9.8-hour brute forcing session. (My friend Bitweasil is doing research on GPU-accelerated rainbow tables, see cryptohaze.com)
As a more general question: why is it not an industry standard to salt with the username/email in addition to the random key? (i.e. Sha1($salt + $email + $password)). Even if the random salt were excluded, I would think that this is much more secure. Existing rainbow tables would not be anywhere near as helpful, and attempts to generate a rainbow table for a specific salted database would be ineffective because the salt changes on a per-user basis.
Then the password has to be updated whenever your email changes. I believe Amazon does it like that, literally "forking" whenever you change password; at one point it was possible to simply log on with the old password and live an "alternate reality" where all changes you'd done after changing pwd had not been applied. Don't know if it's still the case today.
Why would you use the email? Mostly when passwords/usernames are stolen the email is there too. For my site I have an unique 128-bit token for every user. I also have a 128-bit site_key (which is in the application, not db) and mix those with the password and then hash.
To get a sense of it, I downloaded it from a link here. Below is the structure of the first few lines. Caveat: it's garbage/useless data below -- I intentionally changed around the actual numbers to give a sense of the structure, only:
The pattern 000000a9 is just in presentation - I counted the occurrences of different bytes in that position (also misled by the apparent pattern, where many lines in a row would have the same 4th byte), and each possible value is present more or less equally often.
It seems like it's just sha1.
EDIT: however, 3.5 million hashes start with 5 zeroes, which is way too many for just coincidence. Possibly they used multiple hash functions?
MD5 isn't the issue - it's the lack of salting. Without a salt, almost any hash can be cracked with a rainbow table. With a salt, you'd need to know the salt for each hash, and then generate a new rainbow table, in order to recover the original password.
This isn't really the issue. The real issue is that MD5 (though these hashes are SHA1, which has the same problem) are too easily computed; they are practically byte-forceable. I don't need a rainbow table to compute hashes when I can slam out millions in short order using a GPU. You have a good point about needing to know the salt, but getting the salt is generally easy because it's usually stored in the same place as the hashes (and this practice is fine, because hiding the salts doesn't improve security significantly on its own).
We're speaking about a very specific attack here: bruteforce. And I'm speaking about a very specific type of "salt" (which could probably be called something else, since it's not the same as normal unique-per-password salt): large, database-wide string of random bytes.
If every password is padded with such a string before hashing, computing the hash would be slower. Obviously, it would be slower because you would have to process more data. An interesting question is whether this would also make it less parallelizable by the virtue of having more information than would fit into GPU cache.
None of this makes much sense to me, sorry. Brute-force password cracking has worked on salted passwords since Alec Muffett released Crack in the early '90s. The amount of extra computational power required to hash a password and a salt is negligible.
The only thing "salts" do is prevent rainbow table precomputation, but it's just a quirk of the late '90s and early '00s that "rainbow tables" ever became a mainstream attack method: one bad Microsoft password hash and a series of bad web applications. Long before the MD4 LANMAN hash was ever released, people were breaking salted Unix passwords with off-the-shelf tools, on much, much slower computers than we have now.
Computing a hash on 1MB of data is slower than computing a hash of 6-8 bytes of data. Brute-force attacks are based on trying different passwords and seeing that after being salted they generate the same hash as in the database. Therefore, adding a large string to the password before hashing would force the attacker to hash that string. The question is, can this be pre-computed once or efficiently parallelized?
First, I do not advocate anything here. I asked a question.
Second, working with a large string of bits is the same as recursive hashing only if you can pre-compute some small intermediate state of the hash function for that string independently from the password you're trying to guess. If you can't, you would have to work with the entire string for every new password tried.
128 bytes is not "large". I was thinking more along the lines of megabyte+. There is no question that it will slow down hash computations, because you would need to process more data. The question is, can you efficiently parallellize this in a commodity hardware (GPUs)?
To be clear, MD5 (or SHA1 as these apparently are) is a problem. Passwords should be stored using a cryptographic hash function that is designed to hash passwords (read: be slow), not a generic cryptographic hash function (which are designed to be fast). This is exactly the problem that bcrypt was created to solve (among others).
Still, it doesn't matter. As long as one can generate a rainbow table for the hash function, then password lookups will be a O(1) operation. The rainbow table for md5 is moderately small, sha1 is bigger, and I'm sure sha2 is even bigger than the sha1 table.
That's easy to do it you have the email addresses, but impossible to do if you only have the SHA-1 hash, as in this case (unless you're also using unsalted SHA-1 hashes, which is a much bigger issue by itself).
Easy, just convert all the hashes into passwords using a rainbow table. Should only take a few seconds to convert all 6.5M passwords -- O(n) operation here. Then run all the passwords through each user's password algorithm, this is a O(n^2) operation. Essentially you're making 6.5M password attempts for each of your users. It could be slightly faster because I'm sure there are quite a few duplicates in 6.5M passwords.
A cross-reference is only feasible in very bad situations:
- no-salt or same-salt and same hashing
- trivial/common passwords (password1 etc)
- password(hashed/unhashed) and email are paired.
A cross-reference could be accomplished for all known cracked linkedin passwords, but this would be no different then you running a dictionary attack of known passwords against your own users... This seems very bad. Enforcing strong but sane password strength rules should mitigate this need.
Cross reference only has value if both the hash and email pairs are leaked.
The bitcoin leak fell into one of these very bad situations:
- [<email>, <hash>] where leaked together
- poor hashing (just sha1, no salt if memory serves)
- unfortunate number of people reuse passwords
LinkedIn passwords are not salted. You can only make comparisons if your database contains unsalted passwords. And if both databases used salted-passwords, then you still can't compare unless you all shared the same salting key.
you'd compare the hashes in your database with those from the file. The users with a hash contained in the file would be notified.
Because the passwords aren't salted(stupid), you might get multiple hits for the same hash(for example, for the good old "1234" password), meaning you might end up contacting more users than actually affected. Better safe than sorry.
i agree, but think about the backlash this would create amongst the userbase. the majority of the users will probably never even realize / read that their passwords have been stolen and thus linkedin probably does best in keeping a low profile about this (and start from now on using a better encryption). this is obviously not in the interest of the users, but it is in the interest of linkedin.
Interesting, I tried this with a bunch of different passwords (though using php's sha1 function, which obviously gives the same output as ruby's), and found no matches. You're using the "combo_not.txt" file from the zip file in the ggp, right?
My password is missing too (if i've done right the hash generation as illustrated above). It's strange that only hashes starting with "000000a9" are present, someone said here that it's just presentation but my hashed password is 40char long as those leaked including the 000000a9
i was talkin about hashes starting with 0000 (i just looked at the beginning and the end of the file). jgrahamc posts is useful, if i dont consider this 0000 (that could be a sign of "ok we've decrypted it" i can find my hash (password was not very difficult)...
Whatever manager it was that tasked some junior programmer (particularly one that didn't know that unsalted SHA1 is a terrible idea) with implementing the password system at LinkedIn needs to be fired. Making the programming mistake means that you don't know much about web security, and while not a great thing, that's forgivable; putting someone that's utterly unqualified for code with security implications on such an important task is not. Nor is letting the code get deployed without having someone that knows what to look for review it. Nor is letting such a bad decision remain live for...what is it now, almost 10 years?
But let's not stop there. There are probably a dozen other people at the company whose job it is to avoid blunders like this, all the way up to the top technical staff. After all, LinkedIn is not, and has not been for some time now, some tiny underfunded startup. It's a goddamn public company, and even before that it was a super-team Silicon Valley darling that was getting money thrown at it since even before tech became cool to invest in again, and it's been valued at over a billion dollars for almost five years now. There is absolutely no excuse for this, they should have been doing regular security audits for years, and no audit worth its salt would miss something this simple. I absolutely refuse to believe that this problem was unknown, that nobody ever commented or filed a bug report about this code - no, this was deprioritized, because it wasn't considered a high enough value problem. And now it's bitten them in the ass and become a problem, probably because some other security vulnerability was similarly deprioritized instead of fixed.
I expect this from some shady Bitcoin market that a high school kid runs off of a server in his bedroom. I do not expect this type of amateurism from a 10 billion dollar company with hundreds of engineers, many of whom have specifically looked over that code, some of whom have probably complained about it, and all of whom should know better than to let it fester...
"I expect this from some shady Bitcoin market that a high school kid runs off of a server in his bedroom. I do not expect this type of amateurism from a 10 billion dollar company with hundreds of engineers.."
Think you might be expecting too much from large companies =/
I wonder, what if this list wasn't leaked from LinkedIn databases, but rather from some third-party service using the "enter your password" anti-pattern? A flaky service like that would likely not be very good at safely storing passwords.
Unfortunately, LinkedIn keeping mum on the subject makes it easy to speculate that it was actually coming from them. Otherwise it'd be easy to deny (and even spin: "How dare you! We never store unsalted hashes, we follow state-of-the-art practices here!!"). Also, their security track record is... embarrassing as it is.
I wonder how many LinkedIn users use the same passwords for all their accounts. The article talks about identity theft and "confidential contacts" but I think the real danger is that people tend to use the same password everywhere. It's their other accounts that might have real value.
EDIT - As I think about it, e-mail accounts would be especially valuable as most of your other sites could be compromised using the "recover my password via e-mail" feature if the hacker could read the resulting mail.
Me. Admittedly, it's stupid as hell, but has generally been too much of a pain to do anything else (for things outside of banking, email). I've started to get serious about KeePass lately, but I bet a significant percentage of users take the lazy approach.
I've developed a system (kept only in my head) where every password I use is based off on the name of the service. This means that with just one of my passwords, you're most likely not getting anywhere. With two, you have a bigger chance of figuring out the differences and thus the system, but it works fine for me at the moment.
I generally use the same password for what I feel are non-critical sites like LinkedIn, twitter and Facebook. Another password for testing new services/apps etc. As a rule any site that may contain my credit card data or sensitive information I use a separate password. I feel this is the best compromise to having complex passwords for each account.
I used this in the past as well. But then started thinking about what non-critical is. As a "internet professional", even my Facebook account being compromised would be negative impact on my image; on LinkedIN doubly so due to it's professional character. So I basically decided that I'm not going to distinguish at all (sliding slope) and just have randomly generated passwords for all sites (not for my Mac though, too much hassle/attack vectors are different).
Safe >> Sorry
EDIT: Just checked, and my randomly generated password is in the leaked list of hashed passwords. I'm not using that same password anywhere else, so the source MUST be LinkedIN through whatever means (or it's some Mac/PC based attack vector, and these folks only leaked LinkedIN accounts which sounds very implausible).
What riddles me though, is how come 6.5 million?
LinkedIn has what, 150M users?
Did they not post the entire load (and are in fact sitting on _all_ the hashes?)
Is the dump an old backup or breach from when they had fewer accounts?
Is it just one DB partition / file that's been lost, an archive?
Given that these hashes are not salted, running a 'uniq' on the list of all users' password hashes would probably already cut it by half, if not more. Then you eliminate all the easy ones from wordlists, and post the remains on the internet for people with excess computing power to bruteforce.
My old password was in the password file, and it was flagged as cracked.
If you're a Windows user and you want to check if your password is in the file.
(1) download the passwords file from http://www.mediafire.com/?n307hutksjstow3
(2) the download is a RAR file, so you'll need to have WinRAR installed to extract it.
(3) to get the sha1 version of your password, go to duckduckgo.com and type:
(4) copy the result, except for the first 6 or so characters
(5) open a DOS command prompt (WindowsKey+R and type CMD)
(6) type (quotes required where indicated): find "sha1hash" sha1.txt
(note: to paste to the command prompt is right-click)
The sha1 hash of the password 'password' is: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8
Remove first six characters: e4c9b93f3f0682250b6cf8331b7ee68fd8
enter at command prompt: find "e4c9b93f3f0682250b6cf8331b7ee68fd8" sha1.txt
Obviously the list was filtered to eliminate duplicates. It contains only what the hackers wanted it to contain. So, why does nobody mentions that it is HIGHLY LIKELY that the user names associated with the passwords (which are actually mainly e-mail addresses for LinkedIn) are also in the possession of the hackers.
So, if I would be the hacker - strip usernames, strip duplicate hashes, post list of unique hashes to let others do the CPU intensive cracking, retrieve cracked passwords, match with usernames (e-mail address), check same password on other accounts (first on the e-mail account, then google the e-mail address on forums or try on the services that interests me and say "forgot password, send it again to this e-mail address - thank you telling me that this e-mail has indeed an account with you..."), monetize somehow the data.
As a user that implies - IMMEDIATELY change your password for the e-mail address used to login at LinkedIn (if it was the same password); verify if settings of this e-mail account have changed (like an additional unknown address added to allow retrieval of the password, DUH),
try to remember where you use the same address either as login or to recover credentials, try to remember where you used the same password, google you e-mail address to help you remember; change passwords; consider abandoning the e-mail address if it is not your primary one,...
Also - did the amount of SPAM that you receive on the e-mail address used to login at LinkedIn suddenly increased, while SPAM remained constant on a similar mail account not connected to LinkedIn ? Maybe someone just sold your e-mail address, so the LinkedIn break may affect you even if the password is not in the list.
Bottom line is - LinkedIn approach appears to be: We have no proof that this particular account was hacked since password hash is not in the list - let's not overreact and let'sassume it is not hacked even if we don't have a clue what was actually hacked. I'm not to judge if it is the best approach for the business, but sure as hell I don't like this approach as a user.
I can confirm that my password was in there. I have changed it. My password was "98mnja6z" which hashes to 6475590bc1407aa98c8b022230292cce3d8528b3. I used this for no other sites, so I'm not concerned about it leaking.
It is inexcusable that LinkedIn hasn't alerted their users yet.
I'm starting to think it might be wise, if you intend to reuse your password on multiple sites, to salt it yourself. By using a form like "<site name><user name><reused password", you protect yourself from rainbow tables without making your username harder to remember.
And yes, yes, I know you shouldn't be reusing your password across different sites, or using a dictionary word anyway. And teenagers also shouldn't be drinking, doing drugs and having sex. It doesn't help anything to pretend that people are going to behave optimally.
Of course, the preposterous restrictions that websites put on passwords, like maximum password length, will make this idea harder to put into practice.
I've been doing this myself and it has worked out pretty well so far. My password is in the list of passwords released, but is uncracked and I can rest assured knowing that I did not use the same password on any other website.
A couple things to keep in mind:
1) The salt you generate should be put at the front in case the website is silently truncating the password to a certain length
2) The salt can be something more complicated than site name. I mentally calculate a fixed length salt based on the site name
3) You may want to still keep two separate "base" passwords, one for high value sites (banks, email) and one for low value sites (everything else).
I sincerely mean no offense but this statement came directly out of your butt. Read the table on page 14 of Colin Percival's Usenix paper "Stronger Key Derivation Via Sequential Memory-Hard Functions" (which you could have found by Googling [scrypt paper]); PBKDF2 is ~5x faster (ie: costs ~5x less to break) than bcrypt; PBKDF2 and scrypt aren't even in the same ballpark.
From exactly where did you derive the idea that PBKDF2 is "extremely good"?
The reality is that all three of PBKDF2, bcrypt, and scrypt are just fine. But PBKDF2 and scrypt have drastically poorer library support than bcrypt; nobody should delay using a strong password hash so that they can optimize which one they use.
Please cite one academic cryptography paper that presents an analysis of PBKDF2, other than Colin's paper which damns it.
There is virtually no "rigorous" research into KDFs of any sort, let alone password KDFs. Most academic crypto research simply presumes passwords are taken from cryptographically secure random number generators and stored securely.
And with that said I want to remind you that I just cited a source, accepted at Usenix, that measured PBKDF2, bcrypt, and scrypt and found PBKDF2 inferior to bcrypt. You seem to want to pretend otherwise.
Django has chosen a fine default and for the next several years it's probably unnecessary to second-guess it. Over time, GPU and (more importantly) FPGA-assisted hash cracking may or may not become more common, at which point you'd want to transition to something like scrypt.
You could literally flip a coin to decide between bcrypt and PBKDF2 and it wouldn't matter which side came up.
I'm not an authority on this, but django_bcrypt is generally considered a best-practice in the Django community. Scrypt may replace that in the future, once implementations are widely available and battle-tested.
Is it unique enough that you can be sure it's your password, and not someone else's? I ask because the cracked passwords seem to be the simple/obvious ones that are likely to be used by multiple people. If it is strong/unique though, it would effectively confine the hack time to the last 2 days.
I didn't see it in the post, but does anyone know if these were current passwords (as of this post)? I use a unique password for linked-in, but some number of months ago I used a password I shared with another site. Wondering if I need to change that one too. Guess I might as well.
Can we please start using BrowserID or some other standard so we can secure that one provider and do away with all this? I'd like it if we could authenticate with Google using 2-factor authentication and be less worried about my password getting hacked.
By centralizing authentication, you make that central provider an even bigger target and you risk losing access to other services as you lose your main account (Google is known to sometimes terminate accounts with no way of recurse).
Finally, when that central provider gets hacked, all your dependent services are now also compromised.
And as we know from the CloudFlare story over the weekend, not even Google with their 2 factor authentication is devoid of issues.
No. Centralizing your login to one third-party as as bad as the current practice of reusing your password for every service you have an account with. The only way that is reasonably safe is to use different random credentials for every service and store these credentials somewhere under your (and only your) control (i.e. a password manager or a piece of paper)
A single point of failure sounds dangerous.
People should just avoid using the same password for different websites. (That's what KeePass is for..) Perhaps a clever extension / browser feature could ensure that. (e.g. "Warning: You are probably using the same password for facebook.com")
My rationale is that it's much easier to secure one provider (the attack surface is much smaller), and you can also run one yourself, making you responsible for all your authentication needs.
OpenID was great in that you could choose any provider you wanted, and nobody could attack them all (not that they'd have to). It just seems like a good solution to use someone whose only job is to provide secure authentication.
Guys, this all doesn't parse for me. My password on LinkedIn was 13 characters long, and included symbols (!@#$%^&&*()), numbers, and alphabet characters. A 13-character password like this would imply a search space of (26 + 26 + 10 + 20) ^ 13 = BIG. If a GPU can check 11 billion passwords per second, this implies that someone ran 2.4 x 10^7 GPUs for a month.
We're either looking at someone with a seriously ridiculous password cracking computer (i.e. ASIC-based -- not even FPGAs), a compromise for SHA-1 (very unlikely), or a keylogger/proxy/trojan/etc... I vote for keylogger.
If your password is in this database, I don't think it's because your password was brute-forced.
My old password is not on the list. However, it seems like somebody tried to log on to windows live with the e-mail address and password I was registered on linkedin with. This is one of my oldest passwords from when I still only had one or two passwords.
I noticed this as window live kept sending another of my e-mail accounts a code needed to log in from an unrecognised computer.
Now it could all be a coincidence, but I wouldn't be surprised if there was a connection, as the e-mail address and the password were identical to the ones used on Linkedin. If that's the case there would be a more complete list with my password/hash as well as the associated e-mail address.
It seems we will never get rid of bad programming like this. I hit the 'forgot my password' link on the T-Mobile website yesterday and the pop-up requested my T-Mobile phone number. Ten seconds later I received an SMS with my actual password in it.
Putting people's personal details on the open web, giving anyone access, including malicious hackers... This design used by LinkedIn, as well as Facebook, was a bad idea from the beginning. Don't think they are not aware of the risks. How much spam and other annoyances do people get as a result? These companies are killing privacy just to make a quick buck. Maybe they'll be sued.
How on earth were they not salting? There are so many open source auth systems now that get all the basics right. Someone who works at a big company like this and has any insight, please comment. How is this even possible in these days?
There's still an unbelievable amount of ignorance out there about how to properly store hashed passwords. There are countless articles explaining that you need to hash the passwords, and telling you how to use md5("salt" + password), and then the blog comments are full of helpful people saying that you should use SHA256, or "no u also gots to add pepper", or exhorting the author to use a large unique salt from /dev/random (not /dev/urandom, it's not random enough) and then encrypt the salts in the database with 2048-bit RSA. I sometimes google around for these articles when I want some morbid fascination -- it's the intellectual equivalent of those YouTube videos where one car crashes into another, and then a third car crashes into the wreckage, and then another car tries to ramp over it and fails, and then everything explodes, and then the people staggering out of the destroyed cars start shouting bad advice about hash functions.
No sign of my password in there http://www.mediafire.com/?n307hutksjstow3, or my wife's. I checked both the full and the '00000' truncated hash for each. Neither of us had changed it for the last couple of years.
So I guess it is only a subset of all the linkedin passwords?
I have now changed my passwords anyway.
By the way, the press say both the username and password were hacked, has anyone seen the list of usernames? They also say 6.4m passwords were hacked but this file only has 6.14m.
According to jgrahamc's investigation, this will probably check if your password is there and is cracked already. To check if the hash is there, although uncracked yet, you should probably remove the sed call from pipeline.
A salt may not have been enough to protect the passwords : if it is not complex enough, the presence of common passwords like "password" or "123456" make a brute-force attack on the salt itself possible in some case. I have performed a benchmark on that point in particular, and was able to retrieve a salt in five days, without strong optimization. A bit long to give all the numbers and code here, so the ref is http://gouigoux.com/blog/?p=46
My password hash was in the file and it was cracked. It was a combination of 8 upper and lower case letters, digits and special characters. This is the case where size does matter and apparently passwords like my old one can be broken on GPU in minutes or hours nowadays.
Quick sample from persons I polled: 2 password hashes were not in the file, 1 was there and cracked, 1 was there and not cracked yet.
As bad as it is, this can be a great case to raise the awareness of good password management.
The good thing is: every time this happens to a high-profile site, storing sensitive data, more people get more acquainted with the concepts of "you really should not use a simple password" and "you really should not use the same password across all sites". I know it works for me: this was the last straw that forced me to abandon a good ol' password I've been using since 1998. From now on I'll just rely on password managers (currently DataVault, but I know people who swear by LastPass).
My password isn't in the file, and yes I checked for a 0'd version as well. My password is 9 characters of lower case, upper case, numbers, and a symbol. I'm wondering if this is incomplete, or fake. Either way...if it is a vulnerability I suppose LinkedIn hasn't fixed it yet, or at least I haven't heard mention of this - thus even changing your password won't help much if they can just re-download the database. Thus making a long, complex password is the best course of action.
My belief is that the hackers might get the username password combos, but they grouped the hashes to only have unique (sort -u ?) passwords hashes and therefore ease the process of dictionary cracking them as they do not have salts.
The 00000 prefix might be an indication of this. I bet there is an automate script taking care of a dict attack and the file was released during execution.
I've come to the conclusion that this list is genuine. While some people have said that they could't find their passwords in the list, I think this only points to the most probable reason in that this is a part list.
I deleted my account over 6 months ago but my password hash (strong unique password) is in the file. Either (a) the file retains passwords of deleted accounts, (b) the file was stolen over 6 months ago and LinkedIn didn't know about it, or (c) the file was stolen over 6 months ago and LinkedIn DID know about it and were hoping it wouldn't show up online.
How does this benefit someone who is trying to access an account? There are no account names tied to these hashes. So even if you managed to find the clear text of each of these you would still be in a position where you have a list of over 6,000,000 passwords to work through in order to brute force your way in.
http://pastebin.com/JmtNxcnB - 20k++ sample cracked passwords from LinkedIn hash dump released on June 6, 2012. They do appear legit and strong too. It's unfortunate that LinkedIn hashed them using unsalted SHA-1.
I'm not an expert in the field but from what I know, SHA1 is a one way function. When an encrypted password is cracked, YES, the hackers know that specific password. They brute forced it by guessing the password, running it through SHA1, and comparing the output to the hash. If they are the same, then they guessed the right password.
They do not know any other passwords and if "salt" was used, they would have to brute force each password. I think salt wasn't used in this case so once they crack someone's password, they know every other user who used the same password. So if you and I used the same password, and they brute forced yours already, they will know that I have the same password.
"Cracking" in this sense is brute forcing. SHA1 is fast, and people use bad passwords. The combination means that you can run through lots and lots of bad passwords very quickly. I checked my linked in password I have stored in 1password, and it is 20+ chars with special characters and numbers. That won't be "cracked" in any meaningful sense, so I don't even worry about it.
You are correct that there's currently no way to go from a hash to a value that hashes to it in SHA1 (AFAIK, IANYNSA [I am not your NSA]).
Those paranoid tinfoil-hat wearing lunatics that generate absurdly long unique random passwords for every site are wringing their hands with glee because they found the hash of their LinkedIn password in the file. You're welcome.
As much as I hate lawsuits, I'd love to see one or two major Internet companies sued in a class action lawsuit for negligence to serve as an example and a warning to the rest. This kind of behavior from a top tier internet presence is inexcusable!
A password I used many months ago (maybe almost a year now?) was in the list, but the password I use currently for many months was not on the list interestingly enough. This list is possibly pretty old, which means it happened quite a while ago.
My LinkedIn's password hash (at the time, I changed it once news broke) was not listed. And it was a relatively weak password (8 characters, just lower case characters and numbers). I doubt this is LinkedIn's password dump.
Even after securing our own passwords, we are all still vulnerable to attacks where the attackers simulate members of our networks to discover private information like our connections, job history, etc.
On my own accord and not my employers, I'd like to invite developers to check out mojoLive as your career management tool. Our goals and vision are light years ahead of what LinkedIn has slowly become. Also, I dislike recruiters and spam.
Assuming LinkedIn used SHA1 unsalted passwords and will continue to do so, and many of us do not want to delete our LinkedIn accounts, what should be the minimum number of characters we should use in our new password? 15? 20? 100? (I know, 100 is probably higher than they allow)
In fairness to Twitter, it was never actually known if the accounts/passwords came from Twitter.com (proper) or (more likely) leaked from some 3rd-party Twitter-integrating app that had pre-OAuth integration.