Hacker News new | comments | show | ask | jobs | submit login
6.5 Million LinkedIn Password Hashes Leaked (translate.google.com)
561 points by ssclafani 1751 days ago | hide | past | web | 512 comments | favorite



Some observations on this file:

0. This is a file of SHA1 hashes of short strings (i.e. passwords).

1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.

Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.

  5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 is not present
  000001e4c9b93f3f0682250b6cf8331b7ee68fd8 is present
Same story for 'secret':

  e5e9fa1ba31ecd1ae84f75caaa474f3a663f05f4 is not present
  00000a1ba31ecd1ae84f75caaa474f3a663f05f4 is present
And for 'linkedin':

  7728240c80b6bfd450849405e8500d6d207783b6 is not present
  0000040c80b6bfd450849405e8500d6d207783b6 is present
2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.

3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.

4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...

5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.


For the security novices amongst us: I had no idea how to do this so I figured out a quick python script to test it:

    >>> from hashlib import sha1
    >>> def check_pass(plaintext, offset=5):
    	hashed = sha1(plaintext).hexdigest()
    	return (hashed, '0' * offset + hashed[offset:])

    >>> check_pass("linkedin")
    ('7728240c80b6bfd450849405e8500d6d207783b6',
     '0000040c80b6bfd450849405e8500d6d207783b6')
Edit: I'm pretty sure JtR refers to this: http://en.wikipedia.org/wiki/John_the_Ripper


Obligatory perl one-liner:

  perl -MDigest::SHA -le '$h = substr( Digest::SHA::sha1_hex($ARGV[0]) , 5 ); open F, "<combo_not.txt"; do { print "found $_" if grep(/$h/, $_) } while (<F>)' password
(for people without shells)


Obligatory shell one-liner:

  grep `echo -n password | shasum | cut -c6-40` hacked.txt


Prefix the whole command with a space to avoid dumping your password into your bash history: " grep `echo -n yourpassword | shasum | cut -c6-40` SHA1.txt"


Only if HISTCONTROL is assigned 'ignoreboth' or 'ignorespace'.


Or prompt for it:

   grep `read -sp "password: "; echo "$REPLY" | tr -d "\n" | shasum | cut -c6-40` hacked.txt


I couldn't really find a good reason to use a .bash_history. I linked mine to /dev/null and never looked back. (heh)


Ctrl+r history search? I'd tend to maintaining a complete history log so that when I've forgotten the one liner I used to rotate my videos 2 years ago I can easily recall it.


2 years? Just how big is your history file?

I thought 16k entries might be reasonable but that doesn't even last 3 weeks for me. I think there might have been some issue with slow disk seeks so at some point I restricted it to that many.

I guess it probably it would be better to regularly backup the history file to deal with possible some accidental truncations and issues when running multiple shells concurrently, but probably the overall effort to set up such a system would outweight the benefits.


export HISTSIZE=0


Alternative and more dramatic method of preventing it being written to your bash history:

  kill -9 $$


kill -9 -1 is better than kill -9 $$


How so?


That post was a troll. -1 is a special PID: It indicates that all processes that you can kill should be.

Kill -9 -1 as root is a surefire way to make a system stop doing anything, fast.


Here's node.js:

    $ echo linkedin | xargs node -e "var x = require('crypto').createHash('sha1').update(process.argv[1]).digest('hex'); console.log([x, '00000' + x.substring(5)]);"

     7728240c80b6bfd450849405e8500d6d207783b6
     0000040c80b6bfd450849405e8500d6d207783b6


Or you could just feed "sha1 <password>" to the duckduckgo.com search box and it will give the result.


Some people have this thing against sending their private passwords in plaintext to third-party websites...


You're sending the hash, not the password.

DDG supports SSL: https://www.duckduckgo.com/

If you want coverage, generate a few hundred thousand SHA1 hashes along with your password.

Actually, running a trickle query of random SHA1 hashes from your box might be a fun exercise, along with a trickle query of random word tuples (bonus points for using Markov chains to generate statistically probable tuples).


If you search for 'sha1 foo', that's being sent across the network to DDG's servers. And sure, if you're using SSL then it's not going across in plain text, but it's decrypted and handled on their servers in plain text; it'll probably even end up in logs and/or tracking databases somewhere. You're giving DDG your password.


A hash is not a password.

At worst you're giving the attacker a hash target to try brunting. He still has to brute it, and that takes time. Select your plaintext from a large enough keyspace and it's astronomical time.

I'll need to review their policy more closely, but DDG claim fairly minimal tracking. At best someone might be able to correlate hash lookup with some IP space. That's a long way from handing over passwords. And as I already indicated, you could cradled the queries to make the search space much larger.


No, no, no. You're 100% completely misunderstanding this.

When you search for 'sha1 foo', that query ("sha1 foo") goes up to the server. They know your password is "foo" and that you're attempting to "sha1" it. They don't have a hash, they take that data and perform the hash, then send that down to you.


Boggle.

OK, gotchya.

I guess I'm just too damned used to using systems that, you know, have useful tools installed locally (or can get them there really damned fast). Including SHA1 and MD5 hash generators.

And I was all worked up to tell you how wrong you were still being.

All because I couldn't fathom the possibility let alone reason anyone would need a third-party site to compute their hashes for them.

Silly me, my error.


Well presumably you've already changed your LinkedIn password, so what's not to send?


Challenge accepted (although this is pretty crude)

curl -s -d q="sha1 password" http://duckduckgo.com | w3m -T text/html | grep '\w\+\{32\}'


Hi - what does

" xargs node -e "

do?

Thank you


[node -e] evaluates a line of node.js source from a command line argument:

    $ node -e "console.log('Hello, world.')"
     Hello, world.
[xargs] allows you to pipe the output of one command as an argument to another command. By default it will show up at the tail end of the second command's arg list, but if you want to interleave it you can use -I flag:

    $ echo /usr/share/dict/words | xargs head -5

     A
     A's
     AOL
     AOL's

    $ echo petard | xargs -I {} grep {} /usr/share/dict/words
     petard
     petard's
     petards
[xargs node -e] therefore allows text from STDIN to inserted into a script to be evaluated by the node interpreter, accessible via process.argv:

    $ echo is dog this yes | xargs node -e "console.log(process.argv.slice(1).sort().reverse().join(' ').toUpperCase())"
     YES THIS IS DOG


head -5 /usr/share/dict/words

same result as with xargs

grep petard /usr/share/dict/words

same result as with xargs

not sure what you are trying to demonstrate here

useless use of xargs?


No, he is trying to demonstrate how to use 'xargs node -e'.

Are you even reading this discussion properly or are you just searching for some shell snippets and ridicule them as soon as you get a chance? This is what it looks like from your history: http://news.ycombinator.com/threads?id=uselessuseof

ionwake doesn't want to learn how to search a word. He wants to know how 'xargs node -e' works. Please read this again: http://news.ycombinator.com/item?id=4075293


The perl one liner was funny, the shell one liner was light hearted, but your node solution is just pure fanboyism and quite frankly not in line with the spirit of the two previous posts.


And.. the node.js solution doesn't do what either the Perl or shell one liners do. It doesn't tell you whether the password was found in the file. All it does is print out a SHA1 hash of a string.


That's a trivial modification:

    $ echo linkedin | xargs node -e "var x = require('crypto').createHash('sha1').update(process.argv[1]).digest('hex'); console.log(x.substring(5));" | xargs -I {} grep {} hashes.txt
I'm surprised at the backlash to what I thought was fun code golfing. No one called me names after I posted a simple Python solution that didn't check the file. For what it's worth I've changed my LI password and I haven't bothered downloading the actual hash file.


If I post a PHP solution maybe zxcvb will get a heart attack.


node has a neat API for quickly knocking out stuff like this; it's a useful tool for more than just server code. Calling that comment fanboyism is just displaying the opposite of fanboyism, prejudice against hyped-up tools that nevertheless are good tools.


My point still stands. There's funny and then theres blatent fanboyism. You're like a prepubescent teenager who doesn't understand the context of social situations so always says something stupid.


"Which brings us to the most important principle on HN: civility. Since long before the web, the anonymity of online conversation has lured people into being much ruder than they'd dare to be in person. So the principle here is not to say anything you wouldn't say face to face. This doesn't mean you can't disagree. But disagree without calling the other person names. If you're right, your argument will be more convincing without them."


Some people actually do call names to others when face to face.

Personally, while I don't, I do tend to get a little aggressive and then I'm often surprised with the backlash, because I get that way when I'm genuinely enjoying the conversation, not when I'm irritated.


Tone doesn't carry on the Internet, so no one knows you're enjoying it. Hence, it generally degrades the quality of the conversation, which is the opposite of what we want at HN.


No, I'm saying I do that face-to-face, and people still can't tell I'm enjoying it. So the tip to say nothing that you wouldn't say IRL is useless to me; I just can't help it.


"You're like a prepubescent teenager who doesn't understand the context of social situations..."

The hypocrisy is so unabashed my brain might explode.


Pot, meet kettle.


obligatory comments

- not portable

- useless use of backticks

printf password|openssl sha1|cut -c6-40|grep -f - hacked.txt


Why are you extracting 35 characters with 'cut -c6-40'? SHA1 produces a 160-bit message digest. That's 20 bytes or 40 hex-digits.


typo.


Shorter and, IMHO, a bit simpler Perl one-liner:

    perl -MDigest::SHA=sha1_hex -le '$h = substr( sha1_hex(shift), 5 ); open F, "<combo_not.txt"; print "found $_" for grep /$h/, <F>' password
Or:

    perl -MDigest::SHA=sha1_hex -lne 'BEGIN {$pw = shift} $h = substr( sha1_hex($pw), 5 ); print "found $_" if /$h/' password combo_not.txt


The first one ramps up memory use like crazy (which I was trying to avoid) and the second one is much better with memory, but you need to move the sha1_hex into the BEGIN block or you're recomputing the hash for every line parsed, thrashing your CPU. Interesting use of 'shift' though, I didn't know you could modify the file argument to -n like that.


You might compare many words at once (say from a popular password list such as rockyou) like this:

while read line; do echo -n $line | sha1sum | cut -c6-40 | awk '{print "00000" $0}'; done < rockyou.txt

I haven't tested that, but I think it'll work.


By sheer coincidence I had a chance to use Perl again today for a job interview.

I now have a good appreciation of why it's considered a "Write once, read never" language. :)


Amsterdam? ;)


If you're paranoid about shoulder-surfing you can use getpass to hide your password as you type it in.

    >>> import getpass
    >>> password = getpass.getpass('Password: ')
http://docs.python.org/library/getpass.html


Command line utility I wrote which uses getpass: http://dpaste.com/hold/756011/


A complete python script assuming you have hashes.txt in the same directory.

http://dpaste.com/756007/


Just tried your code and it seems that my password has been cracked. Glad i changed it this morning now


Any password that I try works...


I have found one case where both types are present.

grep `echo -n l1nked0ut | shasum | cut -c6-40` combo_not.txt

    000000afef5f2ba94b104126d04db1837f423816 
    e7bf10afef5f2ba94b104126d04db1837f423816


How many hashes are present in both stripped and unstripped form?

  $ cat combo_not.txt |cut -c7-40 |sort |dups |wc -l
  670781
That's ~10% of the total.


another useless use of cat

cut -c7-40 combo_not.txt|sort|dups|wc -l

what the heck is dups?

cut -c7-40 combo_not.txt|sort|uniq -d|wc -l


Yeah I'm aware of http://partmaps.org/era/unix/award.html#cat and choose to continue writing my scripts this way. My commands look more symmetric at the prompt, and are easier to manipulate.

dups is indeed a little helper of mine. Like uniq it only handles sorted input. Update: I see you edited your answer to include uniq -d. I wasn't aware of the option, thanks. Now I can simplify the implementation of dups. But I find the name valuable, and I think it's perverse to say uniq when you mean its opposite.


symmetric?


Each pipe stage reads from the left and writes to the right. The eye goes left to see the input and right to see the output if it's redirected to file.

The input file is reliably the second word, so C-A M-f gets me to it if I want to operate on a different file. !!:1 gets me the file if I want to use it in a new command.


echo abc > file

1. cat file

2. cat < file

3. echo abc|cat

4. echo abc|cat - file

cat can take input from the left, the right, or both

same goes for cut


I'm not sure what you're suggesting. I'm supposed to echo |cut ...? But I have a whole file, not just one line. So I have to cat ... |cut ... -- which is what I did. So what's your point?

I could keep the file first by saying:

  $ < combo_not.txt cut -c7-40 |sort |dups |wc -l
To which I reply, "Yuck!"

Perhaps we should stop here. You seem to have made this account just a few hours ago for the express purpose of poking at people's code fragments in this thread. You're making stylistic nitpicks (they don't affect correctness, do they?) and you're making them in a tone that I'm not sure I would take from Randal Schwartz himself (you actually edited http://news.ycombinator.com/item?id=4076556 to be ruder than the original). It's a drag, man.


cut takes a file as an argument. there's no need to start the line with <

   cut -c4-70 combo_not.txt|...


But that's where this conversation started out. My response the last time around: http://news.ycombinator.com/item?id=4076674

BTW, HN has some formatting support: http://news.ycombinator.com/formatdoc


I disagree with #5, I had a few of my coworkers check their sha1 against the DB and most of them were not in the dump. I also checked for truncated hashed, none of which were found. I have the feeling this is a subset of the full database


I don't really see a purpose in hiding my password. So, as a counterpoint, my password is in the list. This is my LinkedIn password:

AxEWS9rg5V

This is the sha1:

caf28fcc9c3e4d88b830b8e5cc52c5b65d3db5f4

It is found in Line 3612910 of combo_not.txt. I believe the file is authentic.


So I have a funny wild theory...remember back when the Gawker database was compromised? And LinkedIn forced a password reset for users who (according to what I read) used email addresses that matched the Gawker leak?

What if they also (or actually) compared password hashes from their database to the ones released in the Gawker breach? In that case, they likely wouldn't have pulled data straight from the database but actually might have pulled passes from the db, output to text files, cut the text files up to parcel out for processing via Hadoop or something? And somehow one of those text files got loose somehow...or someone MiTMed the actual process (I'd vote for a floating text file just because it's been so long; the Gawker breach was in December 2010).


on another note,

my fairly complex alphanumeric+symbol password IS in the dump, though not prepended truncated with 0's and the other one I found, which my coworker admitted was too short and alpha only, was in the dump with prepended 0's.

This could validate the fact that the truncated hashes are actually already cracked.


Mine was 5 characters, alpha and numeric, but no special characters. It was in there, prepended with 0's.

Whoops.

At the very least, it should have been longer.


Same here - mine was all alpha characters, seven characters, and the hash with five 0's was in the file. Guess who just changed their LinkedIn password today? And included some numbers?


Another datum: the hash of my password (randomly generated 8 character mixed case alphanumeric) was in the file, without any overwritten 0's.


My password is in the dump. I use the Forget Passwords Chrome extension [1], which is based on pwdhash.com, and generate site-specific passwords based on a master password -- i.e. my password is only used on LinkedIn and it's unlikely that I share it with someone else.

I think I have changed to this password during the last year.


My linkedin password of at least 3 years was not in the dump. So it must be a partial...


Mine is there.

(email me if you need proof)


Another data point:

I changed my linkedin password about three weeks ago. The old one is in the list (already 00000-ed), the new one isn't.


My (very unique) password hash is in the list, although unbroken so far.


Sorry for the stupid question, but where did you guys find the list of hashes? I didn't see it linked in the article.

Edit: found it in the Slashdot comments, it's: http://www.mediafire.com/?n307hutksjstow3

For the record, my password's hash was not in the list.


I think they're getting removed. I posted a link from the original source, but it's since disappeared.


I don't know if I have the correct file: http://www.mediafire.com/?n307hutksjstow3

mbf041:Downloads shephard$ wc -l SHA1.txt 6143150 SHA1.txt

My password hash which was last rotated July 5, 2011

   .,7^R8Cl}g1}Ze6f
Was _not_ found in the file (with/without 00000). I have, of course, changed it today. Strangely enough, the previous password is also not in the list.


Don't know if this adds anything, but both my old password (created eight years ago) and current password (changed six months ago) were on the list. Both were very unique - 20 characters mixed.

Need to get better at changing my PWs every three months. It's really not that hard, just a matter of discipline.


My old password was in the list, but not my newer password. I changed it about 2 years ago I think.


Hmm. My truncated password (for my now-deleted account) is not in the list of hashes -- so it's not just a uniq'd full DB. Also, the original forum thread where the file was first posted only managed to break around 600,491 passwords before it went offline ... so 3,521,180 broken passwords could mean that the original hacker has had access to some LinkedIn accounts for more than just a few minutes today.


Same here. My password is not in the list and I've had a LinkedIn account since 2003. I probably changed my password about 18 months ago. Neither that nor the previous one are on the list.


My password is not in the list, not idiotic but not super-hard . I doubt this is the full list. I hadn't changed mine in years, so maybe this is from a certain period of time?


I've had the same password on linkedin for as long as I remember and neither the full hash nor the zero prefix edited was found in the dump.

Simple line used in OS X terminal:

grep -e "`echo -n "your pass" | openssl sha1`" combo_not.txt


May want to grab the last characters as the cracked passes have 00000 at the beginning:

i=`echo -n 'mypass' |openssl sha1 |echo ${i:14}`; grep $i combo_not.txt

This yielded success on some known passwords and a bunch of obvious passwords. Not mine, but I assume this dump is a list of the passwords they've cracked so far (i.e., even if your password isn't on this list - change it).


If your password was 'linkedinsucks' then it sucks because they found it already !


Correct!

  527688fa9f32bb8dab32d30807ca5c57a0b203b8 is not present
  000008fa9f32bb8dab32d30807ca5c57a0b203b8 is present


Here's some they didn't find, from /usr/dict/words: Paraná, Zürich, attaché. Not sure of the encoding, but I'd guess UTF-8.


My not so strong password is not in the list, spacex12, and Ive checked if it was already cracked by the prefix of 00000, nope.


Also if it was "linkedin"

7728240c80b6bfd450849405e8500d6d207783b6 not present

0000040c80b6bfd450849405e8500d6d207783b6 present

or "facebook"

cbe648909034c0624c205fe219d3fbd10052c715 not present

000008909034c0624c205fe219d3fbd10052c715 present

or google

759730a97e4373f3a0ee12805db065e3a4a649a5 not present

000000a97e4373f3a0ee12805db065e3a4a649a5 present


I have found hashes of linkedout, recruiter, recru1ter, googlerecruiter, toprecruiter, superrecruiter, humanresources and hiring.

If it is a hoax, it is a very elaborate hoax.


Perhaps it's a DDoS on MediaFire! /joke


Good posted, upvoted. One clarification:

> That's 25 users per hash

Password choices are probably Zipf-distributed, so averages don't make a ton of sense.


It does if you're trying to estimate the size of the corpus based on the number of users.

The arithmetic mean is specifically the value you'd want. n users times m users/password == total passwords (unduplicated) in the LinkedIn database.

Zipf distribution would suggest that the pattern of reuse among passwords isn't normal, and that the median and mode are probably higher than the arithmetic mean.


My password also doesn't appear to be in the list, so I doubt it is the complete/current file. I used this python to check, in case anyone else wants to use it:

    from hashlib import sha1
    f = "combo_not.txt"
    hashes = [x[0:40] for x in open(f)] # [0:40] to stripe off \n

    # From another comment
    def check_pass(plaintext, offset=5):
        hashed = sha1(plaintext).hexdigest()
        return (hashed, '0' * offset + hashed[offset:])

    print check_pass("linkedin")[0] in hashes # -> False
    print check_pass("linkedin")[1] in hashes # -> True (sanity check)

    myHash, myHashBroken = check_pass("plaintextoflinkedinpassword")
    print myHash in hashes # -> False
    print myHashBroken in hashes # -> False


Mine was not in the list. It's also possible this isn't the entire file. I was also able to recover 225129 other passwords with a wordfile and some Python based on truncated and full hashes.


> Evidence against that thesis is that password of one person that I've asked is not in the list.

Mine isn't in it.


Neither is mine.


A stock JtR 1.7.9-jumbo5, using the default rules, is finding quite a few of the non-zeroed ones pretty quickly. This surprises me; I would have expected them to have run the list through the JtR mill before passing it on to others.


The list of cracked hashes is almost certainly not complete, one can conclude from this fact.


Got a link to the file? I haven't been able to dig one up


The hash of my password, set when I joined on October 10 2011, appears not to be in the list. Changed it anyway.


Likewise, my password (MybXy836YCza), which wasn't used anywhere except my LinkedIn account created 29-Jan-2012, and has been stored securely at my end, wasn't on the list (either as a full SHA1 sum, or as part of the SHA1).

As you probably guessed from the fact that I posted my old password, I changed it just in case the list that was shared is only a partial list of what was obtained.


Nice observation dude, Can u please share the password file I dont have it anywhere. Thanks


So where is the list? I'd like to see whether I'm on it.


fwiw, this could also be an elaborate hoax, given this facts.

E.g. a list of simple password + combinations of the above simple password+"linkedin" variations.


I have a very unique strong password on LinkedIn, and it is on the list. Given that, this is no hoax.


Same here. Sucks too, because I liked that password.


My complex unique password is also on this list (full hash no 5 0's). So nope, not a hoax. Unbelievable/insulting they didn't even bother to salt.


Yeah, even I, a newbie Rails programmer, going through the Agile Rails book learned how to salt. It isn't rocket science.


It shouldn't just be a salt. It should be bcrypt.


Do you remember when you first used this password at LinkedIn? It could help narrow the dates of the breach. Especially useful would be the presence of a strong password in the list that was subsequently changed. That might help determine its freshness, if the new password isn't present (although this may be an incomplete list from an ongoing breach).


I'm thinking this list is from closer to a year ago, I changed my password shortly after the MtGox hack last year and this hash is for my old password that was compromised during that time period.


My password is in the dump, and it was changed mid October 2010. I remember because I changed all my passwords when my laptop was stolen.

The MtGox hack was in June 2011.


It was about a year ago now. I checked the hashes for my previous password and it wasn't on the list... Mind you, as many have noticed, it seems to be very incomplete.


Unbelievable/insulting they used a general purpose, easily reversible hash like SHA1 in the first place. I would have thought everyone had seen the 'use bcrypt' page by now.

http://codahale.com/how-to-safely-store-a-password/


Since when is SHA1 easily reversible? Did I not get the memo?

Salting should have been fine.


I couldn't find my password on the list and I've been using the same password for LinkedIn since I registered. I was trying to remember when was that. If someone know how to find out the last time you changed your pass or when you registered for linkedIn please let me know. I'd guess I use linkedIn for over 4 years at least.


A "member since" date is available on the "Account & Settings" page. Choose "settings" in the drop down that appears when you hover over your (account) name in the upper right corner of any LinkedIn page.


I agree, I've tried several passwords and they match. If you're a Math person, please shed some light on the chances that this list covers the full space.


I'm not a math person either, but here's some fodder for someone who is.

Mark Burnett's extensive password collection (which he acknowledges is skewed, because it's largely based on cracked passwords, he only harvests passwords between 3 and 30 chars, etc.). Here's how some of his stats shake out:

* Although my list contains about 6 million username/password combos, the list only contains about 1,300,000 unique passwords.

* Of those, approximately 300,000 of those passwords are used by more than one person; about 1,000,000 only appear once (and a good portion of those are obviously generated by a computer).

* The list of the top 20 passwords rarely changes and 1 out of every 50 people uses one of these passwords.

So it's conceivable that 6M unique passwords could cover a very significant portion of a 120M user namespace.

Ref: http://xato.net/passwords/how-i-collect-passwords


It's neat that the hashes are unique enough to serve as their own key. Obvious in retrospect, but still neat.

Curious why some of the hashes have been obscured with 00000 but not all. It means more than one possible password could generate the remaining characters, but what does that help or protect?


6.5 million? Off the top of my head, assuming that passwords are only letters and 5 characters long this still wouldn't cover the possible space. [I think it's safe to ignore hash collisions]

Are you trying passwords you've used on other sites, or random ones? If it's the former, then LI might not be the only source for the file.


0. There are known cases of peoples' passwords (including my own) not on the list.


"We were curious what would happen to our share price if our company did something incredibly stupid"

The above comment might seem incredibly harsh, but really, there's no good excuse for a site this prominent to not have a salted, secure password hashing system. Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.

The only way I could regain respect for LinkedIn is if we find that these unsalted hashes were from users who never logged in to LinkedIn after the security upgrade. From the replies of other HN users who have found their password hashes in the leaked list, this doesn't seem to be the case though.

I can understand database leaks. Bad things happen. Not being prepared for such an event however is where I draw the line. These leaks impact users far beyond just the site at fault.

It's not enough to say users should use LastPass. They don't, and that's the world we live in, for better or worse. If computer security doesn't take into account problematic users, then it's flawed computer security.


Surely just hashing the username|password would massively reduce the effectiveness of leaks like this? Sure, a hacker would know what the "salt" is, but since it now varies between users you would expend the same amount of effort breaking one person's login as you previously would spend breaking everyones (on average).

(Not recommending it, just wondering if my reasoning is correct.)


I hear this commonly, so it is a good idea to clear it up.

Usernames have lower entropy than a random salt and are predictable in many cases. People re-use usernames and some usernames are common. If your password system became common on the web, or if I knew the workings of your password system (i.e. open source / leaked codebase / Kerckhoffs's principle[1]), I could generate a rainbow table for either common or targeted users. This means I could generate a rainbow table for "Jabbles", gain access to your password and compromise your account before the website is likely even aware of a breach or has time to warn you. Salts only act to slow down, not prevent, compromising leaked password hashes (as you can always brute force which is quite practical with MD5/SHA1). Thus, using a username defeats one of the stated purposes of salting.

It's also said ad nauseam (with good reason) but rolling your own in security is a bad idea, especially when libraries exist that do exactly what you'd intend to do just as easily. Algorithms such as bcrypt and scrypt exist and are well vetted. bcrypt is easy to integrate with many languages and provides a trivial interface and sane defaults for iterations/rounds [brute force] and salts [rainbow table]. bcrypt can also handle increasing the security of your system over time as the metadata is stored as part of the hash.

tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.

[1]: http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle


Bcrypt takes two lines of code to securely test passwords and two lines to created the hashed password, both of which come in the documentation.

There is every reason to use it and none not to.


Often people say "Don't roll your own security" but the reality is that developers aren't trying to roll their own. They are trying to solve a problem, and if a quick google doesn't turn up a good library then they'll try and figure it out. Googling for password security implementations is likely to be fraught with horrible horrible advice.

I guess what I'm saying is that it's not enough to say don't do it, instead the defaults need to be there (and very visible).


I think we've reached a point with bcrypt that a good secure password system is within reach and comes with sane defaults and ease of use as features for most programming languages.

If it's just an issue of getting the word out there, then I'm hopeful things can improve.


You need more than just bcrypt. You've hinted at other things, but a few random things popping in to my mind:

  * Preventing password logging (many web frameworks log parameters)
  * Secure password recovery
  * New alternative attack vectors (eg. Facebook, Twitter auth)
  * XSS and CSRF
There are so, so many simple to make security errors, and worse - many of them are inter-related so that forgetting one will make another vulnerable. This is why you need safe defaults and more Security education.


A strong password hash doesn't gate on any of those things, so, while you do indeed need to pay attention to them, you don't need to pay attention to them before you deploy a strong password hash.

You should deploy a strong password hash immediately.


True point and this is probably off topic, but out of curiosity, what is the recommended approach for his point about logging messages/requests?

On previous projects, we've gone through all sorts of machinations to detect a password in our SOAP logging. This usually involves XML parsing (slow, ineffective on malformed messages) and Regexes (ineffective on malformed or "unusual" messages).

I can't think of anything better, short of "you can't leak what you don't log" which is nice in theory but not always practical.


There are defaults bcrypt and PBKDF2. There is no excuse for anyone to do anything less than salted hashes even if the decide not to follow bcrypt or PBKDF2.


Having a password salted with the username fairly easily balloons out the complexity of building and searching a rainbow table by a factor of the number of usernames you want to be useful for. This factor is larger then you'd expect, given the sheer quantity and variety of usernames in various systems.

For a targeted attack it really doesn't matter as the time complexity to produce the rainbow table is equivalent to that of simply brute forcing the hash, ie, you can't say 'well assume the rainbow table contains only some small number of usernames"...

It also is entirely unlike the WPA2 rainbow tables in that you don't have millions of users all sharing the same username (ie. factory default SSIDs).

Overall it's more secure then it seems at first glance but you still have to ask yourself why you'd use that over a random salt.


The targeted attack does matter though, for the reason I pointed out above.

I can produce a rainbow table offline before I compromise the targeted system as I know the username of my target. This is not possible if the salt is random. This means I can crack a targeted user's password hash _instantly_ upon gaining access to the system.

With a random salt, you can only perform the brute force attack on that targeted user _after_ you've gained access to the system and likely alerted them to a compromise.

If the response time of the compromised system and team is a factor, this means using a username as a salt compromises your security greatly.

tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.


Sure you can, assuming:

1) You know the hash function beforehand 2) You know that they are salting in exactly this way 3) You know how they are doing their salting (HMAC vs., vs.) 4) You have enough time to create this new rainbow table 5) You have only just enough access to the system to dump the hashes (ie. the easier routes are blocked off from you)

That would in fact, with some probability (based upon the complexity of your rainbow table and the complexity of the users password), give you the passwords for a particular set of users.

I did say that it was more secure then it seems, not that it was perfectly secure :)


While not entirely random, would a "date based" salt work as well? Say, the date that the entry was added? This would still negate rainbow tables as a specific user entry needs to be targeted.


It would probably work well enough, but... why not just add a proper random salt field that isn't tied to anything an attacker could guess? Is something like 8 bytes per user too expensive?


Perhaps I'm missing something but... wouldn't you still need to store the random salt field somewhere in the database?


Remember salts don't need to be secret to do their job. The goal is to change the algorithm slightly (by adding additional input) for each user. That means you can't mass-precompute (rainbow tables), and just look up what matches, you have to break each user individually.

Your reasoning about how salts work is correct.

There's also something called a pepper which is another additional bit of input data, that is only stored in the app code (fixed for entire app). So an attacker who only manages to get a database dump would need to guess yet another chunk of data (making it near impossible). So a well-seasoned hash would be SLOW_HASH(pepper+salt+password).

Security is all about layers. Each layer protects a bit more, or prevents things from being easy for the attacker.

Edit: Don't do this yourself. Know it for the theory part - but then just use a well-vetted library to do it.


Please refer to my comment above. You can precompute a rainbow table if you know the username (trivial) and the method of hashing[1]. Whilst usernames as salts would increase security over no salt, it results in a potential exploit / vulnerability that would not exist if the salt was truly random. Hence, suggesting the use of usernames as salts is not wise.

[1]: http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle


I read cschneid's comment twice, and nowhere to I see where he or she specifically recommends using the username as a password; he or she simply recapitulates the logic behind using a unique salt value for each stored hash, and describes using an additional non-unique value which is not stored with the passwords ("pepper"), which is a new and interesting idea, at least to me.


Re: pepper - The devise plugin for Rails uses it. The idea is that the attacker must now steal both the app code AND database, which are often on separate servers.

Just make their life harder.


It would make it a lot easier for LinkedIn to identify whose hashes were leaked because with a salt, all passwords would be unique. It would also make rainbow tables useless.

But in this day and age, the bigger problem is how fast you can compute the hashes, salt or no. With GPUs you can calculate a few hundred million(depending on the hashing algorithm) per second, making the algorithm used the real vulnerability.

Best practice involves increasing the calculation time of you're algorithm. Theoretically, you could just rehash y few thousand times in a loop, throwing in a salt here and there, but practically, you should just use bcrypt or scrypt.


A few hundred million? Try in the billions. Like 33.1 Billion/s for md5. http://blog.zorinaq.com/?e=42

This is why you don't use really fast hashes for passwords and you iterate (key stretch). Bcrypt like you said.


Please don't downvote posts like the parent. It's a legitimate comment, asking a question, if you have something to say please reply.


In a password hashing scheme with a salt, you're supposed to consider everything except the cleartext to be public, for the purposes of analysis. The password should be unrecoverable even if the attacker knows the algorithm and any salts.


It's true that that would be an improvement, however we try to avoid discussing things like that seriously because of the risk that someone new to the game will actually try to do it. The easy answer is to use an out-of-the-box secure password strategy, anything else is adolescent.


We've just checked everyone's passwords around the office. One of them was in the list, and he has accessed the site the past month.


Could be that he shared a password with another account that hasn't? Wishful thinking most likely.


Regarding requiring users to log in; wouldn't it be better to run their current hash through another password hashing scheme (while we're at it bcrypt, scrypt, PBKDF, etc)? Then, the next time they log in, verify them by running their password through the old algorithm, and the result through the new one.


That could be a good transition strategy if you're worried about being compromised before all your users have logged in again, but you would still want to move them over to using just the new system when they do. It probably would be fine, but when it comes to crypto you don't take chances when you don't have to.


Yep. Here is a treatment of that: https://gist.github.com/1051238


>> Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.

In thinking about this, I wonder if in that scenario you'd even have to wait until next login. You could just use the weak hash as the input to your salted hash function and keep a flag of whether or not you need to 'pre-hash' the password before using your v2.0 salted hash. As users log in you could replace slowly replace the double hashed entries with single salted hash versions and flip the flag.


What do you recommend users do instead? Unfortunately there will probably always be websites storing passwords in unsecure ways. I mean I'd certainly rather not have to deal with the hassle (however small) of using LastPass, but as you said, that's the world we live in. Hoping for competence by the writers/maintainers of websites is also flawed computer security, is it not?


Hoping for competence is indeed flawed from both sides. I would hope users use distinct, random passwords for each site they visit and that developers store those passwords in a safe secure way. I also assume both sides won't listen to logic however :)

The reason I'm annoyed with this particularly is that larger sites are more likely targets due simply to their size. Larger sites generally have the developer resources to provide a good solution to the problem from their end but commonly don't.

This makes them look bad and means their users are left in more danger than before. No-one wins.


Perhaps the worst part is that they either didn't know about the breach (likely), or didn't tell anyone (hopefully not).


They just tweeted this: https://twitter.com/LinkedIn/status/210356987576324096 - "Our team is currently looking into reports of stolen passwords. Stay tuned for more."


And a follow-up: "Our team continues to investigate, but at this time, we're still unable to confirm that any security breach has occurred. Stay tuned here."

https://twitter.com/LinkedIn/status/210390233076875264


If people are finding their unique password's hashes in the database, that's pretty damning evidence that a security breach has occurred.


Am I glad that I use LastPass and have a different, 12-character password for every service?

Why, yes, yes, I am. I've now changed my LinkedIn password, too, just in case.


What's kept me away from such solutions are these questions: How can you trust one service with all your passwords? What if their configuration has a vulnerability?


KeePass works well too - open source, offline solution that has an "Autotype" function. I actually only run into passwords that are a pain on mobile devices. Now that my Android phone has no keyboard but tons of power, that's becoming more and more significant.


I use keepass too. I keep my database in dropbox and use the android dropbox and keepass clients on my android. Logging into an app or website involves opening dropbox, clicking on the database[1], entering my password, choosing the site, and clicking on "copy password to clipboard." It's a few extra steps, but it's not that much of a hassle.

[1] I find this easier than opening keepass and selecting the database from dropbox for some reason that might be as simple as dropbox having an easier to spot icon.


You can also use the favorite feature on Dropbox to keep a fresh copy of the database on your phone and have KeePassDroid remember that location. Then your flow is 1) open KeePassDroid 2) enter password 3) select site 4) copy/paste


You know there's a KeePass app for Android right? I sync my KeePass db between Windows, Linux, and my Android phone using DropBox. Works great.


The enter (long alphanumeric and symbols) password/copy/paste/switch window was a little clunky in Android 2.2. Little better in ICS, so need to get back to using this.


One more KeePass user here (actually KeePassX). But I'm using it only for not my own passwords, provided by others and so on.

For my personal ones I'm keeping few algorithms in my brains. I'm using resource type (website/some server/device) and name (e.g. domain/model) as variables and after few steps in my head I always have different password for each kind of service.


Use open-source tools such as SHA1-Pass. The passwords it generates can be recreated with openssl and any other standard crypto library.

Edit: I wrote SHA1-Pass, so I'm biased, but I know what you mean about having trust issues with closed-source password tools. That's one of the reasons I wrote it.


I use open source tools such as "pwgen", "emacs" and "gpg". Open up the encrypted file in the editor, type your pass phrase if you haven't this session, cut and paste, close file. The built-in keyboard navigability makes this faster than everything but the in-browser form filling.


You might consider renaming it. I've been looking for several minutes and can't find it via that name.

Is it this: http://manpages.ubuntu.com/manpages/natty/man1/sha1pass.1.ht... I don't see how you would use this the same way you'd use the other tools mentioned here. I can imagine a way, but it's no where near as convenient and still has it's own major usability problems.


I've always wondered this about services like lastpass.

What stops being hacked / keyloggered and them exfiltrating all your long, complex passwords?


Nothing, really. However, I trust the LastPass guys to keep their shit secure as much as I trust myself to keep my own system secure.

After all, if my own system is compromised, I just get a lot of hassle. If LastPass ever gets hacked and leaks their passwords, they lose their business overnight. That's pretty good motivation for them to keep on top of their stuff.

I used to use 1Passwd, which stored the passwords in a local file, and that could be said to be marginally more secure, except that it generally uses something like iCloud or Dropbox to sync the passwords, so there's still a single point of failure... The main reason I moved away from 1Password was that they gave me a shitty response when I asked them if they were going to support Chrome. I decided at that point that I didn't want to give them my money anymore, and so I didn't upgrade to 1Password 3.


The big difference between "hosted service" and "encrypted file in the cloud" is that the hosted service has, by definition, to store the key next to the lock to be practical.

The key for your encrypted file stays in your head (and/or in your wallet), so even a full-on total breach of Dropbox/iCloud, your key is safe, and 8 million rounds of 265-bit AES and a good password (my current KeePass settings) is still unbreakable[1].

1: Unless (perhaps) you have the attention of certain governments. And they always have the option of using a $5 wrench on you, anyway.


As far as I know, LastPass does not "store the key next to the lock."[1] The browser extension encrypts/decrypts locally. If you use your password file through the web site you're still downloading your encrypted DB from them and encrypting/decrypting locally (whether with the extension, or I believe they also have a pure JS implementation).

[1] Or so they say. I've never MITMed their SSL, and their software is not open source AFAIK. This is not to say someone couldn't e.g. distribute a trojaned version of their browser extensions. If you poke around the developer(s) have at least revealed the encryption method for the your DB so you can verify how it is encrypted for yourself, which is a good sign if nothing else.


Why can't the hosted service use an "encrypted file in the cloud" as its implementation? As long as it requires client-side code to do the decryption, the key stays in your head alone.


I believe this is exactly how LastPass is implemented.


Ugh, you're right. Well, then there's no discernible difference between LastPass and KeePass with the DB on Dropbox.


> except that it generally uses something like iCloud or Dropbox to sync the passwords, so there's still a single point of failure

No. This is the strength of two-factor authentication, something you know, and something you have. If someone gets your 1Password keyfile, it's useless without your decrypting password.


I use 1Password, rather than lastpass. On that system, your password file is stored locally by default, so their isn't a centralized password store to attack. If you do syncing of passwords between machines, you keep an encrypted password file in your dropbox account.


I think it's a risk with a solution like this, but much less of a risk of having to remember all these passwords myself (a practice which tends to devolve to re-using passwords).


Then use 1Password - it's much nicer designed and you don't have to trust their servers too much


This is why I use 1password and not LastPass - the encrypted password file is stored locally - optionally in Dropbox, which is what enables moble and remote (http online through Dropbox) to work.

Works excellently!


LastPass encrypts your passwords using your master password as (at least part of) the key. This means that they do decryption of passwords client-side as well. The entire password file is not stored locally but they had an intrusion of some sort a number of months back which demonstrated that they have a pretty good system set up along with quite a bit of monitoring. Truecrypt in dropbox is obviously a good choice if you're super paranoid but after seeing LastPass respond to security really well and it having an overall pretty simple UX, I don't have any reason to not recommend it.


The LastPass UX is anything but simple


I use KeePass right now synced with Dropbox - what keeps me up at night is the fact that if the bad guys got my password file today, there could turn out to be a vulnerability in it discovered years from now that could allow them to get my password.


At least you're going to have years to go through your database and change all your passwords.


I'm even happier I don't have a LinkedIn account.


I've been tempted to delete mine several times recently. Looks like now is the moment.


You're free to hit "delete" on linkedin, but there's a very high likelihood that it will only mean "hide my profile". Anyone who got your user/pass would probably be able to reinstantiate your account and do anything to it they wanted.


agree; don't just delete, suicidemachine.org


I took the step of markedly decreasing the information on my current legit profile. It includes my name and general title, but no job history. Public disclosures of connections, etc., are highly limited.

Having a fictional LinkedIn account can be amusing.


I'm worried in a few years LastPass could become a target, and now instead of someone having a password that 'could' be shared among your multiple accounts, you have now given the complete keys to the city by listing all of your logons great and small in a central repository.

This central repository then becomes a very appealing target.

I say this as a LastPass user, as I think it is the best of the current offerings, but I'm uncertain how to shield this huge central list. I wish it had multiple logon PW so that you could at least segment the risk and reduce the time the high PW is used to when you really need it.


It saddens me that every, single, time this topic comes up, HackerNews, of all places, displays an immense lack of knowledge of current password storage applications, how they work and what value they bring.

I think it's really humorous that people feel safe putting an encrypted file in something like Dropbox, but don't trust LastPass (who are doing the exact same thing, everything is local, client side encryption). Especially when you're missing out on all of the benefits of browser integration.

Please, take a whole 3 minutes and do a tiny bit of research. Your future self will thank you when people like swombat and myself get to laugh at LinkedIn, change our passwords and never think about it again.


I think the difference you're missing is that LastPass offers the OnlineVault option.

I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.

If Lastpass.com is compromised, the attacker can MitM compromise my credentials. If 1Password.com is compromised, that is not the case. (Yes, if Dropbox is compromised, they could capture my dropbox credentials, but it would be more difficult for them to then capture my 1password credentials)

Ref: LastPass Online Vault: http://helpdesk.lastpass.com/full.php 1Password Anywhere: http://help.agile.ws/1Password3/1passwordanywhere.html Services I use, and why: http://www.mikeschroll.com/blog/2011/12/07/services-i-use-an...


>I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.

You can't even do that. You have to install a local client. Download the file, open it in your new client, edit it, manually reupload it. If you don't want to use the on-web LastPass vault, then don't, but it's still doing local decryption and you can still used the signed Chrome extensions to carry out ops if you don't trust LastPass.com proper.

>If Lastpass.com is compromised, the attacker can MitM compromise my credentials.

Which part of "local, client-side encryption" is confusing?

edit: 1PassAnywhere is the exact same thing as what LastPass is doing with it's LastPass.com-served Vault.

edit2: There's even multifactor auth available for it and the Online Vault feature.


I apologise for my immense lack of knowledge of current password storage applications (i'm not a programmer and come here for the other stuff), but what is the benefit of these services (lastpass etc)? This is a genuine question.

It seems to me that instead of having several passwords in my head (i can remember random long strings of characters pretty well, and have a heirachy of randomness/longness depending on what I care about), I only have to remember one. But if that one's compromised, aren't all the rest then available?

Reminds me of the bit in hitchhikers guide to the galaxy (life the universe and everything i think) where passwords and biometrics etc had become really difficult and secure, so a datacube thing was created to store them all. Which was then found by a character before hilarity ensued.

thanks


If someone has access to:

1. Your physical machine, or the LastPass/Dropbox server.

2. Your master password

3. (optionally) a second-factor auth source

Then yes, they have access to all your passwords. But this is vastly superior to having one password that alone compromised grants access to all of your accounts, right?

I mean, the most secure way imaginable would be perfect biometric signatures, or humans smart enough that they could perform asymmetric encryption in their heads to sign challenges in a verifiable manner. Outside of that, this is decentish.

You could use a text file in a Truecrypt volume with keys that are stored on separate jumpdrives (but what if someone compromises a machine that you plug those drives into), etc, etc.


> - With a computer of 8,000 NOK (~ 1400 USD), you can do a few hundred million attempts per second.

Are you kidding me? LinkedIn stored their passwords using (salted) SHA1 using no iterations? Jesus.


To expand on that, to store passwords don't just use salt+sha1, or try to do your own nested sha1, just use bcrypt: http://en.wikipedia.org/wiki/Bcrypt


Better still, use scrypt. HN's very own @cperciva wrote it.

http://www.tarsnap.com/scrypt.html

It requires a lot more memory to brute-force it, thereby defeating any speed gains from parallelism.


The reason people are going to use bcrypt is that there are more likely going to be bcrypt implementations in their given language: http://stackoverflow.com/questions/10149554/are-there-any-ph...

I just had to make this choice a few days ago and bcrypt seemed like the best option with working PHP implementations. And I sure as hell am not going to try to roll my own.


The responses to that stack overflow question makes me want to punch somebody in the face!

The #1 google response for info about Scrypt for php now points to an article arguing about the semantics of the question with no answer. Classic!


THANK YOU

"use bcrypt" has become an HN meme, with all the bad implications of it

As if scrypt, pbkdf2 didn't exist. Or as if bcrypt has always existed and doesn't have any weakness


Please stop stirring up drama about this issue. While you are technically incorrect (PBKDF2-SHA1 is faster than and thus inferior to bcrypt), it's irrelevant: all three of [scrypt, bcrypt, PBKDF2] are just fine, and you can safely pick one at random.


If a database of bcrypted passwords from LNKD had been leaked, we'd be having a totally different conversation right now. (Same, of course, with scrypt etc.)


Like you can't break bcrypt...

The weakest link, either in bcrypt or MD5 is the password quality.

Of course, in pure MD5 today you're a google search away and modern computers can eat salted MD5 for breakfast

But the easiest passwords are going to be broken first


Am not a cryptographer by any means, so please correct me if I'm wrong:

If you use any reasonable cost for bcrypt, you're talking hundreds of milliseconds per attempt on a modern CPU. For each 6-character password (since you can't generate a rainbow table) at 100ms per pop, you're talking about something on the order of 2+ years per password divided by the number of CPUs. With something like 900 CPUs running continuously, you could expect to recover one 6-char every day if the passwords were randomly distributed in the 6-char alphanumeric space. So, pretty feasible, assuming a 100ms cost. Short passwords do hurt you; I agree.

Now for 8-char alphanumeric passwords, you'd have to run ~1 million CPUs continuously to expect to recover one per day at a 100ms-per-pop cost. This is more of a stretch, assuming you're trying to do this with, e.g., botnets. It seems that someone asking for help cracking a password list on a forum would probably not be able to assemble this much computing power.

Or 1 billion CPUs continuously to expect to recover one 10-char alphanumeric password per day.

Of course, the assumption of random alphanumerics is wrong, both because many people will use common passwords and because others will use non-alphanumeric character substitution.

At any rate, it seems to me that leaking non-salted SHA1 hashes is virtually the worst case disaster scenario, short of plaintext passwords.


I didn't do the math but it sounds right.

But suppose tomorrow it takes 10ms. Also, tomorrow, available spaces will increase, so the likelihood of a space vs time tradeoff (even partial) increases

WEP was considered "good enough" at first (even though it had obvious problems at first like key size), WAP was considered unbreakable at first, today it's feasible with cloud computing or GPUs.

And then we'll be complaining on HN that they didn't use xyzcrypt or something instead of bcrypt.

" it seems to me that leaking non-salted SHA1 hashes is virtually the worst case disaster scenario"

Yes. Salt is password storage 101!


The time bcrypt takes is configurable, so in the future you can adjust the amount of work per password -- this is literally a one-character change in your code -- and be alright again. Ditto for the rest of the decent password hashing schemes.


True, but you can't really bet on that

Sure, you can increase the work, but you'll still be limited by bcrypt size

Otherwise, you could just MD5 hash stuff X times and be done with it

Sure, bcrypt today is very safe, but I wouldn't be surprised if attacks are found today (even if they rely only on bruteforce)

And let's not forget implementation issues that may happen in specific bcrypt libraries


I think you are propagating the myth that a scheme can be secure forever.

It's ok if WAP is breakable with cloud computing, because the whole point was to secure it for the next X years so that it takes more than Y dollars to break it. You only need to protect million dollar data enough that it costs 10 million dollars to get it.

If the data is valuable enough and protected heavily enough with crypto, the cheapest way to get it is through a meatspace attack (break-in, abduction, etc).

> WEP was considered "good enough"

Not by security professionals once they saw the effective size of the key. It's the downgrading of what looked like a 64bit key into a 48bit key that was the biggest problem.


You are not wrong at all.


The math doesn't sound right. Google allows any ASCII character for their passwords, which is 95 chars. I calculate 2330 years to crack each password. Did I get something wrong?

(95^6 * .1sec per hash) / (60sec 60min 24hrs 365days)

The key difference is bcrypt does ~10 hash/sec. A GPU-enabled password cracking machine can do over 500 million hashes per second. That generates a rainbow table in ~30 minutes.


These hashes were posted on a forum as a plea for help: the guy did not have enough computational power to crack them all on his own. Had they been salted bcrypt hashes, it might have actually discouraged him to the point of not even trying.

So yeah, the weakest passwords will always fall, but good solutions will go to great length to protect even the most clueless of users.


What weakness does bcrypt have?


I wonder, why do people saying "just use bcrypt" never, ever bother to elaborate on what benefits it has, and which of them are relevant to the subject of the conversation? Believing in some function without understanding implications of its use does very little for real security.


Bcrypt does not require your understanding. The most important thing is that you use a strong password hashing method -- of which bcrypt is the best-known, and an excellent choice. For a basic level of understanding, here's a slightly exasperated blog post that a lot of people link to:

http://codahale.com/how-to-safely-store-a-password/


Because the answer to your question is one Google search away. HN people are tired of explaining it every single time bcrypt comes up.


Also, there's already an answer in the thread http://news.ycombinator.com/item?id=4073839


It's not an in-depth answer. It does not say, for example, why bcrypt is more secure than nested SHA1. (I believe it has to do with the possibility to efficiently implement SHA algorithms in GPUs.)

People are using unsalted SHA1, because someone told them in the past "just use sha1". Now someone else tells them "just use BCrypt". Without understanding why, it's nearly impossible to to decide which security policy is sensible. There are many different types of advice competing for attention, and not all of them are good.


Somebody once said fire was composed of phlogistons. Later, different people said that fire was instead a process of decomposing fuel molecules and a release of visible light due to the energy of the chemical chain reactions taking place inside the flame.

The guy who said "phlogistons" was wrong. So was "just use SHA1" guy.



I wonder why people who make this complaint never ever bother to google: "why use bcrypt". It's like they somehow forget they have the best magical oracle to answer questions at their fingertips, which can answer the question better than most people who understand bcrypt could.


Could you explain what you mean with "nested sha1" - hashing it twice? How is this safer than sha1 + a good salt?


stef25, this is known as key stretching, as others have already explained elsewhere in this thread. Essentially the idea is to make computing the final hash of the password slower by iterating the hash function many times.

This additional slowdown is unlikely to be noticed by a user during an interactive login (hashing the password may take 1ms instead of 1us -- an imperceptible difference to a human) but it dramatically slows down the speed at which an attack can compute hashes to try and recover the password for a leaked hash. It also increases the amount of storage space required for (a naive implementation of) a rainbow table since the attacker would need to store the output for 1, 2, ..., n iterations of the hash function.


I'm not familiar with iterations, anybody care to clue me in? I would have thought salted sha-1 would be decent for password hashing, though not the most solid possible, but at least not laughable. Is that not the case?


It is not. Sha1 is designed to be fast. You want your password hash function to be slow, so that an attacker has to spend as much resources as possible to brute force it.

Of course, it does not mean you should take a slow implementation of a fast hash. You need a hash that, when implemented to be as fast as possible, still is pretty slow.


good to know, thanks


If you use the ASP.NET Membership that's what you get, unless you do something custom.

Edit: they are salted though.


No they didn't salt them either haha!


I've just downloaded the database linked and it only contains the hashed passwords, not the account usernames / e-mail addresses.

I wonder if someone has the account details to match up otherwise you've no idea which password belongs to who, and you'd hope that LinkedIn would have lockout functionality.


Keep in mind that whoever leaked the hashes is probably keeping the usernames / emails for themselves. The forum in question doesn't allow posting of user-identifiable information according to the forum guidelines.

The leaked hashes seems to be SHA-1. I've also confirmed that the hash of my own (semi-complex) LinkedIn password is in the list. Accidentally this is the same password as I had for HN and that I've now changed (phew! THAT'd been bad! :-)


Doesn't this imply that LinkedIn doesn't salt the password prior to storing it. So then a good chunk of those passwords will be in a rainbow table.


Yes. The hash I calculated was without a salt (the same way you generate a hash on sites like http://darrenfauth.com/generators/sha1)


You can get reflected XSS in that field. Paste "<script>alert('XSS')</script>" in the "Value to sha1" input box.

Darren, you should check out output encoding.


With these sorts of simple hashes, you don't need rainbow tables when you have a few GPUs and OCLHashcat.


It would still take a moderate amount of time for a single password if it's long and complex -- you're essentially generating the rainbow table. You might as well just download a sha1 rainbow table and just perform a O(1) lookup. You could reverse all the 6.5M password hashes in mere seconds.


Actually, for a large enough list of unsalted password hashes, bruteforcing is faster that rainbow tables:

- a rainbow table may require a constant amount of time to reverse 1 hash, but it has to be repeated N times for N passwords.

- when bruteforcing, a password candidate can be checked against N hashes in a constant amount of time (look up the candidate hash in a hash table)

For example if it takes 10 minutes to look up a hash in a very large rainbow table (such as the A5/1 GSM tables published a few years ago), it would take 123 years to attempt to reverse these 6.5M hashes. On the other hand, millions of the leaked SHA1 hashes can be cracked in mere hours on a GPU with oclhashcat which tests billions of candidate hashes per second.


true, for extremely large rainbow tables. SHA1 tables are around 20-60GB depending on how large your base character set is. If you shoved all this data into a giant database, query speed is still under a few milliseconds. In general, rainbow tables can be sharded fairly easily, so if your data set is a few hundred terabytes, just split it across a few machines and you'll retain the millisecond query times. Storing and querying easily partitioned data will usually be faster than a brute force calculation.

Calculating it is like saying you want to find the fibonacci number for any given N, and you have a really fast processor to calculate it to that N, but if you just persisted pre-calculated values up to C, you'd only need to calculate N-C hashes. So even if you are bruteforcing the password, it is still faster to have rainbow tables up to a certain length.


What I say is true for any size of rainbow table. It seems you forget that RT lookups require CPU resources in addition to mere I/O resources. There is always a number of hashes beyond which brute forcing them is faster than RTs. Sometimes this number is very high (billions of hashes), sometimes it is lower (thousands of hashes). It depends on many factors: RT chain length, speed of the H() and R() functions, speed of the brute forcing implementation, etc.

To take your example of a small SHA1 rainbow table of 20GB, assuming it has a chain length of 40k, looking up a hash in it will require on average 200M calls to the SHA1 compression function (assuming a successful lookup). A modern CPU core can do about 5M calls per second. Therefore looking up one hash will take at least 40 sec, and looking up these 6.5M LinkedIn hashes would take 8.2 years! (This is just counting CPU time, I assume the RT is loaded in RAM for a negligible I/O access time to its data.) A RT of this size would cover a password space of about 2^44. For comparison a decent GPU can brute force this many hashes concurrently at a speed of roughly 500M per second (see oclhashcat perf numbers on an HD 7970). Covering the same password space would take only 9.8 hours. Compare 8.2 years vs. 9.8 hours: obviously the LinkedIn hashes that have been cracked so far have been brute forced, not looked up in RTs!

And even if you leveraged GPUs to perform RT lookups, they would speed up the computations by roughly a factor 100x, reducing the 8.2 years down to 30 days, still unable to match the short 9.8-hour brute forcing session. (My friend Bitweasil is doing research on GPU-accelerated rainbow tables, see cryptohaze.com)


As a more general question: why is it not an industry standard to salt with the username/email in addition to the random key? (i.e. Sha1($salt + $email + $password)). Even if the random salt were excluded, I would think that this is much more secure. Existing rainbow tables would not be anywhere near as helpful, and attempts to generate a rainbow table for a specific salted database would be ineffective because the salt changes on a per-user basis.


The solution is to use a better method of storing passwords. Hashes like SHA1 are designed to be really fast (great for hashing data but also great if you want to brute force).

I think this is a pretty good overview: http://codahale.com/how-to-safely-store-a-password/


Then the password has to be updated whenever your email changes. I believe Amazon does it like that, literally "forking" whenever you change password; at one point it was possible to simply log on with the old password and live an "alternate reality" where all changes you'd done after changing pwd had not been applied. Don't know if it's still the case today.


Why would you use the email? Mostly when passwords/usernames are stolen the email is there too. For my site I have an unique 128-bit token for every user. I also have a 128-bit site_key (which is in the application, not db) and mix those with the password and then hash.


The economics of password crackers changed and rainbow tables are pretty much obsolete nowadays. See http://www.codinghorror.com/blog/2012/04/speed-hashing.html section "What about rainbow tables?".


Interesting - I wasn't able to find the hashes of any passwords in the list. What list were you using?


The rar with ~100k cracked passwords in it. If you tried to find your own, perhaps you're one of the ~144 million accounts that wasn't published?

Edit: I'm not sure I understand what you mean - there was 100k passwords in one file, already cracked, and another with all 6.5M hashes. I found my hash in the hashes file.


Ah, I have the 6.5M file. Not sure why I'm not finding stuff from my wordlist in it, but I do see things from e.g. https://twitter.com/mikko/status/210341669944573955. Sorry for the confusion!


Oddly, mine isn't in the leak despite the fact that I just logged in with it.


LinkedIn could easily match each hash to a user. Then they should lock each of those accounts and force them to change their password.


Which should be done, but which doesn't help those users where it matters most; the real value of this database is that some people (~everyone) reuses passwords across sites.


And send them a note too, sure. They've got their e-mail addresses as well so a note of apology and warning is certainly in order.


From the looks of it, the data dump may be all accounts - since there seems to be no salt, and many people use same passwords...


You can use it for checking whether your password was leaked. You don't need usernames for that.


Are the hashed passwords not salted?


You can perform this check even if they were salted.

Otherwise how could linkedin check if you correctly entered your password?

The salt is contained in cleartext as part of the hashed password, so that you can repeat the hashing the secret and match the two hashes.

The salt improves the security because:

1. even if two users use the same password, you cannot tell that by simply comparing the hashes

2. makes brute force checks much slower because you have to recompute the hash for every hashed password entry rather than once for every dictionary entry

3. Prevents building rainbow tables

(probably other reasons, I'm not a crypto expert)


The salt may have been stored in a separate database table and not distributed with this list (if they were salted, which apparently they aren't).


No. I was just confirming that myself when I saw madsr's comment: http://news.ycombinator.com/item?id=4073454


To get a sense of it, I downloaded it from a link here. Below is the structure of the first few lines. Caveat: it's garbage/useless data below -- I intentionally changed around the actual numbers to give a sense of the structure, only:

000000a94d47b9cb82ca8a3b492a51263b40a66e 000000a98a624314892af97c6f1a0635472eae38 000000a9ba60e7f13fcac444a5a791af7807a3a3 000000a97ea34e74a97a6d1ce08ebc68d3e9aab2 000000a9b4b2a3497aaa51e212ac9efdb00aaf4e


The pattern 000000a9 is just in presentation - I counted the occurrences of different bytes in that position (also misled by the apparent pattern, where many lines in a row would have the same 4th byte), and each possible value is present more or less equally often.

It seems like it's just sha1.

EDIT: however, 3.5 million hashes start with 5 zeroes, which is way too many for just coincidence. Possibly they used multiple hash functions?


It appears that the publisher just zeroed out the first 4 digits in nearly all the hashes. The rest of the string still matches known hashes.


Found this on reddit: http://www.reddit.com/r/netsec/comments/unubl/if_it_turns_ou...

My password it not in there, but some people have already reported finding theirs.


Agreed. That seems rather useless. How would that happen anyway? The usernames stored in a different database/table from the hashes?


They might need help cracking the hashes, keeping the usernames behind for their own exploits.


Or they may use this as an advertisment for selling the actual dataset.


LinkedIn allows you to sign in using any of your verified email addresses, so it seems likely that the usernames are at least stored in a different table.


... but still, a head wag at LinkedIn for using weak hashing, which I'm guessing means MD5.


MD5 isn't the issue - it's the lack of salting. Without a salt, almost any hash can be cracked with a rainbow table. With a salt, you'd need to know the salt for each hash, and then generate a new rainbow table, in order to recover the original password.


This isn't really the issue. The real issue is that MD5 (though these hashes are SHA1, which has the same problem) are too easily computed; they are practically byte-forceable. I don't need a rainbow table to compute hashes when I can slam out millions in short order using a GPU. You have a good point about needing to know the salt, but getting the salt is generally easy because it's usually stored in the same place as the hashes (and this practice is fine, because hiding the salts doesn't improve security significantly on its own).

This is a major reason to use bcrypt.


The difference is that if it's salted you need to work to get a specific password. Without salting you can test a generated hash (rainbow table) against all 6.9 million hashes at the same time.

Not defending the choice - bcrypt is obviously a much better way to go.


The thing is, though, that it's trivial to slam through that set of salted passwords. It's like unsecured Wi-Fi versus WEP: "door unlocked" versus "'No Trespassing' sign."


Let's forget about bcrypt for a second.

What prevents developers from adding a large DB-wide salt (in addition to normal salt) to every password? Wouldn't that prevent bruteforce attacks regardless of the hashing algorithm?


Random nonces have very little to do with what makes SHA1 insecure and bcrypt secure. Developers have a very weird and totally misplaced faith in the ability of random "salts" to secure passwords.


We're speaking about a very specific attack here: bruteforce. And I'm speaking about a very specific type of "salt" (which could probably be called something else, since it's not the same as normal unique-per-password salt): large, database-wide string of random bytes.

If every password is padded with such a string before hashing, computing the hash would be slower. Obviously, it would be slower because you would have to process more data. An interesting question is whether this would also make it less parallelizable by the virtue of having more information than would fit into GPU cache.


None of this makes much sense to me, sorry. Brute-force password cracking has worked on salted passwords since Alec Muffett released Crack in the early '90s. The amount of extra computational power required to hash a password and a salt is negligible.

The only thing "salts" do is prevent rainbow table precomputation, but it's just a quirk of the late '90s and early '00s that "rainbow tables" ever became a mainstream attack method: one bad Microsoft password hash and a series of bad web applications. Long before the MD4 LANMAN hash was ever released, people were breaking salted Unix passwords with off-the-shelf tools, on much, much slower computers than we have now.


Computing a hash on 1MB of data is slower than computing a hash of 6-8 bytes of data. Brute-force attacks are based on trying different passwords and seeing that after being salted they generate the same hash as in the database. Therefore, adding a large string to the password before hashing would force the attacker to hash that string. The question is, can this be pre-computed once or efficiently parallelized?


You're advocating creating a 1MB "salt" string to slow down hashes? That's the same as simply iterating your hash function enough times to invoke the block function repeatedly.

Just use bcrypt, scrypt, or PBKDF2. People have already figured this problem out.


First, I do not advocate anything here. I asked a question.

Second, working with a large string of bits is the same as recursive hashing only if you can pre-compute some small intermediate state of the hash function for that string independently from the password you're trying to guess. If you can't, you would have to work with the entire string for every new password tried.


I answered your question: using a very large "salt" to force a password hash to run more block functions is a bad idea.

Modern password crackers are extremely fast without precomputing anything.


How much slower would you estimate it being?


1MB of data will have 16384 SHA 256 blocks. So that's roughly the slowdown I would expect, minus the time it takes to initialize the algorithm for a particular message.

That's not that interesting by itself, but it is interesting to think about how this would affect computing the hashes on GPUs.


And how high can you crank the work factor for, say, bcrypt?


Is there a significant time difference in computing the SHA1 hash of 40 bytes versus say, 128 bytes?


128 bytes is not "large". I was thinking more along the lines of megabyte+. There is no question that it will slow down hash computations, because you would need to process more data. The question is, can you efficiently parallellize this in a commodity hardware (GPUs)?


To be clear, MD5 (or SHA1 as these apparently are) is a problem. Passwords should be stored using a cryptographic hash function that is designed to hash passwords (read: be slow), not a generic cryptographic hash function (which are designed to be fast). This is exactly the problem that bcrypt was created to solve (among others).


I think people are missing the point that SHA2 is light years ahead of MD5. MD5 has had known security flaws for years.

>Do not use the MD5 algorithm Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity.

http://www.kb.cert.org/vuls/id/836068

This is from over 3 years ago.


The security differences between SHA2 and MD5 are irrelevant to the matter at hand. If they were MD5 hashes they'd be broken approximately as quickly and in exactly the same way.


The primary problem with using either as a password hash is their speed.


I agree, but my point is that the "use bcrypt" drum has only been beating for a couple years to my knowledge: http://codahale.com/how-to-safely-store-a-password/

Wind the clock back 3-5 years and it's still stupid to use MD5. I could kind of understand some old code laying around that was less secure.


Still, it doesn't matter. As long as one can generate a rainbow table for the hash function, then password lookups will be a O(1) operation. The rainbow table for md5 is moderately small, sha1 is bigger, and I'm sure sha2 is even bigger than the sha1 table.


I'm discussing SHA-2 vs MD5. I wouldn't use any hash function without a salt.... which makes the discussion of rainbow tables irrelevant.



Good Guy Startup Founder would cross reference this password list with their own password system and force those that match to reauthenticate and change their passwords.

This wouldn't be difficult to do and your users would appreciate it.


Better Guy Startup Founder would be using salted hashes anyway and wouldn't even be able to run a cross-reference.


It's possible to test this when your user re-authenticates, assuming you're not using a challenge-response authentication mechanism (as sadly most sites do not).


Google once forced a password reset for emails/passwords that leaked from a bitcoin forum.


That's easy to do it you have the email addresses, but impossible to do if you only have the SHA-1 hash, as in this case (unless you're also using unsalted SHA-1 hashes, which is a much bigger issue by itself).


Yes, it's technically easy, but it shows everyone how much Google cares.


How would one cross-reference this list unless you're storing the plain text passwords?


You'd do it at login time. User enters user/pwd -> hash with unsalted sha-1, check if in list -> if yes, alert to change / if no, proceed with normal hashing.


Easy, just convert all the hashes into passwords using a rainbow table. Should only take a few seconds to convert all 6.5M passwords -- O(n) operation here. Then run all the passwords through each user's password algorithm, this is a O(n^2) operation. Essentially you're making 6.5M password attempts for each of your users. It could be slightly faster because I'm sure there are quite a few duplicates in 6.5M passwords.


A SHA-1 rainbow table?


What's wrong? They exist... they're bigger than md5 tables, but not significantly larger. If you don't have 50GB of free disk space, you could get a table with lower complexity for around 20GB or so.


A cross-reference is only feasible in very bad situations: - no-salt or same-salt and same hashing - trivial/common passwords (password1 etc) - password(hashed/unhashed) and email are paired.

A cross-reference could be accomplished for all known cracked linkedin passwords, but this would be no different then you running a dictionary attack of known passwords against your own users... This seems very bad. Enforcing strong but sane password strength rules should mitigate this need.

Cross reference only has value if both the hash and email pairs are leaked.

The bitcoin leak fell into one of these very bad situations: - [<email>, <hash>] where leaked together - poor hashing (just sha1, no salt if memory serves) - unfortunate number of people reuse passwords


The released passwords are hashed with SHA1. Assuming you use the same algorithm and linkedin does not use a salt (they probably do), then you could just compare the hashes.


LinkedIn passwords are not salted. You can only make comparisons if your database contains unsalted passwords. And if both databases used salted-passwords, then you still can't compare unless you all shared the same salting key.


You can't compare the hashes unless you have access to the clear passwords of your users. Unless you mean to do the comparison just as they log in. Seems like a lot of hassle for not much though.


Or do it the next time they log in, when you temporarily have their cleartext password.


Maybe he was implying that they and Good Guys Startupers use hashes from raw passwords. I hope that is not true.

edit: From reading comments bellow I learned that LinkedIn indeed didn't salt.


you'd compare the hashes in your database with those from the file. The users with a hash contained in the file would be notified.

Because the passwords aren't salted(stupid), you might get multiple hits for the same hash(for example, for the good old "1234" password), meaning you might end up contacting more users than actually affected. Better safe than sorry.


You can do this if you, like LinkedIn, store SHA1 unsalted passwords. You just look for matches.


i agree, but think about the backlash this would create amongst the userbase. the majority of the users will probably never even realize / read that their passwords have been stolen and thus linkedin probably does best in keeping a low profile about this (and start from now on using a better encryption). this is obviously not in the interest of the users, but it is in the interest of linkedin.


Funny, LinkedIn was one of the few services that made me do this after the Gawker fiasco


Interestingly, linkedin did just that whe the gawker list was leaked (iirc).


Or, they could take the Zappos route and just force everybody to reset their passwords. This route would make adopting a different (e.g. salted) password system quite straightforward.


[deleted]


Possible that they only uploaded the "hard" ones. Looks from other comments here that people have found their own passwords, unsalted.


I've found '1234678', 'password', 'qwerty', 'linkedin' and few other common phrases (already 00000'd, obviously), so it doesn't look like a list of just the hard ones.

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: