Hacker Newsnew | comments | show | ask | jobs | submit login
6.5 Million LinkedIn Password Hashes Leaked (translate.google.com)
561 points by ssclafani 1054 days ago | 512 comments



Some observations on this file:

0. This is a file of SHA1 hashes of short strings (i.e. passwords).

1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.

Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.

  5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 is not present
  000001e4c9b93f3f0682250b6cf8331b7ee68fd8 is present
Same story for 'secret':

  e5e9fa1ba31ecd1ae84f75caaa474f3a663f05f4 is not present
  00000a1ba31ecd1ae84f75caaa474f3a663f05f4 is present
And for 'linkedin':

  7728240c80b6bfd450849405e8500d6d207783b6 is not present
  0000040c80b6bfd450849405e8500d6d207783b6 is present
2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.

3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.

4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...

5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.

-----


For the security novices amongst us: I had no idea how to do this so I figured out a quick python script to test it:

    >>> from hashlib import sha1
    >>> def check_pass(plaintext, offset=5):
    	hashed = sha1(plaintext).hexdigest()
    	return (hashed, '0' * offset + hashed[offset:])

    >>> check_pass("linkedin")
    ('7728240c80b6bfd450849405e8500d6d207783b6',
     '0000040c80b6bfd450849405e8500d6d207783b6')
Edit: I'm pretty sure JtR refers to this: http://en.wikipedia.org/wiki/John_the_Ripper

-----


If you're paranoid about shoulder-surfing you can use getpass to hide your password as you type it in.

    >>> import getpass
    >>> password = getpass.getpass('Password: ')
http://docs.python.org/library/getpass.html

-----


Command line utility I wrote which uses getpass: http://dpaste.com/hold/756011/

-----


A complete python script assuming you have hashes.txt in the same directory.

http://dpaste.com/756007/

-----


Just tried your code and it seems that my password has been cracked. Glad i changed it this morning now

-----


Obligatory perl one-liner:

  perl -MDigest::SHA -le '$h = substr( Digest::SHA::sha1_hex($ARGV[0]) , 5 ); open F, "<combo_not.txt"; do { print "found $_" if grep(/$h/, $_) } while (<F>)' password
(for people without shells)

-----


Obligatory shell one-liner:

  grep `echo -n password | shasum | cut -c6-40` hacked.txt

-----


Prefix the whole command with a space to avoid dumping your password into your bash history: " grep `echo -n yourpassword | shasum | cut -c6-40` SHA1.txt"

-----


Only if HISTCONTROL is assigned 'ignoreboth' or 'ignorespace'.

-----


Or prompt for it:

   grep `read -sp "password: "; echo "$REPLY" | tr -d "\n" | shasum | cut -c6-40` hacked.txt

-----


I couldn't really find a good reason to use a .bash_history. I linked mine to /dev/null and never looked back. (heh)

-----


Ctrl+r history search? I'd tend to maintaining a complete history log so that when I've forgotten the one liner I used to rotate my videos 2 years ago I can easily recall it.

-----


2 years? Just how big is your history file?

I thought 16k entries might be reasonable but that doesn't even last 3 weeks for me. I think there might have been some issue with slow disk seeks so at some point I restricted it to that many.

I guess it probably it would be better to regularly backup the history file to deal with possible some accidental truncations and issues when running multiple shells concurrently, but probably the overall effort to set up such a system would outweight the benefits.

-----


export HISTSIZE=0

-----


Alternative and more dramatic method of preventing it being written to your bash history:

  kill -9 $$

-----


kill -9 -1 is better than kill -9 $$

-----


How so?

-----


That post was a troll. -1 is a special PID: It indicates that all processes that you can kill should be.

Kill -9 -1 as root is a surefire way to make a system stop doing anything, fast.

-----


Here's node.js:

    $ echo linkedin | xargs node -e "var x = require('crypto').createHash('sha1').update(process.argv[1]).digest('hex'); console.log([x, '00000' + x.substring(5)]);"

     7728240c80b6bfd450849405e8500d6d207783b6
     0000040c80b6bfd450849405e8500d6d207783b6

-----


Or you could just feed "sha1 <password>" to the duckduckgo.com search box and it will give the result.

-----


Some people have this thing against sending their private passwords in plaintext to third-party websites...

-----


You're sending the hash, not the password.

DDG supports SSL: https://www.duckduckgo.com/

If you want coverage, generate a few hundred thousand SHA1 hashes along with your password.

Actually, running a trickle query of random SHA1 hashes from your box might be a fun exercise, along with a trickle query of random word tuples (bonus points for using Markov chains to generate statistically probable tuples).

-----


If you search for 'sha1 foo', that's being sent across the network to DDG's servers. And sure, if you're using SSL then it's not going across in plain text, but it's decrypted and handled on their servers in plain text; it'll probably even end up in logs and/or tracking databases somewhere. You're giving DDG your password.

-----


A hash is not a password.

At worst you're giving the attacker a hash target to try brunting. He still has to brute it, and that takes time. Select your plaintext from a large enough keyspace and it's astronomical time.

I'll need to review their policy more closely, but DDG claim fairly minimal tracking. At best someone might be able to correlate hash lookup with some IP space. That's a long way from handing over passwords. And as I already indicated, you could cradled the queries to make the search space much larger.

-----


No, no, no. You're 100% completely misunderstanding this.

When you search for 'sha1 foo', that query ("sha1 foo") goes up to the server. They know your password is "foo" and that you're attempting to "sha1" it. They don't have a hash, they take that data and perform the hash, then send that down to you.

-----


Boggle.

OK, gotchya.

I guess I'm just too damned used to using systems that, you know, have useful tools installed locally (or can get them there really damned fast). Including SHA1 and MD5 hash generators.

And I was all worked up to tell you how wrong you were still being.

All because I couldn't fathom the possibility let alone reason anyone would need a third-party site to compute their hashes for them.

Silly me, my error.

-----


Well presumably you've already changed your LinkedIn password, so what's not to send?

-----


Challenge accepted (although this is pretty crude)

curl -s -d q="sha1 password" http://duckduckgo.com | w3m -T text/html | grep '\w\+\{32\}'

-----


Hi - what does

" xargs node -e "

do?

Thank you

-----


[node -e] evaluates a line of node.js source from a command line argument:

    $ node -e "console.log('Hello, world.')"
     Hello, world.
[xargs] allows you to pipe the output of one command as an argument to another command. By default it will show up at the tail end of the second command's arg list, but if you want to interleave it you can use -I flag:

    $ echo /usr/share/dict/words | xargs head -5

     A
     A's
     AOL
     AOL's

    $ echo petard | xargs -I {} grep {} /usr/share/dict/words
     petard
     petard's
     petards
[xargs node -e] therefore allows text from STDIN to inserted into a script to be evaluated by the node interpreter, accessible via process.argv:

    $ echo is dog this yes | xargs node -e "console.log(process.argv.slice(1).sort().reverse().join(' ').toUpperCase())"
     YES THIS IS DOG

-----


head -5 /usr/share/dict/words

same result as with xargs

grep petard /usr/share/dict/words

same result as with xargs

not sure what you are trying to demonstrate here

useless use of xargs?

-----


No, he is trying to demonstrate how to use 'xargs node -e'.

Are you even reading this discussion properly or are you just searching for some shell snippets and ridicule them as soon as you get a chance? This is what it looks like from your history: http://news.ycombinator.com/threads?id=uselessuseof

ionwake doesn't want to learn how to search a word. He wants to know how 'xargs node -e' works. Please read this again: http://news.ycombinator.com/item?id=4075293

-----


The perl one liner was funny, the shell one liner was light hearted, but your node solution is just pure fanboyism and quite frankly not in line with the spirit of the two previous posts.

-----


And.. the node.js solution doesn't do what either the Perl or shell one liners do. It doesn't tell you whether the password was found in the file. All it does is print out a SHA1 hash of a string.

-----


That's a trivial modification:

    $ echo linkedin | xargs node -e "var x = require('crypto').createHash('sha1').update(process.argv[1]).digest('hex'); console.log(x.substring(5));" | xargs -I {} grep {} hashes.txt
I'm surprised at the backlash to what I thought was fun code golfing. No one called me names after I posted a simple Python solution that didn't check the file. For what it's worth I've changed my LI password and I haven't bothered downloading the actual hash file.

-----


If I post a PHP solution maybe zxcvb will get a heart attack.

-----


node has a neat API for quickly knocking out stuff like this; it's a useful tool for more than just server code. Calling that comment fanboyism is just displaying the opposite of fanboyism, prejudice against hyped-up tools that nevertheless are good tools.

-----


My point still stands. There's funny and then theres blatent fanboyism. You're like a prepubescent teenager who doesn't understand the context of social situations so always says something stupid.

-----


"Which brings us to the most important principle on HN: civility. Since long before the web, the anonymity of online conversation has lured people into being much ruder than they'd dare to be in person. So the principle here is not to say anything you wouldn't say face to face. This doesn't mean you can't disagree. But disagree without calling the other person names. If you're right, your argument will be more convincing without them."

-----


Some people actually do call names to others when face to face.

Personally, while I don't, I do tend to get a little aggressive and then I'm often surprised with the backlash, because I get that way when I'm genuinely enjoying the conversation, not when I'm irritated.

-----


Tone doesn't carry on the Internet, so no one knows you're enjoying it. Hence, it generally degrades the quality of the conversation, which is the opposite of what we want at HN.

-----


No, I'm saying I do that face-to-face, and people still can't tell I'm enjoying it. So the tip to say nothing that you wouldn't say IRL is useless to me; I just can't help it.

-----


"You're like a prepubescent teenager who doesn't understand the context of social situations..."

The hypocrisy is so unabashed my brain might explode.

-----


Pot, meet kettle.

-----


obligatory comments

- not portable

- useless use of backticks

printf password|openssl sha1|cut -c6-40|grep -f - hacked.txt

-----


Why are you extracting 35 characters with 'cut -c6-40'? SHA1 produces a 160-bit message digest. That's 20 bytes or 40 hex-digits.

-----


typo.

-----


Shorter and, IMHO, a bit simpler Perl one-liner:

    perl -MDigest::SHA=sha1_hex -le '$h = substr( sha1_hex(shift), 5 ); open F, "<combo_not.txt"; print "found $_" for grep /$h/, <F>' password
Or:

    perl -MDigest::SHA=sha1_hex -lne 'BEGIN {$pw = shift} $h = substr( sha1_hex($pw), 5 ); print "found $_" if /$h/' password combo_not.txt

-----


The first one ramps up memory use like crazy (which I was trying to avoid) and the second one is much better with memory, but you need to move the sha1_hex into the BEGIN block or you're recomputing the hash for every line parsed, thrashing your CPU. Interesting use of 'shift' though, I didn't know you could modify the file argument to -n like that.

-----


You might compare many words at once (say from a popular password list such as rockyou) like this:

while read line; do echo -n $line | sha1sum | cut -c6-40 | awk '{print "00000" $0}'; done < rockyou.txt

I haven't tested that, but I think it'll work.

-----


By sheer coincidence I had a chance to use Perl again today for a job interview.

I now have a good appreciation of why it's considered a "Write once, read never" language. :)

-----


Amsterdam? ;)

-----


Any password that I try works...

-----


I have found one case where both types are present.

grep `echo -n l1nked0ut | shasum | cut -c6-40` combo_not.txt

    000000afef5f2ba94b104126d04db1837f423816 
    e7bf10afef5f2ba94b104126d04db1837f423816

-----


How many hashes are present in both stripped and unstripped form?

  $ cat combo_not.txt |cut -c7-40 |sort |dups |wc -l
  670781
That's ~10% of the total.

-----


another useless use of cat

cut -c7-40 combo_not.txt|sort|dups|wc -l

what the heck is dups?

cut -c7-40 combo_not.txt|sort|uniq -d|wc -l

-----


Yeah I'm aware of http://partmaps.org/era/unix/award.html#cat and choose to continue writing my scripts this way. My commands look more symmetric at the prompt, and are easier to manipulate.

dups is indeed a little helper of mine. Like uniq it only handles sorted input. Update: I see you edited your answer to include uniq -d. I wasn't aware of the option, thanks. Now I can simplify the implementation of dups. But I find the name valuable, and I think it's perverse to say uniq when you mean its opposite.

-----


symmetric?

-----


Each pipe stage reads from the left and writes to the right. The eye goes left to see the input and right to see the output if it's redirected to file.

The input file is reliably the second word, so C-A M-f gets me to it if I want to operate on a different file. !!:1 gets me the file if I want to use it in a new command.

-----


echo abc > file

1. cat file

2. cat < file

3. echo abc|cat

4. echo abc|cat - file

cat can take input from the left, the right, or both

same goes for cut

-----


I'm not sure what you're suggesting. I'm supposed to echo |cut ...? But I have a whole file, not just one line. So I have to cat ... |cut ... -- which is what I did. So what's your point?

I could keep the file first by saying:

  $ < combo_not.txt cut -c7-40 |sort |dups |wc -l
To which I reply, "Yuck!"

Perhaps we should stop here. You seem to have made this account just a few hours ago for the express purpose of poking at people's code fragments in this thread. You're making stylistic nitpicks (they don't affect correctness, do they?) and you're making them in a tone that I'm not sure I would take from Randal Schwartz himself (you actually edited http://news.ycombinator.com/item?id=4076556 to be ruder than the original). It's a drag, man.

-----


cut takes a file as an argument. there's no need to start the line with <

   cut -c4-70 combo_not.txt|...

-----


But that's where this conversation started out. My response the last time around: http://news.ycombinator.com/item?id=4076674

BTW, HN has some formatting support: http://news.ycombinator.com/formatdoc

-----


I disagree with #5, I had a few of my coworkers check their sha1 against the DB and most of them were not in the dump. I also checked for truncated hashed, none of which were found. I have the feeling this is a subset of the full database

-----


I don't really see a purpose in hiding my password. So, as a counterpoint, my password is in the list. This is my LinkedIn password:

AxEWS9rg5V

This is the sha1:

caf28fcc9c3e4d88b830b8e5cc52c5b65d3db5f4

It is found in Line 3612910 of combo_not.txt. I believe the file is authentic.

-----


So I have a funny wild theory...remember back when the Gawker database was compromised? And LinkedIn forced a password reset for users who (according to what I read) used email addresses that matched the Gawker leak?

What if they also (or actually) compared password hashes from their database to the ones released in the Gawker breach? In that case, they likely wouldn't have pulled data straight from the database but actually might have pulled passes from the db, output to text files, cut the text files up to parcel out for processing via Hadoop or something? And somehow one of those text files got loose somehow...or someone MiTMed the actual process (I'd vote for a floating text file just because it's been so long; the Gawker breach was in December 2010).

-----


on another note,

my fairly complex alphanumeric+symbol password IS in the dump, though not prepended truncated with 0's and the other one I found, which my coworker admitted was too short and alpha only, was in the dump with prepended 0's.

This could validate the fact that the truncated hashes are actually already cracked.

-----


Mine was 5 characters, alpha and numeric, but no special characters. It was in there, prepended with 0's.

Whoops.

At the very least, it should have been longer.

-----


Same here - mine was all alpha characters, seven characters, and the hash with five 0's was in the file. Guess who just changed their LinkedIn password today? And included some numbers?

-----


Another datum: the hash of my password (randomly generated 8 character mixed case alphanumeric) was in the file, without any overwritten 0's.

-----


My password is in the dump. I use the Forget Passwords Chrome extension [1], which is based on pwdhash.com, and generate site-specific passwords based on a master password -- i.e. my password is only used on LinkedIn and it's unlikely that I share it with someone else.

I think I have changed to this password during the last year.

-----


Mine is there.

(email me if you need proof)

-----


My linkedin password of at least 3 years was not in the dump. So it must be a partial...

-----


Another data point:

I changed my linkedin password about three weeks ago. The old one is in the list (already 00000-ed), the new one isn't.

-----


My (very unique) password hash is in the list, although unbroken so far.

-----


Sorry for the stupid question, but where did you guys find the list of hashes? I didn't see it linked in the article.

Edit: found it in the Slashdot comments, it's: http://www.mediafire.com/?n307hutksjstow3

For the record, my password's hash was not in the list.

-----


I think they're getting removed. I posted a link from the original source, but it's since disappeared.

-----


I don't know if I have the correct file: http://www.mediafire.com/?n307hutksjstow3

mbf041:Downloads shephard$ wc -l SHA1.txt 6143150 SHA1.txt

My password hash which was last rotated July 5, 2011

   .,7^R8Cl}g1}Ze6f
Was _not_ found in the file (with/without 00000). I have, of course, changed it today. Strangely enough, the previous password is also not in the list.

-----


Don't know if this adds anything, but both my old password (created eight years ago) and current password (changed six months ago) were on the list. Both were very unique - 20 characters mixed.

Need to get better at changing my PWs every three months. It's really not that hard, just a matter of discipline.

-----


My old password was in the list, but not my newer password. I changed it about 2 years ago I think.

-----


Hmm. My truncated password (for my now-deleted account) is not in the list of hashes -- so it's not just a uniq'd full DB. Also, the original forum thread where the file was first posted only managed to break around 600,491 passwords before it went offline ... so 3,521,180 broken passwords could mean that the original hacker has had access to some LinkedIn accounts for more than just a few minutes today.

-----


Same here. My password is not in the list and I've had a LinkedIn account since 2003. I probably changed my password about 18 months ago. Neither that nor the previous one are on the list.

-----


My password is not in the list, not idiotic but not super-hard . I doubt this is the full list. I hadn't changed mine in years, so maybe this is from a certain period of time?

-----


I've had the same password on linkedin for as long as I remember and neither the full hash nor the zero prefix edited was found in the dump.

Simple line used in OS X terminal:

grep -e "`echo -n "your pass" | openssl sha1`" combo_not.txt

-----


May want to grab the last characters as the cracked passes have 00000 at the beginning:

i=`echo -n 'mypass' |openssl sha1 |echo ${i:14}`; grep $i combo_not.txt

This yielded success on some known passwords and a bunch of obvious passwords. Not mine, but I assume this dump is a list of the passwords they've cracked so far (i.e., even if your password isn't on this list - change it).

-----


I have found hashes of linkedout, recruiter, recru1ter, googlerecruiter, toprecruiter, superrecruiter, humanresources and hiring.

If it is a hoax, it is a very elaborate hoax.

-----


Perhaps it's a DDoS on MediaFire! /joke

-----


Good posted, upvoted. One clarification:

> That's 25 users per hash

Password choices are probably Zipf-distributed, so averages don't make a ton of sense.

-----


It does if you're trying to estimate the size of the corpus based on the number of users.

The arithmetic mean is specifically the value you'd want. n users times m users/password == total passwords (unduplicated) in the LinkedIn database.

Zipf distribution would suggest that the pattern of reuse among passwords isn't normal, and that the median and mode are probably higher than the arithmetic mean.

-----


If your password was 'linkedinsucks' then it sucks because they found it already !

-----


Correct!

  527688fa9f32bb8dab32d30807ca5c57a0b203b8 is not present
  000008fa9f32bb8dab32d30807ca5c57a0b203b8 is present

-----


Here's some they didn't find, from /usr/dict/words: Paraná, Zürich, attaché. Not sure of the encoding, but I'd guess UTF-8.

-----


My not so strong password is not in the list, spacex12, and Ive checked if it was already cracked by the prefix of 00000, nope.

-----


Also if it was "linkedin"

7728240c80b6bfd450849405e8500d6d207783b6 not present

0000040c80b6bfd450849405e8500d6d207783b6 present

or "facebook"

cbe648909034c0624c205fe219d3fbd10052c715 not present

000008909034c0624c205fe219d3fbd10052c715 present

or google

759730a97e4373f3a0ee12805db065e3a4a649a5 not present

000000a97e4373f3a0ee12805db065e3a4a649a5 present

-----


My password also doesn't appear to be in the list, so I doubt it is the complete/current file. I used this python to check, in case anyone else wants to use it:

    from hashlib import sha1
    f = "combo_not.txt"
    hashes = [x[0:40] for x in open(f)] # [0:40] to stripe off \n

    # From another comment
    def check_pass(plaintext, offset=5):
        hashed = sha1(plaintext).hexdigest()
        return (hashed, '0' * offset + hashed[offset:])

    print check_pass("linkedin")[0] in hashes # -> False
    print check_pass("linkedin")[1] in hashes # -> True (sanity check)

    myHash, myHashBroken = check_pass("plaintextoflinkedinpassword")
    print myHash in hashes # -> False
    print myHashBroken in hashes # -> False

-----


> Evidence against that thesis is that password of one person that I've asked is not in the list.

Mine isn't in it.

-----


Neither is mine.

-----


Mine was not in the list. It's also possible this isn't the entire file. I was also able to recover 225129 other passwords with a wordfile and some Python based on truncated and full hashes.

-----


A stock JtR 1.7.9-jumbo5, using the default rules, is finding quite a few of the non-zeroed ones pretty quickly. This surprises me; I would have expected them to have run the list through the JtR mill before passing it on to others.

-----


The list of cracked hashes is almost certainly not complete, one can conclude from this fact.

-----


Got a link to the file? I haven't been able to dig one up

-----


The hash of my password, set when I joined on October 10 2011, appears not to be in the list. Changed it anyway.

-----


Likewise, my password (MybXy836YCza), which wasn't used anywhere except my LinkedIn account created 29-Jan-2012, and has been stored securely at my end, wasn't on the list (either as a full SHA1 sum, or as part of the SHA1).

As you probably guessed from the fact that I posted my old password, I changed it just in case the list that was shared is only a partial list of what was obtained.

-----


Nice observation dude, Can u please share the password file I dont have it anywhere. Thanks

-----


So where is the list? I'd like to see whether I'm on it.

-----


fwiw, this could also be an elaborate hoax, given this facts.

E.g. a list of simple password + combinations of the above simple password+"linkedin" variations.

-----


I have a very unique strong password on LinkedIn, and it is on the list. Given that, this is no hoax.

-----


Same here. Sucks too, because I liked that password.

-----


My complex unique password is also on this list (full hash no 5 0's). So nope, not a hoax. Unbelievable/insulting they didn't even bother to salt.

-----


Yeah, even I, a newbie Rails programmer, going through the Agile Rails book learned how to salt. It isn't rocket science.

-----


It shouldn't just be a salt. It should be bcrypt.

-----


Do you remember when you first used this password at LinkedIn? It could help narrow the dates of the breach. Especially useful would be the presence of a strong password in the list that was subsequently changed. That might help determine its freshness, if the new password isn't present (although this may be an incomplete list from an ongoing breach).

-----


I'm thinking this list is from closer to a year ago, I changed my password shortly after the MtGox hack last year and this hash is for my old password that was compromised during that time period.

-----


My password is in the dump, and it was changed mid October 2010. I remember because I changed all my passwords when my laptop was stolen.

The MtGox hack was in June 2011.

-----


It was about a year ago now. I checked the hashes for my previous password and it wasn't on the list... Mind you, as many have noticed, it seems to be very incomplete.

-----


Unbelievable/insulting they used a general purpose, easily reversible hash like SHA1 in the first place. I would have thought everyone had seen the 'use bcrypt' page by now.

http://codahale.com/how-to-safely-store-a-password/

-----


Since when is SHA1 easily reversible? Did I not get the memo?

Salting should have been fine.

-----


I couldn't find my password on the list and I've been using the same password for LinkedIn since I registered. I was trying to remember when was that. If someone know how to find out the last time you changed your pass or when you registered for linkedIn please let me know. I'd guess I use linkedIn for over 4 years at least.

-----


A "member since" date is available on the "Account & Settings" page. Choose "settings" in the drop down that appears when you hover over your (account) name in the upper right corner of any LinkedIn page.

-----


I agree, I've tried several passwords and they match. If you're a Math person, please shed some light on the chances that this list covers the full space.

-----


I'm not a math person either, but here's some fodder for someone who is.

Mark Burnett's extensive password collection (which he acknowledges is skewed, because it's largely based on cracked passwords, he only harvests passwords between 3 and 30 chars, etc.). Here's how some of his stats shake out:

* Although my list contains about 6 million username/password combos, the list only contains about 1,300,000 unique passwords.

* Of those, approximately 300,000 of those passwords are used by more than one person; about 1,000,000 only appear once (and a good portion of those are obviously generated by a computer).

* The list of the top 20 passwords rarely changes and 1 out of every 50 people uses one of these passwords.

So it's conceivable that 6M unique passwords could cover a very significant portion of a 120M user namespace.

Ref: http://xato.net/passwords/how-i-collect-passwords

-----


It's neat that the hashes are unique enough to serve as their own key. Obvious in retrospect, but still neat.

Curious why some of the hashes have been obscured with 00000 but not all. It means more than one possible password could generate the remaining characters, but what does that help or protect?

-----


6.5 million? Off the top of my head, assuming that passwords are only letters and 5 characters long this still wouldn't cover the possible space. [I think it's safe to ignore hash collisions]

Are you trying passwords you've used on other sites, or random ones? If it's the former, then LI might not be the only source for the file.

-----


0. There are known cases of peoples' passwords (including my own) not on the list.

-----


"We were curious what would happen to our share price if our company did something incredibly stupid"

The above comment might seem incredibly harsh, but really, there's no good excuse for a site this prominent to not have a salted, secure password hashing system. Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.

The only way I could regain respect for LinkedIn is if we find that these unsalted hashes were from users who never logged in to LinkedIn after the security upgrade. From the replies of other HN users who have found their password hashes in the leaked list, this doesn't seem to be the case though.

I can understand database leaks. Bad things happen. Not being prepared for such an event however is where I draw the line. These leaks impact users far beyond just the site at fault.

It's not enough to say users should use LastPass. They don't, and that's the world we live in, for better or worse. If computer security doesn't take into account problematic users, then it's flawed computer security.

-----


Surely just hashing the username|password would massively reduce the effectiveness of leaks like this? Sure, a hacker would know what the "salt" is, but since it now varies between users you would expend the same amount of effort breaking one person's login as you previously would spend breaking everyones (on average).

(Not recommending it, just wondering if my reasoning is correct.)

-----


I hear this commonly, so it is a good idea to clear it up.

Usernames have lower entropy than a random salt and are predictable in many cases. People re-use usernames and some usernames are common. If your password system became common on the web, or if I knew the workings of your password system (i.e. open source / leaked codebase / Kerckhoffs's principle[1]), I could generate a rainbow table for either common or targeted users. This means I could generate a rainbow table for "Jabbles", gain access to your password and compromise your account before the website is likely even aware of a breach or has time to warn you. Salts only act to slow down, not prevent, compromising leaked password hashes (as you can always brute force which is quite practical with MD5/SHA1). Thus, using a username defeats one of the stated purposes of salting.

It's also said ad nauseam (with good reason) but rolling your own in security is a bad idea, especially when libraries exist that do exactly what you'd intend to do just as easily. Algorithms such as bcrypt and scrypt exist and are well vetted. bcrypt is easy to integrate with many languages and provides a trivial interface and sane defaults for iterations/rounds [brute force] and salts [rainbow table]. bcrypt can also handle increasing the security of your system over time as the metadata is stored as part of the hash.

tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.

[1]: http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle

-----


Bcrypt takes two lines of code to securely test passwords and two lines to created the hashed password, both of which come in the documentation.

There is every reason to use it and none not to.

-----


Often people say "Don't roll your own security" but the reality is that developers aren't trying to roll their own. They are trying to solve a problem, and if a quick google doesn't turn up a good library then they'll try and figure it out. Googling for password security implementations is likely to be fraught with horrible horrible advice.

I guess what I'm saying is that it's not enough to say don't do it, instead the defaults need to be there (and very visible).

-----


I think we've reached a point with bcrypt that a good secure password system is within reach and comes with sane defaults and ease of use as features for most programming languages.

If it's just an issue of getting the word out there, then I'm hopeful things can improve.

-----


You need more than just bcrypt. You've hinted at other things, but a few random things popping in to my mind:

  * Preventing password logging (many web frameworks log parameters)
  * Secure password recovery
  * New alternative attack vectors (eg. Facebook, Twitter auth)
  * XSS and CSRF
There are so, so many simple to make security errors, and worse - many of them are inter-related so that forgetting one will make another vulnerable. This is why you need safe defaults and more Security education.

-----


A strong password hash doesn't gate on any of those things, so, while you do indeed need to pay attention to them, you don't need to pay attention to them before you deploy a strong password hash.

You should deploy a strong password hash immediately.

-----


True point and this is probably off topic, but out of curiosity, what is the recommended approach for his point about logging messages/requests?

On previous projects, we've gone through all sorts of machinations to detect a password in our SOAP logging. This usually involves XML parsing (slow, ineffective on malformed messages) and Regexes (ineffective on malformed or "unusual" messages).

I can't think of anything better, short of "you can't leak what you don't log" which is nice in theory but not always practical.

-----


There are defaults bcrypt and PBKDF2. There is no excuse for anyone to do anything less than salted hashes even if the decide not to follow bcrypt or PBKDF2.

-----


Having a password salted with the username fairly easily balloons out the complexity of building and searching a rainbow table by a factor of the number of usernames you want to be useful for. This factor is larger then you'd expect, given the sheer quantity and variety of usernames in various systems.

For a targeted attack it really doesn't matter as the time complexity to produce the rainbow table is equivalent to that of simply brute forcing the hash, ie, you can't say 'well assume the rainbow table contains only some small number of usernames"...

It also is entirely unlike the WPA2 rainbow tables in that you don't have millions of users all sharing the same username (ie. factory default SSIDs).

Overall it's more secure then it seems at first glance but you still have to ask yourself why you'd use that over a random salt.

-----


The targeted attack does matter though, for the reason I pointed out above.

I can produce a rainbow table offline before I compromise the targeted system as I know the username of my target. This is not possible if the salt is random. This means I can crack a targeted user's password hash _instantly_ upon gaining access to the system.

With a random salt, you can only perform the brute force attack on that targeted user _after_ you've gained access to the system and likely alerted them to a compromise.

If the response time of the compromised system and team is a factor, this means using a username as a salt compromises your security greatly.

tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.

-----


Sure you can, assuming:

1) You know the hash function beforehand 2) You know that they are salting in exactly this way 3) You know how they are doing their salting (HMAC vs., vs.) 4) You have enough time to create this new rainbow table 5) You have only just enough access to the system to dump the hashes (ie. the easier routes are blocked off from you)

That would in fact, with some probability (based upon the complexity of your rainbow table and the complexity of the users password), give you the passwords for a particular set of users.

I did say that it was more secure then it seems, not that it was perfectly secure :)

-----


While not entirely random, would a "date based" salt work as well? Say, the date that the entry was added? This would still negate rainbow tables as a specific user entry needs to be targeted.

-----


It would probably work well enough, but... why not just add a proper random salt field that isn't tied to anything an attacker could guess? Is something like 8 bytes per user too expensive?

-----


Perhaps I'm missing something but... wouldn't you still need to store the random salt field somewhere in the database?

-----


Remember salts don't need to be secret to do their job. The goal is to change the algorithm slightly (by adding additional input) for each user. That means you can't mass-precompute (rainbow tables), and just look up what matches, you have to break each user individually.

Your reasoning about how salts work is correct.

There's also something called a pepper which is another additional bit of input data, that is only stored in the app code (fixed for entire app). So an attacker who only manages to get a database dump would need to guess yet another chunk of data (making it near impossible). So a well-seasoned hash would be SLOW_HASH(pepper+salt+password).

Security is all about layers. Each layer protects a bit more, or prevents things from being easy for the attacker.

Edit: Don't do this yourself. Know it for the theory part - but then just use a well-vetted library to do it.

-----


Please refer to my comment above. You can precompute a rainbow table if you know the username (trivial) and the method of hashing[1]. Whilst usernames as salts would increase security over no salt, it results in a potential exploit / vulnerability that would not exist if the salt was truly random. Hence, suggesting the use of usernames as salts is not wise.

[1]: http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle

-----


I read cschneid's comment twice, and nowhere to I see where he or she specifically recommends using the username as a password; he or she simply recapitulates the logic behind using a unique salt value for each stored hash, and describes using an additional non-unique value which is not stored with the passwords ("pepper"), which is a new and interesting idea, at least to me.

-----


Re: pepper - The devise plugin for Rails uses it. The idea is that the attacker must now steal both the app code AND database, which are often on separate servers.

Just make their life harder.

-----


It would make it a lot easier for LinkedIn to identify whose hashes were leaked because with a salt, all passwords would be unique. It would also make rainbow tables useless.

But in this day and age, the bigger problem is how fast you can compute the hashes, salt or no. With GPUs you can calculate a few hundred million(depending on the hashing algorithm) per second, making the algorithm used the real vulnerability.

Best practice involves increasing the calculation time of you're algorithm. Theoretically, you could just rehash y few thousand times in a loop, throwing in a salt here and there, but practically, you should just use bcrypt or scrypt.

-----


A few hundred million? Try in the billions. Like 33.1 Billion/s for md5. http://blog.zorinaq.com/?e=42

This is why you don't use really fast hashes for passwords and you iterate (key stretch). Bcrypt like you said.

-----


Please don't downvote posts like the parent. It's a legitimate comment, asking a question, if you have something to say please reply.

-----


In a password hashing scheme with a salt, you're supposed to consider everything except the cleartext to be public, for the purposes of analysis. The password should be unrecoverable even if the attacker knows the algorithm and any salts.

-----


It's true that that would be an improvement, however we try to avoid discussing things like that seriously because of the risk that someone new to the game will actually try to do it. The easy answer is to use an out-of-the-box secure password strategy, anything else is adolescent.

-----


We've just checked everyone's passwords around the office. One of them was in the list, and he has accessed the site the past month.

-----


Could be that he shared a password with another account that hasn't? Wishful thinking most likely.

-----


Regarding requiring users to log in; wouldn't it be better to run their current hash through another password hashing scheme (while we're at it bcrypt, scrypt, PBKDF, etc)? Then, the next time they log in, verify them by running their password through the old algorithm, and the result through the new one.

-----


That could be a good transition strategy if you're worried about being compromised before all your users have logged in again, but you would still want to move them over to using just the new system when they do. It probably would be fine, but when it comes to crypto you don't take chances when you don't have to.

-----


Yep. Here is a treatment of that: https://gist.github.com/1051238

-----


>> Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.

In thinking about this, I wonder if in that scenario you'd even have to wait until next login. You could just use the weak hash as the input to your salted hash function and keep a flag of whether or not you need to 'pre-hash' the password before using your v2.0 salted hash. As users log in you could replace slowly replace the double hashed entries with single salted hash versions and flip the flag.

-----


What do you recommend users do instead? Unfortunately there will probably always be websites storing passwords in unsecure ways. I mean I'd certainly rather not have to deal with the hassle (however small) of using LastPass, but as you said, that's the world we live in. Hoping for competence by the writers/maintainers of websites is also flawed computer security, is it not?

-----


Hoping for competence is indeed flawed from both sides. I would hope users use distinct, random passwords for each site they visit and that developers store those passwords in a safe secure way. I also assume both sides won't listen to logic however :)

The reason I'm annoyed with this particularly is that larger sites are more likely targets due simply to their size. Larger sites generally have the developer resources to provide a good solution to the problem from their end but commonly don't.

This makes them look bad and means their users are left in more danger than before. No-one wins.

-----


Perhaps the worst part is that they either didn't know about the breach (likely), or didn't tell anyone (hopefully not).

-----


They just tweeted this: https://twitter.com/LinkedIn/status/210356987576324096 - "Our team is currently looking into reports of stolen passwords. Stay tuned for more."

-----


And a follow-up: "Our team continues to investigate, but at this time, we're still unable to confirm that any security breach has occurred. Stay tuned here."

https://twitter.com/LinkedIn/status/210390233076875264

-----


If people are finding their unique password's hashes in the database, that's pretty damning evidence that a security breach has occurred.

-----


Am I glad that I use LastPass and have a different, 12-character password for every service?

Why, yes, yes, I am. I've now changed my LinkedIn password, too, just in case.

-----


What's kept me away from such solutions are these questions: How can you trust one service with all your passwords? What if their configuration has a vulnerability?

-----


KeePass works well too - open source, offline solution that has an "Autotype" function. I actually only run into passwords that are a pain on mobile devices. Now that my Android phone has no keyboard but tons of power, that's becoming more and more significant.

-----


I use keepass too. I keep my database in dropbox and use the android dropbox and keepass clients on my android. Logging into an app or website involves opening dropbox, clicking on the database[1], entering my password, choosing the site, and clicking on "copy password to clipboard." It's a few extra steps, but it's not that much of a hassle.

[1] I find this easier than opening keepass and selecting the database from dropbox for some reason that might be as simple as dropbox having an easier to spot icon.

-----


You can also use the favorite feature on Dropbox to keep a fresh copy of the database on your phone and have KeePassDroid remember that location. Then your flow is 1) open KeePassDroid 2) enter password 3) select site 4) copy/paste

-----


You know there's a KeePass app for Android right? I sync my KeePass db between Windows, Linux, and my Android phone using DropBox. Works great.

-----


The enter (long alphanumeric and symbols) password/copy/paste/switch window was a little clunky in Android 2.2. Little better in ICS, so need to get back to using this.

-----


One more KeePass user here (actually KeePassX). But I'm using it only for not my own passwords, provided by others and so on.

For my personal ones I'm keeping few algorithms in my brains. I'm using resource type (website/some server/device) and name (e.g. domain/model) as variables and after few steps in my head I always have different password for each kind of service.

-----


Use open-source tools such as SHA1-Pass. The passwords it generates can be recreated with openssl and any other standard crypto library.

Edit: I wrote SHA1-Pass, so I'm biased, but I know what you mean about having trust issues with closed-source password tools. That's one of the reasons I wrote it.

-----


I use open source tools such as "pwgen", "emacs" and "gpg". Open up the encrypted file in the editor, type your pass phrase if you haven't this session, cut and paste, close file. The built-in keyboard navigability makes this faster than everything but the in-browser form filling.

-----


You might consider renaming it. I've been looking for several minutes and can't find it via that name.

Is it this: http://manpages.ubuntu.com/manpages/natty/man1/sha1pass.1.ht... I don't see how you would use this the same way you'd use the other tools mentioned here. I can imagine a way, but it's no where near as convenient and still has it's own major usability problems.

-----


I've always wondered this about services like lastpass.

What stops being hacked / keyloggered and them exfiltrating all your long, complex passwords?

-----


Nothing, really. However, I trust the LastPass guys to keep their shit secure as much as I trust myself to keep my own system secure.

After all, if my own system is compromised, I just get a lot of hassle. If LastPass ever gets hacked and leaks their passwords, they lose their business overnight. That's pretty good motivation for them to keep on top of their stuff.

I used to use 1Passwd, which stored the passwords in a local file, and that could be said to be marginally more secure, except that it generally uses something like iCloud or Dropbox to sync the passwords, so there's still a single point of failure... The main reason I moved away from 1Password was that they gave me a shitty response when I asked them if they were going to support Chrome. I decided at that point that I didn't want to give them my money anymore, and so I didn't upgrade to 1Password 3.

-----


The big difference between "hosted service" and "encrypted file in the cloud" is that the hosted service has, by definition, to store the key next to the lock to be practical.

The key for your encrypted file stays in your head (and/or in your wallet), so even a full-on total breach of Dropbox/iCloud, your key is safe, and 8 million rounds of 265-bit AES and a good password (my current KeePass settings) is still unbreakable[1].

1: Unless (perhaps) you have the attention of certain governments. And they always have the option of using a $5 wrench on you, anyway.

-----


As far as I know, LastPass does not "store the key next to the lock."[1] The browser extension encrypts/decrypts locally. If you use your password file through the web site you're still downloading your encrypted DB from them and encrypting/decrypting locally (whether with the extension, or I believe they also have a pure JS implementation).

[1] Or so they say. I've never MITMed their SSL, and their software is not open source AFAIK. This is not to say someone couldn't e.g. distribute a trojaned version of their browser extensions. If you poke around the developer(s) have at least revealed the encryption method for the your DB so you can verify how it is encrypted for yourself, which is a good sign if nothing else.

-----


Why can't the hosted service use an "encrypted file in the cloud" as its implementation? As long as it requires client-side code to do the decryption, the key stays in your head alone.

-----


I believe this is exactly how LastPass is implemented.

-----


Ugh, you're right. Well, then there's no discernible difference between LastPass and KeePass with the DB on Dropbox.

-----


> except that it generally uses something like iCloud or Dropbox to sync the passwords, so there's still a single point of failure

No. This is the strength of two-factor authentication, something you know, and something you have. If someone gets your 1Password keyfile, it's useless without your decrypting password.

-----


I use 1Password, rather than lastpass. On that system, your password file is stored locally by default, so their isn't a centralized password store to attack. If you do syncing of passwords between machines, you keep an encrypted password file in your dropbox account.

-----


I think it's a risk with a solution like this, but much less of a risk of having to remember all these passwords myself (a practice which tends to devolve to re-using passwords).

-----


Then use 1Password - it's much nicer designed and you don't have to trust their servers too much

-----


This is why I use 1password and not LastPass - the encrypted password file is stored locally - optionally in Dropbox, which is what enables moble and remote (http online through Dropbox) to work.

Works excellently!

-----


LastPass encrypts your passwords using your master password as (at least part of) the key. This means that they do decryption of passwords client-side as well. The entire password file is not stored locally but they had an intrusion of some sort a number of months back which demonstrated that they have a pretty good system set up along with quite a bit of monitoring. Truecrypt in dropbox is obviously a good choice if you're super paranoid but after seeing LastPass respond to security really well and it having an overall pretty simple UX, I don't have any reason to not recommend it.

-----


The LastPass UX is anything but simple

-----


I use KeePass right now synced with Dropbox - what keeps me up at night is the fact that if the bad guys got my password file today, there could turn out to be a vulnerability in it discovered years from now that could allow them to get my password.

-----


At least you're going to have years to go through your database and change all your passwords.

-----


I'm even happier I don't have a LinkedIn account.

-----


I've been tempted to delete mine several times recently. Looks like now is the moment.

-----


You're free to hit "delete" on linkedin, but there's a very high likelihood that it will only mean "hide my profile". Anyone who got your user/pass would probably be able to reinstantiate your account and do anything to it they wanted.

-----


agree; don't just delete, suicidemachine.org

-----


I took the step of markedly decreasing the information on my current legit profile. It includes my name and general title, but no job history. Public disclosures of connections, etc., are highly limited.

Having a fictional LinkedIn account can be amusing.

-----


I'm worried in a few years LastPass could become a target, and now instead of someone having a password that 'could' be shared among your multiple accounts, you have now given the complete keys to the city by listing all of your logons great and small in a central repository.

This central repository then becomes a very appealing target.

I say this as a LastPass user, as I think it is the best of the current offerings, but I'm uncertain how to shield this huge central list. I wish it had multiple logon PW so that you could at least segment the risk and reduce the time the high PW is used to when you really need it.

-----


It saddens me that every, single, time this topic comes up, HackerNews, of all places, displays an immense lack of knowledge of current password storage applications, how they work and what value they bring.

I think it's really humorous that people feel safe putting an encrypted file in something like Dropbox, but don't trust LastPass (who are doing the exact same thing, everything is local, client side encryption). Especially when you're missing out on all of the benefits of browser integration.

Please, take a whole 3 minutes and do a tiny bit of research. Your future self will thank you when people like swombat and myself get to laugh at LinkedIn, change our passwords and never think about it again.

-----


I think the difference you're missing is that LastPass offers the OnlineVault option.

I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.

If Lastpass.com is compromised, the attacker can MitM compromise my credentials. If 1Password.com is compromised, that is not the case. (Yes, if Dropbox is compromised, they could capture my dropbox credentials, but it would be more difficult for them to then capture my 1password credentials)

Ref: LastPass Online Vault: http://helpdesk.lastpass.com/full.php 1Password Anywhere: http://help.agile.ws/1Password3/1passwordanywhere.html Services I use, and why: http://www.mikeschroll.com/blog/2011/12/07/services-i-use-an...

-----


>I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.

You can't even do that. You have to install a local client. Download the file, open it in your new client, edit it, manually reupload it. If you don't want to use the on-web LastPass vault, then don't, but it's still doing local decryption and you can still used the signed Chrome extensions to carry out ops if you don't trust LastPass.com proper.

>If Lastpass.com is compromised, the attacker can MitM compromise my credentials.

Which part of "local, client-side encryption" is confusing?

edit: 1PassAnywhere is the exact same thing as what LastPass is doing with it's LastPass.com-served Vault.

edit2: There's even multifactor auth available for it and the Online Vault feature.

-----


I apologise for my immense lack of knowledge of current password storage applications (i'm not a programmer and come here for the other stuff), but what is the benefit of these services (lastpass etc)? This is a genuine question.

It seems to me that instead of having several passwords in my head (i can remember random long strings of characters pretty well, and have a heirachy of randomness/longness depending on what I care about), I only have to remember one. But if that one's compromised, aren't all the rest then available?

Reminds me of the bit in hitchhikers guide to the galaxy (life the universe and everything i think) where passwords and biometrics etc had become really difficult and secure, so a datacube thing was created to store them all. Which was then found by a character before hilarity ensued.

thanks

-----


If someone has access to:

1. Your physical machine, or the LastPass/Dropbox server.

2. Your master password

3. (optionally) a second-factor auth source

Then yes, they have access to all your passwords. But this is vastly superior to having one password that alone compromised grants access to all of your accounts, right?

I mean, the most secure way imaginable would be perfect biometric signatures, or humans smart enough that they could perform asymmetric encryption in their heads to sign challenges in a verifiable manner. Outside of that, this is decentish.

You could use a text file in a Truecrypt volume with keys that are stored on separate jumpdrives (but what if someone compromises a machine that you plug those drives into), etc, etc.

-----


> - With a computer of 8,000 NOK (~ 1400 USD), you can do a few hundred million attempts per second.

Are you kidding me? LinkedIn stored their passwords using (salted) SHA1 using no iterations? Jesus.

-----


To expand on that, to store passwords don't just use salt+sha1, or try to do your own nested sha1, just use bcrypt: http://en.wikipedia.org/wiki/Bcrypt

-----


Better still, use scrypt. HN's very own @cperciva wrote it.

http://www.tarsnap.com/scrypt.html

It requires a lot more memory to brute-force it, thereby defeating any speed gains from parallelism.

-----


The reason people are going to use bcrypt is that there are more likely going to be bcrypt implementations in their given language: http://stackoverflow.com/questions/10149554/are-there-any-ph...

I just had to make this choice a few days ago and bcrypt seemed like the best option with working PHP implementations. And I sure as hell am not going to try to roll my own.

-----


The responses to that stack overflow question makes me want to punch somebody in the face!

The #1 google response for info about Scrypt for php now points to an article arguing about the semantics of the question with no answer. Classic!

-----


THANK YOU

"use bcrypt" has become an HN meme, with all the bad implications of it

As if scrypt, pbkdf2 didn't exist. Or as if bcrypt has always existed and doesn't have any weakness

-----


Please stop stirring up drama about this issue. While you are technically incorrect (PBKDF2-SHA1 is faster than and thus inferior to bcrypt), it's irrelevant: all three of [scrypt, bcrypt, PBKDF2] are just fine, and you can safely pick one at random.

-----


If a database of bcrypted passwords from LNKD had been leaked, we'd be having a totally different conversation right now. (Same, of course, with scrypt etc.)

-----


Like you can't break bcrypt...

The weakest link, either in bcrypt or MD5 is the password quality.

Of course, in pure MD5 today you're a google search away and modern computers can eat salted MD5 for breakfast

But the easiest passwords are going to be broken first

-----


Am not a cryptographer by any means, so please correct me if I'm wrong:

If you use any reasonable cost for bcrypt, you're talking hundreds of milliseconds per attempt on a modern CPU. For each 6-character password (since you can't generate a rainbow table) at 100ms per pop, you're talking about something on the order of 2+ years per password divided by the number of CPUs. With something like 900 CPUs running continuously, you could expect to recover one 6-char every day if the passwords were randomly distributed in the 6-char alphanumeric space. So, pretty feasible, assuming a 100ms cost. Short passwords do hurt you; I agree.

Now for 8-char alphanumeric passwords, you'd have to run ~1 million CPUs continuously to expect to recover one per day at a 100ms-per-pop cost. This is more of a stretch, assuming you're trying to do this with, e.g., botnets. It seems that someone asking for help cracking a password list on a forum would probably not be able to assemble this much computing power.

Or 1 billion CPUs continuously to expect to recover one 10-char alphanumeric password per day.

Of course, the assumption of random alphanumerics is wrong, both because many people will use common passwords and because others will use non-alphanumeric character substitution.

At any rate, it seems to me that leaking non-salted SHA1 hashes is virtually the worst case disaster scenario, short of plaintext passwords.

-----


I didn't do the math but it sounds right.

But suppose tomorrow it takes 10ms. Also, tomorrow, available spaces will increase, so the likelihood of a space vs time tradeoff (even partial) increases

WEP was considered "good enough" at first (even though it had obvious problems at first like key size), WAP was considered unbreakable at first, today it's feasible with cloud computing or GPUs.

And then we'll be complaining on HN that they didn't use xyzcrypt or something instead of bcrypt.

" it seems to me that leaking non-salted SHA1 hashes is virtually the worst case disaster scenario"

Yes. Salt is password storage 101!

-----


The time bcrypt takes is configurable, so in the future you can adjust the amount of work per password -- this is literally a one-character change in your code -- and be alright again. Ditto for the rest of the decent password hashing schemes.

-----


True, but you can't really bet on that

Sure, you can increase the work, but you'll still be limited by bcrypt size

Otherwise, you could just MD5 hash stuff X times and be done with it

Sure, bcrypt today is very safe, but I wouldn't be surprised if attacks are found today (even if they rely only on bruteforce)

And let's not forget implementation issues that may happen in specific bcrypt libraries

-----


I think you are propagating the myth that a scheme can be secure forever.

It's ok if WAP is breakable with cloud computing, because the whole point was to secure it for the next X years so that it takes more than Y dollars to break it. You only need to protect million dollar data enough that it costs 10 million dollars to get it.

If the data is valuable enough and protected heavily enough with crypto, the cheapest way to get it is through a meatspace attack (break-in, abduction, etc).

> WEP was considered "good enough"

Not by security professionals once they saw the effective size of the key. It's the downgrading of what looked like a 64bit key into a 48bit key that was the biggest problem.

-----


You are not wrong at all.

-----


The math doesn't sound right. Google allows any ASCII character for their passwords, which is 95 chars. I calculate 2330 years to crack each password. Did I get something wrong?

(95^6 * .1sec per hash) / (60sec 60min 24hrs 365days)

The key difference is bcrypt does ~10 hash/sec. A GPU-enabled password cracking machine can do over 500 million hashes per second. That generates a rainbow table in ~30 minutes.

-----


These hashes were posted on a forum as a plea for help: the guy did not have enough computational power to crack them all on his own. Had they been salted bcrypt hashes, it might have actually discouraged him to the point of not even trying.

So yeah, the weakest passwords will always fall, but good solutions will go to great length to protect even the most clueless of users.

-----


What weakness does bcrypt have?

-----


I wonder, why do people saying "just use bcrypt" never, ever bother to elaborate on what benefits it has, and which of them are relevant to the subject of the conversation? Believing in some function without understanding implications of its use does very little for real security.

-----


Bcrypt does not require your understanding. The most important thing is that you use a strong password hashing method -- of which bcrypt is the best-known, and an excellent choice. For a basic level of understanding, here's a slightly exasperated blog post that a lot of people link to:

http://codahale.com/how-to-safely-store-a-password/

-----


Because the answer to your question is one Google search away. HN people are tired of explaining it every single time bcrypt comes up.

-----


Also, there's already an answer in the thread http://news.ycombinator.com/item?id=4073839

-----


It's not an in-depth answer. It does not say, for example, why bcrypt is more secure than nested SHA1. (I believe it has to do with the possibility to efficiently implement SHA algorithms in GPUs.)

People are using unsalted SHA1, because someone told them in the past "just use sha1". Now someone else tells them "just use BCrypt". Without understanding why, it's nearly impossible to to decide which security policy is sensible. There are many different types of advice competing for attention, and not all of them are good.

-----


Somebody once said fire was composed of phlogistons. Later, different people said that fire was instead a process of decomposing fuel molecules and a release of visible light due to the energy of the chemical chain reactions taking place inside the flame.

The guy who said "phlogistons" was wrong. So was "just use SHA1" guy.

-----


http://lmgtfy.com/?q=%22use+bcrypt%22

-----


I wonder why people who make this complaint never ever bother to google: "why use bcrypt". It's like they somehow forget they have the best magical oracle to answer questions at their fingertips, which can answer the question better than most people who understand bcrypt could.

-----


Could you explain what you mean with "nested sha1" - hashing it twice? How is this safer than sha1 + a good salt?

-----


stef25, this is known as key stretching, as others have already explained elsewhere in this thread. Essentially the idea is to make computing the final hash of the password slower by iterating the hash function many times.

This additional slowdown is unlikely to be noticed by a user during an interactive login (hashing the password may take 1ms instead of 1us -- an imperceptible difference to a human) but it dramatically slows down the speed at which an attack can compute hashes to try and recover the password for a leaked hash. It also increases the amount of storage space required for (a naive implementation of) a rainbow table since the attacker would need to store the output for 1, 2, ..., n iterations of the hash function.

-----


I'm not familiar with iterations, anybody care to clue me in? I would have thought salted sha-1 would be decent for password hashing, though not the most solid possible, but at least not laughable. Is that not the case?

-----


It is not. Sha1 is designed to be fast. You want your password hash function to be slow, so that an attacker has to spend as much resources as possible to brute force it.

Of course, it does not mean you should take a slow implementation of a fast hash. You need a hash that, when implemented to be as fast as possible, still is pretty slow.

-----


good to know, thanks

-----


If you use the ASP.NET Membership that's what you get, unless you do something custom.

Edit: they are salted though.

-----


No they didn't salt them either haha!

-----


I've just downloaded the database linked and it only contains the hashed passwords, not the account usernames / e-mail addresses.

I wonder if someone has the account details to match up otherwise you've no idea which password belongs to who, and you'd hope that LinkedIn would have lockout functionality.

-----


Keep in mind that whoever leaked the hashes is probably keeping the usernames / emails for themselves. The forum in question doesn't allow posting of user-identifiable information according to the forum guidelines.

The leaked hashes seems to be SHA-1. I've also confirmed that the hash of my own (semi-complex) LinkedIn password is in the list. Accidentally this is the same password as I had for HN and that I've now changed (phew! THAT'd been bad! :-)

-----


Doesn't this imply that LinkedIn doesn't salt the password prior to storing it. So then a good chunk of those passwords will be in a rainbow table.

-----


Yes. The hash I calculated was without a salt (the same way you generate a hash on sites like http://darrenfauth.com/generators/sha1)

-----


You can get reflected XSS in that field. Paste "<script>alert('XSS')</script>" in the "Value to sha1" input box.

Darren, you should check out output encoding.

-----


With these sorts of simple hashes, you don't need rainbow tables when you have a few GPUs and OCLHashcat.

-----


It would still take a moderate amount of time for a single password if it's long and complex -- you're essentially generating the rainbow table. You might as well just download a sha1 rainbow table and just perform a O(1) lookup. You could reverse all the 6.5M password hashes in mere seconds.

-----


Actually, for a large enough list of unsalted password hashes, bruteforcing is faster that rainbow tables:

- a rainbow table may require a constant amount of time to reverse 1 hash, but it has to be repeated N times for N passwords.

- when bruteforcing, a password candidate can be checked against N hashes in a constant amount of time (look up the candidate hash in a hash table)

For example if it takes 10 minutes to look up a hash in a very large rainbow table (such as the A5/1 GSM tables published a few years ago), it would take 123 years to attempt to reverse these 6.5M hashes. On the other hand, millions of the leaked SHA1 hashes can be cracked in mere hours on a GPU with oclhashcat which tests billions of candidate hashes per second.

-----


true, for extremely large rainbow tables. SHA1 tables are around 20-60GB depending on how large your base character set is. If you shoved all this data into a giant database, query speed is still under a few milliseconds. In general, rainbow tables can be sharded fairly easily, so if your data set is a few hundred terabytes, just split it across a few machines and you'll retain the millisecond query times. Storing and querying easily partitioned data will usually be faster than a brute force calculation.

Calculating it is like saying you want to find the fibonacci number for any given N, and you have a really fast processor to calculate it to that N, but if you just persisted pre-calculated values up to C, you'd only need to calculate N-C hashes. So even if you are bruteforcing the password, it is still faster to have rainbow tables up to a certain length.

-----


What I say is true for any size of rainbow table. It seems you forget that RT lookups require CPU resources in addition to mere I/O resources. There is always a number of hashes beyond which brute forcing them is faster than RTs. Sometimes this number is very high (billions of hashes), sometimes it is lower (thousands of hashes). It depends on many factors: RT chain length, speed of the H() and R() functions, speed of the brute forcing implementation, etc.

To take your example of a small SHA1 rainbow table of 20GB, assuming it has a chain length of 40k, looking up a hash in it will require on average 200M calls to the SHA1 compression function (assuming a successful lookup). A modern CPU core can do about 5M calls per second. Therefore looking up one hash will take at least 40 sec, and looking up these 6.5M LinkedIn hashes would take 8.2 years! (This is just counting CPU time, I assume the RT is loaded in RAM for a negligible I/O access time to its data.) A RT of this size would cover a password space of about 2^44. For comparison a decent GPU can brute force this many hashes concurrently at a speed of roughly 500M per second (see oclhashcat perf numbers on an HD 7970). Covering the same password space would take only 9.8 hours. Compare 8.2 years vs. 9.8 hours: obviously the LinkedIn hashes that have been cracked so far have been brute forced, not looked up in RTs!

And even if you leveraged GPUs to perform RT lookups, they would speed up the computations by roughly a factor 100x, reducing the 8.2 years down to 30 days, still unable to match the short 9.8-hour brute forcing session. (My friend Bitweasil is doing research on GPU-accelerated rainbow tables, see cryptohaze.com)

-----


As a more general question: why is it not an industry standard to salt with the username/email in addition to the random key? (i.e. Sha1($salt + $email + $password)). Even if the random salt were excluded, I would think that this is much more secure. Existing rainbow tables would not be anywhere near as helpful, and attempts to generate a rainbow table for a specific salted database would be ineffective because the salt changes on a per-user basis.

-----


The solution is to use a better method of storing passwords. Hashes like SHA1 are designed to be really fast (great for hashing data but also great if you want to brute force).

I think this is a pretty good overview: http://codahale.com/how-to-safely-store-a-password/

-----


Then the password has to be updated whenever your email changes. I believe Amazon does it like that, literally "forking" whenever you change password; at one point it was possible to simply log on with the old password and live an "alternate reality" where all changes you'd done after changing pwd had not been applied. Don't know if it's still the case today.

-----


Why would you use the email? Mostly when passwords/usernames are stolen the email is there too. For my site I have an unique 128-bit token for every user. I also have a 128-bit site_key (which is in the application, not db) and mix those with the password and then hash.

-----


The economics of password crackers changed and rainbow tables are pretty much obsolete nowadays. See http://www.codinghorror.com/blog/2012/04/speed-hashing.html section "What about rainbow tables?".

-----


Interesting - I wasn't able to find the hashes of any passwords in the list. What list were you using?

-----


The rar with ~100k cracked passwords in it. If you tried to find your own, perhaps you're one of the ~144 million accounts that wasn't published?

Edit: I'm not sure I understand what you mean - there was 100k passwords in one file, already cracked, and another with all 6.5M hashes. I found my hash in the hashes file.

-----


Ah, I have the 6.5M file. Not sure why I'm not finding stuff from my wordlist in it, but I do see things from e.g. https://twitter.com/mikko/status/210341669944573955. Sorry for the confusion!

-----


Oddly, mine isn't in the leak despite the fact that I just logged in with it.

-----


LinkedIn could easily match each hash to a user. Then they should lock each of those accounts and force them to change their password.

-----


Which should be done, but which doesn't help those users where it matters most; the real value of this database is that some people (~everyone) reuses passwords across sites.

-----


And send them a note too, sure. They've got their e-mail addresses as well so a note of apology and warning is certainly in order.

-----


From the looks of it, the data dump may be all accounts - since there seems to be no salt, and many people use same passwords...

-----


You can use it for checking whether your password was leaked. You don't need usernames for that.

-----


Are the hashed passwords not salted?

-----


You can perform this check even if they were salted.

Otherwise how could linkedin check if you correctly entered your password?

The salt is contained in cleartext as part of the hashed password, so that you can repeat the hashing the secret and match the two hashes.

The salt improves the security because:

1. even if two users use the same password, you cannot tell that by simply comparing the hashes

2. makes brute force checks much slower because you have to recompute the hash for every hashed password entry rather than once for every dictionary entry

3. Prevents building rainbow tables

(probably other reasons, I'm not a crypto expert)

-----


The salt may have been stored in a separate database table and not distributed with this list (if they were salted, which apparently they aren't).

-----


No. I was just confirming that myself when I saw madsr's comment: http://news.ycombinator.com/item?id=4073454

-----


To get a sense of it, I downloaded it from a link here. Below is the structure of the first few lines. Caveat: it's garbage/useless data below -- I intentionally changed around the actual numbers to give a sense of the structure, only:

000000a94d47b9cb82ca8a3b492a51263b40a66e 000000a98a624314892af97c6f1a0635472eae38 000000a9ba60e7f13fcac444a5a791af7807a3a3 000000a97ea34e74a97a6d1ce08ebc68d3e9aab2 000000a9b4b2a3497aaa51e212ac9efdb00aaf4e

-----


The pattern 000000a9 is just in presentation - I counted the occurrences of different bytes in that position (also misled by the apparent pattern, where many lines in a row would have the same 4th byte), and each possible value is present more or less equally often.

It seems like it's just sha1.

EDIT: however, 3.5 million hashes start with 5 zeroes, which is way too many for just coincidence. Possibly they used multiple hash functions?

-----


It appears that the publisher just zeroed out the first 4 digits in nearly all the hashes. The rest of the string still matches known hashes.

-----


Found this on reddit: http://www.reddit.com/r/netsec/comments/unubl/if_it_turns_ou...

My password it not in there, but some people have already reported finding theirs.

-----


Agreed. That seems rather useless. How would that happen anyway? The usernames stored in a different database/table from the hashes?

-----


They might need help cracking the hashes, keeping the usernames behind for their own exploits.

-----


Or they may use this as an advertisment for selling the actual dataset.

-----


LinkedIn allows you to sign in using any of your verified email addresses, so it seems likely that the usernames are at least stored in a different table.

-----


... but still, a head wag at LinkedIn for using weak hashing, which I'm guessing means MD5.

-----


MD5 isn't the issue - it's the lack of salting. Without a salt, almost any hash can be cracked with a rainbow table. With a salt, you'd need to know the salt for each hash, and then generate a new rainbow table, in order to recover the original password.

-----


This isn't really the issue. The real issue is that MD5 (though these hashes are SHA1, which has the same problem) are too easily computed; they are practically byte-forceable. I don't need a rainbow table to compute hashes when I can slam out millions in short order using a GPU. You have a good point about needing to know the salt, but getting the salt is generally easy because it's usually stored in the same place as the hashes (and this practice is fine, because hiding the salts doesn't improve security significantly on its own).

This is a major reason to use bcrypt.

-----


The difference is that if it's salted you need to work to get a specific password. Without salting you can test a generated hash (rainbow table) against all 6.9 million hashes at the same time.

Not defending the choice - bcrypt is obviously a much better way to go.

-----


The thing is, though, that it's trivial to slam through that set of salted passwords. It's like unsecured Wi-Fi versus WEP: "door unlocked" versus "'No Trespassing' sign."

-----


Let's forget about bcrypt for a second.

What prevents developers from adding a large DB-wide salt (in addition to normal salt) to every password? Wouldn't that prevent bruteforce attacks regardless of the hashing algorithm?

-----


Random nonces have very little to do with what makes SHA1 insecure and bcrypt secure. Developers have a very weird and totally misplaced faith in the ability of random "salts" to secure passwords.

-----


We're speaking about a very specific attack here: bruteforce. And I'm speaking about a very specific type of "salt" (which could probably be called something else, since it's not the same as normal unique-per-password salt): large, database-wide string of random bytes.

If every password is padded with such a string before hashing, computing the hash would be slower. Obviously, it would be slower because you would have to process more data. An interesting question is whether this would also make it less parallelizable by the virtue of having more information than would fit into GPU cache.

-----


None of this makes much sense to me, sorry. Brute-force password cracking has worked on salted passwords since Alec Muffett released Crack in the early '90s. The amount of extra computational power required to hash a password and a salt is negligible.

The only thing "salts" do is prevent rainbow table precomputation, but it's just a quirk of the late '90s and early '00s that "rainbow tables" ever became a mainstream attack method: one bad Microsoft password hash and a series of bad web applications. Long before the MD4 LANMAN hash was ever released, people were breaking salted Unix passwords with off-the-shelf tools, on much, much slower computers than we have now.

-----


Computing a hash on 1MB of data is slower than computing a hash of 6-8 bytes of data. Brute-force attacks are based on trying different passwords and seeing that after being salted they generate the same hash as in the database. Therefore, adding a large string to the password before hashing would force the attacker to hash that string. The question is, can this be pre-computed once or efficiently parallelized?

-----


You're advocating creating a 1MB "salt" string to slow down hashes? That's the same as simply iterating your hash function enough times to invoke the block function repeatedly.

Just use bcrypt, scrypt, or PBKDF2. People have already figured this problem out.

-----


First, I do not advocate anything here. I asked a question.

Second, working with a large string of bits is the same as recursive hashing only if you can pre-compute some small intermediate state of the hash function for that string independently from the password you're trying to guess. If you can't, you would have to work with the entire string for every new password tried.

-----


I answered your question: using a very large "salt" to force a password hash to run more block functions is a bad idea.

Modern password crackers are extremely fast without precomputing anything.

-----


How much slower would you estimate it being?

-----


1MB of data will have 16384 SHA 256 blocks. So that's roughly the slowdown I would expect, minus the time it takes to initialize the algorithm for a particular message.

That's not that interesting by itself, but it is interesting to think about how this would affect computing the hashes on GPUs.

-----


And how high can you crank the work factor for, say, bcrypt?

-----


Is there a significant time difference in computing the SHA1 hash of 40 bytes versus say, 128 bytes?

-----


128 bytes is not "large". I was thinking more along the lines of megabyte+. There is no question that it will slow down hash computations, because you would need to process more data. The question is, can you efficiently parallellize this in a commodity hardware (GPUs)?

-----


To be clear, MD5 (or SHA1 as these apparently are) is a problem. Passwords should be stored using a cryptographic hash function that is designed to hash passwords (read: be slow), not a generic cryptographic hash function (which are designed to be fast). This is exactly the problem that bcrypt was created to solve (among others).

-----


I think people are missing the point that SHA2 is light years ahead of MD5. MD5 has had known security flaws for years.

>Do not use the MD5 algorithm Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity.

http://www.kb.cert.org/vuls/id/836068

This is from over 3 years ago.

-----


The security differences between SHA2 and MD5 are irrelevant to the matter at hand. If they were MD5 hashes they'd be broken approximately as quickly and in exactly the same way.

-----


The primary problem with using either as a password hash is their speed.

-----


I agree, but my point is that the "use bcrypt" drum has only been beating for a couple years to my knowledge: http://codahale.com/how-to-safely-store-a-password/

Wind the clock back 3-5 years and it's still stupid to use MD5. I could kind of understand some old code laying around that was less secure.

-----


Still, it doesn't matter. As long as one can generate a rainbow table for the hash function, then password lookups will be a O(1) operation. The rainbow table for md5 is moderately small, sha1 is bigger, and I'm sure sha2 is even bigger than the sha1 table.

-----


I'm discussing SHA-2 vs MD5. I wouldn't use any hash function without a salt.... which makes the discussion of rainbow tables irrelevant.

-----


Good Guy Startup Founder would cross reference this password list with their own password system and force those that match to reauthenticate and change their passwords.

This wouldn't be difficult to do and your users would appreciate it.

-----


Better Guy Startup Founder would be using salted hashes anyway and wouldn't even be able to run a cross-reference.

-----


It's possible to test this when your user re-authenticates, assuming you're not using a challenge-response authentication mechanism (as sadly most sites do not).

-----


Google once forced a password reset for emails/passwords that leaked from a bitcoin forum.

-----


That's easy to do it you have the email addresses, but impossible to do if you only have the SHA-1 hash, as in this case (unless you're also using unsalted SHA-1 hashes, which is a much bigger issue by itself).

-----


Yes, it's technically easy, but it shows everyone how much Google cares.

-----


How would one cross-reference this list unless you're storing the plain text passwords?

-----


You'd do it at login time. User enters user/pwd -> hash with unsalted sha-1, check if in list -> if yes, alert to change / if no, proceed with normal hashing.

-----


Easy, just convert all the hashes into passwords using a rainbow table. Should only take a few seconds to convert all 6.5M passwords -- O(n) operation here. Then run all the passwords through each user's password algorithm, this is a O(n^2) operation. Essentially you're making 6.5M password attempts for each of your users. It could be slightly faster because I'm sure there are quite a few duplicates in 6.5M passwords.

-----


A SHA-1 rainbow table?

-----


What's wrong? They exist... they're bigger than md5 tables, but not significantly larger. If you don't have 50GB of free disk space, you could get a table with lower complexity for around 20GB or so.

-----


A cross-reference is only feasible in very bad situations: - no-salt or same-salt and same hashing - trivial/common passwords (password1 etc) - password(hashed/unhashed) and email are paired.

A cross-reference could be accomplished for all known cracked linkedin passwords, but this would be no different then you running a dictionary attack of known passwords against your own users... This seems very bad. Enforcing strong but sane password strength rules should mitigate this need.

Cross reference only has value if both the hash and email pairs are leaked.

The bitcoin leak fell into one of these very bad situations: - [<email>, <hash>] where leaked together - poor hashing (just sha1, no salt if memory serves) - unfortunate number of people reuse passwords

-----


The released passwords are hashed with SHA1. Assuming you use the same algorithm and linkedin does not use a salt (they probably do), then you could just compare the hashes.

-----


LinkedIn passwords are not salted. You can only make comparisons if your database contains unsalted passwords. And if both databases used salted-passwords, then you still can't compare unless you all shared the same salting key.

-----


You can't compare the hashes unless you have access to the clear passwords of your users. Unless you mean to do the comparison just as they log in. Seems like a lot of hassle for not much though.

-----


Or do it the next time they log in, when you temporarily have their cleartext password.

-----


Maybe he was implying that they and Good Guys Startupers use hashes from raw passwords. I hope that is not true.

edit: From reading comments bellow I learned that LinkedIn indeed didn't salt.

-----


you'd compare the hashes in your database with those from the file. The users with a hash contained in the file would be notified.

Because the passwords aren't salted(stupid), you might get multiple hits for the same hash(for example, for the good old "1234" password), meaning you might end up contacting more users than actually affected. Better safe than sorry.

-----


You can do this if you, like LinkedIn, store SHA1 unsalted passwords. You just look for matches.

-----


i agree, but think about the backlash this would create amongst the userbase. the majority of the users will probably never even realize / read that their passwords have been stolen and thus linkedin probably does best in keeping a low profile about this (and start from now on using a better encryption). this is obviously not in the interest of the users, but it is in the interest of linkedin.

-----


Funny, LinkedIn was one of the few services that made me do this after the Gawker fiasco

-----


Interestingly, linkedin did just that whe the gawker list was leaked (iirc).

-----


Or, they could take the Zappos route and just force everybody to reset their passwords. This route would make adopting a different (e.g. salted) password system quite straightforward.

-----


[deleted]

Possible that they only uploaded the "hard" ones. Looks from other comments here that people have found their own passwords, unsalted.

-----


I've found '1234678', 'password', 'qwerty', 'linkedin' and few other common phrases (already 00000'd, obviously), so it doesn't look like a list of just the hard ones.

-----


Other sources are starting to report this as well:

http://blogs.computerworlduk.com/unscrewing-security/2012/06...

http://thenextweb.com/socialmedia/2012/06/06/bad-day-for-lin...

-----


Database is available here https://disk.yandex.net/disk/public/?hash=pCAcIfV7wxXCL/YPhO...

(Source: twitter, haven't looked at it myself)

-----


I miss the days where wget/curl worked to download files from the web.

-----


Those just look like hashes - are there usernames / salts somewhere? They do indeed seem to be salted.

-----


No they're not. I tried the following:

> irb

> require 'digest/sha1'

> Digest::SHA1.hexdigest 'my_password'

=> hash_string

Then I searched the file with the hash string and found my password. I really hope they don't also have the usernames somewhere.

-----


Interesting, I tried this with a bunch of different passwords (though using php's sha1 function, which obviously gives the same output as ruby's), and found no matches. You're using the "combo_not.txt" file from the zip file in the ggp, right?

-----


The dump is not complete -- my password is also missing. As other people said, that file contains about 6.5 million hashes, while LinkedIn has 30 times more users.

Considering how usernames weren't leaked, there's a big chance that the intruder is just sitting on them and the other passwords.

-----


My password is missing too (if i've done right the hash generation as illustrated above). It's strange that only hashes starting with "000000a9" are present, someone said here that it's just presentation but my hashed password is 40char long as those leaked including the 000000a9

-----


Either you don't have a complete file or you haven't scrolled through it. Only the first 277 hashes start with that string (and some others scattered throughout).

-----


i was talkin about hashes starting with 0000 (i just looked at the beginning and the end of the file). jgrahamc posts is useful, if i dont consider this 0000 (that could be a sign of "ok we've decrypted it" i can find my hash (password was not very difficult)...

-----


Thank you!

-----


How can you tell?

-----


Not finding 'password', 'foobar', '1234' suggests salted passwords.

-----


Very good point. Although some sites have password rules that would prevent those.

-----


The hash list posted might be incomplete.

-----


check jgrahamc post at the top

-----


'password' and 'foobar' are both in there. '1234' is not, but that's probably because of a minimum length requirement.

Edit: '12345678' is in there, further bolstering the length requirement theory.

-----


Correct.

From LinkedIn: "Passwords are case-sensitive and must be at least 6 characters."

-----


Whatever manager it was that tasked some junior programmer (particularly one that didn't know that unsalted SHA1 is a terrible idea) with implementing the password system at LinkedIn needs to be fired. Making the programming mistake means that you don't know much about web security, and while not a great thing, that's forgivable; putting someone that's utterly unqualified for code with security implications on such an important task is not. Nor is letting the code get deployed without having someone that knows what to look for review it. Nor is letting such a bad decision remain live for...what is it now, almost 10 years?

But let's not stop there. There are probably a dozen other people at the company whose job it is to avoid blunders like this, all the way up to the top technical staff. After all, LinkedIn is not, and has not been for some time now, some tiny underfunded startup. It's a goddamn public company, and even before that it was a super-team Silicon Valley darling that was getting money thrown at it since even before tech became cool to invest in again, and it's been valued at over a billion dollars for almost five years now. There is absolutely no excuse for this, they should have been doing regular security audits for years, and no audit worth its salt would miss something this simple. I absolutely refuse to believe that this problem was unknown, that nobody ever commented or filed a bug report about this code - no, this was deprioritized, because it wasn't considered a high enough value problem. And now it's bitten them in the ass and become a problem, probably because some other security vulnerability was similarly deprioritized instead of fixed.

I expect this from some shady Bitcoin market that a high school kid runs off of a server in his bedroom. I do not expect this type of amateurism from a 10 billion dollar company with hundreds of engineers, many of whom have specifically looked over that code, some of whom have probably complained about it, and all of whom should know better than to let it fester...

-----


"I expect this from some shady Bitcoin market that a high school kid runs off of a server in his bedroom. I do not expect this type of amateurism from a 10 billion dollar company with hundreds of engineers.."

Think you might be expecting too much from large companies =/

-----


I wonder, what if this list wasn't leaked from LinkedIn databases, but rather from some third-party service using the "enter your password" anti-pattern? A flaky service like that would likely not be very good at safely storing passwords.

Unfortunately, LinkedIn keeping mum on the subject makes it easy to speculate that it was actually coming from them. Otherwise it'd be easy to deny (and even spin: "How dare you! We never store unsalted hashes, we follow state-of-the-art practices here!!"). Also, their security track record is... embarrassing as it is.

-----


I wonder how many LinkedIn users use the same passwords for all their accounts. The article talks about identity theft and "confidential contacts" but I think the real danger is that people tend to use the same password everywhere. It's their other accounts that might have real value.

EDIT - As I think about it, e-mail accounts would be especially valuable as most of your other sites could be compromised using the "recover my password via e-mail" feature if the hacker could read the resulting mail.

-----


Me. Admittedly, it's stupid as hell, but has generally been too much of a pain to do anything else (for things outside of banking, email). I've started to get serious about KeePass lately, but I bet a significant percentage of users take the lazy approach.

-----


Having to type in my Apple password on iOS once every few hours inevitably means I have to use something memorizable and quick to type. There are certain trade-offs with different passwords.

-----


What's causing you to have to retype passwords every few hours?

If you're doing something that makes that normal procedure, consider using the browser inside 1Password for iOS.

-----


Installing and updating apps, I'm guessing.

-----


That, and the app sync prompts between iOS devices.

-----


My linkedin password is an easy one. Then I checked 1password to see how many other sites I use that password on.

74 sites... including gmail, openid, facebook, skype, amazon, dropbox, reddit and this site.

-----


Out of curiosity: if you use 1Password, why are you reusing a password across critical sites?

-----


Most of them are from the era before 1password - also, I didn't realize until now how bad it was.

-----


Full disclosure: I reuse passwords on "low-value" accounts too ... I am NOT innocent either.

-----


I've developed a system (kept only in my head) where every password I use is based off on the name of the service. This means that with just one of my passwords, you're most likely not getting anywhere. With two, you have a bigger chance of figuring out the differences and thus the system, but it works fine for me at the moment.

-----


Other than the simple top 100 password list, a password based on the name of the service is the most likely password that everyone has.

Usually something like "domainname"+"common password for all sites"

-----


Don't underestimate me. It contains many numbers extracted from the letters according to various rules (order in alphabet, backwards, etc), along with special characters.

-----


aha! same for me ;)

-----


just use a couple of shitty passwords for sites you don't care about, and remember the other ones.

E.g. my hacker news account would probably be relatively unproblematic to compromise. If that were to happen, I'd just make a new one though.

-----


I take things a step further -- I have no idea what my password is on sites like HN or reddit. If the cookie is ever gone, my account is gone.

I don't like the idea of identity permanence.

Instead of shitty passwords though, why not use something like 1Password to store the logins? I use that (or an old fashioned piece of paper in a secure location) for meaningful security tokens.

-----


Yeah, I actually deliberately make new accounts on reddit and HN every couple of months

-----


But what about your kar- nevermind.

-----


Ha. I'm in the same boat. This is my second account after the first one got ghost banned (for a single comment and the followups attempting to explain).

-----


I generally use the same password for what I feel are non-critical sites like LinkedIn, twitter and Facebook. Another password for testing new services/apps etc. As a rule any site that may contain my credit card data or sensitive information I use a separate password. I feel this is the best compromise to having complex passwords for each account.

-----


I used this in the past as well. But then started thinking about what non-critical is. As a "internet professional", even my Facebook account being compromised would be negative impact on my image; on LinkedIN doubly so due to it's professional character. So I basically decided that I'm not going to distinguish at all (sliding slope) and just have randomly generated passwords for all sites (not for my Mac though, too much hassle/attack vectors are different).

Safe >> Sorry

EDIT: Just checked, and my randomly generated password is in the leaked list of hashed passwords. I'm not using that same password anywhere else, so the source MUST be LinkedIN through whatever means (or it's some Mac/PC based attack vector, and these folks only leaked LinkedIN accounts which sounds very implausible).

-----


Cracking the passwords from the hashes is not just fast, it's ridiculously fast. I can't believe a site like LinkedIn stores their passwords this way in 2012.

  guesses: 11516  time: 0:00:21:36 0.00% (3)  c/s: 27126G  trying: aptewwod - aptewws1
That's plain old john the ripper running on the cheapest 13" 2010 mbp. John is not even using the GPU, and non-trivial 8-character passwords are scrolling by in my terminal, too fast to read.

-----


What riddles me though, is how come 6.5 million? LinkedIn has what, 150M users?

Did they not post the entire load (and are in fact sitting on _all_ the hashes?) Is the dump an old backup or breach from when they had fewer accounts? Is it just one DB partition / file that's been lost, an archive?

-----


Given that these hashes are not salted, running a 'uniq' on the list of all users' password hashes would probably already cut it by half, if not more. Then you eliminate all the easy ones from wordlists, and post the remains on the internet for people with excess computing power to bruteforce.

-----


They are already unique and sorted.

sort -u combo_not.txt | wc -l 6458020

wc -l combo_not.txt 6458020

-----


I assume the first line you meant to pipe it through uniq afer the sort? Otherwise the only thing you've demonstrated is that sorting a file doesn't change its line count. :)

-----


"sort -u" means "sort and uniq".

-----


Wow, I can't believe I was never aware of this. Thank you!

-----


My password shouldn't be easy, and wasn't in the list.

-----


Ways I've seen this play out before:

• Someone got in to one user database, but not all of them.

• Someone got into the complete user database, but were found out during the intrusion and cut off.

• Someone found a sharded DB dump or backup.

• Someone found/stole/virus'd a dev laptop with DB dumps.

• Someone sat on the network for a while and grabbed app server -> DB traffic.

Replace "Someone" with "russians," "brazilians," or "something behind tor" for more accurate portrayals.

-----


Here are a couple links to the file:

http://www.mediafire.com/?n307hutksjstow3 (RAR)

https://disk.yandex.net/disk/public/?hash=pCAcIfV7wxXCL/YPhO... (ZIP)

-----


There should be a huge banner on linked in urging users to change their passwords in my opinion.

-----


Even more important to change password on other sites if you use the same email/password combo there

-----


Um, pardon the obvious question, but does someone have a direct link to the hash file?

-----


From a slashdot comment:

https://disk.yandex.net/disk/public/?hash=pCAcIfV7wxXCL/YPhO...

-----


I can confirm that my long-lived randomly generated single-use 12-character password hash is in the file, but not 00000-prefixed (apparently not broken).

A more recent 20 character single-use randomly generated password was not, but the file doesn't comprise the full 6.5 million hashes noted in stories.

I've since changed both for rather longer randomly generated single-use passwords.

-----


Thanks.

For anyone trying: it's not a direct link, but a download page (JS required) which lets you d/l "combo_not.zip". Which has 6458020 lines of "00000"-prefixed hashes, apparently sorted.

-----


Maybe they read this SO thread?

http://stackoverflow.com/questions/2019279/what-is-the-recom...

-----


If we can't trust SO for security questions, where do we go to get them answered?

-----


My old password was in the password file, and it was flagged as cracked.

If you're a Windows user and you want to check if your password is in the file.

  (1) download the passwords file from http://www.mediafire.com/?n307hutksjstow3
  (2) the download is a RAR file, so you'll need to have WinRAR installed to extract it.
  (3) to get the sha1 version of your password, go to duckduckgo.com and type:
    sha1 yourpassword
  (4) copy the result, except for the first 6 or so characters
  (5) open a DOS command prompt (WindowsKey+R and type CMD)
  (6) type (quotes required where indicated): find "sha1hash" sha1.txt
    (note: to paste to the command prompt is right-click)
Example:

  The sha1 hash of the password 'password' is: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8
  Remove first six characters: e4c9b93f3f0682250b6cf8331b7ee68fd8
  enter at command prompt: find "e4c9b93f3f0682250b6cf8331b7ee68fd8" sha1.txt
    result:
    ---------- SHA1.TXT
    000001e4c9b93f3f0682250b6cf8331b7ee68fd8

-----


The SHA1 of my LinkedIn password naturally starts with "00000". I wonder if that would have thwarted the original cracking attempt.

I'm almost disappointed that mine was not in the list.

-----


Obviously the list was filtered to eliminate duplicates. It contains only what the hackers wanted it to contain. So, why does nobody mentions that it is HIGHLY LIKELY that the user names associated with the passwords (which are actually mainly e-mail addresses for LinkedIn) are also in the possession of the hackers. So, if I would be the hacker - strip usernames, strip duplicate hashes, post list of unique hashes to let others do the CPU intensive cracking, retrieve cracked passwords, match with usernames (e-mail address), check same password on other accounts (first on the e-mail account, then google the e-mail address on forums or try on the services that interests me and say "forgot password, send it again to this e-mail address - thank you telling me that this e-mail has indeed an account with you..."), monetize somehow the data. As a user that implies - IMMEDIATELY change your password for the e-mail address used to login at LinkedIn (if it was the same password); verify if settings of this e-mail account have changed (like an additional unknown address added to allow retrieval of the password, DUH), try to remember where you use the same address either as login or to recover credentials, try to remember where you used the same password, google you e-mail address to help you remember; change passwords; consider abandoning the e-mail address if it is not your primary one,... Also - did the amount of SPAM that you receive on the e-mail address used to login at LinkedIn suddenly increased, while SPAM remained constant on a similar mail account not connected to LinkedIn ? Maybe someone just sold your e-mail address, so the LinkedIn break may affect you even if the password is not in the list. Bottom line is - LinkedIn approach appears to be: We have no proof that this particular account was hacked since password hash is not in the list - let's not overreact and let'sassume it is not hacked even if we don't have a clue what was actually hacked. I'm not to judge if it is the best approach for the business, but sure as hell I don't like this approach as a user.

-----


I can confirm that my password was in there. I have changed it. My password was "98mnja6z" which hashes to 6475590bc1407aa98c8b022230292cce3d8528b3. I used this for no other sites, so I'm not concerned about it leaking.

It is inexcusable that LinkedIn hasn't alerted their users yet.

-----


Mine is also in there. It was also a randomly generated password used only on LinkedIn.

-----


I'm starting to think it might be wise, if you intend to reuse your password on multiple sites, to salt it yourself. By using a form like "<site name><user name><reused password", you protect yourself from rainbow tables without making your username harder to remember.

And yes, yes, I know you shouldn't be reusing your password across different sites, or using a dictionary word anyway. And teenagers also shouldn't be drinking, doing drugs and having sex. It doesn't help anything to pretend that people are going to behave optimally.

Of course, the preposterous restrictions that websites put on passwords, like maximum password length, will make this idea harder to put into practice.

-----


I've been doing this myself and it has worked out pretty well so far. My password is in the list of passwords released, but is uncracked and I can rest assured knowing that I did not use the same password on any other website.

A couple things to keep in mind:

1) The salt you generate should be put at the front in case the website is silently truncating the password to a certain length

2) The salt can be something more complicated than site name. I mentally calculate a fixed length salt based on the site name

3) You may want to still keep two separate "base" passwords, one for high value sites (banks, email) and one for low value sites (everything else).

-----


Given they haven't confirmed they've found and closed the leak, is it wise for everyone to be changing their passwords already?

-----


Given many people have confirmed that their uniquely generated password is in the list, is it wise to wait any longer before changing your password ?

-----


I guess you need to do both: change your password, but to something throwaway. Then when the hole is closed, change it again.

-----


I'm not sure what you're implying. How have they 'closed the leak'?

If you find your hash in the list, you should change your password. If you don't, you should change your password.

I use LastPass to manage my passwords so I just generated another random 20+ char password and forgot about it.

-----


The point is that LinkedIn haven't even confirmed they know how the passwords were stolen (they haven't even confirmed they were stolen, yet).

In that case, when you change your password and feel all secure again, what's to say the hackers haven't just lifted your new hash as well?

-----


That's a fair point.

-----


You can generate your own SHA1 for your password and check if it's existing in the txt file.

http://jssha.sourceforge.net/

-----


Or, if you don't trust random web sites, we can turn this into the next fizzbuzz:

    $ python -c 'import hashlib; print hashlib.sha1("hunter2").hexdigest()'
This will print the SHA1 hash of "hunter2".

-----


This makes me wonder. I've been relying on Django's built in user authentication lately. Does anyone know if that's pretty safe? Is it doing the right thing for hashing passwords?

-----


Disclaimer: I am not a cryptographic expert.

https://docs.djangoproject.com/en/dev/topics/auth/

Django by default uses the PBKDF2 algorithm, which is better than nothing/md5/no salt sha1.

I'd use bcrypt or scrypt by default, better be safe than sorry.

-----


Pbkdf2 is extremely good. Without deeper analysis (or a feature comparison) I'd be hesitant to say that bcrypt or scrypt are better.

-----


I sincerely mean no offense but this statement came directly out of your butt. Read the table on page 14 of Colin Percival's Usenix paper "Stronger Key Derivation Via Sequential Memory-Hard Functions" (which you could have found by Googling [scrypt paper]); PBKDF2 is ~5x faster (ie: costs ~5x less to break) than bcrypt; PBKDF2 and scrypt aren't even in the same ballpark.

From exactly where did you derive the idea that PBKDF2 is "extremely good"?

The reality is that all three of PBKDF2, bcrypt, and scrypt are just fine. But PBKDF2 and scrypt have drastically poorer library support than bcrypt; nobody should delay using a strong password hash so that they can optimize which one they use.

-----


All three are extremely good for this use case, when the competition is SHA-1. Beyond that, I don't know enough to compare the three. So yeah, it came out of my butt.

If Colin has a paper on it then I trust his comparison. What I really meant to say is what you said: all three are just fine.

Also, I thought I remembered my comment's parent saying something stronger, either it was edited later, or I was drunk when I decided it was worth commenting on.

-----


I just wanted to write the word "butt". Thanks for being cool about it. :)

-----


hahahaha, well, what you wrote was worthwhile and (as far as I can tell) correct. Thanks.

-----


Eh? PBKDF2 has configurable complexity and has found many more applications than bycrpt, from WPA2 to disk encryption. The crypto research behind PBKDF2 is much more rigorous.

-----


Please cite one academic cryptography paper that presents an analysis of PBKDF2, other than Colin's paper which damns it.

There is virtually no "rigorous" research into KDFs of any sort, let alone password KDFs. Most academic crypto research simply presumes passwords are taken from cryptographically secure random number generators and stored securely.

And with that said I want to remind you that I just cited a source, accepted at Usenix, that measured PBKDF2, bcrypt, and scrypt and found PBKDF2 inferior to bcrypt. You seem to want to pretend otherwise.

-----


PBKDF2 is way better than salted hashes. It's right there with bcrypt and scrypt on the "good choices" list.

-----


Django has chosen a fine default and for the next several years it's probably unnecessary to second-guess it. Over time, GPU and (more importantly) FPGA-assisted hash cracking may or may not become more common, at which point you'd want to transition to something like scrypt.

You could literally flip a coin to decide between bcrypt and PBKDF2 and it wouldn't matter which side came up.

-----


> which is better than nothing/md5/no salt sha1.

It's also better than salted sha1 since it performs multiple iteration rounds leading to (configurable) higher computational complexity.

-----


I'm not an authority on this, but django_bcrypt is generally considered a best-practice in the Django community. Scrypt may replace that in the future, once implementations are widely available and battle-tested.

-----


Yes, and to make Django even more safe, a number of improvements have been made over the last year (e.g. https://code.djangoproject.com/ticket/15367).

-----


I literally created my LinkedIn account only 2 days ago. And my password was in the list and cracked. Thankfully I don't use it for anything else.

-----


Is it unique enough that you can be sure it's your password, and not someone else's? I ask because the cracked passwords seem to be the simple/obvious ones that are likely to be used by multiple people. If it is strong/unique though, it would effectively confine the hack time to the last 2 days.

-----


Could be a dupe password? Is your password quite strong?

-----


I wrote a short article about this kind of stuff - if it's really my data, then let me use my lock on it : http://ragmondocom.appspot.com/2012/03/My-Stuff-My-Lock

Then I get to choose what strength lock I put on it.

-----


First rule of software design: users are lazy. Second rule of software design: users are stupid.

"Use your own lock" is fine for us Übergeeks, but for the vast majority of the populace, they just want the provider to put a system in place so they don't have to worry about it.

-----


I know a lot of companies just keep your account including your password in there database while you removed your account.

Can I be sure my account was totally removed when I removed my LinkedIn account? Because the "please change your password as soon as possible" won't help me much.

-----


As long as:

1. They do not have a mechanism to resurrect your account and 2. You do not use the same password elsewhere

then it should not matter.

-----


I didn't see it in the post, but does anyone know if these were current passwords (as of this post)? I use a unique password for linked-in, but some number of months ago I used a password I shared with another site. Wondering if I need to change that one too. Guess I might as well.

-----


Can we please start using BrowserID or some other standard so we can secure that one provider and do away with all this? I'd like it if we could authenticate with Google using 2-factor authentication and be less worried about my password getting hacked.

-----


By centralizing authentication, you make that central provider an even bigger target and you risk losing access to other services as you lose your main account (Google is known to sometimes terminate accounts with no way of recurse).

Finally, when that central provider gets hacked, all your dependent services are now also compromised.

And as we know from the CloudFlare story over the weekend, not even Google with their 2 factor authentication is devoid of issues.

No. Centralizing your login to one third-party as as bad as the current practice of reusing your password for every service you have an account with. The only way that is reasonably safe is to use different random credentials for every service and store these credentials somewhere under your (and only your) control (i.e. a password manager or a piece of paper)

-----


Browserid is not a centralized authentication protocol. Although currently all implementations I know of rely on browserid.org, this is not required by its design.

There's also the fully decentralized openid, you know. I'd 100% rather be able to use openid for sites like Linkedin and this one than rely on every site implementing sane password management.

-----


Why can't we just use good old PGP?

There is no reason why we should centralize password management and put the world's authentication into one giant pinata for black hats to take a swing at.

-----


A single point of failure sounds dangerous. People should just avoid using the same password for different websites. (That's what KeePass is for..) Perhaps a clever extension / browser feature could ensure that. (e.g. "Warning: You are probably using the same password for facebook.com")

-----


Wow. Not only is every single reply to StavrosK completely wrong about how BrowserID works, they're actually doubly wrong. Not only is it NOT centralized, it also can be used with:

- 2 factor auth

- asymmetric encryption (aka, a challenge/response ala PGP)

- whatever security mechanism you want, frankly. It's up to the browserid provider.

-----


My rationale is that it's much easier to secure one provider (the attack surface is much smaller), and you can also run one yourself, making you responsible for all your authentication needs.

OpenID was great in that you could choose any provider you wanted, and nobody could attack them all (not that they'd have to). It just seems like a good solution to use someone whose only job is to provide secure authentication.

-----


Guys, this all doesn't parse for me. My password on LinkedIn was 13 characters long, and included symbols (!@#$%^&&*()), numbers, and alphabet characters. A 13-character password like this would imply a search space of (26 + 26 + 10 + 20) ^ 13 = BIG. If a GPU can check 11 billion passwords per second, this implies that someone ran 2.4 x 10^7 GPUs for a month.

We're either looking at someone with a seriously ridiculous password cracking computer (i.e. ASIC-based -- not even FPGAs), a compromise for SHA-1 (very unlikely), or a keylogger/proxy/trojan/etc... I vote for keylogger.

If your password is in this database, I don't think it's because your password was brute-forced.

-----


My old password is not on the list. However, it seems like somebody tried to log on to windows live with the e-mail address and password I was registered on linkedin with. This is one of my oldest passwords from when I still only had one or two passwords.

I noticed this as window live kept sending another of my e-mail accounts a code needed to log in from an unrecognised computer.

Now it could all be a coincidence, but I wouldn't be surprised if there was a connection, as the e-mail address and the password were identical to the ones used on Linkedin. If that's the case there would be a more complete list with my password/hash as well as the associated e-mail address.

-----


It seems we will never get rid of bad programming like this. I hit the 'forgot my password' link on the T-Mobile website yesterday and the pop-up requested my T-Mobile phone number. Ten seconds later I received an SMS with my actual password in it.

-----


Putting people's personal details on the open web, giving anyone access, including malicious hackers... This design used by LinkedIn, as well as Facebook, was a bad idea from the beginning. Don't think they are not aware of the risks. How much spam and other annoyances do people get as a result? These companies are killing privacy just to make a quick buck. Maybe they'll be sued.

Direct link to SHA1 file on mediafire (117MB) to avoid javascript, captchas, popups, etc.

http://205.196.120.123/c2o80hrlhteg/n307hutksjstow3/SHA1.txt...

-----


How on earth were they not salting? There are so many open source auth systems now that get all the basics right. Someone who works at a big company like this and has any insight, please comment. How is this even possible in these days?

-----


There's still an unbelievable amount of ignorance out there about how to properly store hashed passwords. There are countless articles explaining that you need to hash the passwords, and telling you how to use md5("salt" + password), and then the blog comments are full of helpful people saying that you should use SHA256, or "no u also gots to add pepper", or exhorting the author to use a large unique salt from /dev/random (not /dev/urandom, it's not random enough) and then encrypt the salts in the database with 2048-bit RSA. I sometimes google around for these articles when I want some morbid fascination -- it's the intellectual equivalent of those YouTube videos where one car crashes into another, and then a third car crashes into the wreckage, and then another car tries to ramp over it and fails, and then everything explodes, and then the people staggering out of the destroyed cars start shouting bad advice about hash functions.

-----


No sign of my password in there http://www.mediafire.com/?n307hutksjstow3, or my wife's. I checked both the full and the '00000' truncated hash for each. Neither of us had changed it for the last couple of years.

So I guess it is only a subset of all the linkedin passwords?

I have now changed my passwords anyway.

By the way, the press say both the username and password were hacked, has anyone seen the list of usernames? They also say 6.4m passwords were hacked but this file only has 6.14m.

-----


If you want to check if your password is in there:

    read -s a
    zgrep $(echo -n "$a" | sha1sum | cut -d' ' -f1 | \
      sed -e 's/^...../00000/') combo_not.zip
    unset a

-----


According to jgrahamc's investigation, this will probably check if your password is there and is cracked already. To check if the hash is there, although uncracked yet, you should probably remove the sed call from pipeline.

-----


Oh. You're right, I missed that. New command line:

    read -s a; zgrep --color -E $(hash=$(echo -n "$a" | sha1sum | cut -d' ' -f1); echo -n "$hash|$(echo $hash | sed -e 's/^...../00000/')" ) combo_not.zip ; unset a

-----


I cross-referenced the leaked hashes against hashes of the 10,000 most common passwords and found that 93% of the passwords at least 6 characters long appear in the leak.

http://www.johnvey.com/blog/2012/06/93-of-top-passwords-appe...

It's always surprising that people are so lackadaisical about their passwords. I've had people tell me their passwords in casual conversation multiple times, just for the sake of discussion.

-----


A salt may not have been enough to protect the passwords : if it is not complex enough, the presence of common passwords like "password" or "123456" make a brute-force attack on the salt itself possible in some case. I have performed a benchmark on that point in particular, and was able to retrieve a salt in five days, without strong optimization. A bit long to give all the numbers and code here, so the ref is http://gouigoux.com/blog/?p=46

-----


There's an English-language article (rather than a translation) at http://www.bbc.co.uk/news/technology-18338956

-----


My password hash was in the file and it was cracked. It was a combination of 8 upper and lower case letters, digits and special characters. This is the case where size does matter and apparently passwords like my old one can be broken on GPU in minutes or hours nowadays.

Quick sample from persons I polled: 2 password hashes were not in the file, 1 was there and cracked, 1 was there and not cracked yet.

As bad as it is, this can be a great case to raise the awareness of good password management.

-----


The good thing is: every time this happens to a high-profile site, storing sensitive data, more people get more acquainted with the concepts of "you really should not use a simple password" and "you really should not use the same password across all sites". I know it works for me: this was the last straw that forced me to abandon a good ol' password I've been using since 1998. From now on I'll just rely on password managers (currently DataVault, but I know people who swear by LastPass).

-----


So I changed my password through my PC yesterday, went to my Android client and, to my surprise, it is logged on the mobile!

It's been more than 12 hours, and the access token for the mobile client is still connected to my account, despite changing my password.

I would expect all tokens to be revoked on-password-change. Really disappointing.

I'll have to set up an SSL proxy later to dump the traffic from Android, see what is happening. Anyone compiled SSLDump for Android?

-----


My password isn't in the file, and yes I checked for a 0'd version as well. My password is 9 characters of lower case, upper case, numbers, and a symbol. I'm wondering if this is incomplete, or fake. Either way...if it is a vulnerability I suppose LinkedIn hasn't fixed it yet, or at least I haven't heard mention of this - thus even changing your password won't help much if they can just re-download the database. Thus making a long, complex password is the best course of action.

-----


Torrent of database: http://www.seedpeer.me/details/4368981/linkedin-hashes.html

Magnet link: magnet:?xt=urn:btih:VUPJHINO4KAWLWVLEKKFWKJVF3DVDDDR

Torrent download: http://www.seedpeer.me/download/linkedin_hashes/ad1e93a1aee2...

-----


00000fac2ec84586f9f5221a05c0e9acc3d2e670 0000022c7caab3ac515777b611af73afc3d2ee50 deb46f052152cfed79e3b96f51e52b82c3d2ee8e 00000dc7cc04ea056cc8162a4cbd65aec3d2f0eb 00000a2c4f4b579fc778e4910518a48ec3d2f111 b3344eaec4585720ca23b338e58449e4c3d2f628 674db9e37ace89b77401fa2bfe456144c3d2f708 37b5b1edf4f84a85d79d04d75fd8f8a1c3d2fbde 00000e56fae33ab04c81e727bf24bedbc3d2fc5a 0000058918701830b2cca174758f7af4c3d30432

-----


My belief is that the hackers might get the username password combos, but they grouped the hashes to only have unique (sort -u ?) passwords hashes and therefore ease the process of dictionary cracking them as they do not have salts. The 00000 prefix might be an indication of this. I bet there is an automate script taking care of a dict attack and the file was released during execution.

-----


I've come to the conclusion that this list is genuine. While some people have said that they could't find their passwords in the list, I think this only points to the most probable reason in that this is a part list.

Password: "needajob"

    Hash:    e41b635974babd5d6e7d6dc68e8b3d2fc39938b2
    Cracked: 0000035974babd5d6e7d6dc68e8b3d2fc39938b2

-----


Looks like they were too busy tuning their Spam All Your Friends (TM) "invitation" feature to bother with proper ways to store passwords.

-----


They have added a blog post with an update http://blog.linkedin.com/2012/06/06/linkedin-member-password...

Unfortunately they dont tell when they started salting the password and if they've found (and fixed) the hole that permitted the database leak!

-----


I deleted my account over 6 months ago but my password hash (strong unique password) is in the file. Either (a) the file retains passwords of deleted accounts, (b) the file was stolen over 6 months ago and LinkedIn didn't know about it, or (c) the file was stolen over 6 months ago and LinkedIn DID know about it and were hoping it wouldn't show up online.

-----


Why is there no notification on their website?

-----


Has any legitimate sources confirmed that the usernames were also stolen along with these hashes? Or were only the hashes stolen?

Could this just be an elaborate hoax where someone generated 6.5M SHA1 hashes and said that they hacked linkedin? Maybe someone shorted LNKD and then leaked this, hoping for a monetary gain?

-----


Someone in this thread stated that his non-common password is hashed on the list so, very unlikely.

-----


How does this benefit someone who is trying to access an account? There are no account names tied to these hashes. So even if you managed to find the clear text of each of these you would still be in a position where you have a list of over 6,000,000 passwords to work through in order to brute force your way in.

-----


Quickly check if your password has been cracked:

http://crackedin.s3-website-us-east-1.amazonaws.com/

It will not send your password over the wire. It won't even send the SHA1 hash over the wire.

-----


If your hashed password is not in this list (trunucated or not) do not assume for a second that whoever leaked this list had to leak all their lists.

Also note that according to other users, some of these hashes are at least 3 weeks old, they've had this list for some time.

-----


http://pastebin.com/JmtNxcnB - 20k++ sample cracked passwords from LinkedIn hash dump released on June 6, 2012. They do appear legit and strong too. It's unfortunate that LinkedIn hashed them using unsalted SHA-1.

-----


Am I the only one who is too paranoid to decompress a file that I know was created by a black hat...

http://www.symantec.com/avcenter/security/Content/2005.12.21...

-----


I have an ignorant question... when a SHA1 encrypted password is cracked, can the hackers actually identify what the unencrypted password is?

I'm guessing no since SHA1 uses a hashing algorithm and only a brute force approach would potentially work...

-----


I'm not an expert in the field but from what I know, SHA1 is a one way function. When an encrypted password is cracked, YES, the hackers know that specific password. They brute forced it by guessing the password, running it through SHA1, and comparing the output to the hash. If they are the same, then they guessed the right password.

They do not know any other passwords and if "salt" was used, they would have to brute force each password. I think salt wasn't used in this case so once they crack someone's password, they know every other user who used the same password. So if you and I used the same password, and they brute forced yours already, they will know that I have the same password.

-----


"Cracking" in this sense is brute forcing. SHA1 is fast, and people use bad passwords. The combination means that you can run through lots and lots of bad passwords very quickly. I checked my linked in password I have stored in 1password, and it is 20+ chars with special characters and numbers. That won't be "cracked" in any meaningful sense, so I don't even worry about it.

You are correct that there's currently no way to go from a hash to a value that hashes to it in SHA1 (AFAIK, IANYNSA [I am not your NSA]).

-----


Is it scary to anyone else that LinkedIn can't confirm yet if their security was compromised?

-----


The forum they are talking about apparently (found using google)

http://forum.insidepro.com/viewtopic.php?p=96122&sid=133...

-----


The Hacker News effect seems to have taken the forums off-line. If you need a quick DDOS and don't have a botnet handy, just post the link here!

-----


https://www.dropbox.com/s/dsiuavbbzt8cy7g/forum.insidepro.co...

A saved copy of my local cache...

-----


Might also be the fact it's all over twitter.

-----


I just looked into the file (combo_not.txt). There are only hashes. Who decided that the hashes posted in the forum are related to linkedin in any way ? Thank you.

-----


Those paranoid tinfoil-hat wearing lunatics that generate absurdly long unique random passwords for every site are wringing their hands with glee because they found the hash of their LinkedIn password in the file. You're welcome.

-----


John the Ripper released a patch for this modified sha1 hash type today. So no need for any manual hacks. You may download the patch here:

http://openwall.info/wiki/john/patches

-----


As much as I hate lawsuits, I'd love to see one or two major Internet companies sued in a class action lawsuit for negligence to serve as an example and a warning to the rest. This kind of behavior from a top tier internet presence is inexcusable!

-----


It seems there are some duplicates, considering cracked hashes beginning with 000000. For instance, I found:

  passforlinkedin
  00000610754c30b38d0c70b72b7e8210268cd9b7
  b3534610754c30b38d0c70b72b7e8210268cd9b7

-----


A password I used many months ago (maybe almost a year now?) was in the list, but the password I use currently for many months was not on the list interestingly enough. This list is possibly pretty old, which means it happened quite a while ago.

-----


My LinkedIn's password hash (at the time, I changed it once news broke) was not listed. And it was a relatively weak password (8 characters, just lower case characters and numbers). I doubt this is LinkedIn's password dump.

-----


Even after securing our own passwords, we are all still vulnerable to attacks where the attackers simulate members of our networks to discover private information like our connections, job history, etc.

-----


So LinkedIn... Why are you not using bcrypt to has your passwords?

http://codahale.com/how-to-safely-store-a-password/

-----


And here is the (partially) decrypted version with 163267 passwords: http://www.mediafire.com/?bq8bd5iojp50zci

-----


I checked a few 6 char passwords (alphanum) some were not present. So they seem not to be bruteforcing them serially. Maybe just checking against other known tables.

-----


I just made an online too to check if your password is in the list: http://billsnitzer.com/linkedin/

-----


Here is the original file... http://filevelocity.com/ixhk76jz07m5/SHA1.txt_1.rar

-----


The worst is I can't remember what password I used but don't want to change it because I want to know if it's one I used somewhere else not just reset it.

-----


Anyone looking for the file to run these tests yourself -> http://clck.ru/d/jE6Mg-5X1ARpN

-----


Just adding my two cents worth. This file looks legit as I have a long, complex password on linkedin and the sha1 hash for it is present in the dump.

-----


Does anyone know what encryption scheme LinkedIn uses?

-----


You don't mean encryption. Passwords should be hashed.

-----


Hashing scheme, not encryption scheme.

Anyway, they use Unsalted SHA-1, a really weak option.

-----


Indeed yeah, I meant hashing.

I only build tiny websites compared to linkedin and even I take the time to use a proper hashing scheme with a salt. Shame on you, LinkedIn.

-----


A question from the cryptographically uninitiated -- how (and how easily) can these hashes be linked to a user's account name/email?

-----


It's nice to see a page translated through Google on the front page. It's an indication that technology is breaking some barriers.

-----


Did not find my password in the list in truncated or original form. Member since April 2010, did not change my password even once.

-----


Searchable DB is available at http://dazzlepod.com/linkedin/

-----


Use http://leakedin.org to find out if your password was hacked.

-----


`pass` is uncracked?

-----


Shocking that they didn't even use the basic technique of using salts before hashing!

-----


On my own accord and not my employers, I'd like to invite developers to check out mojoLive as your career management tool. Our goals and vision are light years ahead of what LinkedIn has slowly become. Also, I dislike recruiters and spam.

http://mojolive.com

-----


If this is legit, shouldn't LinkedIn be notifying users of the security breach?

-----


Can anyone tell me where I can change my password on LinkedIn? I must be dense.

-----


https://www.linkedin.com/uas/change-password

It's buried under "your name" (top right) > settings > password > change (just below your e-mail, left side)

-----


Assuming LinkedIn used SHA1 unsalted passwords and will continue to do so, and many of us do not want to delete our LinkedIn accounts, what should be the minimum number of characters we should use in our new password? 15? 20? 100? (I know, 100 is probably higher than they allow)

-----


I just reset to a 22 character password, so I know that number of characters is allowable at least.

-----


A company as large and popular as LinkedIn uses no salt? Surprise.

-----


Terribly sorry to ask this - but where did you guys get the file ?

-----


Of course, now their password change script is broken/overloaded.

-----


Hmm, my (short alpha) password isn't in the list.

-----


could some other folks who have this database repost it? The existing sources (yandex, the original .ru site) seem to be clobbered.

-----


I confirmed mine is in the dump as well

-----


Mine as well, unbroken (not prefixed with the 00000) in the original file.

-----


Sorry where is the list?

-----


Can somebody recommend good reading material/book on how to handle passwords/encryption for practical everyday applications.

-----


http://codahale.com/how-to-safely-store-a-password/

-----


Worth noting that some passwords belong to the dating website eHarmony. The dating site had a password beach last year.

http://arstechnica.com/security/2012/06/8-million-leaked-pas...

http://www.theregister.co.uk/2011/02/11/eharmony_data_breach...

-----


When Twitter recently had accounts and passwords leaked, many were attached to spam accounts or duplicate records. Most had obvious passwords (like 1234).

Are these legitimate active accounts? Can you do anything with the hashed passwords alone?

-----


In fairness to Twitter, it was never actually known if the accounts/passwords came from Twitter.com (proper) or (more likely) leaked from some 3rd-party Twitter-integrating app that had pre-OAuth integration.

-----


I just changed my password. To test, I entered only letters and numbers all in lowercase. Linkedin accepts it even though the site says "should have upper case etc.".

-----


Turkish Post : http://www.halitalptekin.com/linkedin-sizintisi.html

-----


is this confirmed to be real or a hoax ?

-----




Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: