Hacker News new | past | comments | ask | show | jobs | submit login

Some observations on this file:

0. This is a file of SHA1 hashes of short strings (i.e. passwords).

1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.

Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.

  5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 is not present
  000001e4c9b93f3f0682250b6cf8331b7ee68fd8 is present
Same story for 'secret':

  e5e9fa1ba31ecd1ae84f75caaa474f3a663f05f4 is not present
  00000a1ba31ecd1ae84f75caaa474f3a663f05f4 is present
And for 'linkedin':

  7728240c80b6bfd450849405e8500d6d207783b6 is not present
  0000040c80b6bfd450849405e8500d6d207783b6 is present
2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.

3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.

4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...

5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.

For the security novices amongst us: I had no idea how to do this so I figured out a quick python script to test it:

    >>> from hashlib import sha1
    >>> def check_pass(plaintext, offset=5):
    	hashed = sha1(plaintext).hexdigest()
    	return (hashed, '0' * offset + hashed[offset:])

    >>> check_pass("linkedin")
Edit: I'm pretty sure JtR refers to this: http://en.wikipedia.org/wiki/John_the_Ripper

Obligatory perl one-liner:

  perl -MDigest::SHA -le '$h = substr( Digest::SHA::sha1_hex($ARGV[0]) , 5 ); open F, "<combo_not.txt"; do { print "found $_" if grep(/$h/, $_) } while (<F>)' password
(for people without shells)

Obligatory shell one-liner:

  grep `echo -n password | shasum | cut -c6-40` hacked.txt

Prefix the whole command with a space to avoid dumping your password into your bash history: " grep `echo -n yourpassword | shasum | cut -c6-40` SHA1.txt"

Only if HISTCONTROL is assigned 'ignoreboth' or 'ignorespace'.

Or prompt for it:

   grep `read -sp "password: "; echo "$REPLY" | tr -d "\n" | shasum | cut -c6-40` hacked.txt

I couldn't really find a good reason to use a .bash_history. I linked mine to /dev/null and never looked back. (heh)

Ctrl+r history search? I'd tend to maintaining a complete history log so that when I've forgotten the one liner I used to rotate my videos 2 years ago I can easily recall it.

2 years? Just how big is your history file?

I thought 16k entries might be reasonable but that doesn't even last 3 weeks for me. I think there might have been some issue with slow disk seeks so at some point I restricted it to that many.

I guess it probably it would be better to regularly backup the history file to deal with possible some accidental truncations and issues when running multiple shells concurrently, but probably the overall effort to set up such a system would outweight the benefits.

export HISTSIZE=0

Alternative and more dramatic method of preventing it being written to your bash history:

  kill -9 $$

kill -9 -1 is better than kill -9 $$

How so?

That post was a troll. -1 is a special PID: It indicates that all processes that you can kill should be.

Kill -9 -1 as root is a surefire way to make a system stop doing anything, fast.

Here's node.js:

    $ echo linkedin | xargs node -e "var x = require('crypto').createHash('sha1').update(process.argv[1]).digest('hex'); console.log([x, '00000' + x.substring(5)]);"


Or you could just feed "sha1 <password>" to the duckduckgo.com search box and it will give the result.

Some people have this thing against sending their private passwords in plaintext to third-party websites...

You're sending the hash, not the password.

DDG supports SSL: https://www.duckduckgo.com/

If you want coverage, generate a few hundred thousand SHA1 hashes along with your password.

Actually, running a trickle query of random SHA1 hashes from your box might be a fun exercise, along with a trickle query of random word tuples (bonus points for using Markov chains to generate statistically probable tuples).

If you search for 'sha1 foo', that's being sent across the network to DDG's servers. And sure, if you're using SSL then it's not going across in plain text, but it's decrypted and handled on their servers in plain text; it'll probably even end up in logs and/or tracking databases somewhere. You're giving DDG your password.

A hash is not a password.

At worst you're giving the attacker a hash target to try brunting. He still has to brute it, and that takes time. Select your plaintext from a large enough keyspace and it's astronomical time.

I'll need to review their policy more closely, but DDG claim fairly minimal tracking. At best someone might be able to correlate hash lookup with some IP space. That's a long way from handing over passwords. And as I already indicated, you could cradled the queries to make the search space much larger.

No, no, no. You're 100% completely misunderstanding this.

When you search for 'sha1 foo', that query ("sha1 foo") goes up to the server. They know your password is "foo" and that you're attempting to "sha1" it. They don't have a hash, they take that data and perform the hash, then send that down to you.


OK, gotchya.

I guess I'm just too damned used to using systems that, you know, have useful tools installed locally (or can get them there really damned fast). Including SHA1 and MD5 hash generators.

And I was all worked up to tell you how wrong you were still being.

All because I couldn't fathom the possibility let alone reason anyone would need a third-party site to compute their hashes for them.

Silly me, my error.

Well presumably you've already changed your LinkedIn password, so what's not to send?

Challenge accepted (although this is pretty crude)

curl -s -d q="sha1 password" http://duckduckgo.com | w3m -T text/html | grep '\w\+\{32\}'

Hi - what does

" xargs node -e "


Thank you

[node -e] evaluates a line of node.js source from a command line argument:

    $ node -e "console.log('Hello, world.')"
     Hello, world.
[xargs] allows you to pipe the output of one command as an argument to another command. By default it will show up at the tail end of the second command's arg list, but if you want to interleave it you can use -I flag:

    $ echo /usr/share/dict/words | xargs head -5


    $ echo petard | xargs -I {} grep {} /usr/share/dict/words
[xargs node -e] therefore allows text from STDIN to inserted into a script to be evaluated by the node interpreter, accessible via process.argv:

    $ echo is dog this yes | xargs node -e "console.log(process.argv.slice(1).sort().reverse().join(' ').toUpperCase())"

head -5 /usr/share/dict/words

same result as with xargs

grep petard /usr/share/dict/words

same result as with xargs

not sure what you are trying to demonstrate here

useless use of xargs?

No, he is trying to demonstrate how to use 'xargs node -e'.

Are you even reading this discussion properly or are you just searching for some shell snippets and ridicule them as soon as you get a chance? This is what it looks like from your history: http://news.ycombinator.com/threads?id=uselessuseof

ionwake doesn't want to learn how to search a word. He wants to know how 'xargs node -e' works. Please read this again: http://news.ycombinator.com/item?id=4075293

The perl one liner was funny, the shell one liner was light hearted, but your node solution is just pure fanboyism and quite frankly not in line with the spirit of the two previous posts.

And.. the node.js solution doesn't do what either the Perl or shell one liners do. It doesn't tell you whether the password was found in the file. All it does is print out a SHA1 hash of a string.

That's a trivial modification:

    $ echo linkedin | xargs node -e "var x = require('crypto').createHash('sha1').update(process.argv[1]).digest('hex'); console.log(x.substring(5));" | xargs -I {} grep {} hashes.txt
I'm surprised at the backlash to what I thought was fun code golfing. No one called me names after I posted a simple Python solution that didn't check the file. For what it's worth I've changed my LI password and I haven't bothered downloading the actual hash file.

If I post a PHP solution maybe zxcvb will get a heart attack.

node has a neat API for quickly knocking out stuff like this; it's a useful tool for more than just server code. Calling that comment fanboyism is just displaying the opposite of fanboyism, prejudice against hyped-up tools that nevertheless are good tools.

My point still stands. There's funny and then theres blatent fanboyism. You're like a prepubescent teenager who doesn't understand the context of social situations so always says something stupid.

"Which brings us to the most important principle on HN: civility. Since long before the web, the anonymity of online conversation has lured people into being much ruder than they'd dare to be in person. So the principle here is not to say anything you wouldn't say face to face. This doesn't mean you can't disagree. But disagree without calling the other person names. If you're right, your argument will be more convincing without them."

Some people actually do call names to others when face to face.

Personally, while I don't, I do tend to get a little aggressive and then I'm often surprised with the backlash, because I get that way when I'm genuinely enjoying the conversation, not when I'm irritated.

Tone doesn't carry on the Internet, so no one knows you're enjoying it. Hence, it generally degrades the quality of the conversation, which is the opposite of what we want at HN.

No, I'm saying I do that face-to-face, and people still can't tell I'm enjoying it. So the tip to say nothing that you wouldn't say IRL is useless to me; I just can't help it.

"You're like a prepubescent teenager who doesn't understand the context of social situations..."

The hypocrisy is so unabashed my brain might explode.

Pot, meet kettle.

obligatory comments

- not portable

- useless use of backticks

printf password|openssl sha1|cut -c6-40|grep -f - hacked.txt

Why are you extracting 35 characters with 'cut -c6-40'? SHA1 produces a 160-bit message digest. That's 20 bytes or 40 hex-digits.


Shorter and, IMHO, a bit simpler Perl one-liner:

    perl -MDigest::SHA=sha1_hex -le '$h = substr( sha1_hex(shift), 5 ); open F, "<combo_not.txt"; print "found $_" for grep /$h/, <F>' password

    perl -MDigest::SHA=sha1_hex -lne 'BEGIN {$pw = shift} $h = substr( sha1_hex($pw), 5 ); print "found $_" if /$h/' password combo_not.txt

The first one ramps up memory use like crazy (which I was trying to avoid) and the second one is much better with memory, but you need to move the sha1_hex into the BEGIN block or you're recomputing the hash for every line parsed, thrashing your CPU. Interesting use of 'shift' though, I didn't know you could modify the file argument to -n like that.

You might compare many words at once (say from a popular password list such as rockyou) like this:

while read line; do echo -n $line | sha1sum | cut -c6-40 | awk '{print "00000" $0}'; done < rockyou.txt

I haven't tested that, but I think it'll work.

By sheer coincidence I had a chance to use Perl again today for a job interview.

I now have a good appreciation of why it's considered a "Write once, read never" language. :)

Amsterdam? ;)

If you're paranoid about shoulder-surfing you can use getpass to hide your password as you type it in.

    >>> import getpass
    >>> password = getpass.getpass('Password: ')

Command line utility I wrote which uses getpass: http://dpaste.com/hold/756011/

A complete python script assuming you have hashes.txt in the same directory.


Just tried your code and it seems that my password has been cracked. Glad i changed it this morning now

Any password that I try works...

I have found one case where both types are present.

grep `echo -n l1nked0ut | shasum | cut -c6-40` combo_not.txt


How many hashes are present in both stripped and unstripped form?

  $ cat combo_not.txt |cut -c7-40 |sort |dups |wc -l
That's ~10% of the total.

another useless use of cat

cut -c7-40 combo_not.txt|sort|dups|wc -l

what the heck is dups?

cut -c7-40 combo_not.txt|sort|uniq -d|wc -l

Yeah I'm aware of http://partmaps.org/era/unix/award.html#cat and choose to continue writing my scripts this way. My commands look more symmetric at the prompt, and are easier to manipulate.

dups is indeed a little helper of mine. Like uniq it only handles sorted input. Update: I see you edited your answer to include uniq -d. I wasn't aware of the option, thanks. Now I can simplify the implementation of dups. But I find the name valuable, and I think it's perverse to say uniq when you mean its opposite.


Each pipe stage reads from the left and writes to the right. The eye goes left to see the input and right to see the output if it's redirected to file.

The input file is reliably the second word, so C-A M-f gets me to it if I want to operate on a different file. !!:1 gets me the file if I want to use it in a new command.

echo abc > file

1. cat file

2. cat < file

3. echo abc|cat

4. echo abc|cat - file

cat can take input from the left, the right, or both

same goes for cut

I'm not sure what you're suggesting. I'm supposed to echo |cut ...? But I have a whole file, not just one line. So I have to cat ... |cut ... -- which is what I did. So what's your point?

I could keep the file first by saying:

  $ < combo_not.txt cut -c7-40 |sort |dups |wc -l
To which I reply, "Yuck!"

Perhaps we should stop here. You seem to have made this account just a few hours ago for the express purpose of poking at people's code fragments in this thread. You're making stylistic nitpicks (they don't affect correctness, do they?) and you're making them in a tone that I'm not sure I would take from Randal Schwartz himself (you actually edited http://news.ycombinator.com/item?id=4076556 to be ruder than the original). It's a drag, man.

cut takes a file as an argument. there's no need to start the line with <

   cut -c4-70 combo_not.txt|...

But that's where this conversation started out. My response the last time around: http://news.ycombinator.com/item?id=4076674

BTW, HN has some formatting support: http://news.ycombinator.com/formatdoc

I disagree with #5, I had a few of my coworkers check their sha1 against the DB and most of them were not in the dump. I also checked for truncated hashed, none of which were found. I have the feeling this is a subset of the full database

I don't really see a purpose in hiding my password. So, as a counterpoint, my password is in the list. This is my LinkedIn password:


This is the sha1:


It is found in Line 3612910 of combo_not.txt. I believe the file is authentic.

So I have a funny wild theory...remember back when the Gawker database was compromised? And LinkedIn forced a password reset for users who (according to what I read) used email addresses that matched the Gawker leak?

What if they also (or actually) compared password hashes from their database to the ones released in the Gawker breach? In that case, they likely wouldn't have pulled data straight from the database but actually might have pulled passes from the db, output to text files, cut the text files up to parcel out for processing via Hadoop or something? And somehow one of those text files got loose somehow...or someone MiTMed the actual process (I'd vote for a floating text file just because it's been so long; the Gawker breach was in December 2010).

on another note,

my fairly complex alphanumeric+symbol password IS in the dump, though not prepended truncated with 0's and the other one I found, which my coworker admitted was too short and alpha only, was in the dump with prepended 0's.

This could validate the fact that the truncated hashes are actually already cracked.

Mine was 5 characters, alpha and numeric, but no special characters. It was in there, prepended with 0's.


At the very least, it should have been longer.

Same here - mine was all alpha characters, seven characters, and the hash with five 0's was in the file. Guess who just changed their LinkedIn password today? And included some numbers?

Another datum: the hash of my password (randomly generated 8 character mixed case alphanumeric) was in the file, without any overwritten 0's.

My password is in the dump. I use the Forget Passwords Chrome extension [1], which is based on pwdhash.com, and generate site-specific passwords based on a master password -- i.e. my password is only used on LinkedIn and it's unlikely that I share it with someone else.

I think I have changed to this password during the last year.

My linkedin password of at least 3 years was not in the dump. So it must be a partial...

Mine is there.

(email me if you need proof)

Another data point:

I changed my linkedin password about three weeks ago. The old one is in the list (already 00000-ed), the new one isn't.

My (very unique) password hash is in the list, although unbroken so far.

Sorry for the stupid question, but where did you guys find the list of hashes? I didn't see it linked in the article.

Edit: found it in the Slashdot comments, it's: http://www.mediafire.com/?n307hutksjstow3

For the record, my password's hash was not in the list.

I think they're getting removed. I posted a link from the original source, but it's since disappeared.

I don't know if I have the correct file: http://www.mediafire.com/?n307hutksjstow3

mbf041:Downloads shephard$ wc -l SHA1.txt 6143150 SHA1.txt

My password hash which was last rotated July 5, 2011

Was _not_ found in the file (with/without 00000). I have, of course, changed it today. Strangely enough, the previous password is also not in the list.

Don't know if this adds anything, but both my old password (created eight years ago) and current password (changed six months ago) were on the list. Both were very unique - 20 characters mixed.

Need to get better at changing my PWs every three months. It's really not that hard, just a matter of discipline.

My old password was in the list, but not my newer password. I changed it about 2 years ago I think.

Hmm. My truncated password (for my now-deleted account) is not in the list of hashes -- so it's not just a uniq'd full DB. Also, the original forum thread where the file was first posted only managed to break around 600,491 passwords before it went offline ... so 3,521,180 broken passwords could mean that the original hacker has had access to some LinkedIn accounts for more than just a few minutes today.

Same here. My password is not in the list and I've had a LinkedIn account since 2003. I probably changed my password about 18 months ago. Neither that nor the previous one are on the list.

My password is not in the list, not idiotic but not super-hard . I doubt this is the full list. I hadn't changed mine in years, so maybe this is from a certain period of time?

I've had the same password on linkedin for as long as I remember and neither the full hash nor the zero prefix edited was found in the dump.

Simple line used in OS X terminal:

grep -e "`echo -n "your pass" | openssl sha1`" combo_not.txt

May want to grab the last characters as the cracked passes have 00000 at the beginning:

i=`echo -n 'mypass' |openssl sha1 |echo ${i:14}`; grep $i combo_not.txt

This yielded success on some known passwords and a bunch of obvious passwords. Not mine, but I assume this dump is a list of the passwords they've cracked so far (i.e., even if your password isn't on this list - change it).

If your password was 'linkedinsucks' then it sucks because they found it already !


  527688fa9f32bb8dab32d30807ca5c57a0b203b8 is not present
  000008fa9f32bb8dab32d30807ca5c57a0b203b8 is present

Here's some they didn't find, from /usr/dict/words: Paraná, Zürich, attaché. Not sure of the encoding, but I'd guess UTF-8.

My not so strong password is not in the list, spacex12, and Ive checked if it was already cracked by the prefix of 00000, nope.

Also if it was "linkedin"

7728240c80b6bfd450849405e8500d6d207783b6 not present

0000040c80b6bfd450849405e8500d6d207783b6 present

or "facebook"

cbe648909034c0624c205fe219d3fbd10052c715 not present

000008909034c0624c205fe219d3fbd10052c715 present

or google

759730a97e4373f3a0ee12805db065e3a4a649a5 not present

000000a97e4373f3a0ee12805db065e3a4a649a5 present

I have found hashes of linkedout, recruiter, recru1ter, googlerecruiter, toprecruiter, superrecruiter, humanresources and hiring.

If it is a hoax, it is a very elaborate hoax.

Perhaps it's a DDoS on MediaFire! /joke

Good posted, upvoted. One clarification:

> That's 25 users per hash

Password choices are probably Zipf-distributed, so averages don't make a ton of sense.

It does if you're trying to estimate the size of the corpus based on the number of users.

The arithmetic mean is specifically the value you'd want. n users times m users/password == total passwords (unduplicated) in the LinkedIn database.

Zipf distribution would suggest that the pattern of reuse among passwords isn't normal, and that the median and mode are probably higher than the arithmetic mean.

My password also doesn't appear to be in the list, so I doubt it is the complete/current file. I used this python to check, in case anyone else wants to use it:

    from hashlib import sha1
    f = "combo_not.txt"
    hashes = [x[0:40] for x in open(f)] # [0:40] to stripe off \n

    # From another comment
    def check_pass(plaintext, offset=5):
        hashed = sha1(plaintext).hexdigest()
        return (hashed, '0' * offset + hashed[offset:])

    print check_pass("linkedin")[0] in hashes # -> False
    print check_pass("linkedin")[1] in hashes # -> True (sanity check)

    myHash, myHashBroken = check_pass("plaintextoflinkedinpassword")
    print myHash in hashes # -> False
    print myHashBroken in hashes # -> False

Mine was not in the list. It's also possible this isn't the entire file. I was also able to recover 225129 other passwords with a wordfile and some Python based on truncated and full hashes.

> Evidence against that thesis is that password of one person that I've asked is not in the list.

Mine isn't in it.

Neither is mine.

A stock JtR 1.7.9-jumbo5, using the default rules, is finding quite a few of the non-zeroed ones pretty quickly. This surprises me; I would have expected them to have run the list through the JtR mill before passing it on to others.

The list of cracked hashes is almost certainly not complete, one can conclude from this fact.

Got a link to the file? I haven't been able to dig one up

The hash of my password, set when I joined on October 10 2011, appears not to be in the list. Changed it anyway.

Likewise, my password (MybXy836YCza), which wasn't used anywhere except my LinkedIn account created 29-Jan-2012, and has been stored securely at my end, wasn't on the list (either as a full SHA1 sum, or as part of the SHA1).

As you probably guessed from the fact that I posted my old password, I changed it just in case the list that was shared is only a partial list of what was obtained.

Nice observation dude, Can u please share the password file I dont have it anywhere. Thanks

So where is the list? I'd like to see whether I'm on it.

fwiw, this could also be an elaborate hoax, given this facts.

E.g. a list of simple password + combinations of the above simple password+"linkedin" variations.

I have a very unique strong password on LinkedIn, and it is on the list. Given that, this is no hoax.

Same here. Sucks too, because I liked that password.

My complex unique password is also on this list (full hash no 5 0's). So nope, not a hoax. Unbelievable/insulting they didn't even bother to salt.

Yeah, even I, a newbie Rails programmer, going through the Agile Rails book learned how to salt. It isn't rocket science.

It shouldn't just be a salt. It should be bcrypt.

Do you remember when you first used this password at LinkedIn? It could help narrow the dates of the breach. Especially useful would be the presence of a strong password in the list that was subsequently changed. That might help determine its freshness, if the new password isn't present (although this may be an incomplete list from an ongoing breach).

I'm thinking this list is from closer to a year ago, I changed my password shortly after the MtGox hack last year and this hash is for my old password that was compromised during that time period.

My password is in the dump, and it was changed mid October 2010. I remember because I changed all my passwords when my laptop was stolen.

The MtGox hack was in June 2011.

It was about a year ago now. I checked the hashes for my previous password and it wasn't on the list... Mind you, as many have noticed, it seems to be very incomplete.

Unbelievable/insulting they used a general purpose, easily reversible hash like SHA1 in the first place. I would have thought everyone had seen the 'use bcrypt' page by now.


Since when is SHA1 easily reversible? Did I not get the memo?

Salting should have been fine.

I couldn't find my password on the list and I've been using the same password for LinkedIn since I registered. I was trying to remember when was that. If someone know how to find out the last time you changed your pass or when you registered for linkedIn please let me know. I'd guess I use linkedIn for over 4 years at least.

A "member since" date is available on the "Account & Settings" page. Choose "settings" in the drop down that appears when you hover over your (account) name in the upper right corner of any LinkedIn page.

I agree, I've tried several passwords and they match. If you're a Math person, please shed some light on the chances that this list covers the full space.

I'm not a math person either, but here's some fodder for someone who is.

Mark Burnett's extensive password collection (which he acknowledges is skewed, because it's largely based on cracked passwords, he only harvests passwords between 3 and 30 chars, etc.). Here's how some of his stats shake out:

* Although my list contains about 6 million username/password combos, the list only contains about 1,300,000 unique passwords.

* Of those, approximately 300,000 of those passwords are used by more than one person; about 1,000,000 only appear once (and a good portion of those are obviously generated by a computer).

* The list of the top 20 passwords rarely changes and 1 out of every 50 people uses one of these passwords.

So it's conceivable that 6M unique passwords could cover a very significant portion of a 120M user namespace.

Ref: http://xato.net/passwords/how-i-collect-passwords

It's neat that the hashes are unique enough to serve as their own key. Obvious in retrospect, but still neat.

Curious why some of the hashes have been obscured with 00000 but not all. It means more than one possible password could generate the remaining characters, but what does that help or protect?

6.5 million? Off the top of my head, assuming that passwords are only letters and 5 characters long this still wouldn't cover the possible space. [I think it's safe to ignore hash collisions]

Are you trying passwords you've used on other sites, or random ones? If it's the former, then LI might not be the only source for the file.

0. There are known cases of peoples' passwords (including my own) not on the list.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact