
I wrote BozoCrack to show why plain MD5 is a horrible way to hash passwords. - aparadja
https://github.com/juuso/BozoCrack
======
bad_user
I don't think it's demonstrating anything other than:

    
    
       1) many developers don't use salts / HMAC
       2) MD5 is popular
       3) hashed passwords end up on Google
    

From these 3 points I don't think it follows that MD5 is horrible.

Any hashing function would have the same issues, simply because a hashing
function is a mathematical function, so for any X from the domain of
definition, H(X) will always have the same value, on every call. Therefore, if
a hashing method is popular enough and developers don't salt their hashes,
then hashes for common passwords will inevitably end up on Google. However, if
you salt your hash calls with your own key, the hashes produced will be
different from everybody else's.

~~~
aparadja
Perhaps I should have written it as "unsalted MD5" instead of "plain MD5" to
avoid confusion. Unsalted MD5, in my opinion, is horrible. MD5 plays it's part
in the mess: it's quick to calculate, which means that anybody can churn out
huge lookup databases. Missing salts make those databases universally usable.

~~~
TillE
> huge lookup databases

"Huge" being the key word here.

Try searching for the md5sums of arbitrary 8-character alphanumeric passwords.
You won't find many results. 62^8 is a big number.

~~~
nobody3141592
I believe that with GPUs it's now faster to calculate possible MD5s on-the-fly
than use rainbow tables.

Hence the current advice is to use "alongpassphraseasyourpassword" rather than
"L33$Pa55wd"

~~~
FuzzyDunlop
This is where password management gets ridiculous, because you'll find a lot
of registration forms limit the length of your password input to, say, 12 or
16 characters.

Why? Is it not being hashed? I have a (possibly very wrong) inkling that a
longer phrase might increase the chance of collision but even so, so many
places enforce a strong password but force you to keep it short.

~~~
nobody3141592
The worst offender is NVidia. They have half-a-dozen different developer
logins for different bits of their site - and they all have different rules =
your CUDA one must have a symbol but the parralel Nsight one must not etc

------
gaving
Obligatory <http://codahale.com/how-to-safely-store-a-password/> link which
taught me a ton.

~~~
sdkmvx
I see that link referenced a lot and don't think that's a good thing. He's
right, but he doesn't explain why we should use bcrypt (or any other adaptive
password hashing function). Picking bcrypt without knowing why is just as bad
as picking MD5 without knowing why.

~~~
mentat
No, it's really not, as long as you follow current guidance on counts for
iteration. There are people smarter than <x> random person and sometimes
(often with crypto) it's better to follow their advice. Anyone can make a
system that they themselves cannot break. Don't be that person.

~~~
sdkmvx
I strongly disagree. If you are the person tasked with implementing password
verification, then you really should learn enough about the field to give a
rough explanation of why you've used bcrypt (or whatever you use). If you know
what MD5 (and other simple hashes)'s deficiency is (and Hale explains it),
then you should know how bcrypt, etc. solve that problem. One should follow
expert advice but not blindly.

------
aparadja
For those who don't want to read the source code, BozoCrack has a simple
algorithm. It googles the MD5 hash and hopes the first result page contains
the plaintext password. It usually does.

~~~
themouth
"It usually does" isn't quite accurate here. Common or weak plaintexts might
work, but for the vast majority of input you're SOL. Sure "nicetry" comes
back, but "nicetry99" produces 0 results and for every "nicetry" there are an
infinite number of "nicetry"+i hashes.

~~~
TheEskimo
Ah, you are technically incorrect in saying there are an infinite number of
"nicetry"+i hashes. There are an infinite number of "nicetry"+i passwords, but
eventually there will be collisions as the hash set stays a constant size and
the password set grows without bound. "Infinite" isn't a term to throw around
too lightly.

~~~
tedunangst
There are 2^128 possible MD5 hashes. When it becomes impossible to increment a
counter to a number, that's as good as infinite.

~~~
stan_rogers
The point is that it's unnecessary to find _the_ plaintext; all you need is
_some_ plaintext that produces the same hash value. It doesn't matter if your
actual password is "zipobibrok5x10^8" when "fordprefect" also gets you into
the system. (That, of course, only applies to a single system -- or to a
cluster of systems all using something like an unsalted MD5. It _would_ matter
if you're trying to leverage a password found on a cat fanciers' site to empty
someone's bank account.)

~~~
Dylan16807
I actually think you're missing the point here. While it is true that an
infinite number of strings correspond with each md5 hash, the question was
about trying to _actually find_ a match. With a suitably large hash, say 256
bits, it becomes physically impossible to even _count_ that high, let alone
compute that many test hashes. A problem that is too large to evaluate is
effectively infinite.

(Yes, md5 is 128 bits and might be possible if an entire country dedicated
itself to the effort. Or an attack on its flaws could be used. But both these
points are tangential to themouth's use of infinite.)

~~~
stan_rogers
With the technique under discussion (using Google to search for the MD5 hash),
it doesn't really matter what the computational cost of finding a plaintext
for the every possible hash value is -- you're not brute forcing a collision,
you're doing a search using someone else's enormous resources. That's always
going to be O(1) from your point of view (with a lot of overhead, of course).

------
tomelders
Tried it with a bunch of passwords that I know my friends use (yes, they treat
me like tech support) and it failed miserably with all of them.

It did however get a password that I use regularly (though not anymore) which
I though was pretty complex.

So, friends with "idiot" passwords 1. Their so called computer-expert mate
(me) 0.

I'm not so smug anymore.

~~~
pavel_lishin
I've used Google before at work.

We inherited a system, without inheriting the administrator passwords, that we
had to work on. It was a spaghetti mess, so creating a new account didn't seem
obvious, and I wasn't sure how the passwords were hashed, but they did seem
md5-like to me, so I googled one.

Turns out, 90% of them, including most of the admin passwords, were just four
numeric characters, like 9678.

------
storborg
Whenever one of these posts comes up it seems like there's a lot of comments
rushing to defend salted MD5 or SHA1.

What's actually wrong with bcrypt that prevents people from using it? Is it
not available on all platforms? Too computationally expensive?

~~~
16s
Integrate your email systems with Google mail or MS mail. You'll quickly find
that they do not accept bcrypt. Plain md5 or plain sha1 is all they support
(at least that was the case two years ago). When you are forced to inter-
operate with the big guys, you'll find not many actual use bcrypt.

~~~
storborg
If you need to authenticate from another party's hashes, why not run bcrypt on
those?

------
mcritz
I dub this “shameware” and declare it awesome.

------
georgefox

        wordlist = response.split(/\s+/)
    

Thank God I use spaces liberally in my passwords.

~~~
chimeracoder
Unfortunately, the same sites that are naive enough to use MD5 for
cryptographic hashing are also likely the same sites naive enough to use
oversimplified regexes that fail to validate all possible inputs.

(If I had a dollar for every time the 'emailaddress+foo@gmail.com' failed to
validate....)

------
Cushman
Seriously? Who is still using md5? There are strong hashing libraries for like
every language. Anyone reading this uses md5?

Can we find these people and just let them know?

~~~
maratd
> Anyone reading this uses md5?

I use MD5 all the time, just not for security.

~~~
throwaway64
Ultimately, data integrity is the same thing as data security. If you cannot
trust your data not be detectably corrupt in the face of a malicious collision
attack, you do not have data integrity. Collided data can be used to cause a
DoS, to overrun buffers, any number of nasty things that arbitrary user data
can cause when trusted implicitly.

~~~
maratd
You are making assumptions that are unwarranted. There are other uses for MD5
besides data integrity and data security. For instance, I generate an MD5 hash
of a user's email address and use that hash for Gravatar. While someone who
knows quite a bit about MD5 and the other person's email address, can force a
collision ... all that would get him would be the other person's avatar image
... which is public anyway. In other words, using MD5 is perfectly fine in
situations where collisions don't pose a serious problem.

------
MrEnigma
A simple salt would fix the issue with it finding on google (unless your salt
is incredibly common).

~~~
aparadja
Yep. I find it fascinating why plain unsalted md5 hashes are as common as they
are. Developers go through the trouble of hashing, but don't go the single
necessary step further.

~~~
Torn
Salted md5 is still surely laughably weak in an age of GPU cracking?

~~~
aparadja
Sure. I guess there's not a whole lot of excuses to avoid bcrypt these days.

Bozo's idea was to show that unsalted MD5 is, for most passwords, as bad as no
encryption at all. An attack doesn't get much easier than a lookup table.

------
phzbOx
To the other posts saying how MD5 is not bad and/or it's stupid to use MD5
without salt (or whatever):

See this script more as a fun little hack rather than a "Formal proof". Thanks
aparadja for sharing. That being said, using MD5 without salt is asking for
trouble. I mean, I know security is usually just a time vs $ vs quality
problem, but it costs almost nothing more to add a salt in front of the
password. Why not do it?

~~~
tptacek
Using MD5 _with_ a salt is asking for trouble.

Using SHA256 _with_ a salt is asking for trouble.

Use bcrypt, scrypt, or PBKDF2. Do not DIY your password hash.

~~~
peterwwillis
A question on the practicality of expensive compute time for password hashes:

If somebody got read-level access to your password hashes, it follows (based
purely on the assumption that any app with the rights to read the hash will
probably have the right to change it when applicable) that one could simply
overwrite the password hash with a new one that is already known, gain
unauthorized access, and change the hash back to prevent the user from finding
out. Unless you really need that password to crack multiple accounts that
might be reusing it, it seems unnecessary. (And honestly I wouldn't care to
implement these special password hashes just to protect extraneous accounts of
my customers which I don't control)

 _edit_ I should clarify that while I agree that expensive compute time for
password hashes helps prevent the ultimate compromise of a user's password, I
find it a much more worrisome prospect that somebody got access to the
database in the first place. To me, a person's password strength is almost
irrelevant compared to the importance of preventing brute-force attacks on a
login API or ensuring the integrity of the password database and db apps.

That being said, for those wishing to implement a bcrypt-type hash:

    
    
      perl -le'print crypt("something", "\$2a\$random")'
    

should give you blowfish-encrypted password hashes on systems with it patched
into glibc. ("6" instead of "2a" for SHA-512)

~~~
throwaway64
security is about defense in depth, and doing as much as is feasible to
mitigate damage. Proper password hashing wont protect you from a break in on a
primary system, but it could very easily prevent a break in on secondary ones.

Also, a note to anyone reading the above post, that _is not how bcrypt works_
and is incredibly insecure.

~~~
peterwwillis
What is not how bcrypt works? The example that I gave? That comes from the
'crypt' man page explaining how to use blowfish encryption patched into
libcrypt?

------
alpb
Great. I think `bcrypt` is another way of hashing passwords. Read this: How to
safely store a password <http://codahale.com/how-to-safely-store-a-password/>

------
spacehaven
Note to self: poison search results for md5 hashes of my passwords.

~~~
feral
Because that way, anyone sniffing or monitoring your traffic doesn't even need
to crack the systems to steal the hash, and so you'll be saving everyone some
time?

Edit: I'm trying to illustrate the general principle, that you shouldn't take
any action thats visible outside your secure perimeter, that depends on
knowledge of your password.

What you define as 'outside the perimeter' depends. In the case of your
corporate systems, its probably everything outside the corporate network. In
the case of your gmail password, its everything outside of [your computer, the
SSL connection to google's auth servers, and those servers].

You shouldn't ever leak any information outside that perimeter, that reveals
knowledge of your password.

Its generally pretty hard to steal the password hash; if you start revealing
what your password hash is to someone doing passive analysis, you compromise a
lot.

If its worth thinking about poisoning hashes to protect, then don't try and
poison the hashes!

~~~
watty
If someone is sniffing his traffic he has much bigger things to worry about...

------
zaken
I don't understand. This is pretty much the definition of a dictionary attack.
The dictionary is just stored and accessed through Google rather than packaged
with the program.

------
crtv
#1 rule - don't use unsalted md5. I think md5(md5($pass) + $seperator + $pass)
works great.

------
snorkel
You need an app for this?

~~~
icebraining
An app proves automated cracking is possible. Doing it manually is often
unfeasible, therefore many don't worry about such security issues.

------
teflonhook
There's a bunch of MD5 search engines with rainbow tables etc. plugged in, it
could tap into that easily as well.

~~~
bluehex
I think this is exactly why this method is so effective. Google's index all of
those sites.

The first result when I searched the hash for superman
(84d961568a65073a3bcf0eb216b2a576) was a link to a page titled literally
"Google Hash: md5(superman) = 84d961568a65073a3bcf0eb216b2a576", the page is
hosted at(<http://www.nth-dimension.org.uk/utils/ghash.php>), basically
someone's gone through the trouble of making a rainbow table that's easily
crawlable that makes this method of lookup via Google even easier.

The page has a description:

    
    
      > Google Hash is a PoC implementation of an hash search engine using Google.  
      > Unlike other implementations, the aim here is to get Google to store the  
      > word and associated hash. We do this by putting them into the title where it  
      > will always be stored by Google's spider....
    

The next top hit is md5rainbow.com etc. etc. I would guess that most of the
positive results from BozoCrack.rb are thanks these sites.

------
drivebyacct2
The exact same thing could occur if another (unsalted) hash becomes as
popular?

~~~
elisee
Yes, as long as it's designed for speed, it's not a good fit for storing
passwords. That's why people should use bcrypt or an equivalent. It has both
built-in support for salting & a work factor.

(Source: <http://codahale.com/how-to-safely-store-a-password/>)

