
Storing Passwords Securely - pw
http://throwingfire.com/storing-passwords-securely/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+throwingfire+%28Throwing+Fire%29#notpasswordhashes
======
sillysaurus
I know I won't be able to easily convince anyone of this, but I thought I'd
mention...

Colin Percival's "scrypt" password hash is:

1) production-ready (and has been for a long time)

2) superior to bcrypt, as it is designed to be expensive in both CPU and
memory (hence, scrypt is "memory-hard", whereas bcrypt is not)

I don't have time to go into further detail. I encourage you to check it out.
It's quite simply "the future of password hashing". (Bcrypt will be defeated
by natural advances in multi-core hardware; scrypt won't ever be.)

Passwords hashed with scrypt with sufficiently-high strength values (there are
3 tweakable input numbers) are fundamentally impervious to being cracked. I
use the word "fundamental" in the literal sense, here; even if you had the
resources of a large country, you would not be able to design any hardware
(whether it be GPU hardware, custom-designed hardware, or otherwise) which
could crack these hashes. Ever. (For sufficiently-small definitions of "ever".
At the very least "within your lifetime"; probably far longer.)

~~~
AgentConundrum
I think I answered my own question, but I'll ask anyway - why does memory
intensiveness matter?

~~~
tedunangst
GPUs (and similar hardware) have a very limited amount of memory per core. If
computing a hash requires 100MB of memory, then even a 1GB graphics card can
only execute 10 cracks in parallel, not hundreds or thousands.

~~~
xxpor
Or the hundreds of millions.

------
pud
The article says to give each password its own unique salt and then _store the
salt_ (as well as the salted hash) in the database.

This seems like a bad idea to me. If a hacker gets access to the salted
passwords, in this case he'll probably figure out how to get access to the
salts too.

I figure if the salt is stored in the code (or a config file...) rather than
the database itself, at least it's two different hacks to get (1) the salted
hashes, and (2) the salt.

Am I misunderstanding?

~~~
ashconnor
No it's not a bad idea and essentially it's what BCrypt does.

Think about it. If you use the same salt for all passwords then I can easily
create a rainbow table consisting of "keyword" + salt hashes and doing so I
can crack multiple passwords.

However if there is a different salt for each password then this type of
attack becomes more difficult as I have to essentially do the same amount of
work for only a single password.

Finally, we need to store the salt in the database because it's needed in
order to recreate the hash using the user's password for authentication.

~~~
masklinn
> If you use the same salt for all passwords then I can easily create a
> rainbow table consisting of "keyword" + salt hashes and doing so I can crack
> multiple passwords.

That's not a rainbow table, a rainbow table is a list of precomputed hashes
(which you'd just do once on a cluster you pay — of buy from a guy who did —
and then store/stash)

But you're brute-forcing the whole table (you can look for matches after each
computation) instead of having to individually brute-force each password. But
you wouldn't keep the forced salted hashes around since there's no need for
them once you've tested the leaked hashes.

~~~
ashconnor
Yeah I should have stuck that in quotes. I was trying to keep the explanation
as simple as possible.

You are right of course there is no need to keep hashes about once they have
been compared against the db.

------
DanielRibeiro
SRP note was very nice:
<http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol>

On a more esoteric note: If you are looking to resist quantum algorithms
attacks, there are post-quantum algorithms for that[1] (they are computed on
normal machines, but the problems behind the crypstosystems are _hard_ to
solve even for quantum computers).

[1] [http://crypto.stackexchange.com/questions/494/what-is-the-
po...](http://crypto.stackexchange.com/questions/494/what-is-the-post-quantum-
cryptography-alternative-to-diffie-hellman)

~~~
tptacek
SRP also has tunable work-factor knobs, although they aren't as explicit as
the ones in bcrypt, scrypt, or PBKDF2. But I'd strongly recommend against re-
implementing SRP, since it's treacherous to get right.

~~~
shabble
Of the implementations listed in the WP article, which, if any, are suitable
for general purpose use? And what manner of implementation screwups are most
likely|hazardous?

[With the obvious caveat: Advice not for production use, If you have to
ask...]

~~~
tptacek
If you're going to use an SRP library, there's an open-source Stanford SRP
library (most commercial systems derive from it) which deals with the obvious
attacks.

------
jharding
If you're storing passwords as MD5/SHA hashes, how difficult is it be to
switch over to bcrypt? I've never had to do this, but I would imagine it would
be somewhat trivial. With all of the password leaks that have happened over
the past few years, I'd imagine a good amount of developers are aware that
storing passwords as MD5/SHA hashes is somewhat risky, so I can't understand
why big websites (LinkedIn) are still doing it.

~~~
kmfrk
In Django, as I recall, you just check for a hashing indicator that's prefixed
to the hashed password, and do something like this on a user's log-in:

    
    
        if hashed_password.startswith("sha$"):
            hashed_password = bcrypt(hashed_password)
    

(or `... = "bc$" + bcrypt(hashed_password)`. However it's done.)

Here is the relevant code for django-bcrypt:
[https://github.com/dwaiter/django-
bcrypt/blob/master/django_...](https://github.com/dwaiter/django-
bcrypt/blob/master/django_bcrypt/models.py#L55).

In your case, you could probably do this:

    
    
        if not hashed_password.startswith("bc$")\
           and sha(entered_password) == hashed_password:
            hashed_password = "bc$" + bcrypt(entered_password)
    

You don't have the prefix identifier, but that's okay; you just roll out an
equivalent now instead, so you only have to check the start of the hash string
and do the conversion, if it hasn't already been performed.

Of course, you have to account for the prefix identifier when validating an
entered password against the stored hash.

YMMV.

~~~
kmfrk
On a quick revision, the first code should read

    
    
       hashed_password = bcrypt(entered_password)
    

Not `bcrypt(hashed_password)`.

------
fein
I liked the article and it was a nice little afternoon read, but the whole
thing could have been condensed to "use bcrypt for passwords".

I'm not really sure where to stand on this. On one hand, we have PLENTY of
security articles stating the same thing (bcrypt, bcrypt, and just in case
you've forgotten... bcrypt), which leads to an observed over saturation of the
same subject matter. On the other hand, we have a huge company like LinkedIn
that doesn't have the presence of mind to use something other than vanilla
SHA-1. Maybe there's just too much lazyness/ stupidity in the world to require
a constant barrage of the same security articles every week.

~~~
pmylund
I agree there are a ton of articles saying "Use bcrypt." After Coda's post
(<http://codahale.com/how-to-safely-store-a-password/>) it's almost become a
meme. I don't, however, think that the people who say "Use bcrypt!" tend to
explain why they say that.

I think the reason that this happens so often is that regular developers just
don't care. But that's because they don't know why they should care. Given a
proper explanation (and an attention span longer than "Squirrel!"), any
reasonable developers would (at least, should) care.

~~~
ticks
Indeed. There's a developer bubble around Hacker News and websites like Stack
Overflow, where topics like bcrypt have become second nature. Whereas standard
developers, i.e. those that program solely as a job, they don't inhabit these
places.

------
stickfigure
How about this: Don't store passwords at all.

There are a multitude of sophisticated third-party solutions to
authentication. Facebook, Twitter, and Google all offer competent solutions.
Don't like those? Use BrowserID.

Integrating any of these is actually quite a bit easier than rolling your own
solution. It reduces hack risk, provides a better experience for your
customers (what was my password again?), and almost certainly will be more
reliable than your website.

~~~
pmylund
OpenID and OAuth really did a lot, but there's just nothing called "don't use
passwords." Fingerprint readers suck. Anything biometric that doesn't suck
costs too much, and 99% of people don't have them. A good KDF is not bad in
comparison to a centralized authentication server considering other factors.

Someone, somewhere will be storing user passwords/digests for the foreseeable
future. And they will do it incorrectly.

~~~
stickfigure
Sure, but the number of those people should become vanishingly small over
time.

HN is full of web developers rolling unnecessary username/password solutions.
The fact that this is such a hot issue - as opposed to esoterica like TCP
frame size - shows that _far_ too many developers are homebrewing solutions
rather than outsourcing.

~~~
pmylund
I agree, but "outsourcing" includes using libraries written by people who know
what they're doing. (And not using libraries written by people who know what
they're doing, but which are the wrong tools for the job.)

------
peeters
I used to hear some controversy with regards to "stretching." The argument
back in the day was, "it's partially security through obscurity, but the
danger is that there isn't research to prove that a hash of a hash is
cryptographically strong."

So is there research that proves that hashing a hash of a hash of a hash
(x100000) doesn't result in a smaller range of values than a single hash for
SHA algorithms? Is there no such convergence?

~~~
tptacek
Stretching isn't "security through obscurity". It's "security through
increasing the attacker's cost by a huge amount while increasing your own cost
by a minimal amount".

But don't use stretched SHA1. Use bcrypt or scrypt or PBKDF2, all of which
explicitly address this particular concern.

------
mparlane
An even better way of securely storing your passwords would be to mix them
around on entry to your bcrypt hash function in a unique way that makes it
impossible to brute force your leaked password hashes without having access to
the code that did them.

~~~
pmylund
So something like a HMAC digest generated using a pepper stored in the source
code/binary or on disk before passing it to bcrypt/scrypt? :)

This only really protects against SQL injection attacks, though/when there is
actually a separation between where you store the bcrypt digests and where you
store the pepper. (Granted, there are a lot of SQL injection attacks.)

~~~
mparlane
Exactly, most commonly with these things is that the db was dumped which does
not imply that the source code was accessed. If the source code was accessed
they normally don't need a db dump. (unless it was read-only)

The first section of the article IMHO was not needed in regards to a simple
hash. Forums have been hashing their passwords with salts for how long now ?

~~~
pmylund
Sure, but I try to not make any assumptions without being boring. I think the
goto-link at the beginning works fairly well.

------
teyc
If you happen to have a web app that stores passwords in clear text or SHA-1
hashed, all is not lost. You can apply further secure hashes to the existing
value stored on db and update your authentication validator.

~~~
heretohelp
Or do the easy thing and perform a chinese fire-drill whenever the user logs
in and bump them to the new encryption schema.

Or invalidate and send an email.

~~~
fl3tch
Interestingly, when reddit upgraded to bcrypt, they refused to do this because
so many users don't know their passwords and it would lock lots of people out
of their accounts forever (remember, reddit doesn't require email addresses to
register).

[http://reddit.com/r/changelog/comments/lj0cb/reddit_change_p...](http://reddit.com/r/changelog/comments/lj0cb/reddit_change_passwords_are_now_hashed_with_bcrypt/c2t7hoa)

~~~
heretohelp
Good on them for knowing their users?

------
uzero
Excellent reading - I spent countless hours researching all this stuff for a
project few years back. After my research I came up with very similar protocol
than SRP but I find that SRP is a nice POC for both protocols.

One thing I didn't find solution for is keyloggers and other similar attacks.
If you look at the whole securing your service as a whole, you have to
acknowledge the risk of keyloggers also. Now with Flame, Stuxnet and all the
other nice things still in the shadows keyloggers can suddenly become also a
risk in a large scale.

------
nshankar
I was just testing with www.Leakedin.com for a possible breakup of my LinkedIn
password. I gave my password and said it is leaked. But when I placed a random
value, like dsfsfgfdsgsd it said hoorah!, not leaked. So, the onus is on the
users, who don't want their passwords to be leaked, rather than depend upon
someone to keep it safe.

------
trjordan
OK, so I get the message. Use bcrypt. Don't worry, that's what I'll do in
production.

On the other hand, if it's so hard to roll your own, can somebody point out
the security flaws in the given Python function? Seems pretty straightforward
to my untrained eye.

~~~
rmc
The check of hash to input uses == which will shortcircuit and return quicker
as you guess the leading digits allowing you to figure out what the hash is.
(i.e. a timing attack)

~~~
tptacek
Try to explain how you would actually conduct that timing attack to see why it
isn't one.

~~~
indygreg2
If you are going to go through all the effort to do it properly, you might as
well use a proper comparison function. If nothing else, it reinforces the
knowledge that string comparisons can be part of security (which goes
overlooked by many).

~~~
pmylund
I'm not sure I agree that it would matter, but, either way, using a constant-
time equality function might have given readers the impression that my code
was safe to use. It isn't. That was never the intention. One of my main points
was that it's extremely hard to do properly. My (pseudo-code) examples were
intended to explain the concepts of salting and stretching. Perhaps it's
unfortunate that they're actually valid Python.

People should use proven KDFs for password authentication, not implement their
own (including using my SHA-salting/iteration examples.)

Edit: removed "in web apps"

~~~
tptacek
Many timing attacks are viable in web applications. But there aren't timing
attacks against password hash comparisons, for obvious reasons.

~~~
pmylund
You're right. That was an unfortunate choice of words.

------
rokhayakebe
Why does everyone need to be an expert in password generation and storing?
Would a password as a service would be feasible or even authentication as a
service? (And I do not mean FB or Twitter auth)

~~~
apendleton
Everyone doesn't, and shouldn't. That's why the best practice is not to
implement your own hash function, and instead use a third-party library
written by an expert.

I'm not sure that password-as-a-service would be worth the overhead involved,
but password-as-a-library is functionally equivalent from a developer's
perspective and is already the norm. The only question, then, is "which
library?" which is what this article attempts to address.

------
SonicSoul
i've had good experience using KeePass <http://keepass.info/>

it is open source, and supported by all platforms i use
(windows,osx,android,ios) and the interface is pretty well designed. Once
local database is open, simply doing ctrl+c on any of the sites copies the
password to clipboard for a very limited time.

this is still a major pain, especially since you need to protect the safe with
a long password and this is particularly painful to type on mobile devices.

------
billymillions
bug in the code.

return getDigest(password, salt) == digest

getDigest returns a tuple

~~~
pmylund
So much for white-boarding it. Fixed, thanks.

------
pnayak
great article!

------
rorrr
> _MD5, SHA-1, SHA-256, SHA-512, et al, are not "password hashes." By all
> means use them for message authentication and integrity checking, but not
> for password authentication._

Bullshit. MD5 is just fine, as long as you use the salt.

Here, hack this:

    
    
        MD5(password + salt) = "b520542710812f347432232b2a1fba83"
    
        salt = "MD5 rules"

~~~
ghshephard
I mean this in the most polite and professional manner possible, please take
five minutes and read <http://codahale.com/how-to-safely-store-a-password/> .

It will explain why "salts are useless for preventing dictionary attacks or
brute force attacks."

The entire article is excellent - and every colleague who I've ever pointed at
it, has come away nodding their head and seeing the light.

The key-takeway (but please, read the entire article) is: "It doesn’t affect
how fast an attacker can try a candidate password, given the hash and the salt
from your database."

Salts only help you from precomputed dictionary attacks ("Rainbow Tables") -
but, if someone is brute forcing you, the value of a salt just disappeared.

~~~
romaniv
So here is a question that I've asked and received no good answer. Why size of
the salt doesn't matter? Wouldn't salt be used in every hash computation?
Wouldn't large (megabytes) salt slow down this computation and require more
memory to perform it? I'm not advocating the use of large salt as opposed to
specialized functions, I'm just curious as to why it doesn't work. The article
doesn't explain that.

~~~
pmylund
It's not that the size doesn't matter; it's just that it's not as significant.
It becomes very, very hard to compute rainbow tables after just a few random
bytes. No matter how long the salt is, it doesn't do anything to prevent
somebody from trying to guess the original input for a given digest using a
brute force approach, though. So usually the salt has less than e.g. 256 bits
of entropy just because it takes up less space.

Sure, a very large salt might slow down the first iteration a little (but not
necessarily subsequent ones, and it wouldn't require more memory, at least
with most hash functions), so you're almost always better off just stretching
the key--then you save the storage costs too.

------
hobbsq
HAHA...

Maybe Linkedin should've used this ;-)

