Hacker News new | comments | ask | show | jobs | submit login

Instapaper stores only salted SHA-1 hashes of passwords, so those are relatively safe.

--

Obligatory statement on NEVER USING SHA-1 HASHES to make passwords "safe".

Any normal person can brute force millions of SHA-1 hashes (salted however much you want) per second on a GPU.

If the FBI so wanted (although I don't believe they do) I'm sure they could brute force almost every single password in that database. Granted, it's the government and they have better ways of obtaining such information, but if there is someone the FBI is watching on Instapaper's databases and they so wanted, storing the SHA-1 hash of the password all but handed them over to the FBI.

I am now glad my Instapaper password was generated randomly, 16 characters long, and I will now change it just to be safe.

For anyone running a database which stores ussername/passwords, take a look at bcrypt or scrypt. They're millions (no, I am not exaggerating) of time better than SHA-1.

(Edit: Grammar)




This is more "crypto nerd imagination", a la the XKCD comic. The FBI doesn't care about the encrypted passwords because it has access to all the content in plaintext. And what else would they need the passwords for? Other accounts on other services? They can just confiscate those servers too, where the content is most likely also in plaintext.

So in this case, where the FBI is involve, using a SHA-1 hash poses no extra security vulnerability.


They can just confiscate those servers too, where the content is most likely also in plaintext.

I imagine that many companies are better prepared to deal with the FBI than this data center was. I have a hard time imagining the FBI going into a Google data center and easily walking out with a few racks. But even if that's too optimistic, I doubt the FBI could go about seizing servers for very long. If nothing else, this would eventually piss off big companies who will lobby Congress to curtail the FBI.


In this case since the warrant probably didn't allow for the seizure of Instapaper's servers / data you run a serious Fourth Amendment risk of any evidence within being inadmissible. That said even if it is inadmissible the FBI now know things they might not have known before. There is the obvious point that there is very little likelihood of any direct evidence of a crime in Instapaper's data, there maybe indirect or circumstantial evidence though.


"So in this case, where the FBI is involve, using a SHA-1 hash poses no extra security vulnerability."

meeeh...

remember the fbi is not a person, it's an organization. the org can have bad actors in it who might be able to access the encrypted passwords but not be able to confiscate servers.

also, confiscating a server(s) is much more visible / detectable...


> can brute force millions

Modern consumer video cards can do billions per second now. You might as well just store them in plaintext instead of using SHA1/MD5 with or without salting. :/


I dont understand. If you can use mixed-cased, letters and symbols you have 26 * 2 + 20 = 72 possible characters.

72^8 >> 1e9

It would still take more than 8 days to brute force at 1 billion/sec. And using a longer password (16 chars?) would make this a very long time.

Or is there other trick that makes this fast? Or, is it simply that people don't choose random, long passwords?


Secure password hashes don't protect users, and particularly not users who use one-time effectively-random passwords.

Secure password hashes protect application developers from the disclosure of hundreds or thousands of user passwords from their database. It allows them to attest to their userbase "your password is cryptographically stored in a manner that makes them hard to break even by dedicated hardware; you should consider changing your password if it's weak and shared", instead of, "expect to see your password on Pastebin any day now".


To clarify, I assume you mean that using secure password hashes instead of insecure ones does not help users who use one-time effectively-random passwords, because those users are already safe?

That is true.

However, it seems to me that the combination of an effectively-random password and password hashing does protect users, because their password is not effectively crackable in a situation like this. Additionally, there's a tradeoff between how secure your password hashing is and how much randomness users need to put into their password: every additional factor of 1000 in the iterations of the hash saves you a random character or two.


I wish everyone could use complex, unique, strong passwords all the time, but some use cases just don't support it. For example, I have to type my Apple ID into my iPhone/iPad what seems like every 5 minutes in iOS. Without access to 1password or a similar tool, I just can't use a strong password. Even if I did, I couldn't change it as often as I'd like to. FWIW, I wish I could.


security vs. convenience


My point was that the choice is sometimes not left in the user's direct control. If I thought I could choose an absurdly strong password (e.g., to overcome the shortcomings of the developer's choice of SHA1), I would always do that – except if I'm going to need to enter that password from memory a bunch of times per day.


You're looking at more like 5+ billion/sec today. There was a listing of modern consumer video cards + their hashing capabilities posted on another HN story recently, but I can't find it.


https://en.bitcoin.it/wiki/Mining_hardware_comparison

Not sure if this is what you were looking for or the particular hash bitcoin uses, but a $110 Radeon 5830 can get you around 250Mhash/s


"Hashes" aren't interchangeable, they're an abstract concept. In this case, the thread is talking about the SHA-1 hash algorithm. AFAIK Bitcoin uses SHA-2 w/256-bit digests.

So, actually, in this case the two happen to be related but different - SHA-1 being considered potentially flawed but not (yet) the stronger SHA-2.

Also, the Bitcoin block headers that are hashed are (I think) 80 bytes (640 bits) long, salted passwords probably a bit shorter.

Wikipedia says SHA-256 runs about 2/3 the raw throughput of SHA-1. You can the comparative rate yourself on any nix computer:

$ openssl speed sha1 sha256 -snip- Doing sha1 for 3s on 16 size blocks: 7760096 sha1's in 2.99s Doing sha1 for 3s on 64 size blocks: 5502820 sha1's in 3.00s -snip= Doing sha256 for 3s on 16 size blocks: 5460366 sha256's in 3.00 Doing sha256 for 3s on 64 size blocks: 3169031 sha256's in 3.00s -snip- *

So... about 2/3 faster. I don't know enough about crypto implementation, but I'd guess this ratio would scale roughly to the much faster GPU implementations as well.


Yes, I have 2 6950s, which together average to about 660MH/s on hashkill mining bitcoin, but when I tried MD5, IIRC they averaged to about 3200MH/s. I'd assume SHA1 to be slightly slower, but not much.


> You're looking at more like 5+ billion/sec today.

Sounds about right http://blog.zorinaq.com/?e=43 And a few times more if it's md5.


> Or, is it simply that people don't choose random, long passwords?

That's always part of the problem.


If my understanding is correct that's not the issue here. Hashes are meant to be one-way functions, the developer can easily check if a user's password matches the hash, but it should be practically impossible to deduce the password from the hash.

What the user chose as their password should be irrelevant if using a good hash.

[Edit: I stand corrected on the effect of password length.]


SHA-1 is a reasonably good hashing algorithm, but for the sake of argument, I'll talk about an imaginary SHA-4 which is perfect in every respect. It will be a 4096 bit hash function which has no faster-than-bruteforce collisions or preimage attacks or second preimage attacks.

Let's also assume that this perfect SHA-4 function is freakishly fast, say, a million times faster than SHA-1.

Now, even though my imaginary SHA-4 function is perfect in every way, it would be strictly worse to use this for password hashing than SHA-1. Why? Because cryptographic attacks aren't the problem here. The problem is that the entropy of a user's password is very VERY small. So small, in fact, that attacks on passwords aren't done through cryptographic weaknesses, they are done by simply hashing everything someone might pick as a password and asking "did I get it right?". An attacker will repeat this process for a little while, and eventually they'll get the answer "YES, this user chose to make abc123 as their password!".


Right, which is the reason why "perfect in every respect" and "freakishly fast" are mutually exclusive in a hashing algorithm.

A "perfect in every respect" hash then would be one that takes a consistent, acceptably-long time. Some large fraction of a second perhaps.

Of course, this fictional hash wouldn't be the right choice for everything. But for password hashing, it's a good start.


... Which is pretty much what bcrypt is.

http://codahale.com/how-to-safely-store-a-password/


(Edit: see child comment -- I was responding to something other than what was intended. I'm leaving this here for clarity, but you can ignore it.)

No, not really. Hashing functions aren't designed for passwords, they're mainly used for integrity checks and other uses which need to be fast: why do you think one of the axes the SHA-3 hashes are competing on is speed?

You have your 10gb file and want to send it to your coworker and let him know it's really yours and no one has messed with it. So you run an HMAC over it and then sign it with your private key.

You want it to be as fast as possible. It would be optimal if there was a single x86 instruction called sha4 which did this in the time it takes to do an add.

Hashing is really, really not meant for passwords.


Woah, slow down, I think you whipped up a 4 paragraph reply before you ever got to my last sentence.

Or, go on and tell me more about all the things hashes are used for as if I just fell off the turnip truck.

This discussion is not about checksums on files. It's abotu passwords. And your "perfect hash" in your example about passwords is "freakishly fast." In fact, like the other guy that replied to me mentioned, this is the entire point of the workfactor in bcrypt, right?

Of course, bycrypt is really, really not meant for checksums. Good thing nobody was talking about checksums then.


When you wrote "this fictional hash" I read that as talking about the SHA-4 I made up, not the one you did.

I then didn't respond to the rest of the post because when you said "for password hashing, it's a good start" I again assumed you were talking about my hash function, which is not good for hashing. Yours would be perfectly fine.

I apologize.


Apology accepted, and thank you for adding a lot to the discussion further down the page (the interesting maths related to the probability of collisions on a hashed-hash)


>Let's also assume that this perfect SHA-4 function is freakishly fast, say, a million times faster than SHA-1

This is why they invented key stretching: http://en.wikipedia.org/wiki/PBKDF2

Just set number of rounds to 10000000000 instead of, like, 10000 for SHA1.


Sure, but if you can run through billions of possible hashes per second then it's possible to brute force the "one way" hash by exploring hashes for short passwords.

1 billion passwords per second is 86 trillion passwords per day. If a hacker wanted to gain access to a particular password then it can become trivial to crack with fairly modest resources. The only thing that protects you is a sufficiently complex and long password.


But there are rainbow tables, or tables of all of the MD5s/SHA1s/<insert favorite hash algorithm> for arbitrary strings. So the time's already sunk in.

8 days for one password is a very short amount of time comparatively (tiny for a botnet). If you use bcrypt, which you can force a certain complexity on, you can get that amount of time up much higher.


Rainbow tables don't work even against amateurish salted hash schemes.


As they have the Web code base, you must assume that they have the salt to the hashing... If they actually want to get these passwords, all they have to do is generate rainbow tables using SHA1 and the appropriate salt. We're back to relying on the length and bit depth (range of characters) of the passwords you are trying to find.


Salts, done properly, vary per user not per server.


but I'm assuming that if they have the code base, the plaintext user names (emailaddys) and the salted password, then they would have whatever the per user salt is.


Right, but if the salt varies per user, then you end up doing a bruteforce on each user's password; it's no longer a precomputation attack. There are no "Rainbow tables" in this case.

However, if you find Hale's bcrypt page (http://codahale.com/how-to-safely-store-a-password/) convincing, and I do, salting really doesn't matter because with modern GPUs you can bruteforce a reasonably-sized alphanumeric password, if the hash algorithm is a general-purpose (read: fast) one.

The solution is not salt, the solution is to use a purposely slow hash function.


I have been thinking about switching everything to bcrypt, but there is definitely way too much confusion about bcrypt vs scrypt, how many rounds to set for bcrypt, etc. What is the definitive source for figuring out what the new standard should be? Does anyone have any links to something that's peer-reviewed and approved for use by someone with enough authority to do so?


No there isn't. You only think that because when geeks discuss anything that involves one or more knobs, a huge debate must necessarily ensue about the proper values of those knobs.

Just use the bcrypt defaults. You will be fine. You will in particular be so much better off than salted SHA-1 that this topic will be mooted. Later on, maybe in 5-10 years, you can re-engage with the debate about what a good cost factor for bcrypt will be in 2020.


In the defense of geeks, this probably follows from the mantra that you should never ever run any command on your system ever without completely and fully understanding what it does and all of its options and etc. etc.

So now people are twitchy about "just use the defaults", especially when it comes to something they don't really understand, like cryptography.


As a geek let me just say that it is all love with me and the geeks. Just: in this case, you can just take the defaults and be better off.


OK. By the way, in case you haven't heard it lately, thanks for hanging around and demystifying this stuff for so many people. It's a huge help.


as is your comment.


What's a ready to go bcrypt library for C/C++? I mean include headers, link lib / so, and call a function.

I've been looking into this over the past few days, and I've decided to just extract the relevant files from py-bcrypt, and get rid of the compatibility layer.


The C reference implementation is here:

http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/crypt/

Its from OpenBSD and implemented by the developers of the algorithm. It is what the Python/Ruby/Lisp/PHP etc. versions are derived from or wrap


Thanks!


Thanks. I am just trying to navigate the sea of misinformation that spews forth everywhere about salted hashes vs bcrypt vs scrypt. I can see that lots of people claim that bcrypt is better, but I am not aware of anything about it other than the original paper. Basically, I want to know what the chances are that two months after I implement bcrypt a huge issue with it will be discovered and I'll have to move everything to some new scheme.


There is virtually no chance that, after selecting bcrypt, you will be forced to scramble to replace it in 2 months.

There is no chance that, after selecting bcrypt, you will be forced to scramble to replace it with salted SHA-1 hashes. bcrypt is strictly better than what you're doing now.


Fantastic. One more question: does increasing the work factor automagically upgrade existing passwords in some way? As in, will bcrypt passwords created today be strong enough in 2020?


The Ruby on Rails auth system I use, Devise [1], will automatically update a user's password to use your new work factor on their next login.

You could do something similar.

[1] https://github.com/plataformatec/devise


No; you'd upgrade them incrementally.


getsat and tptacek have already answered your question, so I won't rehash that (pun wholly intended), but I should point out that one interesting property of PBKDF2 is that you can increment the work factor (number of iterations).

PBKDF2(password, iterations=10) == PBKDF2(PBKDF2(password, iterations=5), iterations=5)

Thus you could, say, increase the number of iterations every month. All that said, you should still use bcrypt; this is just an interesting property IMO.


That does introduce a security concern though. While it might be hard in practise, if you have a copy of a hashed password iterated 200 times, then a copy of the same hashed password iterated 300 times and have cracked the 200 iteration hash, you could verify the other hash is the same by applying 100 iterations to the hash. To solve this you would want to change the salt whenever you change the password, which involves doing all the iterations again. Then you are no better off then using a non-incrementing solution like bcrypt. The only situation where you wouldn't be able to make a new salt, however, is if the user hasn't logged in for a while (which is quite possible for single-use accounts on websites).


I got into a debate on StackOverflow over bcrypt vs salted SHA1:

http://stackoverflow.com/questions/3722780/do-any-security-e...

I think I'm right in choosing bcrypt, but one interesting argument against it was that, being slower, it would facilitate DoS attacks. You want the password hashing to be slow to prevent brute-forcing, but, if it's too slow, attackers could supposedly DoS your login system by trying tons of passwords.

I'm not a security expert, and I didn't know what to respond to that. How would one mitigate this problem? Is it even really a problem?


Just like you don't want over 60 requests per second from a client, you don't want them to be able to allow that much log in attempts. Look at Gmail failed login process. A captcha is required after 3 failed attempts is preferable to a "you can't login after this many attempts" that I remember getting on a forum.


One of the fundamental ideas of cryptology is using the right algorithm for the type of data and the length of its required security. If the cost required to break an algorithm is greater than the value of of the encrypted data, you're probably safe.

You can always store the password of the users again and update the crypto used, (more iterations, different digest algorithm). It's never a question of if it will be broken, but when.

Choosing iterations for a PBKDF takes a bit of common sense, yes if you're going to roflscale and think 100000000 iterations is a good idea currently, then you may run into performance issues.

The correct balance is performance vs security and you can only choose one. You want to authenticate the user as fast as possible while also making it unfeasible to recover the data. As with everything, a little common sense and knowledge goes a long way.


scrypt slides: http://www.tarsnap.com/scrypt/scrypt-slides.pdf

Takeaway: Cost to crack one MD5 password: $1. Cost to crack one scrypt password: $50M to $200B.

You want your login to be slow compared to the rest of your application. It's okay to take half a second to verify a login.


scrypt is better than bcrypt, but not by the same margin that bcrypt is better than salted hashes. Salted hashes are a straight-up vulnerability. bcrypt is a best practice.

Note that almost nobody uses scrypt. We don't recommend it, not because it's insecure, but because it's painful to implement for most companies.

But use either. Or just use PBKDF2. All of the adaptive hashes are fine.


    > All of the adaptive hashes are fine.
I am so glad you say this.

I can't count the arguments I've heard centered around what is The One True Way to store passwords... this topic turns every programmer on the planet into an instant Crypto Expert (TM).

STFU and use one. Hell, glib's crypt() lets you pick any of three computationally expensive schemes, so use one of those.


> Salted hashes are a straight-up vulnerability.

I find this a bit of a misnomer. I understand what you mean in context, of course, but, strictly speaking, bcrypt is a "hash", and "salted" is always good.


What was your goal with this comment?


Clarification. Right now we have

> Salted hashes are a straight-up vulnerability. -- tptacek


"Salted or unsalted versions of common hash functions (MD5, SHA-1, SHA-2, SHA-3) are not to be used to store passwords."


I'm just wondering: What if I used SHA-1 a million times on the password, i.e. hashing the hash over and over. Wouldn't that make it much more time-consuming for an attacker? Or am I missing something? The input every time but the first would be a random-looking 160 bit number, so it would be hard to guess. And if the attacker wanna look for common passwords in a dictionary the attacker must hash them a million times, no?


Absolutely. That's essentially PBKDF2 (http://en.wikipedia.org/wiki/PBKDF2).

You usually add a salt (an additional string which is stored in the clear, but which makes your local instance globally unique, so the attacker can't precompute value to hash mappings ("Rainbow Tables" [which are faster to make if you have alien technology, from what I've heard]) for all sites.

I'd still suggest using bcrypt or scrypt.


Bcrypt typically generates and stores the salt with the rest of the hash, all on its own, which reduces the chance for developer error. It's idiot-proof basically.


Of course, the other key thing is to avoid giving attackers offline access to the hash database if possible. Even with scrypt, if you let someone try offline, he will get good results on 100 password attempts per account. Users are often using such weak passwords that being only an online oracle and able to shut down after a number of tries on a password, or at least to do app level rate limiting, is still useful.


So if I'm using SHA-1 already to store passwords, what are my options for moving to a different system? I assume there's no way to rehash the passwords?


You have two choices that I see:

1. The next time a user logs in to your system and you verify against the SHA-1 hash that they are who they say they are, recompute the correct hash for bcrypt. Then, delete the SHA-1 hash. It does you no good to have a bcrypt version if you keep the SHA-1 version around.

2. Generate the bcrypt hash from the SHA-1 hash. That is, pretend that the SHA-1 hash is the user's password. This isn't as clean (your password authentication software will then have to do SHA-1 followed by bcrypt) but it means you'll be able to migrate your entire database all at once if you so choose. This also causes a very (very, very) slightly higher chance of password collisions, although there's not much to worry about from that.


Or you can do both, migrate everything to bcrypted SHA for now and then replace these with straight up bcrypt the next time the user logs in.

You ostensibly have a hash method identifier per hash, so just create "sha+bc" and "bc" along with your current "sha".

Also, why is the risk of a collision greater? It seems to me that, if anything, it should be lower, as SHA hashes always consist of a fixed number of bits, and thus aren't as likely to collide when hashed to bcrypt, assuming the length of a bcrypt hash is the same (or larger) number than a SHA one.

Basically, it seems to me that, if they're going to collide, they're more likely to collide at the SHA level, which is a problem either way.


Imagine you have two hash functions F and G, both mapping from the domain of integers to integers mod 2^128. Imagine they are perfect in that if you hash all the integers up to some large N, each hash is expected to recorded exactly the same number of times (probabilistically).

Now clearly if I hash a password F(P) and another password F(Q) there is a 1 in 2^128 chance they collide.

Now imagine we do G(F(P)) and G(F(Q)). We first have the chance of 1 in 2^128 that F(P) == F(Q) which implies G(F(P)) == G(F(Q)). However, that is not all!

We now have a new 1 in 2^128 chance that G(F(P)) == G(F(Q)). So we have (about) a 1 in 2^127 chance that the passwords will collide.

But, none of this really matters. Collisions aren't what you worry about with password hashes.


I see what you mean, and I agree that it doesn't matter, but it's an interesting exercise anyway.

I disagree that there's a 2^128 chance that they will collide. Trivially, I can show you a hash that will never collide for up to some N, and that is F(P) = P mod 2^128. This will never collide unless P is more than 128 bits long.

My rationale, above, was that SHA constrains the space to 128 bits. Therefore, for different SHA hashes (ones that haven't already collided), the probability that bcrypt will collide might be smaller (or zero, as in my example above).

In reality it doesn't work like that, I know, but in theory you can't be sure that the probabilities of collision will add (or, well, multiply) up.


Point taken, it's probably true that the probability that G(F(P)) == G(F(Q)) given F(P) != F(Q) is less than 1 in 2^128. But it's probably also true that it's greater than 0.

Clearly it's impossible to be less than zero. So no matter what you do, Defining H(X) to be G(F(X)) will have strictly more collisions than F(X).

The reason I would argue it's greater than zero is that if a function H existed such that H(X) will never collide for X less than 2^128, it would probably have some cryptographic weakness.


Hmm, you're right indeed, since they're additive. I should have said that the second layer is less likely to collide than the first, not than both combined.


I've had the "joy" of doing password migrations in the past when moving authentication systems. We made our auth chain support both methods and then when someone logged in with an old method it stored the password in the new method and deleted the old record. This was on a system with only a few hundred users though, so YMMV.


Convert as sessions expire and people sign in again. You'll obviously need a db column to keep track of which passwords are converted/unconverted, e.g. password_algo.


You can bcrypt your current sha1 hashes.


I currently use a sha hash (with salt) but rehash it x amounts of times. I have changed x over the years to be larger to get an acceptable trade off in computation time. Why is bcrypt much better than this? Is it because the algorithm is less gpu friendly?


What you describe is basically PBKDF1. If you wanted to make it slightly better, you could go with PBKDF2. It's true that bcrypt is better in some ways, but you're fine with what you're doing now. If you really wanted to improve on things you could go with scrypt which eats memory also, but it's more difficult to get things to work right.


That was my first thought too. Not only does the FBI have the salted hashes, but they also have a copy of the code for the website. So they know what the salt values are. This makes it even easier to brute force the hashes.


You've been downmodded because password hash salts are public nonce values; usually, schemes that depend on "secret salts" are crackpot alternatives to secure password hashes.

(I didn't downmod you).


thomas, another programmer in the office just asked this question which I haven't a good answer for: Instead of using a known salt stored in, say, a config file, or prepended to the stored hash, why not derive the salt from some substr of the supplied password? His example was, salt would be concat(left3chars, right3chars). And so when the user inputs his password into the system, you just derive the salt using that same consistent algorithm, and supply both that derived salt and the password into the algorithm.

My only answer was "It's always a bad thing to be clever with crypto, just do it by the book" but he asked for more and I couldn't give him a sound debunking (or an authoritative endorsement).

I build systems -- and do it very well -- and all I know about crypto is what I've had to learn to implement other peoples crypto systems.


The purpose of a salt is to randomize the password hashes so you can't easily precompute them. A "salt" derived from the password itself isn't random; it's deterministic. Salts don't add much security, but they do defeat precomputation. The scheme your coworker proposed doesn't do that.


I'm not going to lie and say I was already thinking that, but I did have a notion that, in such a scheme, if somebodies passowrd was "1111111" then your salt + password would be the unimpressive 1111111111111.

But if you don't mind a follow-up, wouldn't it still defeat rainbow tables? Why not?


In your scheme, if your password is "apple scrapple", the hash value is always going to be (say) "f1d2d2f924e986ac86fdf7b36c94bcdf32beec15". An attacker can precompute that and just use text search to find everyone with the password "apple scrapple".


They would still need a rainbow suited to the algorithm that is used to create the salt, or to have it large enough to contain the password+salt value within it. Still means that the entire database can be used with the same rainbow table, however.


Generating the salt from the password completely defeats its purpose. Users with the same password will have the same salt, and therefore the same hash. In your example, an attacker could find which users chose the password "password" by running your hashing algorithm with a salt value of "pasord". Your database would be wide open to rainbow table attacks.


Right, so then the idea is that if they get your DB dumb, and see this salt scheme in your code, they can compute a rainbow table using it and now they have cracked all your passwords in the time it would take to brute force one (well, not really, because it's not as if they'd have to brute force the entire keyspace before they got to the one password they're trying to break, but I think I'm onto the right idea about shrinking down the magnitude of the problem)

But what it WOULD do -- which is what to be honest tricked me about the concept -- is that it would still offer protection from a precomputed rainbow table that knew nothing of your sheme to derive salt from the password. (eg, the rainbow tables that are publicly searchable right now)


Effectively you are still using a hash without a salt, it's just that you've created a new, non-standard, hash function.


I like that explanation.


I might have read your question wrong, but the whole point of this thread is that passwords stored by hashing with a general purpose hashing algorithm can be easily brute-forced nowadays. Salting just turns your password into a different string and has no substantial effect on the brute force attack. (The attacker is already trying all possible strings.)


A variation on this is to use a random salt with each password that you store. Your auth process then becomes: 1. check the username 2: if the user exists, prepend the salt stored in the user row to supplied password, check against stored hash.

At the very least, you will make it harder for someone to crack all of your passwords by computing one table with a single salt.


If you are using "salts", you have to make them random for each stored password. But if you are doing this securely, you don't care about this detail, because the bcrypt library took care of it for you.


I was thinking more along the lines that knowing how the nonce(s) were added to the password would make it easier to scan for password in a brute force manner. For example, if you know the hashed password would be in the form 'nonce:username:password', it would be easier to know when you found the correct password, regardless of what the nonce is.


Can you describe why it's better?


http://codahale.com/how-to-safely-store-a-password/

(Be prepared for your comment score to visit the grey depths if you attempt to relitigate Coda's blog post here and don't know exactly what you're talking about.)


Why is bcrypt better than simply recursively hashing SHA512 ~2^11 times to produce an equivalent work factor?

Assume wall time is held constant at 1 second per password using both methods: is there an entropy loss or weakness associated with recursive hashing that bcrypt avoids?


What you are referring to is called "stretching". It's a lot better than a simple salted hash, but bcrypt would still be better.

I'm no crypto expert, but I think this is due to the way bcrypt was designed, and their use of a pessimized Blowfish cypher. SHA512 was designed for speed, which is the opposite of what you want with a password hashing scheme.

tptacek talks a little about this in this blog post:

http://chargen.matasano.com/chargen/2007/9/7/enough-with-the...

"Bcrypt uses Blowfish instead of MD5. Blowfish is a block cipher with a notoriously expensive setup time. To optimize Blowfish to run much faster, you’d have to contribute a major advance to cryptography. We security practioners are all “betting people”, and we usually like to place our bets on the side that “demands major advances in cryptography”."

Other interesting links: http://stackoverflow.com/questions/3722780/do-any-security-e... http://en.wikipedia.org/wiki/Bcrypt


Something doesn't make sense in that blog entry you cite. He says:

   Now let’s re-explain rainbow tables:

   1. take a “dictionary” —- say, of all combinations of alphanumerics
   less than 15 characters

   2. hash all of them

   3. burn the results onto a DVD.

   You now have several hundred billion hash values that you
   can reverse back to text —- a “rainbow table”.
Alphanumeric usually means either 36 or 62 possible characters. Let's take 36. Then there are 36^14 possible 14 character alphanumeric passwords. (He said less than 15, so we should also consider 13 characters, 12 characters, and so on, so this is going to come out a little low since I'm just doing 14 exactly). That's 6.14 x 10^21 possible passwords.

If you could compute 10 billion hashes/second, that would take 20000 years. (41 million years if mixed case alphanumeric is allowed). Could anyone REALLY make a table covering all 14 character or less alphanumerics in 2007, and fit it on DVD?

I believe there were tables for 14 character Windows passwords then, but due to poor design Windows passwords were in effect treated as two 7 character passwords. You just needed tables that covered the hashes of all 7 character passwords, which is a lot more tractable. Could that be what the author was thinking of?


I think Thomas was simplifying rainbow tables in his post, to make it more understandable. In practice, you wouldn't use all combination of alphanumerics. You would use a dictionary. This greatly restricts the search space.

http://en.wikipedia.org/wiki/Rainbow_table http://en.wikipedia.org/wiki/Dictionary_attack



That might be the most awesome page I've ever read.

Bcrypt it is :-)


He already did:

"Any normal person can brute force millions of SHA-1 hashes (salted however much you want) per second on a GPU."

This is not true of bcrypt.


It takes much longer to compute the hash of a given password, which essentially makes it as if everyone chose passwords with a couple extra bytes of entropy in them.


Was far happier when he didn't store passwords at all, tbh.


Are you joking?


He probably isn't. I wrote a login system for an ecommerce and b2b site a while ago. Got heavily into the salting/hashing side of things back then. Based on that... I think that most of the people pop-pooing salts in this thread don't know what they're talking about.

This security layer is the only code I've ever written that years later would still cause me to wake up in the middle of the night thinking "oh no! What if an attacker did X, Y and Z??!!"

Note: as far as I know, the security I put on it has never been broken. But it still caused nightmares even so.


No, not at all. Until a few months ago, Instapaper didn't require users to set a password -- you could (and originally were encouraged to) use it without a password at all.

This makes a lot of sense. If more sites storing non-critical data did this we'd have far less password fatigue and people more wary about what they trust to such sites. Just now they see their "password1" as impenetrable security when they might as well have no password at all.

Not having a password has a similar psychological effect as showing the password field in plain text, I think.


So how did you authenticate?


Just store in plaintext because I am already assuming you are. All the this talk about sha-1 vs bcrpyt vs scrpyt is nice and all but I have little faith that most companies care about this as much as HN does. I believe that most people are using the default password storage mechanism for their framework which are already known to be easy to break if the database is compromised. But all of that is mute anyways. Unless you have access to the site's source how would you know if they are hashing at all much less which one they are using? The best practice is to use a random password for each site you use. I just don't see any point in having an rememberable password for websites and hashing just leaves a false sense of security as illustrated by md5.


... what?

> Just store in plaintext because I am already assuming you are.

No, actually, I don't think I will store plaintext passwords.

> All the this talk about sha-1 vs bcrpyt vs scrpyt is nice and all but I have little faith that most companies care about this as much as HN does.

So what? Just because other people don't do it doesn't mean you don't have to also. Fortunately for us, there are a lot of startup founders here who might read this and learn something.

> I believe that most people are using the default password storage mechanism for their framework which are already known to be easy to break if the database is compromised.

I disagree. I think most people use SHA-1 because they know better than to store plaintext passwords. What they don't know is that it's terribly broken.

> But all of that is mute anyways.

No, it's really not.

> Unless you have access to the site's source how would you know if they are hashing at all much less which one they are using?

There are two problems here. (1) If you have access to the site's password database, there's a really good chance you have access to the entire database, and can look up how they're doing it. (2) Even if you can't lookup how they're doing it, you just try them all and find which one it is. I'd bet you money that if someone's hashing passwords, they're using one of {MD4, MD5, SHA0, SHA1, SHA2, DES}. If, god forbid, they're not using one of those and actually wrote their own hashing algorithm, you have even more to worry about.

> The best practice is to use a random password for each site you use.

For sure, no doubt about it. But what we're talking about here is the best practice for application developers, not the users. The users can't do anything about how their password is stored.

> I just don't see any point in having an rememberable password for websites and hashing just leaves a false sense of security as illustrated by md5.

Or, you know, you could use bcrypt and be secure about it.


> So what? Just because other people don't do it doesn't mean you don't have to also. Fortunately for us, there are a lot of startup founders here who might read this and learn something.

Yes I hope there are.

> I disagree. I think most people use SHA-1 because they know better than to store plaintext passwords. What they don't know is that it's terribly broken.

SHA-1 is the default on django and its easy to break that my entire point. It leaves a false sense of security.

> There are two problems here. (1) If you have access to the site's password database, there's a really good chance you have access to the entire database, and can look up how they're doing it. (2) Even if you can't lookup how they're doing it, you just try them all and find which one it is. I'd bet you money that if someone's hashing passwords, they're using one of {MD4, MD5, SHA0, SHA1, SHA2, DES}. If, god forbid, they're not using one of those and actually wrote their own hashing algorithm, you have even more to worry about.

They might as well be using ROT-13 if they are using any of those. Now with todays GPUs and rainbow tables the passwords might as well be in plaintext. The real solution is site security not password security.

> Or, you know, you could use bcrypt and be secure about it.

For how long? 4-5 years? Who will be maintaining your site then?


> The real solution is site security not password security.

That does not imply you don't worry about it though -- it's defense in depth. In the same way sometimes you'll need to go through two sets of doors locked with different keys to access a secured server room (or anything else, for that matter), it's worthwhile to protect everything you can as best you can.

> They might as well be using ROT-13 if they are using any of those. Now with todays GPUs and rainbow tables the passwords might as well be in plaintext. The real solution is site security not password security.

Exactly my point. That's why you use bcrypt.

> For how long? 4-5 years? Who will be maintaining your site then?

First, the premise of that question is that bcrypt is going to be secure for only 4-5 years, which is entirely wrong. You can modify the work factor on bcrypt as time goes by. I could, for example, make it take twice as long to generate a hash every year. I could have the program do this automatically. As for you actual question, which isn't terribly relevant, either (1) me or (2) the next guy, who I hope will have knowledge about security as well, but if he doesn't, then I just have to hope he'd keep the workfactor increases in the code.


That is fair enough and all your points were very good. However despite the down votes and everything else I stand by what I said. As a user I doubt most devs are as competent as the ones here and I would never trust a site with a password that I depend on. Do that there would need to be third-party auditing to make sure that they adhere to the standards you described.

I guess what I am really trying to say is that the state of web/Internet security is very poor right now and I don't believe bcrypt is worthy pursuit (sorry I am really not trying to troll). Since Mt.Gox was just hacked my salted password is on pastebin and then someone attempted to break into my gmail account but that will never happen because I use two factor authentication.

Maybe that is the best solution to all of this.


While it is certainly a good idea, as a user, to assume that the site developers have done things wrong (and therefore choose a strong, random, unique password), it is also a good idea, as a site developer, to assume that your users are doing things wrong (and therefore choose a strong password hashing method).


Security through obscurity is never a good idea because it leaves a false sense of security.

I know I am getting totally destroyed here by the down voting and I'll probably end up in negative karma for this but I standby all of it.


I am not advocating security through obscurity.

I am saying that your advice is appropriate for users (who cannot control what the server does) but inappropriate for servers (who cannot control what the user does).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: