
A Better Way to Store Password Hashes? - zaroth
http://www.opine.me/a-better-way-to-store-password-hashes/
======
michaelfairley
Or just use bcrypt/scrypt/PBKDF2 in the typical manner. You're much more
likely to screw yourself by mis-implementing this than you are to actually
gain any real security benefit from it.

------
SoftwareMaven
This won't make an appreciable difference if your entire db is captured by the
attacker (ala LinkedIn). You are just asking them to buy a box (rent a virtual
server) with more RAM because the equals operation just became memory
intensive. A machine with 128GB of RAM will still give a near-constant result
to "does my test hash equal the user's password?"

"Just add more fake hashes", you say. To get around the above, you have to use
billions upon billions of fake hashes. The database abuse will eventually
catch up to you.

This is trying to accomplish what bcrypt and scrypt do using tunable
parameters, which is force the attacker to spend significantly more to break
passwords than it costs you to test passwords (because they are doing it
billions of times, you are doing it once). Personally, I trust the math more.

~~~
zaroth
This is absolutely only to be used with bcrypt or scrypt, not instead of.

Facebook has 100PB of data, but all the password salts and hashes for their
800m active users fits on a single _USB stick_. One maligned employee and that
data can walk right out the door.

The theft of all your users' passwords should make one heck of a racket on its
way out of your data center. A thief should need the equivalent of a 747 to
fly them out of there.

With this technique, Facebook could fairly easily make their salts and hashes
fill 1PB of drives. You're not sneaking out of the data center with that
tucked in your sock!

------
sillysaurus
_The problem is, even with the salted hash, it’s still too easy for an
attacker to run dictionary attacks if they are able to retrieve a single
user’s salt and hash. When your attacker can do 1 billion SHA1 hashes/second
with cheap hardware, even slowing it down 10,000 times and they’re still
cracking a lot of your users’ passwords._

False. If you use Colin Percival's _scrypt_ to hash your passwords, then it's
impossible for an attacker to crack them via brute force.

See previous discussion: <http://news.ycombinator.com/item?id=4076197>

~~~
dfox
It's certainly not impossible. scrypt provides reasonable defense against
using cheap highly parallel approaches to brute force, but still can obviously
be cracked by brute force.

For passwords, current typical attacker does not care about cracking all of
them, but cares about finding some (mostly weak) passwords that he can try
with other services used by same user.

~~~
michaelfairley
Ultimately, OP's strategy is just increasing the cost of brute forcing. Why
not use the cost increasing features built into the hashing algorithms
themselves?

~~~
zaroth
Current algorithms allow you to increase cost in terms of CPU and RAM only. I
want to increase cost on as many axis as possible, in this case, by requiring
you to steal 1TB of mostly meaningless data, and not just 100 odd bytes.

[Edit] sillysaurus - It's a great image, but a little hard to parse since it
doesn't actually show you the bits of entropy for each choice. If they did
show a column for 40.5bits (average strength of a user's password) you would
see the result is a lot closer to $150 for 'scrypt 64ms' than it is to the
very sexy looking $4.8m.

Of course the technique could work even better with scrypt behind it
(find/replace 20 bytes with 32 bytes).

~~~
repsilat
It's worse than the average strength would suggest, too. You can't do an
arithmetic average and get a meaningful figure because cracking difficulty
doubles with every bit.

Say passwords were distributed evenly at 35, 40.5 and 46 bits. You may have a
lot of difficulty cracking the 46-bit passwords, but the 35-bit ones will be
easy. Now, you can't tell which users have easy passwords, but it's easy
enough to try the "top 10,000 passwords" (from previous dumps) on all of the
hashes you get and see if any match.

Without the scheme outlined in TFA _many_ users' passwords can be compromised
this way. At 64ms each you can run those top 10,000 passwords on each user in
less than 11 minutes, and you're almost guaranteed to get some hits. With this
new technique, though, you _won't_ , because you'll be lucky to even pick a
"real user" in the first place.

EDIT: Of course, if your system is going to remain at all practical to use you
can probably filter out the fake passwords with a simple join. You won't have
user-data for all of your fake users, and you don't want the username space
taken up by trillions of realistic-sounding dummy names preventing bona fide
users from registering.

~~~
foxylad
You can't do a join because the hash table contains only hashes - nothing to
relate them back to the user record.

This idea uncouples the password from the username and salt, which seems a
good idea. But assuming you have access to the database, the additional work
required is an indexed lookup instead of a simple equality - not actually a
huge deal.

Having said that, when it comes to security I'll defer every time to someone
with real chops in this area. Wake me up when Bruce Schneier comments on this.

------
ErikD
Does it really matter that the hacker doesn't know which hash belongs to the
user? He will still be able to do a dictionary attack using the same method
you use to login.

Wouldn't this just make dictionary attacks easier? Now the hacker doesn't have
to find one exact password but has the option to match any of his dictionary
passwords to any of the password hashes.

I know that there are hardly any collisions and that in practise this wouldn't
really change a thing. But in theory the dictionary attack would be faster
this way.

~~~
masklinn
1\. The point of hashing passwords is to protect the password itself (the
plaintext), so that users who use the same password over and over again (which
is most of them) don't see all their accounts opened if one of the services
they use has a security breach.

2\. Collisions are not actually very likely (understatemeeeent)

3\. > He will still be able to do a dictionary attack using the same method
you use to login.

Sure, but that's not the point. The point is that validating a hash now
requires a lookup into terabytes of data, meaning it's much harder to use
ASICs or GPUs to brute-force the site, and the validation may even require
hitting disk which is _extremely_ expensive compared to even expensive
hashings.

4\. It also makes _retrieving_ the data that much harder: a users table is not
usually big and noticeable (especially just 3 columns thereof), a terabyte+ of
data going out might show up on the network stats.

Note that I'm no cryptographer and do not recommend TFA's scheme as I can't
judge one way or the other, but your objections don't hold as far as I can
see.

Side-note (and weakness) for 4: on the other hand the retrieval is trivially
shardable and parallelizable, so at the end of the day you probably don't gain
much: the data from the GPU/ASIC hash-computer is fed into a sharded db server
for matching against the hash data, it will have a cost impact but depending
on the cost of the hashing function itself it may not even increase the
overall operation time.

------
yuliyp
Doesn't this add a new vulnerability? If I can see the salt for a user, I can
easily add a new password for that user _without anyone possibly being aware
of it until it is used_ by just adding the hash of the desired password with
the existing salt to the database.

~~~
olivier1664
That's true: in a classic system, the original user will notice its password
had been overriden. With 'security through obesity' the original user AND the
hacker will be able to login (each using their own password).

------
lparry
Seems like it'd paint you into a bit of a corner if you ever needed to change
anything about your password hashing.

Say some mythical unbreakable hash algorithm comes along that's O(1) and you
decide you want to move to it, you're forever going to have to drag along
these terabytes of junk hashes as you're entire userbase wont log in (giving
you the opertunity to change how their password is stored) and since you can't
tell the noise from the signal you'll forever have that enormous table of
hashes from that time you tried to be extra clever.

I'll leave crypto to the experts :)

------
Jarred
This sounds interesting. But, in implementing this, what's the purpose of
storing password hashes in your own database? Wouldn't it be safer (and maybe
even faster) for a centralized service to handle checking the existence of
password hashes and storing them?

Compromising this central database would be meaningless because it would only
store password hashes. If an intruder accessed it, they'd just have billions
of long strings of random characters -- they wouldn't know where it came
from/who uses it.

I think the biggest downside would be that the larger number of password
hashes, the more likely duplicates exist. The answer to that, I think, is more
randomness. But, I don't know how soon/if that would be a problem.

Another potential downside is performance. Would making a request to this
other server make logging in slower? I think the answer to this question
depends on the implementation, but there's a guranteed latency of another
server to contact before logging in/registering. Beyond that, I think, would
be getting too much into the implementation details.

Note -- I know very little about cryptography, but I'd love to learn more.

------
gibybo
I am skeptical that this method adds any protection to a brute force attack
relative to a standard implementation of PBKDF2 hashes stored with the users.

With proper salts, we have no advantage cracking a password in a database of
10 users or 10 million users. LinkedIn didn't have any salts.

The effectiveness of brute forcing a password is determined entirely by how
many passwords they can test for a particular user (because they are salted)
in a given period of time. Sure you can add tens of gigabytes of extra data
that they have to search through, but your system has to search through it too
so it's no different than adding extra iterations to PBKDF2.

~~~
zaroth
It's a question of sizing up what resources the attacker has, versus what they
will need to succeed in an attack.

We want to make password cracking extremely difficult. It's a problem a big
site should want to throw some money at.

Today, we make it hard by requiring lots of CPU. Another commenter said
'scrypt' -- yes, we want to make it hard by requiring lots of RAM too.

Here's another way to make it hard -- now you'll need to make a clean getaway
with multiple TBs of data off my server too, just to hopefully steal ONE
user's password.

There is a benefit here, I think.

~~~
gibybo
Ok, requiring the attacker to download several terabytes could be an
interesting advantage. Could we also achieve that by inserting billions of
fake users in a traditional [user, salt, hash] setup?

~~~
px1999
You could, but I don't really see any difference between this and just
slapping many tb of unrelated data in the database - the smart attacker will
realise that a bunch of accounts are fake and just join to a table that holds
only data for real users before extracting the info.

That's part of the elegance (well, it's something other than elegant because
it uses a whole mass of data) of the original solution - all of the data could
be significant in checking a single password so none can be discarded.

~~~
zaroth
As the OP, I'd say it's as elegant as securing your bike by locking a cinder
block to it! ;-)

------
somejan
Implemented correctly (with using e.g. scrypt as the hashing component, and
making sure the hashes are large enough so that the chances are neglegible of
an attacker finding a match to a _different_ hash than that was originally
generated from the users password), this scheme would be no less secure than
the traditional way of storing one scrypt hash per user.

The only effective difference would be that the entire database would become a
single unit instead of a collection of separate hashes. Both an attacker and
your webapp need to carry this extra weight of a monolithic blob of un-
dividable data. It probably won't really slow down an attacker trying to brute
force it if he has the data, but it may be more difficult to get the data in
the first place.

But if an attacker has access to, say 10% of the hashes, he'll still be able
to brute force 10% of the user accounts with weak passwords.

A different way to get a similar result of requiring a huge amount of data to
be able to start cracking, would be to treat the database like a huge bloom
filter: treat the database as a huge bit array of (say) a petabyte, hash the
user's password with a hundred different hash functions (but with scrypt-like
slowness), and use those 100 hashes as 100 indexes into the array to set the
corresponding 100 bits. To verify a password, create those same 100 hashes and
check if all 100 bits are set. Now, if an attacker has access to a part of the
database, he won't be able to determine with certainty of any of his guesses
at the user's passwords are correct.

Yet a third way to accomplish the same goal: pre-generate a petabyte of random
data. To hash a user's password, apply a standard scrypt, then based on the
resulting hash, generate a 100 pseudorandom offsets into the petabyte of data.
At each of those 100 offsets, read a few (say, 16) bytes from our petabyte of
random data, and finally store a hash of (scrypt_result + huge_data[offset1] +
huge_data[offset2] + ... + huge_data[offset100]). You'd still have one hash
per user, but to check a hash you also need access to a huge block of random
data. The block of data functions in a way as an additional system-wide salt.

Anyway, there are more ways to get to a similar result as the OP's proposal.
I'm not sure if it buys any additional security or if it's just more of a
hassle for the webapp implementing this, but at least it's fun to think about.

------
KayEss
I think you need to make sure that each user has a unique salt.

If you happen to assign me the same salt as another user then either my or
their password will unlock either account and the attacker only needs to guess
the weaker of the passwords.

~~~
zaroth
Absolutely correct. A 'salt' by definition, is always random.

For example, you could use a 32-byte salt with scrypt and you would get back a
32-byte hash. Both are equally unlikely to ever collide (see numbers in the
article) even with trillions of entries in the table.

~~~
KayEss
With this scheme random isn't quite enough though, it needs to be unique as
well doesn't it?

~~~
dchest
Chances of generating two random 32-byte strings that collide are so tiny that
you can as well say that they are unique.

~~~
brunnsbe
Remember that assumption is the mother of all fuck-ups, although in this case
the risk as you say is minimal.

~~~
yuliyp
Making an assumption that will fail approximately one time in the entire
history of the universe is safe.

------
jayfid
One thing that stands out is, it seems like you can never purge this table. If
it ever grows so large that it becomes a concern, you'd need to force your
users to validate in another way and set a new pw using a fresh db

------
Jarshwah
And what about using a pepper in addition to a salt? Then a potential attacker
requires access to application code as well as the database tables.

I can also imagine some kind of DDOS against the database using the linked
system.

------
jwildeboer
Wouldn't this setup mean that I can hack into your system with almost any
password? As there is no direct connection AND you fill up with dummy hashes -
doesn't that effectively mean you drive up the chances of a hack? The more
fake hashes, the higher the probability that any random password I brute force
may result in a match against the set of hashes. And if that happens - boom, I
am in. Or am I missing something?

~~~
zaroth
Definitely not. The back-end takes a user's password and hashes it with that
user's specific salt. Then the result must be in the table. Another user's
password will not work.

What you are worried about is a result existing in the table, even with the
wrong password. That can only happen if you have a hash collision. The article
on collisions I cited (<http://preshing.com/20110504/hash-collision-
probabilities>) says that even with 171 trillion hashes already IN the
database, the likelihood of a collision is still only 1 in 10^18. And that's
using SHA1 -- SHA-256 would be even better.

[Edit] Ooops, I forgot a zero. It's actually 1,710 trillion hashes in the
database to make the risk of collision 1 in 10^18. OK, that's 1.71
quadrillion.

~~~
jwildeboer
Ah OK. So effectively the salt becomes the foreign key ;-)

~~~
databyte
Actually not a bad idea. You could also hash the salt and use it everywhere
instead of the user's ID - anonymize the user from their own data. (If you
used the password hash, you would just have to remember to update it
everywhere on password changes - and of course use a hash of the hash so you
didn't give away which hash was used.)

For instance, let's say I was UserID 123 which had a FK to a table of
bookmarks or history and normally it would be easy to link that user to
personable data such as Cancer, Job searches, Pr0n, etc. Now instead, you have
a lot of these bookmarks pointing to a hash that was used in the initial user
login and not directly linked in the database.

Typically in highly sensitive databases you hash out a new "ID" entirely and
reference that. Then you provide a different service and database entirely
that correlates two different identities when you need identification. This is
similar to how PCI requirements for credit cards store the actual numbers
elsewhere and use a token against the system for consumption.

------
bad-juice
NO! This is a bad idea:

"When a user logs in, you retrieve the salt for the given user, re-compute the
hash as you normally would, and simply check if the resulting value EXISTS in
the Hashes table. If it does, you consider the login as successful."

This will cause false positives. Let's say that of the trillions of hashes,
one is the one we want... But what if one brute forces the system? What if one
of the brute attempts matches ANY of those trillions? Bam. Access.

This is a very very bad idea.

Do not roll your own security unless you know what the F __& #@ you're doing.
You are trading a compromised security issue (your attacker already has access
enough to get the hashes/users) for a uncompromised security issue (your
attacker does not have access to the hashes/users, and is just trying to get
in via brute force).

You are making it easier for the non-secure access attacker by making it
harder on the secure access hacker. You are making the most common threat
bigger to make the least common threat smaller.

Let me explain it using the pigeonhole principle.

Say you have 2 pigeons (passwords) and 1 million pigeon-holes (hashes aka
possible passwords). Assuming all the pigeons are in a hole, and you reach
into a random hole, what is the chances of pulling out a pigeon (password/hash
collision)?

Now, let's say you have one hundred thousand pigeons (What OP is suggesting)
and the same million holes... What are your chances of pulling a pigeon out of
a random hole this time?

I wish I could use font-size 1000px right now.

Do not do this. Do not do this. This is a compromised system, right off the
bat. If you are securing people's private data or possessions in this manner,
you are doing them harm.

As more users sign up, the chances of brute forcing actually increases (more
hashes to possibly match!). Let me repeat that in a different way: EACH USER
MAKES THE SYSTEM LESS SECURE.

Besides all that, let's say you have to do an EXISTS check on a million
records. That means you will check each one until you find a match. That is a
linear complexity, that is O(n), unless you index the column, which is
expensive. Could take a while, depending, but most certainly will take longer
than searching for something with a known key that is indexed efficiently
(being an index on a primary key), which should give you something like O(log
n) which is much faster.

I don't mean to be an ass, but please downvote this so that there is less
chance of people implementing this.

OP; you are suggesting a very harmful idea. Remove it if you have any amount
of goodwill towards the community.

I work in web security, and used to work for a credit card merchant processor.
At one point, I had access to and had to secure millions of accounts with
authorization information for their credit cards; this being information one
could use to empty a bank, no questions asked.

I Really Know what I'm talking about. THIS IS A BAD IDEA.

~~~
kmm
Such a long reply repeating the same thing is very obnoxious. Don't do it. If
I could downvote you I would.

Do you really know what you're talking about, huh? Your pigeon-hole example is
pretty wrong because you can't use intuition (with completely wrong numbers!)
to judge, you use math. Let's say there's a trillion pigeons and the hash
we're using is 256 bits. That 2^256 ~ 10^77. Divide that by a trillion and you
still have 10^65. Vastly more than enough to be secure all the while solving
two problems. First that with a specific user's password one can often access
other services. Two, and most importantly, that the password database blows up
and isn't feasible to steal any more, either over internet or in real life.

------
rorrr
This would increase the complexity of the hash verification by log2(N) times,
where N is the number of user records. So even if you do have a billion
hashes, you only slowed the attackers 30 times. You might as well increase the
BCRYPT complexity and skip this scheme.

~~~
zaroth
This should be used with scrypt or bcrypt set to the highest difficulty your
application can support.

The point is to increase difficulty of targeting an individual user, and
increase difficulty of stealing the data in the first place. This is separate
and distinct protection from what scrypt/bcrypt gives you.

[Edit] In a typical GPU accelerated brute force attack, the target hash and
salt are both known as fixed 20 or 32-byte values, and you iterate through
candidate passwords until you find a match.

In the proposed technique, the target hash is NOT known! Instead, what you do
know is a salt and a list of a few billion hashes, ONE of which can be
obtained by combining the salt with some unknown password. This is a very
different equation, and I believe, much harder to solve, since your target
value can't just be kept in a register or L1 cache for comparison purposes.

~~~
rorrr
Actually, I was wrong about O(log(N)), the lookup time is O(1), you just have
to put all known hashes in a map structure, or even better, a rainbow hash.

So your technique only adds some constant lookup time.

Instead of

    
    
        if (hash == target_hash) ...
    

you will do

    
    
        if (map[hash]) ....
    

Sorry, I think you are wrong.

~~~
xaway12345
eh, this is amortized O(1) and you are almost guaranteed slow lookups for a
large number of requests.

don't confuse true O(1) with amortized ;)

