
The scrypt parameters - FiloSottile
https://blog.filippo.io/the-scrypt-parameters
======
jedisct1
Password hashing functions are very useful, but can cause operational issues.

A common advice is to allocate as much memory as possible, and as much CPU as
possible, to make bruteforce attacks as long as possible.

Unfortunately, it only makes sense if a each user is assigned a dedicated
server. Which is almost never the case in APIs and web applications.

Doing all the stretching work on a server scales poorly. And predicting how
many users will try to log in is impossible, unless you set up drastic limits.
One way or the other, this makes the whole service vulnerable to very cheap
denial of service attacks.

There are a lot of discussions about password hashing functions and their
parameters, but almost none about server relief, which is far more important
IMHO.

The idea is to delegate most of the work to the client. The server just
provides the salt. It's acceptable to use as much resources as possible for a
short time on the client, since it's unlikely to do anything else important at
the same time.

The stretched password is sent to the server instead of the actual password,
so that it's acceptable for the server to keep the stretching work down to a
minimum before storing the password. This also has the advantage that the
server doesn't see the actual password.

This solves scalability issues, and drastically reduces to ability to perform
cheap resources exhaustion attacks on the servers.

In a webapp, this can be efficiently achieved using WebAssembly, or with
WebCrypto's PBKDF2 implementation. Obviously not an issue in native apps
either.

Legacy protocols such as FTP and IMAP don't have that luxury, though. There's
no way to run a password stretching function on the client. So, while it's
nice to see software like Dovecot adopt functions like Argon2 to store
passwords, this can actually do more harm than good.

Finally, while password hashing functions are typically described as suitable
for storing passwords, they have other usages as well.

Password-authenticated key exchange mechanisms such as SPAKE2 don't get the
attention they deserve. They can be extremely useful to add a strong security
layer on top of another authentication mechanism, using a simple password. And
they also usually leverage password stretching functions.

~~~
ricardobeat
Doesn't sending the salt to the client void a whole layer of security?

~~~
eropple
I don't see how that'd be the case. An attacker who has visibility into the
client that has the salt can already see the plaintext password. And an
attacker on the service side of things can see the salt anyway, it's in the
database.

~~~
jedisct1
An attacker that would have the ability to impersonate the server could
precompute a rainbow table for a given salt, and send that salt to the client.

OTOH, without client-side prehashing, an attacker that has the ability to
impersonate the server doesn't have to precompute anything in order to collect
all passwords.

~~~
Franciscouzo
If an attacker is able to impersonate the server, then it is also able to ask
the client for the password by modifying the Javascript sent.

~~~
jedisct1
All clients are not web browsers, but sure.

------
Ruud-v-A
“Cache line sizes have not significantly increased since 2009, so 8 should
still be optimal for 𝑟.”

Cache line sizes don’t change insignificantly — they change in powers of 2.
Cache lines sizes for Intel x86_64 CPUs have not changed at all since 2009,
it’s still 64 bytes.

------
tzs
Assume that all clients for a site use password managers with good random
password generators, which they use to generate their site password. They are
all, say, 40 characters chosen from [a-z][A-Z][0-9], so about 238 bits.

Would it be terrible in that case to just say "screw it", and just go with the
old classic salt + single hash (SHA256, probably) instead of scrypt or bcrypt
or PBKDF2 or similar?

~~~
sbierwagen
Well, sure, if all users were guaranteed to use a strong password that wasn't
reused for any other site, we wouldn't need a lot of password safety rules.
Heck, you could just store them in plaintext.

In practice though, users don't.

~~~
joveian
A single hash is still a good idea since otherwise anyone who steals the
password database can log in as any user. Sure if someone can steal the
password database they may be able to steal passwords as users log in, but
that is at least harder.

To make sure users have secure passwords, the server can just generate them
and tell the client what their password is. It is, as far as I can tell,
entirely a historical accident that this is not how it is generally done.

For web sites, I'm fairly sure this could be done via browser password storage
without the user needing to ever see the password if they only use one system
to log in (or browser sync between all systems they use). IMO, 21 character
random base-64 passwords are good, enough security and still possible to
memorize (three groups of seven) over a couple of weeks if necessary.

------
user5994461
To illustrate the level of idiocy we reached with slow hash functions. The
official NIST recommendation #800-132 is to use 10k iterations for PBKDF2, or
10M iterations for important keys.

10k iterations was half a second when we tried it on our server.

10M iterations took 8 minutes for a single hash.

~~~
chowells
Were you using it to hash data for integrity? Or even worse, were you abusing
it as a MAC [1]? That's the only way I can see it being so slow.

PBKDF2 is a password-based key derivation function (it says so, right in the
name), not a data integrity verification function. You should only ever be
feeding a password and high-entropy salt into it.

It's something of a disservice to call all these things "hash functions". It
makes people think they're interchangeable, but that's very far from true.

[1] The reason I say using it as a MAC is even worse than using it for data
integrity is that was never intended to work as a MAC. A KDF doesn't care (in
theory) if the salt input can be derived by looking at its output. For the
intended use case of a KDF, the salt is stored with the output anyway. So a
cryptanalysis of a KDF wouldn't see the revelation of information about the
key by the output as a theoretical flaw. But that would be a _huge_ flaw in a
MAC, where the output is often conveyed by an untrusted party between two
parties who keep the key secret. From a practical standpoint, a KDF that did
leak details of its key in its output is inefficient (redundant use of storage
space), but it isn't sufficient by itself to break the security properties.

~~~
user5994461
It's hashing a password. It's used exactly the way it's supposed to be.

------
zaroth
Iterative password hashing functions need to be slow to be secure. Their cost
factor is borne through consumption of real-time resources such as CPU cycles
and memory bandwidth, and therefore cost is roughly synonymous with _latency_.

BlindHash is a completely different approach. I used data as a cost factor — a
massive array of random data, which can grow over time but once a bit becomes
part of the pool, it never changes.

Here’s how it works;

First, a fast hash (SHA512) and a secure salt (64 byte CSPRN) turns your
password into a random number, call it Hash1. That 64 byte value is sent to a
BlindHash Server.

On the BlindHash server Hash1 is used to generate 64 uniformly distributed
i.i.d indices into the data pool, which imagine is ~100TB of data, and you
read 64 bytes from each location. Accumulate those 64 reads into a 4KB buffer,
and hash that. Return that result.

Back on the authentication machine, use the result from BlindHash to HMAC your
Hash1 to produce a Hash2. Store/Verify the Hash2 value. Hash1 is never stored.

What this accomplished is basically entangling the password hash with an
arbitrarily large pool of data, where an attacker needs to steal >90% of the
data pool in order to even start attempting an offline attack.

The bigger you make the data pool, the more data an attacker would have to
steal, but the _faster_ the system runs — because more data spread across more
SSDs increases read bandwidth since the read pattern is perfectly uniform.

The network carries 64 bytes in and 64 bytes out with each BlindHash request.
Size the network pipe to handle your desired authentication load, e.g. 64 *
100,000 = 6.4MB/sec.

Divide the data pool by the available bandwidth, e.g. 100TB / 6.4MB = ~180
days. That’s how long it takes to steal the data pool over the network at full
line rate. And the attacker needs virtually all of it to even try to start an
offline attack.

For large scale secure password storage, BlindHash provides the means to
eliminate the possibility of an offline attack. It’s faster and cheaper than
burning expensive CPU cycles, and most importantly security (growing the data
pool) and performance (logins/sec) are positively correlated, not negatively.
It Scales.

Disclaimer: I’m the inventor, patent holder, and CTO of BlindHash, and I’ve
been working on productizing this technology for several years now. As a
longtime HN member/contributor I do feel like it’s OK to occasionally pitch my
company when relevant topics trend on the front page, as long as there’s an
interesting technical component to go along with it!

If you’re responsible for securely storing customer passwords at a large
corporation, scrypt/bcrypt/argon2 is not the right approach for you. You’re
better off spending those cycles mining Monero.

Let me help you eliminate the possibility of a breach and actually save you
money in the process. You can have your own on-prem data pool, or share a
massive geo-replicated pool with other large corporations at an even lower
cost. </pitch>

~~~
deckar01
Hashing as a service seems like a bad idea. I like the core concept though.
The patent description seems pretty vague [0]. Do you think I can whip up my
own project (maybe called FatHash) that does this without your remote server
and without infringing on your patent?

[0]:
[https://www.google.com/patents/US9225729](https://www.google.com/patents/US9225729)

~~~
zaroth
That’s not the right patent - that’s something else entirely! Goes to show how
poorly written most patents are.

Hopefully you’ll find this one more informative:

[https://www.google.com/patents/US9021269](https://www.google.com/patents/US9021269)

Part of the reason I’ve been able to raise money and spend several years of my
life trying to build and evangelize this new/better way of protecting
passwords is because the USPTO has granted us the exclusive right to license
it in the US through ~2033.

~~~
deckar01
That is a much better patent. Sorry for the confusion. The "index into a
random data block" is already a primitive cryptographic operation in key
expansion and hashing algorithms, but the block is deterministically generated
pseudo randomly from a seed. It would be interesting to replace the seeding
operation with an offset into a large static pool generated from a source of
true randomness.

------
londons_explore
An "expensive" hash function like scrypt provides both the user and the
attacker the same slowdown/cost multiplier.

A longer secret key makes the users cost multiplier linearly increase, while
the attackers exponentially increases.

Most people dramatically underestimate the dramatic scaling of the word
exponential. It's the kind of "if I have 2048 bits then trying every key takes
longer than the age of the universe" type slowdown.

Hence:

* Use long secrets (ie. not human-rememberable passwords)

* Use cheap hash functions.

* Don't use scrypt. It's made for passwords (short keys), which really shouldn't exist in 2018.

~~~
DiThi
> An "expensive" hash function like scrypt provides both the user and the
> attacker the same slowdown/cost multiplier.

Not really. The user only has to wait for the computation once, a few times at
most, and it can easily take less time than typing the passwords.

> * Use long secrets (ie. not human-rememberable passwords)

That's very often not the case. Some people like me put truly random passwords
on every service. But 99.9% don't.

Also, you can perfectly combine _both_ long keys and expensive hashes.

~~~
londons_explore
Passwords shouldn't exist at all. Web services should use OAuth. Oauth
providers should use two factor where one factor is a 2048 bit key
(Yubikey/Client Cert/etc.). Point to point connections (ssh, VNC, network
shares) should use client certificates or centralized auth (Kerberos or active
directory).

Scrypt is a bandage which doesn't resolve the core problem - there is
insufficient entropy in typical users passwords. If there was enough entropy,
the speed of the hash function wouldn't matter.

~~~
akerl_
This is a bit like saying "human-driven cars shouldn't exist: the roads would
be much safer".

True, certainly. You're unlikely to get much debate about hardware tokens /
certificates / etc being more securable; the reason we still have passwords
isn't because they're more securable, it's because their UX for the vast
majority of users is much better.

If you're suggesting there's a reasonable path to get from $the_world_today to
$the_world_you're_proposing, I suspect that's a more interesting point to
make.

~~~
always_good
I looked into Yubikey but it seems absolutely impractical for day-to-day
authentication unless you only use it for a finite amount of sites.

For example, you'd want at least one backup. Maybe buried somewhere or in a
safe deposit box. But the workflow for keeping it synced would be hilarious.

Ever since then, I can't take anyone seriously on HN who says "we shouldn't
have passwords, just hardware key fobs". Not to mention your authentication
security doesn't matter when there's a customer support line an attacker can
socially engineer.

