Hacker News new | past | comments | ask | show | jobs | submit login
The password hash Argon2, winner of PHC (github.com/p-h-c)
270 points by kevindeasis on Jan 8, 2016 | hide | past | favorite | 47 comments

Argon2 is an improvement over bcrypt, scrypt, etc.

For typical needs, such as passwords and keys, you'll want to use the Argon2i variant, which is slower and provides more protections.

For atypical needs, when you're certain there are no side-channel risks, you may want to use the Argon2d variant, which is faster.

Argon2 has introduction slides here: https://www.cryptolux.org/images/c/c5/Rwc-slides.pdf

Toward the end of that slide deck, in the "Slowing Brute Force" section, there the following slide:

> Argon2 ensures that both defenders and attackers hash passwords on the same platform (x86)

Do they simply mean that currently the most cost effective platform to deliver the necessary computation and memory is x86?

I'm having trouble believing that it is either achievable or desirable to design a password hash that's only efficient on one architecture.

  I'm having trouble believing that it is [...] desirable
To make it less useful to build a NSA-style ASIC hash cracking cluster, like https://en.wikipedia.org/wiki/EFF_DES_cracker

  I'm having trouble believing that it is [...] achievable
The point is not to write an algorithm that is only fast on x86, rather than ARM, POWER, whatever; it's to write an algorithm that uses so much memory that it's not orders-of-magnitude faster to use an ASIC that's just a bunch of compute cores connected to fast buses, rather than a general-purpose processor with megabytes of on-chip cache.

The entire point of "memory-hardness" and "anti-parallelisation" and other such properties is to make it such that the CPU is the most desireable platform to for the password stretching function to run.

Why is this desireable?

This is desireable because your server has a CPU. Your attacker has a CPU, a GPU, perhaps an FPGA, and could possibly manufacture a custom ASIC. You do not want the attacker to be able to cost-effectively bruteforce on any of these platforms any more efficiently than a single CPU core can.

This makes password cracking difficult, and it also means we do not need to have absurd CPU runtimes for our hash, a handful of milliseconds is sufficient. (More is of course better, and for a single user system a delay of half a second isn't noticeable.)

What would occur if we did not design password stretching functions that had this property?

We would end up in the situation we have now, where nonstandard hardware computes hashes faster than we can, and thus instead of taking (a few miliseconds * number of possibilities) it takes (smaller amount of time * number of possibilities). This is undesireable.

This is also why you should never use a typical fast cryptographic hash function like SHA256, SHA3 or RIPEMD160 to hash passwords. This is also why PBKDF2 is considered less secure than say scrypt.

> There are two main versions of Argon2, Argon2i and Argon2d. Argon2i is the safest against side-channel attacks, while Argon2d provides the highest resistance against GPU cracking attacks.

Does this mean that one has to choose between protection against side-channel attacks and protection against GPU cracking? While not the "safest", how safe is Argon2d against side-channel attacks, and vice-versa? Any advice on making this choice?

(It's late here so I haven't read the paper yet. Apologies if this is all covered in there.)

Quick answer: use Argon2i, unless you have very special needs.

"Argon2i uses data-independent memory access, which is preferred for password hashing and password-based key derivation. Argon2i is slower as it makes more passes over the memory to protect from tradeoff attacks."

"Argon2d is faster and uses data-depending memory access, which makes it suitable for cryptocurrencies and applications with no threats from side-channel timing attacks."

The IETF draft is here: https://tools.ietf.org/html/draft-josefsson-argon2-00

> Does this mean that one has to choose between protection against side-channel attacks and protection against GPU cracking?

Yes. Argon2i is still resistant against GPU cracking, however with advanced techniques some speedup might be acquired. Argon2d is really GPU unfriendly. The paper explains:

> Should the memory addressing (indexing functions) be input-independent or input-dependent, or hybrid? The first type of schemes, where the memory read location are known in advance, is immediately vulnerable to time-space tradeoff attacks, since an adversary can precompute the missing block by the time it is needed [5]. In turn, the input-dependent schemes are vulnerable to side-channel attacks [14], as the timing information allows for much faster password search.

And the recommendation from the paper:

> Argon2 has two variants: Argon2d and Argon2i. Argon2d is faster and uses data-depending memory access, which makes it suitable for cryptocurrencies and applications with no threats from side-channel timing attacks. Argon2i uses data-independent memory access, which is preferred for password hashing and password-based key derivation.

Basically, use Argon2d for proof-of-work, and Argon2i for everything else.

A memory intensive proof-of-work should be much faster to verify than to attempt a proof.

It is always safe to use Argon2i with smallest possible t=1 unless ASIC-equipped attackers is a viable threat to you.

Argon2i was recently merged into libsodium, and will be part of the next release, as the default password hashing function.

Awesome, good work. I was just going to ask you if you had any roadmap for adding Argon2i.

I just saw the talk by Dmitry today on Argon2 at RealWorldCrypto. I was impressed by the ideas there; in particular, the ideas in Argon2 can be extended to transform any protocol into one that includes memory-hard steps. This re-balances the potential adversary advantage. Very neat concept.

Why was scrypt/bcrypt not part of the competition?

The goal was to get improvement over scrypt. Scrypt is good but it can be improved. For example robustness against side-channel attacks. There are also some seemingly minor issues like the need to keep password in memory until hashing is done, complexity of the scheme, parameter selection and user-friendliness that can be important if the algorithm is widely adopted.

btw. Colin Percival (creator of scrypt) was part of the committee.

Scrypt was, as an improved version called yescrypt. It held up very well, and IIRC the biggest reason argon2 won is that it is much easier to implement.

bcrypt is just obsolete – this was to find a successor to it.

yescrypt, one of the recommended finalists, is an improved/fixed version of scrypt.

"Obsolete" is a very strong word for bcrypt. MD5 is an obsolete hash function and needs to be avoided because it's vulnerable to practical attacks. That's not the case with bcrypt; rather, in the 15+ years since Mazieres and Provos introduced bcrypt, people have refined the idea.

You can throw a dart to pick any of the password hashes, including PBKDF2 (the worst of them), and you'll be just fine.

What's the best way to move a huge database from one scheme to the next? It's hard to keep up...

On your `User` table you have a column such as `hashing_scheme`. If you don't have one of these columns already, create it with the default value of your current scheme (e.g., `bcrypt`)

When a user attempts to log in, you hash the password they submitted with whatever's in that user's `hashing_scheme`. If the password was incorrect, fail early.

If the password was correct, and this user's `hashing_scheme` value is different to the application's `current_hashing_scheme`, you rehash the user's password (that you got via the POST request or otherwise) with the new scheme, update the `password_hash` and `hashing_scheme` value in the database.

This is just one way to do it, which is how Django performs hashing scheme migrations IIRC.

Alternatively, you can update the entire user db at once by hashing their existing password hashes. You'll end up using a double-hash:

Not as elegant, but the benefit is you can immediately move everyone to a new scheme if you need to for some reason.

If you were already using bcrypt (or any "slow hash algorithm"), double-hashing may be very slow.

You only have to do it when you in-place upgrade the entire table, and once when the user initially logs in after that.

Once the user has logged in successfully, you can replace new_hash(old_hash(password)) with simply new_hash(password).

This is less useful for optimistic upgrades like bcrypt->argon2i, but is absolutely critical if you find yourself taking over responsibility for a database that has a password column full of, say, MD5s.

Waiting for each and every user to log in to upgrade those vulnerable hashes is suicidal.

My favorite way would be to store password information as a hash chain, so old users can be immediately upgraded to the latest method while new users just use the latest.

For example, Django currently stores its passwords in a single database column as

So suppose we generalize that to a chain of one or more hash algorithms, with each output fed in as the password to the next, resulting in a final hash:

For example if you had a user who logged in way back in unsalted md5 days and never logged in again, you might end up with a chain like this:

... and then to upgrade to argon you would grab that string, shove the final hash through argon(), and write back:

For new users, or if the old user finally came back and logged in with their correct password, you could then take the opportunity to write back a simplified version:

Iterations could be calibrated differently depending on the chain, if the earlier steps take enough time to matter.

As far as I can tell this is the best possible service you can offer to the users whose long-ago passwords you're storing, short of deleting their passwords entirely. Throwing away the earlier version and replacing it with the argon-wrapped version can't hurt them, and will almost certainly help.

(Usual caveats apply about not doing things that add complexity if you don't need it.)

That's also how PHP handles it for its password_needs_rehash function:

http://php.net/manual/en/function.password-needs-rehash.php (PHP >= 5.5)

> hrowing away the earlier version and replacing it with the argon-wrapped version can't hurt them

Are you sure? And if so, can you convince me as to why you are so sure?

To be specific about my claim: throwing away the earlier version and replacing it with the argon-wrapped version cannot make the user's original password easier to recover, because it adds no new information about the password. To convince yourself of that, imagine that there was some function `argon(hash)` that made the password easier to recover. Why wouldn't an attacker run that function themselves? In the very worst case a bad hash function that takes one second to run could only speed up the attacker's job by one second.

Wrapped hashes could have other side effects for probing non-hacked sites -- for example, attackers could probably figure out which chain a user has based on timing, which would let them narrow down a bit when the user last logged in. Hash chains could also theoretically reduce the total output space for the final hash, making it easier to brute force a password through the login form of a non-hacked site. I don't consider either of those likely to matter, but you might.

Consider something like https://pythonhosted.org/passlib/modular_crypt_format.html and support all your old hash-functions. If you really want to migrate people to the new hash functions, you can:

* Fully expire the accounts (and send emails asking them to reset their password)

* If they successfully login to their account and you have the plain text password, you can re-hash that using argon2i.

* Have them reset their password on next sign in before allowing them to do anything.

* Let the user know their account has less security than it could, and they can upgrade the security by changing their password.

Afaik, there is nothing you can do to upgrade their passwords, modulo argon2'ing the result of another hash function - I am simply too unclear on the safety of this approach so I don't do it.

I've always wondered if there's a security issue by applying the newer hashing algorithm to the old hashes.

I.e. store the result of `argon(bcrypt(md5(password)))`

Any security drawbacks to this?

Here's a discussion on that[0]. Depending on how you do it you either weaken it to the strength of the weakest algorithm or alternatively gain nothing but may introduce a weakness. It's basically a bad idea.

https://security.stackexchange.com/questions/93730/why-not-m... [0]

The arguments in that Q&A look suspect to me - they don't seem to be explicit in what a hash needs to defend against, and as a result, any downside is an argument to avoid a strategy, even if that downside is much less relevant than some other upside.

The top voted answer has a comment that's spot on:

I would offer the opposite argument: if one uses a single hashing function which has a 0.1% chance of having a discoverable weakness that would allow an attacker to speed it up by a factor of a million, there will be a one-in-a-thousand chance that an attacker will be able to gain a million-fold speedup. If one used three independent functions, each of which had a 0.1% chance of allowing such a breakthrough, there would be a 0.3% chance of an attacker being able to achieve a 33% speedup, a 0.0003% chance of an attacker getting a 66% speedup, and only a 0.0000001% chance of an attacker... – supercat Jul 13 '15 at 17:21

...getting a million-fold speedup. I would consider the possibility of an attacker getting a 33% speedup as inconsequential compared to the reduction in the probability of the attacker getting a 70%-or-better speedup. – supercat Jul 13 '15 at 17:22

In short: You lose on average (but who care?) but you reduce the risk of catastrophic failure.

Some discussion here: https://news.ycombinator.com/item?id=10857771 (less than a day ago)

Is there a pure Python version of Argon2?

I am writing one. You should use the C one, if you can.


I'm guessing it would be hard to eliminate timing side-channels in pure python, so no.

I thought this looked interesting until I read the page :-

"A memory cost, which defines the memory usage, given in kibibytes "

Kibibytes? Really?

Can't read any more. I'll find something else.

I am quite happy that something as important as password hashing uses precisely defined technical language to describe its operation.

The evaluation of which password-hashing function to use probably should not be based on the simplicity of language used in their description.

You're angry because they properly measure base 2 data?

Even if you disagree on how proper it is, that seems like an extremely small and unimportant qualm.

> Even if you disagree on how proper it is, that seems like an extremely small and unimportant qualm.

Use of 'kibibytes' and 'KiB' reflects a fundamentally wrongheaded view of the world, either unserious or too-serious, which shouldn't be taken seriously.

Wrongheaded view of the world?

When doing science, is it inappropriate to distinguish between the numbers 1000 and 1024?

> When doing science, is it inappropriate to distinguish between the numbers 1000 and 1024?

Of course not! In computers, 'kilo-' means 1,024; everywhere else it means 1,000. This is a simple rule; network-device and hard-drive manufacturers are just wrong when they violate it.

This is actually incorrect. "kilo-" always means 1000, and computers are not given some exception. 1024 is a KiB, 1000 is a KB.

What's your objection? It should be mibibytes?


Don't blame argon2, blame computer storage makers. They were the ones to go metric.

The problem is that people feel ridiculous trying to pronounce "kibibyte", "mebibyte", etc. Probably the same reason "jiggabyte" never took hold except among grey old engineers. But no, I don't have a better replacement.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact