I work on an enterprise infosec tool that just demonstrated 48 trillion MD5s per...

cyphar · on May 17, 2019

Well, nobody should be using MD5 (nor should they have been using it 20 years ago with the introduction of bcrypt). In fact, nobody should be using any hash function that was designed for speed (such as the SHA family) because you don't want fast hashing of passwords.

Modern cryptographic hash functions that are tailored for password hashing (such as scrypt or Argon2) are much harder to brute-force and have tunable knobs to allow you to increase the memory or CPU hardness. Obviously you cannot be safe forever but if you have a database dump of Argon2id-hashed passphrases with very strong parameters you aren't going to break it any time soon.

bredren · on May 17, 2019

There are plenty of publicly leaked hash tables running MD5 and the like. Just because modern hash functions exist does not mean they are in use. [1]

Also you do not need the hash table of a hardened system to get useful passwords. You need a reused password from a weak one.

[1] https://hashes.org/leaks.php

pathseeker · on May 17, 2019

>There are plenty of publicly leaked hash tables running MD5 and the like.

Not related to stackoverflow though.

>You need a reused password from a weak one.

If the weak password is already public, what is gained by finding out that it's a weak password in a strong DB? You've just described a dictionary attack.

bredren · on May 17, 2019

My mention of MD5 was just a benchmark reference.

I can see from the downvotes the very idea that it is in regular use as triggering for some folks—-but md5 and other weak hashing algos are not just in obscure anime forums but in systems everywhere.

“They don’t like to think it be like it is, but it do.”

And it isn’t about md5 hash rate it’s about the ease of cracking in general due to low cost of compute.

If the SO password hash has leaked even in bcrypt its going to be attacked and many strong passwords will be broken. If they are reused elsewhere, important email addresses will be attempted elsewhere.

Don’t reuse passwords.

cyphar · on May 17, 2019

> Don’t reuse passwords.

Nobody here disagrees with this premise. I just disagree that "low cost of compute" changes the fact that functions like Argon2 can be tuned to become more expensive to crack based on changes in computation cost. If you're worried about someone spinning up something on AWS to crack hashes, bump up the memory and CPU hardness and now they'll have to spend much more money to crack your passwords. In addition, the design of most modern password hashing functions is such that you get poor parallelism on GPUs.

ubercow13 · on May 17, 2019

Isn't the whole point of modern password hashes that the don't scale with GPU compute in the same way as MD5?

Anthony-G · on May 17, 2019

They′re not saying nobody is using MD5; they’re saying “nobody should be using MD5” (emphasis mine).

stephenr · on May 17, 2019

And how does it do on hashes that are not known to be useless, eg bcrypt or argon2?

bredren · on May 17, 2019

It is running on top of hashcat, and at the above-mentioned compute it was benchmarking 45 million bcrypts per second. At that point it is more about the attack plan than the compute.

https://imgur.com/a/DXQMsM1

edit: here is the demo video: https://www.youtube.com/watch?v=KnD4f8N1_OE

lucb1e · on May 17, 2019

"X bcrypts per second" is completely meaningless. What was the bcrypt setting? With the right setting, it would not be more than 1 bcrypt per century, or with the wrong setting, an almost equivalent rate to md5. It depends.

More meaningful would be the speedup compared to a single CPU core, which is what the developers (should) benchmark against. They should make it as slow as possible, so if their system can do bcrypt with a cost of 15 in 0.1 seconds, they should set either that or cost 16. (Much more than 0.2s might be annoying to users or be a DOS vector.)

lazyjones · on May 17, 2019

> With the right setting, it would not be more than 1 bcrypt per century

You can't really call that a "right" setting when it takes at least as long to log in...

lucb1e · on May 17, 2019

Right was meant as necessary to achieve that effect (sorry, English is not my native language). Obviously this is not a recommendation but just to point out that its configuration ranges from negligible to (on today's computers) forever.

umanwizard · on May 17, 2019

English is my native language, and I think the way you used "right" was fine. At any rate, I understood what you meant.

thaumasiotes · on May 17, 2019

> It is a question of when not if a hash table is fully cracked.

...? At 48 trillion hashes / second, you could get the entire hash space in as little as 224 quadrillion years.

Of course, if you had any collisions, it would take longer. A lot longer.

This suggests that strong passwords are still just as strong under md5 as under a more modern hash. No? Use of md5 is a problem because people use passwords that are easy to guess, not because you can enumerate the hash space. The one-way-ness is as secure as ever.

AstralStorm · on May 17, 2019

No, MD5 has about 18 bit collision strength. Combined with predictable salting practice, this is crackable with a calculator. Welcome to Merkle-Damgard construction allowing any prefix or suffix.

Given random salt (random placed or mixed) or HMAC, you have to use the more complex preimage attack at 123 bits.

This is crackable with a medium sized botnet or a supercomputer.

48 THash is an underestimate. Specialized hardware easily surpasses this, even FPGA does.

SHA1 salted is a tougher customer with 60-64 bit collision resistance meaning you probably cannot crack it with your calculator. However, it is still prone to length extension meaning predictable salting has this much strength.

thaumasiotes · on May 17, 2019

At 123 bits, you're five bits short of the 128 bits I calculated with. I don't think a medium sized botnet can rise to the level of doing 8 quadrillion years of work within your lifetime.

What problem are you trying to solve? As I understand it, we're discussing enumerating the hash space, such that:

1. You are given a hashed value, such as 2b0f4e60b80da7ef1e84573d764f1bf4 .

2. The value is someone's hashed password. You need to find any string which hashes to this particular fixed value, but you don't know of any such string to start with.

You can do this by brute force, but it will take you a long, long time.

The problem is NOT:

1. You have a string which hashes to a particular value.

2. You want other strings which hash to the same value.

And it also isn't:

1. You have a string which, with an unknown prefix, hashes to a particular known value.

2. You want to identify hashes which represent the same string with other prefixes applied.

I don't see where salting is relevant to the question. It's a defense against the phenomenon that cracking one user's password automatically also cracks everyone else who uses the same password (since, without salting, they all have the same hash), but it isn't a defense against having your password cracked by a targeted attack (since, in a targeted attack, there are no other hashes to be collateral damage). Why did you bring it up? What attack are you thinking of?

tialaramex · on May 17, 2019

I agree with your thrust that your parent poster (AstralStorm) is barking in a different forest from the tree we care about, but for salting it is also a defence against pre-computation to trade space for time.

With an unsalted hash an adversary can do as much work as they want in advance, store output and then trade that in once they have your hashes to get all or most of the same rewards as if they'd done the work after getting your hashes. Rainbow tables are the most famous example, but they're part of a family of similar attacks.

Salt lets you arbitrarily discount this advance work because the attacker must do it for all possible salt values and you get to choose how many there are - the early Unix crypt() salted pessimised password hash discounts it by a factor of 4096, modern schemes often use many orders of magnitude more salt. An attacker who has $4M to attack my password scheme probably doesn't want to spend $4M now to have a $1000 advantage once they get the hashes, and they certainly won't for a 1¢ advantage.

thaumasiotes · on May 17, 2019

A rainbow table is the exact attack I'm saying is still infeasible. ("Strong passwords are still just as strong.") It only works by assuming the victim uses one of a known set of weak, easy-to-guess passwords. If they don't, their hashed password is very unlikely to be in the rainbow table at all, because there's just too much hash space. The calculation in my original comment gives the approximate number of hashes necessary to fill a complete md5 rainbow table, on the assumption that you get zero collisions in the process of filling it in. (That is, every time you hash something, you get a hash you've never seen before, allowing you to add new information to the table.) That assumption is not at all realistic. By the time you've filled in half of the space, unless you can choose them cleverly to avoid collisions, half of the strings you hash should be wasted effort.

In a single-target attack, I don't really see the concept of "pre-computation to trade space for time". That hurts you by taking a lot of space, but it doesn't gain you any time, because you spent at least the required amount of time, but almost certainly more, doing the pre-computing. If you can buy someone else's pre-computed rainbow table, then sure, that's an advantage for you. But the adversary actually doing the pre-computing is doing it in order to crack many people's passwords all at once ("this table will let you identify _everyone_ whose password is qwe123"), which is the scenario I described earlier.

(At this point I feel I should clarify that "some people use the same passwords" is a real threat and a real reason to avoid md5. I just don't think the comment I responded to, "It is a question of when not if a hash table is fully cracked", was made in good faith or informed by... anything. To fully crack md5 in that way, you'd need an easily-computed function that inverts it. No amount of hashing speed is ever going to get you there.)

tialaramex · on May 17, 2019

> In a single-target attack, I don't really see the concept of "pre-computation to trade space for time"

In a single-target attack the reason you'd do this is because you expect your target to react in a timely fashion to discovery of some other part of your attack by changing passwords.

e.g. maybe you're sure you can break in to get hashes, but you will trigger a reactive IDS. You figure you have some period of time after that trigger before your target is alerted and changes their password.

Time-space tradeoff lets you avoid doing all the work against the clock _after_ the IDS triggers, instead you can do it all _before_ you have the hashes, and only pull the trigger and set off the alarms when you're ready to quickly break the hash, get in and do whatever your actual attack requires.

It's not a _common_ scenario, but it's important to remember it exists in designing general purpose components like password hashes.

AstralStorm · on May 17, 2019

You do not do this by brute force.

You can assume the system uses a certain salt pattern, e.g. 4 byte prefix or 8 byte prefix or suffix. This can reduce work from full crack to some 40 bit crack. (Guess salt then presume stupid concat scheme, use collision attack to get matches.) That one is doable on a modern PC on a GPU. It is a targetted attack. The mass variant are salted rainbow tables.

You usually do not even have to recover actual password to use credentials associated with the hash.

thaumasiotes · on May 17, 2019

Please try to describe the actual attack you're talking about. What do you have, what do you want, how do you get it.

dhritzkiv · on May 17, 2019

MD5 is one thing as a password can be retrieved from a hash table. But pulling out passwords from a hashed + salted value (e.g. via bcrypt) is many orders of magnitude more infeasible, no?

dagenix · on May 17, 2019

Salting was originally important as a defense against rainbow tables - which are more or less obsolete with GPUs that can crank through trillions of hashes per second. The real reason that bcrypt is a better way to store a password isn't just because it uses a salt - it's because its designed to be slow and to use a bunch of memory which makes it much harder to brute force.

bredren · on May 17, 2019

I would be impressed if SO's user password table is in bcrypt.

dhritzkiv · on May 17, 2019

What makes you say that? bcrypt's been the defacto best practice for user password "storage" for probably 10 years now. MD5's been known to be inadequate for much longer.

Even if they had a legacy implementation in MD5, gradually migrating from storing MD5 hashes to storing bcrypt hashes is trivial to do.

bredren · on May 17, 2019

From what I understand, many systems do not choose to implement strong hashing algos.

cyphar · on May 17, 2019

Even PHP's hash_function uses scrypt. Yes, some people explicitly decide to hash everything with sha1 but nothing you or I do will ever be able to stop them.

gpvos · on May 17, 2019

I would be disappointed if such a high-profile and technically savvy site would be using anything less.