Hacker News new | past | comments | ask | show | jobs | submit login

The iteration count used by TrueCrypt [in its PBKDF2 key derivation] is either 1000 or 2000, depending on the hash function and use case. In both cases, this iteration count is too small to prevent password guessing attacks for even moderately complex passwords.

Until TrueCrypt gets patched to use scrypt for key derivation, roughly how long should a volume password be to put it out of reach?

Edit: There's a table in the scrypt paper from 2002 [1] that estimates the cost of various brute force attacks. Back then, a PBKDF2 iteration count of 86,000 and a password of length 40 would cost $200K to crack. TrueCrypt's choice of 1000-2000 iterations look staggeringly low in comparison. And that's not even accounting for hardware advances in the last 12 years.

[1] page 14, http://www.tarsnap.com/scrypt/scrypt.pdf

I do not think this is significantly new as it has been documented in the official TC source code and various 3rd-party source code (TCHead, tcplay, etc.) for many years.

I was the first to crack the majority of the TrueCrypt volumes in the 2012 Defcon - Crack Me if You Can password cracking contest using TCHead running on an old Intel Celeron processor.

Write-up: http://16s.us/software/wm/Defcon/cmiyc_2012.txt

Contest Page (3rd table down): http://contest-2012.korelogic.com/stats.html

TCHead: http://16s.us/software/TCHead/

> PBKDF2 iteration count of 86,000 and a password of length 40 would cost $200K to crack

Bear in mind that's a 40 character english text password which they estimated at 56bits of entropy.

40 completely random characters is ~1x10^79 (95^40 - assuming a character space of 95), i.e. 100% completely infeasible to crack forever, even if you could do a hundred trillion guesses every microsecond

100bit entropy is probably fine to avoid brute force attacks for the foreseeable future, which is around 16 completely random ascii characters (95 character space)

> a password of length 40 would cost $200K to crack

I'd love to see someone try. Here, an md5, those are super vulnerable right? This is a 9 character password. Piece of cake if I should believe the news.


Some numbers for you.

Using non-optimized, managed code on my laptop, I just went through 250 million iterations of plain text in ten minutes. You didn't specify what counts as a character, but let's assume it's a printable ASCII character. 95^9 = 630249409724609375 possibilities.

Divide by 250 million, and we get about 2520997639 times ten minutes for the time it would take me to enumerate all possible plain text passwords. That's a long time, about 47000 years. Not going to happen today on my laptop. But if I were a government, maybe I'd get a few thousand laptops together and set about calculating this over the next five years or so. In 1997.

But if I were to throw some purpose built hardware at this problem or use the video card to speed this up and not just laptops with general purpose CPU's, it's very likely I could decrease this time by thousands of times.

At the end I's have all MD5's for all passwords, and now the problem becomes a rainbow table lookup. And I only have to do this once and I forever gain the capability to break any 9 character (and below) password with just a lookup.

And that's assuming that MD5 is a perfectly secure hash with no shortcuts allowing you to narrow down the input set.

Did some more calculations just for the fun of it. It looks like you'd need nearly 16 exabytes of storage to hold the MD5 table for every 9-character hash [1]. Not accounting for any database overhead. In high density, you can fit a petabyte in what now a days? Half a rack? So around 8,500 racks. Certainly in the realm of possibility for a government but it would be a lot for storing just one type of hash list.

[1] https://www.wolframalpha.com/input/?i=%28630249409724609375+...

I feel like there should be a more efficient way to store these things. You have a complete enumeration of possible inputs (passwords), but unfortunately we need to go in the opposite direction, so we couldn't just index into a n*9 bytes array. Depending on how well distributed the md5 hash space is, it's possible a prefix tree might save you a bit of storage (you'd only have to save a couple of bytes per entry to cancel out the overhead) but I couldn't tell you the numbers there. Otherwise I think you're right, and we'd just have to store a map of md5 -> password.

(Edit: Also being limited to alphanumeric for hashes and printable ascii for input gives you pretty good compression potential.)

They are actually much more complicated than just a big table of 'password | hash'. Check out https://www.freerainbowtables.com/en/articles/ for in-depth explanations.

That isn't how a rainbow table works, it doesn't need to store a hash, plaintext pair for every possibility.

Precomputed hash chains are a simplistic way of explaining how rainbow tables work. http://en.wikipedia.org/wiki/Rainbow_table#Precomputed_hash_...

> I just went through 250 million iterations of plain text in ten minutes

Oh I thought you were going to say per second. I can recall reaching 500 MHash on some 2008 nvidia gpu with barswf.

Anyway, you're right of course. A nine character password is not really military-grade safe for countries with a security budget like the NSA's. But 9 characters ought to be many, orders of magnitude easier than 40 characters, and md5 is many orders of magnitude easier than pbkdf2 even at a weak security setting. This md5 might just be within reach of governments, but 40 characters certainly isn't.

To continue a slight derail about MD5:

For some fun performance figures, check http://golubev.com/gpuest.htm

A new AMD Radeon™ R9 295X2 is pretty speedy, clocking in at 23GHash for single MD5 (there are cheaper per $, but this is extremely impressive imo). Dividing 630249409724609375H by 23e9H/s leaves us with 27e6s, or 0.8 years. That means we'd need 42 cards to crack it in a week, costing somewhere in the region of $80k (1.5k/card + extras). That puts us easily within the range of governments, companies and just generally rich geeks. You can get ~ twice the hashes/$ so maybe $40k, and if you're cheap about the computers you connect with then probably lower.

But that's consumer kit. There are also speedy MD5 FPGA cores, I've not got a great reference but found one from 2007 which did 44MHash / core using a Cyclone II which I can only find described as "built for low cost". Assuming a 5x cost/hash reduction would grab us at $8k which would be in the range of hobbyists.

Beyond this, several private companies have been able to build double SHA-256 ASICs (for bitcoin mining). You could buy a while ago kit that did 7GHash for $300ish, and there's supposedly some 1THash ones in development. Granted, double SHA-256 is not equivalent to MD5, but I'd be surprised if MD5 was significantly harder, and expect it to be a lot simpler. Now the development cost of these is the expensive bit, but then your hashing rate would be insane.

Ramble over, if anyone has any figures for FPGAs or ASICs then I'd be really interested.

You say "9 characters" -- is that "printable characters", or alphanumeric, or what?

If you have a decent GPU, you can try several billion MD5 hashes per second: http://www.golubev.com/hashgpu.htm -- so if you stick with, um, lowercase alpha only should be crackable in under 10 minutes. If you choose from the 95 or so printable ASCII chars, you can expect a bit more of a wait (possibly a few years)... though of course if you don't restrict yourself to a single GPU you can speed things up.

Honestly, using an actually-random password that's more than 8 chars and not just alphanumeric offers decent security even when it's stored with a fast hash algo. Unfortunately, very few people do this in practice, and instead choose (for example) one of the 43 billion passwords in the md5 database here: http://www.md5decrypter.co.uk/

Well, Google search reveals nothing. I'm done.

Lol yeah that's how I usually check md5 hashes too. But that doesn't work: this is from /dev/urandom while filtering out only /[ -~]/. It seems people don't like that I'm proving the point by a super simple example though. And yes, my passwords are as strong as this, I don't use text passwords. Cracking 40 character passwords is absolute bullshit.

Misinterpretation, not bullshit. Per another reply, the original document was about 40 characters of English text.

> Misinterpretation, not bullshit.

Fair enough. I'm still curious about the generation method though: I can imagine that testing all texts ever published takes $200k if you go up to 40 characters, but if you use a unique text it might be different. There are a lot of variables here, including honest misspellings, punctuation use, spacing (non-tech people seem to find spaces weird in passwords), capitalization, usage of names, etc.

Maybe if we start changing the label to "Passwords" or "Pass phrase" to indicate that you can do something more expressive.

It bugs me that even today, more systems (hello banks) don't support more open passwords. Even what's on a US Ascii keyboard would be nice. Two of my banks only support letters and numbers, no special characters and case insensitive.

Your passwords are as strong as the protection of the place where you store them.

I think > Your passwords are as most as strong as the protection of the place where you store them.

would be more appropriate


I think while typing them in is when they are most susceptible to being stolen.

"Just to prove some guy on HN wrong" is not much of an incentive to spend the required resources. It says nothing about the ability to crack MD5 when there are significant incentives. Why not md5 your credit card info and post it here if you are so confident?

Somebody more knowledgeable please correct me if I'm wrong, but as I understand it:

If you're using AES-128, if you have a random password with a 128 bits of entropy it shouldn't matter what key derivation function you use. That means, if your password can contain any printable ASCII character, a password length of 20 would be sufficient [1].

The huge caveat here is that the password has to be random, and most people are either incapable of remembering such a password or have no way of securely storing it.

But I have no qualifications in this area, so don't take my word for it...

[1] http://en.wikipedia.org/wiki/Password_strength#Entropy_as_a_...

The use of AES is orthogonal to the issue of key derivation. In short, when we talk about key derivation, we're speaking of keys in a general sense - irrespective of purpose, they just need to be secure.

With that said, if your "key" actually consists of 128 bits (16 bytes) of secure random data (with all the conditions associated with that), then you're right: the choice of KDF is mostly immaterial, so long as it preserves entropy. That's why, for example, if you're just generating a symmetric key for bulk encryption, you just grab (say) 128 bits of random data and use that as the key directly with no KDF involved.

KDFs really only enter the scene when: (1) you want to generate multiple keys from one "master key" or (2) you want to generate "secure" keying material from "insecure" keying material (say, a user's password).

The choice of KDF, however, mostly depends on the strength of the original keying material. For instance, if you have 256 bits of secure random information as your keying material, using something like scrypt is unnecessary; something like HKDF would be more appropriate there. (No one will ever brute-force your 256-bit value. It's just not happening: ever.) On the other hand, where the original keying material is weak, like in the case of passwords, you want to use something that prohibits rapid guessing, e.g. scrypt.

According to the chart in that article, case-sensitive alphanumerics give at most 5.17 bits of entropy per character. So that's 25 random characters. (edit: sorry, this bit is redundant because I misread your comment.)

You are wrong about the key derivation though, and here's why: A key-derivation function that is very cheap and fast to calculate means it is easy for an attacker to brute-force lots of passwords to find one that matches your key. Using a slow, expensive KDF makes an attack much less feasible, by a factor of a thousand or more.

Thanks for the explanation regarding key-derivation functions. Regarding the calculation, I was using the value for printable ASCII characters which is 6.57 bits of entropy per character.

Disclaimer: I have not tested this extensively. This is merely an outline of the ratios involved, not a recommendation. Don't come after me if you get your password broken after reading this.

The goal is to make it at least as difficult to break the master password as it is to simply brute-force the resultant symmetric key. If the key is 256 bits long, this is true even for a one-round SHA256 KDF if the password has 256 bits of entropy. Of course, that's at least a fifty-character long alphanumeric password from /dev/random. A password structured for a typical human to remember it should probably be 5-10x as long or more to meet a similar standard.

Fortunately TrueCrypt does not use one-round SHA256, it seems to use 1000 rounds in a PBKDF2 construction. That doesn't mean we can use a password 1/1000th as long, because SHA256 is still very fast. Someone could use a Bitcoin ASIC to get a cost structure where hashing 1000 times is about as costly as verifying the aes256 key.

Speculation: You can maybe go down to 25 random or 150-200 human-memorable characters.

>doesn't mean we can use a password 1/1000th as long

It'd never work that way, right? You get to use a password with log2(1000) bits less. So e.g. if you use 2^24 iterations, and figure 80 bits is strong enough, then you can use a 56-bit password.

You're right; it would only be a fixed (and not especially large) length reduction.

Unfortunately TrueCrypt max password length is 64 chars so in your speculation you have to use random chars.

From the same paper:

> By using a key derivation function which requires 2s cryptographic operations to compute, the cost of performing a brute-force attack against passwords with t bits of entropy is raised from 2t to 2s+t operations [19].

1,000 iterations is just shy of 2^10.

Last i saw, django uses 1000 as the iteration count for PBKDF2. So maybe that's common nowadays and needs to be reviewed. On another point, can anyone point me to an article explaining the state of the art for user passwords storage?

OWASP do a decent job of aggregating currently accepted practice: https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact