Until TrueCrypt gets patched to use scrypt for key derivation, roughly how long should a volume password be to put it out of reach?
Edit: There's a table in the scrypt paper from 2002  that estimates the cost of various brute force attacks. Back then, a PBKDF2 iteration count of 86,000 and a password of length 40 would cost $200K to crack. TrueCrypt's choice of 1000-2000 iterations look staggeringly low in comparison. And that's not even accounting for hardware advances in the last 12 years.
 page 14, http://www.tarsnap.com/scrypt/scrypt.pdf
I was the first to crack the majority of the TrueCrypt volumes in the 2012 Defcon - Crack Me if You Can password cracking contest using TCHead running on an old Intel Celeron processor.
Contest Page (3rd table down): http://contest-2012.korelogic.com/stats.html
Bear in mind that's a 40 character english text password which they estimated at 56bits of entropy.
40 completely random characters is ~1x10^79 (95^40 - assuming a character space of 95), i.e. 100% completely infeasible to crack forever, even if you could do a hundred trillion guesses every microsecond
100bit entropy is probably fine to avoid brute force attacks for the foreseeable future, which is around 16 completely random ascii characters (95 character space)
I'd love to see someone try. Here, an md5, those are super vulnerable right? This is a 9 character password. Piece of cake if I should believe the news.
Using non-optimized, managed code on my laptop, I just went through 250 million iterations of plain text in ten minutes. You didn't specify what counts as a character, but let's assume it's a printable ASCII character. 95^9 = 630249409724609375 possibilities.
Divide by 250 million, and we get about 2520997639 times ten minutes for the time it would take me to enumerate all possible plain text passwords. That's a long time, about 47000 years. Not going to happen today on my laptop. But if I were a government, maybe I'd get a few thousand laptops together and set about calculating this over the next five years or so. In 1997.
But if I were to throw some purpose built hardware at this problem or use the video card to speed this up and not just laptops with general purpose CPU's, it's very likely I could decrease this time by thousands of times.
At the end I's have all MD5's for all passwords, and now the problem becomes a rainbow table lookup. And I only have to do this once and I forever gain the capability to break any 9 character (and below) password with just a lookup.
And that's assuming that MD5 is a perfectly secure hash with no shortcuts allowing you to narrow down the input set.
(Edit: Also being limited to alphanumeric for hashes and printable ascii for input gives you pretty good compression potential.)
Precomputed hash chains are a simplistic way of explaining how rainbow tables work. http://en.wikipedia.org/wiki/Rainbow_table#Precomputed_hash_...
Oh I thought you were going to say per second. I can recall reaching 500 MHash on some 2008 nvidia gpu with barswf.
Anyway, you're right of course. A nine character password is not really military-grade safe for countries with a security budget like the NSA's. But 9 characters ought to be many, orders of magnitude easier than 40 characters, and md5 is many orders of magnitude easier than pbkdf2 even at a weak security setting. This md5 might just be within reach of governments, but 40 characters certainly isn't.
For some fun performance figures, check http://golubev.com/gpuest.htm
A new AMD Radeon™ R9 295X2 is pretty speedy, clocking in at 23GHash for single MD5 (there are cheaper per $, but this is extremely impressive imo). Dividing 630249409724609375H by 23e9H/s leaves us with 27e6s, or 0.8 years. That means we'd need 42 cards to crack it in a week, costing somewhere in the region of $80k (1.5k/card + extras). That puts us easily within the range of governments, companies and just generally rich geeks. You can get ~ twice the hashes/$ so maybe $40k, and if you're cheap about the computers you connect with then probably lower.
But that's consumer kit. There are also speedy MD5 FPGA cores, I've not got a great reference but found one from 2007 which did 44MHash / core using a Cyclone II which I can only find described as "built for low cost". Assuming a 5x cost/hash reduction would grab us at $8k which would be in the range of hobbyists.
Beyond this, several private companies have been able to build double SHA-256 ASICs (for bitcoin mining). You could buy a while ago kit that did 7GHash for $300ish, and there's supposedly some 1THash ones in development. Granted, double SHA-256 is not equivalent to MD5, but I'd be surprised if MD5 was significantly harder, and expect it to be a lot simpler. Now the development cost of these is the expensive bit, but then your hashing rate would be insane.
Ramble over, if anyone has any figures for FPGAs or ASICs then I'd be really interested.
If you have a decent GPU, you can try several billion MD5 hashes per second: http://www.golubev.com/hashgpu.htm -- so if you stick with, um, lowercase alpha only should be crackable in under 10 minutes. If you choose from the 95 or so printable ASCII chars, you can expect a bit more of a wait (possibly a few years)... though of course if you don't restrict yourself to a single GPU you can speed things up.
Honestly, using an actually-random password that's more than 8 chars and not just alphanumeric offers decent security even when it's stored with a fast hash algo. Unfortunately, very few people do this in practice, and instead choose (for example) one of the 43 billion passwords in the md5 database here: http://www.md5decrypter.co.uk/
Fair enough. I'm still curious about the generation method though: I can imagine that testing all texts ever published takes $200k if you go up to 40 characters, but if you use a unique text it might be different. There are a lot of variables here, including honest misspellings, punctuation use, spacing (non-tech people seem to find spaces weird in passwords), capitalization, usage of names, etc.
It bugs me that even today, more systems (hello banks) don't support more open passwords. Even what's on a US Ascii keyboard would be nice. Two of my banks only support letters and numbers, no special characters and case insensitive.
would be more appropriate
I think while typing them in is when they are most susceptible to being stolen.
If you're using AES-128, if you have a random password with a 128 bits of entropy it shouldn't matter what key derivation function you use. That means, if your password can contain any printable ASCII character, a password length of 20 would be sufficient .
The huge caveat here is that the password has to be random, and most people are either incapable of remembering such a password or have no way of securely storing it.
But I have no qualifications in this area, so don't take my word for it...
With that said, if your "key" actually consists of 128 bits (16 bytes) of secure random data (with all the conditions associated with that), then you're right: the choice of KDF is mostly immaterial, so long as it preserves entropy. That's why, for example, if you're just generating a symmetric key for bulk encryption, you just grab (say) 128 bits of random data and use that as the key directly with no KDF involved.
KDFs really only enter the scene when: (1) you want to generate multiple keys from one "master key" or (2) you want to generate "secure" keying material from "insecure" keying material (say, a user's password).
The choice of KDF, however, mostly depends on the strength of the original keying material. For instance, if you have 256 bits of secure random information as your keying material, using something like scrypt is unnecessary; something like HKDF would be more appropriate there. (No one will ever brute-force your 256-bit value. It's just not happening: ever.) On the other hand, where the original keying material is weak, like in the case of passwords, you want to use something that prohibits rapid guessing, e.g. scrypt.
You are wrong about the key derivation though, and here's why: A key-derivation function that is very cheap and fast to calculate means it is easy for an attacker to brute-force lots of passwords to find one that matches your key. Using a slow, expensive KDF makes an attack much less feasible, by a factor of a thousand or more.
The goal is to make it at least as difficult to break the master password as it is to simply brute-force the resultant symmetric key. If the key is 256 bits long, this is true even for a one-round SHA256 KDF if the password has 256 bits of entropy. Of course, that's at least a fifty-character long alphanumeric password from /dev/random. A password structured for a typical human to remember it should probably be 5-10x as long or more to meet a similar standard.
Fortunately TrueCrypt does not use one-round SHA256, it seems to use 1000 rounds in a PBKDF2 construction. That doesn't mean we can use a password 1/1000th as long, because SHA256 is still very fast. Someone could use a Bitcoin ASIC to get a cost structure where hashing 1000 times is about as costly as verifying the aes256 key.
Speculation: You can maybe go down to 25 random or 150-200 human-memorable characters.
It'd never work that way, right? You get to use a password with log2(1000) bits less. So e.g. if you use 2^24 iterations, and figure 80 bits is strong enough, then you can use a 56-bit password.
> By using a key derivation function which requires 2s cryptographic operations to compute, the cost of performing a brute-force attack against passwords with t bits of entropy is raised from 2t to 2s+t operations .
1,000 iterations is just shy of 2^10.