Hacker News new | comments | show | ask | jobs | submit login
Typing The Letters A-E-S Into Your Code? You’re Doing It Wrong (matasano.com)
258 points by cscott on June 3, 2009 | hide | past | web | favorite | 74 comments



It's probably worth pointing out: you have the same problems if you're using CFB, OFB, or CTR mode (these are the "stream cipher" modes for DES/AES/whatever that encrypt one byte at a time). There's apocrypha about these modes not being vulnerable to the attack. Bad apocrypha:

Set up an encryptor:

  irb> e = OpenSSL::Cipher::Cipher.new('aes-256-ofb')
  => #<OpenSSL::Cipher::Cipher:0x647100>
  irb> e.key = "\x11" * 32
  irb> e.iv = "\x00" * 16
A decryptor:

  irb> d = OpenSSL::Cipher::Cipher.new('aes-256-ofb')
  => #<OpenSSL::Cipher::Cipher:0x647120>
  irb> d.decrypt
  irb> d.key = "\x11" * 32
  irb> d.iv = "\x00" * 16
 
Encrypt something:

  irb> ciphertext = (e << "A * 40")
  => "a\255N\211XEn\001\347$\275)\311%Ht\2356\254m\b\234z\375\311\006\335\305F\231~\201\243\236\3628w\267\3454"
Make an XOR mask:

  irb> mask = ("187 she wrote".to_bignum ^ ("A" * 13).to_bignum).to_rawstring
  => "pyva2)$a63.5$"
XOR it into the ciphertext:

  irb> new_ciphertext = (ciphertext.to_bignum ^ mask.to_bignum).to_rawstring
NOW decrypt it:

  irb> d << nct
  => "AAAAAAAAAAAAAAAAAAAAAAAAAAA187 she wrote"


Hm. I wrote this for our normal blog readers, who live and breathe security stuff, so I don't know how well it'll carry here.


It's a fun read. Makes me realize there's a HELLUVA lot more to security than meets the eye, even the eye of someone who's pored over some authentication code. Good submission for HN.


You blogged the shit outta that post!

Definitely an interesting read even for crypto dilettantes. Perhaps, especially for crypto dilettantes.

I took Rivest's Computer and Network Security class in college and the most important takeaway for me, far outstripping all of the interesting technical content, was "Don't implement crypto."


Man, you had "don't implement crypto" and I was all, that kicks ass. Then you edited it to say "in general", and now I'm sad.


Ha! Good point. Edit undone. :)


"Perhaps, especially for crypto dilettantes." Yes, that's what I thought in the middle of it: he wrote it to educate his clients in an entertaining way to not have to explain this tricky thing for a millionths time. Good read, BTW.


The chorus of "You just made me aware of how vastly unaware I was on the subject!" (which I'll join in on right now) indicates the value found here, I think. :)


I enjoyed it, even if it makes my head spin. I'm several years from my last reading of 'Applied Cryptography' and have never really gone into the deep end of that pool. So what, it [edit: meaning the blog article] is a fun read of hacking away in the security code world.

[edit: Thomas, thanks for the book recommendation below, I'll definitely grab a copy]


Throw that book away, and buy Ferguson and Schneier's "Practical Cryptography", which Schneier contributed to in penance for writing "Applied Cryptography". Portions of the proceeds of "Practical Cryptography" are donated to a fund that helps the people who wrote crypto based on "Applied Cryptography".


What in particular is wrong with "Applied Cryptography" ?


Lots of random facts about crypto trivia. Not a lot of context. Even less information about how to actually safely use crypto primitives. You'll come out of it knowing how to get CAST or IDEA into your code --- two ciphers nobody uses anymore --- but not how to properly choose an IV for CBC mode.

Everything that is wrong with "Applied" is right with "Practical". "Here are 4 modern block ciphers. We wrote one of them. Don't use it. You should use AES, but if you're a paranoid, use Serpent. But really use AES." It's great stuff, especially because if you really read it, you're going to end up not implementing crypto directly at all.


Nothing, if you know exactly what you're doing. But it's not very good about explaining why e.g. throwing error messages is bad. It's sort of like a toolbox full of really sharp, pointy things with the implicit understanding that sticking your hand in blindly will hurt, and then being surprised when there's a rash of hand injuries.


So, in the context of the SSO cookie, could tampering with the encrypted data be prevented by signing the ciphertext (please excuse the terminology if it's not correct)? What I mean by that is, encrypt the cookie's plaintext (e.g. "user=username, role=admin, etc."), and then sign it, so the value stored in cookie is something like <ciphertext>:<signature>?


In the context of SSO, you could probably get away with not even encrypting it. Just take HMAC-SHA256 of the cookie contents and tack it on.


I didn't buy the GPG/TLS thing before. After reading this I do. I just realized how completely out of my depth I am for crypto stuff. At least now I know ...


Great post! I'd love to see more of this kind of thing; approachable explanations of the right way to do security and how easy it is to do it wrong.

Unfortunately, there's seems to be a huge gap between the average working programmer, who often has little idea how to do security, and the security guru's who are often barely intelligible to the rest of us. :-)

Sadly, the result of most programmer's poor understanding of security is even worse than the bad crypto implementation in the article. It's often something as basic as thinking it's O.K. to store user passwords un-hashed. (http://news.ycombinator.com/item?id=628680)


Is there a shorthand name for "the industry standard answer; the cookie both apps honor to let you in, encrypted so users can’t change their account to someone else’s" pattern, especially that pattern 'done right'?

What's the best battle-tested library (and call) for implementing exactly that, without making any of the common mistakes?


yeah I think the standard answer is "don't store credentials i the cookie". session key only and central session storage.


That works great when apps can find each other, talk directly to each other, or share storage.


I would just like to point out that in 13 hours nobody has posted a comment here that even bothered me a little, whereas it took less than 3 hours for Reddit to bust out the one-time pads.


Thanks, Thomas. I just finished implementing my own crypto in a webapp I am working on. (AES, with Diffie Hellman for a shared secret we needed)

You've made me so nervous about everything I thought was true that I did a hg revert and am looking at gpgme bindings.

You've done a good deed, I think.


Diffie Hellman is also remarkably easy to screw up. Here's an old post of mine I don't think ever made it on to Hacker News:

http://www.matasano.com/log/962/adam-bozanovich-did-not-unco...


Off topic of the main thread, but isn't the attack mentioned in that post still problematic if you can't reliably act as a MITM for a whole session, but you can disrupt the session long enough to confuse both sides into agreeing on an insecure session key?


If you can manipulate a DH exchange, you definitely have bigger problems than forgetting to check DH parameters. It's worth noting that DH is one of those crypto building blocks that by itself provides basically no security (for instance, DH in SSL/TLS is secure because it's backed by an RSA trust anchor). It's just a tool for making other crypto primitives more flexible.


Agreed. Still, I can't decipher the IKE spec, but does it really prevent this - i.e. do any of the other building blocks actually prevent the conversation from continuing using a compromised shared key?


Good read. +1 for Louis CK reference.

A question to start up founders: How do you guys handle security? Do you hire consultants/pen-testers? Do you just copy+paste code from the internet? Do you use whatever framework's components for it?


http://news.ycombinator.com/item?id=638531

Gotta love putting a lot of effort into an HN comment thread on a submission that ends up flagged. =)


A “young, cool-people’s” coffee shop on the first floor of an old office building in downtown Chicago.

A place like this exists? All I can think of is Intelligensia.


It's a Louis CK bit, and an inside joke because the bit does match up with Intelligentsia (there's actually one of those in LA, where LCK wrote the bit, so I've always wondered).

Our office is in fact a few floors up from the Intelligentsia in the Monadnock building.


Very cool. I have always wanted to start a Haskell consultancy and run it out of that building ;)


Joking about names aside: highly recommend this building. Lots of cheap offices --- incl. some very small ones --- and the building itself is gorgeous inside and out.


For variety you should head up a block to the Argo in the Marquette building. I work above the Argo, but sometimes visit your Intelligentsia for variety :)


You should drop me a line sometime and introduce yourself; I'll buy. =)


Right you are, sir.


It seems kind of silly to write one's own code for anything, if you have access to a well-tested library that does what you want. The only reason to do otherwise is where there is no such library. Crypto is just an example of an area that's particularly hard to get right: the solution is no different than for other things.


The problem is everyone thinks they have a library that does crypto for them --- OpenSSL or CryptoAPI. But that's not what they have. They have the moral equivalent of a small pebble bed reactor, and they're strapping it to the tops of their electric cars and hoping to go for a drive.

Keyczar and cryptlib are two libraries that offer a high-level interface that is deliberately hard to screw up. But Google Keyczar is very new (it recently had a really horrible flaw) and cryptlib is not free unless you GPL your code.


Yes; I should've specified that you should use the highest-level well-tested library that does what you want. The libs you listed (in the blog post) are definitely best-in-class for that purpose.


It was a nice read, but the screenplay delivery added a lot of confusion that I felt detracted from the point of the actual message. Maybe if it hadn't been so fragmented. Still, educational, though moderately confusing.


So I wrote out a straight prose version of this post, and even after a second draft it came out 40% longer and as dry and dense as melba toast.

But I actually kind of agree (at the very least, it's visually noisier than the prose wall of text). There's a backstory to this; I haven't blogged in almost a year, and my most infamous blog post from before that was also a screenplay, so doing this post in this style was also an in-joke:

http://www.matasano.com/log/248/metafuzzing/

(This post is funnier than today's post).

I'm sure you're glad you know this now.


great post. on the readability - i think the screenplay makes it more approachable but the visual formatting makes it rather hard to read. It seems like the slightly shortened, indented lines that run into the hard vertical line on the right, combined with the center aligned names, disrupts the eye flow. my eyes are drawn immediately to the center/end of line because that's where the most tension is but that's not where the line begins. i found the hex dumps to be the easiest part for my eyes to latch on to because it felt consistent.

all that is a long way of saying, in spite of the visual formatting, it's a very interesting article. and personally, i think the the screenplay itself works.


Thanks for the feedback! I thought the same thing; I think I'll mod my hacked-up version of Markdown to dim the center lines.


Isn't the whole problem in this situation that you are trusting the client with critical data? He takes possession of it, has unlimited time and opportunity to work on it, and successful falsification will be obvious for him? Why on earth trust the client with the data in the first place? I have never liked the "encrypted cookie" way of handling session storage.

Just have a server side session store and all of this cookie encryption crap just vanishes. Of course that won't help you with session fixation etc, but the post doesn't address that stuff either. Do it over TLS and you're pretty safe though.

And let's not forget the author's suggestion to pad out the cookie with 1000 bytes to make it harder to falsify. That cookie gets send with every single request. 20 images on the page, you're sending 20KB of junk up just to load the page. On a connection with slow upstream, like say ADSL, you can easily add a second or two of request latency. You might not care but some people sure do.

And come on, I read until the very end expecting to hear what "processes or threads" have to do with security, so tell us already!


I don't think you read this post carefully. The "padding out the cookie with 1000 bytes" thing is what an attacker does to break your cookie. The "processes or threads" thing is a very cryptic in-joke. Sorry. Like I said, there are aspects of this post I didn't think would carry over to HN (actually, I didn't think any of it would).


Not to feed this rant too much, but I believe the cookie would be used for verifying a cross authentication attempt. Not exactly sure how it would be used but I'm pretty sure once you've got a session going on both servers the original token would be removed.

Also processes vs threads was just another interview question, it was only slightly more relevant to the story than unicorns in space.


So, is there much to be gained from encryption anyway? If, as the candidate, I suggested sending a cookie as 'userId=39493&role=user&timestamp=1414919&hash=<sha256-of-key-with-data>' then would I lose brownie points?

I assume my hash comparison function is constant time (eg. XOR(a[x],b[x])==0), rather than comparing of char-by-char.


No. Like I said, the candidate aced the interview. Holy, he knew what CBC was! Amazing!

Often you don't need confidentiality, and in those cases, HMAC can be a safer bet than a secure encrypted message format. You want to use HMAC though, not a simple hash with a secret key in it.


> hash=<sha256-of-key-with-data>

Oops, you invented your own keyed MAC and now I can append any data I want.


Sorry, I would really be using a proper RFC 2104 HMAC implementation (eg. python's hmac module).


I always try to espouse the benefits of triple-rot13 as the cipher in interview questions about security. Key management is a breeze, for example, and it's VERY fast.

But it allows everyone to concentrate on how you're managing the secure context instead of what algorithm has been shown to cause cancer in rats.


I don't understand what's so hard about encryption. There are simple, well-known rules (except the timing one that is sort of news), and if you follow them you should be safe, no?

1. Use a blocks-size unique prefix (IV) for each message (random will do as well)

2. SHA-256 your entire message before encryption and add the hash value at the end to prevent tampering

3. Use AES-256 with chaining to encrypt

4. Use SHA-256 to turn a password into a key. If your key space is small harden it with hashing 1000 times or so.

5. The time your algorithm takes should be always the same (or at worst one time for success and one for failure)

6. Don't use any other symmteric crypto algorithm.


Almost everything you wrote just now has problems.

1. Has a well-known problem, which is why "Practical" suggests using a nonce.

2. SHA-256'ing a known plaintext doesn't authenticate a messge. In fact, even simply taking a secret key and appending it to your message before you SHA-256 the message isn't secure; there's a reason HMAC is as complicated as it is.

3. This whole blog post was about things that go wrong with CBC mode. For instance, nothing you wrote addressed padding checks --- btw, not strictly a "timing" attack.

4. Using a password as a crypto key is bad for reasons illustrated in the post, which is why secure keystores use random bytes. Hashing 1000 times has nothing to do with your key space.

5. This is like saying "your algorithm should be secure". Easy to say.

6. Everything we're talking about going wrong goes wrong even when you're using AES.

So, yes, I believe you don't understand what's so hard about encryption. You're obviously smart and you've taken some time with this material, and I still don't believe you'd get this right in your first fielded version.


I am not advocating usage of passwords for encryption, I am saying that if you have to use passwords, this is how you use them. Obviously a strong random bytestring is the best key one could possibly have and should be used when possible.

Anyways, re padding - what if I hash the padding as well? surely an attacker would not get anything of value by playing with it?

Thanks



Great post, not that I can claim to understand all of it. One question:

Just put a “for” loop around SHA-1 and run it 1000 times to generate the key; that’ll at least slow down a brute force attack. SHA-1 is lightning fast. By itself, it’s a crappy way to generate a key.

I see the purpose of this, but doesn't it also provide 999 more opportunities for a hash collision to happen? Are you so confident in the collision resistance of SHA-1 that this isn't an issue?

My (probably naive) proposal would be to keep the result of the first hash and XOR it with the result of each subsequent hash. Bad idea?


As far as I can tell, you're confusing collisions with cycles. I don't think any of the main cryptographic hashes have problems with short cycles.


I could well be confusing the two, because I'm not sure if I even know what cycles are :-). I'm guessing from the name that cycles are when h(h(h(... h(a)))) = a? That's not quite what I'm getting at.

I'll try explaining my concern a little better. Suppose you have a hash function h. Let p be the probability that h(x) == h(y) for any unique x and y.

Now suppose you have some arbitrary inputs a and b.

  h(a) = h(b) with probability p
  h(h(a)) = h(h(b)) with probability 2p
  h(h(h(a))) = h(h(h(b))) with probability 3p
  etc.
The probability at each step increases because a collision is possible at each step, but a collision in a previous step guarantees a collision in the current one.

With a good hash function, p should be astronomically low, so I guess that's why it isn't a problem in this case? Or am I completely off-base in my assumptions?


I see what you mean now. And your intuition on cycles was spot-on.

And to answer your question (I'm not a crypto expert, so take with a pinch of salt):

With SHA-1, that probability is generally accepted to be 2^(-160). 1000 is roughly 2^10, so the final probability is about 2^(-150). I guess that's still improbable enough...

But I think you do make a very valid and interesting point.


The loop has nothing to do with the probability of a collision. If your key comes out of /usr/share/dict/words, you have a 2^18 search space, not a 2^160 space. If you have a 2^18 search space, you better hope your constant factors are very high. Hence the loop.


Yeah, I get why the loop is there. But I still think it increases the probability of a collision by nearly 1000x over the probability of a collision without the loop.

Maybe a better way to explain my concern is to consider hashing something once versus twice, rather than 1000 times.

Let h be a hash function and p be the probability P(h(x) = h(y)) for randomly chosen x and y.

Say you have unique inputs a and b.

Obviously, P(h(a) = h(b)) = p.

Now consider P(h(h(a)) = h(h(b))). h(h(a)) = h(h(b)) if:

  h(a) = h(b) [probability p]
     OR
  h(a) != h(b) [probability 1-p], and
  h(h(a)) = h(h(b)) [probability p].
So the total probability when we repeat the hash is p + (1-p) * p

Since p is small, the probability of a collision has nearly doubled just by applying the hash function again.

[note: Higher up in this thread I mistakenly implied that the probability increased linearly. With a low number of iterations and realistic values for p it should be close to linear, but not exact.]

[edit:formatting]


H(m) -> SHA1(SHA1(m)) is not the same hash function as H(m) -> SHA1(m).


Right, but if SHA1(m) = SHA1(n), then SHA1(SHA1(m)) = SHA1(SHA1(n)) also.

The converse is not true. There will be cases where SHA1(SHA1(m)) = SHA1(SHA1(n)), but SHA1(m) != SHA1(n). (Or does SHA1 somehow guarantee that this will not happen?)

So it seems to follow that the chances that H(a) = H(b) when H(m) -> SHA1(SHA1(m)) must be higher than when H(m) -> SHA1(m).

Am I missing something?


I think you're going to need to better explain the logic behind graf 2 before I can answer you.

But on the off chance that we can end this thread gracefully, I'll point out that SHA256(SHA(256(m), m) is SHAd256(m), and considered by Ferguson to address security concerns in straight SHA256.


It boils down to this:

If SHA1(n) = SHA1(m), we know SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of SHA1 being deterministic)

Also, even if SHA1(n) != SHA1(m), there is still a chance that SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of collisions: SHA1(a) may equal SHA1(b) even if a != b.)

So the probability that SHA1(SHA1(n)) = SHA1(SHA1(m)) must be more than the probability that SHA1(n) = SHA1(m).


You realize that despite the fact that I still don't follow the logic here, SHA collisions have no meaningful impact on the security of PBKDF, right?


Actually I didn't, but that more or less answers my original question. Now I realize why you (or rather, your character) said that it didn't matter that md5 is broken in this case.

In general though, I'm still convinced that putting a hash function in a loop will increase the probability of a collision.

Or maybe I misunderstood your point about the loop? I assume by putting a loop around the hash you mean something like this:

  key = SHA1(password)
  1000 times do
    key = SHA1(key)
  return key


Look I don't want to be dense here --- and there's a good chance that's exactly what's happening --- but it sounds to me like you're talking about collisions between SHA1, SHA1(SHA1), SHA1(SHA1(SHA1)), etc.

Rename the functions. SHA1(SHA1) is now SHA1', SHA1(SHA1(SHA1)) is now SHA1''.

Explain why I care if there's an m and an n such that SHA1'(m) = SHA1''(n)?

They're two different hash functions.


No, you're still missing the point :)

He's saying that SHA1'(m) = SHA1'(n) implies that SHA1"(m) = SHA1"(n).

Let's do this with a very simple, silly hash function.

Let h(x) = (x/2)%10 (in integer arithmetic, e.g, 5/2 = 2, so h(5) = (5/2)%10 = 2%10 = 2.) So then let h2(x) = h(h(x)) = (h(x)/2)%10, and h3(x) = h(h(h(x))) = (h2(x)/2) % 10.

Here's a table:

  x:  h  h2  h3
  0   0  0   0
  1   0  0   0
  2   1  0   0
  3   1  0   0
  4   2  1   0
  5   2  1   0
  6   3  1   0
  7   3  1   0
  8   4  2   1
  9   4  2   1
  10  5  2   1
  11  5  2   1
  12  6  3   1
  13  6  3   1
  14  7  3   1
  15  7  3   1
  16  8  4   2
  17  8  4   2
  18  9  4   2
  19  9  4   2
  20  0  0   0
  21  0  0   0
  ...
Hopefully this table shows that collisions are much more likely for h2 than for h, and more likely for h3 than for h2. The range of h3 is smaller than the range of h2 which is smaller than the range of h.


Right, that's what I had in mind.

Of course, it's a bit of a jump to assume that the result is the same when you use a cryptographic hash function instead of a trivial one.

But regardless of the hash function, the same problem will occur unless there are no collisions for input between 0 and 2^160 - 1 (inclusive). From what I can tell, SHA1 can't guarantee this.



Perhaps... A comment on that article: http://news.ycombinator.com/item?id=615209

Is the "AES" line a tptacek catchphrase? Or is he quoting someone else?


It's a me catch-phrase. I am like Poochie that way.


As far as doing it wrong can we go back to "app ‘A’ set a cookie with your account ID in it" and discuss why on Earth the app works like that???


Because app A and app B can't share state directly, and can't talk directly to each other? You're basically asking why federation and single sign-on exist.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: