
Typing The Letters A-E-S Into Your Code? You’re Doing It Wrong - cscott
http://www.matasano.com/log/1749/typing-the-letters-a-e-s-into-your-code-youre-doing-it-wrong/
======
tptacek
It's probably worth pointing out: you have the same problems if you're using
CFB, OFB, or CTR mode (these are the "stream cipher" modes for
DES/AES/whatever that encrypt one byte at a time). There's apocrypha about
these modes not being vulnerable to the attack. Bad apocrypha:

Set up an encryptor:

    
    
      irb> e = OpenSSL::Cipher::Cipher.new('aes-256-ofb')
      => #<OpenSSL::Cipher::Cipher:0x647100>
      irb> e.key = "\x11" * 32
      irb> e.iv = "\x00" * 16
    

A decryptor:

    
    
      irb> d = OpenSSL::Cipher::Cipher.new('aes-256-ofb')
      => #<OpenSSL::Cipher::Cipher:0x647120>
      irb> d.decrypt
      irb> d.key = "\x11" * 32
      irb> d.iv = "\x00" * 16
     

Encrypt something:

    
    
      irb> ciphertext = (e << "A * 40")
      => "a\255N\211XEn\001\347$\275)\311%Ht\2356\254m\b\234z\375\311\006\335\305F\231~\201\243\236\3628w\267\3454"
    

Make an XOR mask:

    
    
      irb> mask = ("187 she wrote".to_bignum ^ ("A" * 13).to_bignum).to_rawstring
      => "pyva2)$a63.5$"
    

XOR it into the ciphertext:

    
    
      irb> new_ciphertext = (ciphertext.to_bignum ^ mask.to_bignum).to_rawstring
    

NOW decrypt it:

    
    
      irb> d << nct
      => "AAAAAAAAAAAAAAAAAAAAAAAAAAA187 she wrote"

------
tptacek
Hm. I wrote this for our normal blog readers, who live and breathe security
stuff, so I don't know how well it'll carry here.

~~~
_pius
You blogged the shit outta that post!

Definitely an interesting read even for crypto dilettantes. Perhaps,
_especially_ for crypto dilettantes.

I took Rivest's Computer and Network Security class in college and the most
important takeaway for me, far outstripping all of the interesting technical
content, was "Don't implement crypto."

~~~
tptacek
Man, you had "don't implement crypto" and I was all, that kicks ass. Then you
edited it to say "in general", and now I'm sad.

~~~
_pius
Ha! Good point. Edit undone. :)

------
smanek
I didn't buy the GPG/TLS thing before. After reading this I do. I just
realized how completely out of my depth I am for crypto stuff. At least now I
know ...

------
theBobMcCormick
Great post! I'd love to see more of this kind of thing; approachable
explanations of the right way to do security and how easy it is to do it
wrong.

Unfortunately, there's seems to be a _huge_ gap between the average working
programmer, who often has little idea how to do security, and the security
guru's who are often barely intelligible to the rest of us. :-)

Sadly, the result of most programmer's poor understanding of security is even
worse than the bad crypto implementation in the article. It's often something
as basic as thinking it's O.K. to store user passwords un-hashed.
(<http://news.ycombinator.com/item?id=628680>)

------
gojomo
Is there a shorthand name for "the industry standard answer; the cookie both
apps honor to let you in, encrypted so users can’t change their account to
someone else’s" pattern, especially that pattern 'done right'?

What's the best battle-tested library (and call) for implementing exactly
that, without making any of the common mistakes?

~~~
piramida
yeah I think the standard answer is "don't store credentials i the cookie".
session key only and central session storage.

~~~
tptacek
That works great when apps can find each other, talk directly to each other,
or share storage.

------
tptacek
I would just like to point out that in 13 hours nobody has posted a comment
here that even bothered me a little, whereas it took less than 3 hours for
Reddit to bust out the one-time pads.

------
jameskpolk
Thanks, Thomas. I just finished implementing my own crypto in a webapp I am
working on. (AES, with Diffie Hellman for a shared secret we needed)

You've made me so nervous about everything I thought was true that I did a hg
revert and am looking at gpgme bindings.

You've done a good deed, I think.

~~~
tptacek
Diffie Hellman is also remarkably easy to screw up. Here's an old post of mine
I don't think ever made it on to Hacker News:

[http://www.matasano.com/log/962/adam-bozanovich-did-not-
unco...](http://www.matasano.com/log/962/adam-bozanovich-did-not-uncover-an-
nsa-ipsec-conspiracy-diffie-hellman-parameter-validation-explained/)

~~~
three14
Off topic of the main thread, but isn't the attack mentioned in that post
still problematic if you can't reliably act as a MITM for a whole session, but
you can disrupt the session long enough to confuse both sides into agreeing on
an insecure session key?

~~~
tptacek
If you can manipulate a DH exchange, you definitely have bigger problems than
forgetting to check DH parameters. It's worth noting that DH is one of those
crypto building blocks that by itself provides basically no security (for
instance, DH in SSL/TLS is secure because it's backed by an RSA trust anchor).
It's just a tool for making other crypto primitives more flexible.

~~~
three14
Agreed. Still, I can't decipher the IKE spec, but does it really prevent this
- i.e. do any of the other building blocks actually prevent the conversation
from continuing using a compromised shared key?

------
yan
Good read. +1 for Louis CK reference.

A question to start up founders: How do you guys handle security? Do you hire
consultants/pen-testers? Do you just copy+paste code from the internet? Do you
use whatever framework's components for it?

~~~
tptacek
<http://news.ycombinator.com/item?id=638531>

Gotta love putting a lot of effort into an HN comment thread on a submission
that ends up flagged. =)

------
jrockway
_A “young, cool-people’s” coffee shop on the first floor of an old office
building in downtown Chicago._

A place like this exists? All I can think of is Intelligensia.

~~~
tptacek
It's a Louis CK bit, and an inside joke because the bit does match up with
Intelligentsia (there's actually one of those in LA, where LCK wrote the bit,
so I've always wondered).

Our office is in fact a few floors up from the Intelligentsia in the Monadnock
building.

~~~
jrockway
Very cool. I have always wanted to start a Haskell consultancy and run it out
of that building ;)

~~~
tptacek
Joking about names aside: highly recommend this building. Lots of cheap
offices --- incl. some very small ones --- and the building itself is gorgeous
inside and out.

------
amalcon
It seems kind of silly to write one's own code for anything, if you have
access to a well-tested library that does what you want. The only reason to do
otherwise is where there is no such library. Crypto is just an example of an
area that's particularly hard to get right: the solution is no different than
for other things.

~~~
tptacek
The problem is everyone thinks they have a library that does crypto for them
--- OpenSSL or CryptoAPI. But that's not what they have. They have the moral
equivalent of a small pebble bed reactor, and they're strapping it to the tops
of their electric cars and hoping to go for a drive.

Keyczar and cryptlib are two libraries that offer a high-level interface that
is deliberately hard to screw up. But Google Keyczar is very new (it recently
had a really horrible flaw) and cryptlib is not free unless you GPL your code.

~~~
amalcon
Yes; I should've specified that you should use the highest-level well-tested
library that does what you want. The libs you listed (in the blog post) are
definitely best-in-class for that purpose.

------
CGamesPlay
It was a nice read, but the screenplay delivery added a lot of confusion that
I felt detracted from the point of the actual message. Maybe if it hadn't been
so fragmented. Still, educational, though moderately confusing.

~~~
tptacek
So I wrote out a straight prose version of this post, and even after a second
draft it came out 40% longer and as dry and dense as melba toast.

But I actually kind of agree (at the very least, it's visually noisier than
the prose wall of text). There's a backstory to this; I haven't blogged in
almost a year, and my most infamous blog post from before that was also a
screenplay, so doing this post in this style was also an in-joke:

<http://www.matasano.com/log/248/metafuzzing/>

(This post is funnier than today's post).

I'm sure you're glad you know this now.

~~~
tdonia
great post. on the readability - i think the screenplay makes it more
approachable but the visual formatting makes it rather hard to read. It seems
like the slightly shortened, indented lines that run into the hard vertical
line on the right, combined with the center aligned names, disrupts the eye
flow. my eyes are drawn immediately to the center/end of line because that's
where the most tension is but that's not where the line begins. i found the
hex dumps to be the easiest part for my eyes to latch on to because it felt
consistent.

all that is a long way of saying, in spite of the visual formatting, it's a
very interesting article. and personally, i think the the screenplay itself
works.

~~~
tptacek
Thanks for the feedback! I thought the same thing; I think I'll mod my hacked-
up version of Markdown to dim the center lines.

------
sho
Isn't the whole problem in this situation that you are trusting the client
with critical data? He takes possession of it, has unlimited time and
opportunity to work on it, and successful falsification will be obvious for
him? Why on earth trust the client with the data in the first place? I have
never liked the "encrypted cookie" way of handling session storage.

Just have a server side session store and all of this cookie encryption crap
just vanishes. Of course that won't help you with session fixation etc, but
the post doesn't address that stuff either. Do it over TLS and you're pretty
safe though.

And let's not forget the author's suggestion to pad out the cookie with 1000
bytes to make it harder to falsify. That cookie gets send with every single
request. 20 images on the page, you're sending 20KB of junk up just to load
the page. On a connection with slow upstream, like say ADSL, you can easily
add a second or two of request latency. You might not care but some people
sure do.

And come on, I read until the very end expecting to hear what "processes or
threads" have to do with security, so tell us already!

~~~
tptacek
I don't think you read this post carefully. The "padding out the cookie with
1000 bytes" thing is what an attacker does to break your cookie. The
"processes or threads" thing is a very cryptic in-joke. Sorry. Like I said,
there are aspects of this post I didn't think would carry over to HN
(actually, I didn't think any of it would).

------
jermy
So, is there much to be gained from encryption anyway? If, as the candidate, I
suggested sending a cookie as
'userId=39493&role=user&timestamp=1414919&hash=<sha256-of-key-with-data>' then
would I lose brownie points?

I assume my hash comparison function is constant time (eg. XOR(a[x],b[x])==0),
rather than comparing of char-by-char.

~~~
brl
> hash=<sha256-of-key-with-data>

Oops, you invented your own keyed MAC and now I can append any data I want.

~~~
jermy
Sorry, I would really be using a proper RFC 2104 HMAC implementation (eg.
python's hmac module).

------
jimfl
I always try to espouse the benefits of triple-rot13 as the cipher in
interview questions about security. Key management is a breeze, for example,
and it's VERY fast.

But it allows everyone to concentrate on how you're managing the secure
context instead of what algorithm has been shown to cause cancer in rats.

------
DenisM
I don't understand what's so hard about encryption. There are simple, well-
known rules (except the timing one that is sort of news), and if you follow
them you should be safe, no?

1\. Use a blocks-size unique prefix (IV) for each message (random will do as
well)

2\. SHA-256 your entire message before encryption and add the hash value at
the end to prevent tampering

3\. Use AES-256 with chaining to encrypt

4\. Use SHA-256 to turn a password into a key. If your key space is small
harden it with hashing 1000 times or so.

5\. The time your algorithm takes should be always the same (or at worst one
time for success and one for failure)

6\. Don't use any other symmteric crypto algorithm.

~~~
tptacek
Almost everything you wrote just now has problems.

1\. Has a well-known problem, which is why "Practical" suggests using a nonce.

2\. SHA-256'ing a known plaintext doesn't authenticate a messge. In fact, even
simply taking a secret key and appending it to your message before you SHA-256
the message isn't secure; there's a reason HMAC is as complicated as it is.

3\. This whole blog post was about things that go wrong with CBC mode. For
instance, nothing you wrote addressed padding checks --- btw, not strictly a
"timing" attack.

4\. Using a password as a crypto key is bad for reasons illustrated in the
post, which is why secure keystores use random bytes. Hashing 1000 times has
nothing to do with your key space.

5\. This is like saying "your algorithm should be secure". Easy to say.

6\. Everything we're talking about going wrong goes wrong even when you're
using AES.

So, yes, I believe you don't understand what's so hard about encryption.
You're obviously smart and you've taken some time with this material, and I
still don't believe you'd get this right in your first fielded version.

~~~
DenisM
I am not advocating usage of passwords for encryption, I am saying that if you
have to use passwords, this is how you use them. Obviously a strong random
bytestring is the best key one could possibly have and should be used when
possible.

Anyways, re padding - what if I hash the padding as well? surely an attacker
would not get anything of value by playing with it?

Thanks

~~~
tptacek
<http://www.isg.rhul.ac.uk/~kp/padding.pdf>

------
paulgb
Great post, not that I can claim to understand all of it. One question:

 _Just put a “for” loop around SHA-1 and run it 1000 times to generate the
key; that’ll at least slow down a brute force attack. SHA-1 is lightning fast.
By itself, it’s a crappy way to generate a key._

I see the purpose of this, but doesn't it also provide 999 more opportunities
for a hash collision to happen? Are you so confident in the collision
resistance of SHA-1 that this isn't an issue?

My (probably naive) proposal would be to keep the result of the first hash and
XOR it with the result of each subsequent hash. Bad idea?

~~~
pmjordan
As far as I can tell, you're confusing collisions with cycles. I don't think
any of the main cryptographic hashes have problems with short cycles.

~~~
paulgb
I could well be confusing the two, because I'm not sure if I even know what
cycles are :-). I'm guessing from the name that cycles are when h(h(h(...
h(a)))) = a? That's not quite what I'm getting at.

I'll try explaining my concern a little better. Suppose you have a hash
function h. Let p be the probability that h(x) == h(y) for any unique x and y.

Now suppose you have some arbitrary inputs a and b.

    
    
      h(a) = h(b) with probability p
      h(h(a)) = h(h(b)) with probability 2p
      h(h(h(a))) = h(h(h(b))) with probability 3p
      etc.
    

The probability at each step increases because a collision is possible at each
step, but a collision in a previous step _guarantees_ a collision in the
current one.

With a good hash function, p should be astronomically low, so I guess that's
why it isn't a problem in this case? Or am I completely off-base in my
assumptions?

~~~
pmjordan
I see what you mean now. And your intuition on cycles was spot-on.

And to answer your question (I'm not a crypto expert, so take with a pinch of
salt):

With SHA-1, that probability is generally accepted to be 2^(-160). 1000 is
roughly 2^10, so the final probability is about 2^(-150). I guess that's still
improbable enough...

But I think you do make a very valid and interesting point.

~~~
tptacek
The loop has nothing to do with the probability of a collision. If your key
comes out of /usr/share/dict/words, you have a 2^18 search space, not a 2^160
space. If you have a 2^18 search space, you better hope your constant factors
are very high. Hence the loop.

~~~
paulgb
Yeah, I get why the loop is there. But I still think it increases the
probability of a collision by nearly 1000x over the probability of a collision
without the loop.

Maybe a better way to explain my concern is to consider hashing something once
versus twice, rather than 1000 times.

Let h be a hash function and p be the probability P(h(x) = h(y)) for randomly
chosen x and y.

Say you have unique inputs a and b.

Obviously, P(h(a) = h(b)) = p.

Now consider P(h(h(a)) = h(h(b))). h(h(a)) = h(h(b)) if:

    
    
      h(a) = h(b) [probability p]
         OR
      h(a) != h(b) [probability 1-p], and
      h(h(a)) = h(h(b)) [probability p].
    

So the total probability when we repeat the hash is p + (1-p) * p

Since p is small, the probability of a collision has nearly doubled just by
applying the hash function again.

[note: Higher up in this thread I mistakenly implied that the probability
increased linearly. With a low number of iterations and realistic values for p
it should be close to linear, but not exact.]

[edit:formatting]

~~~
tptacek
H(m) -> SHA1(SHA1(m)) is not the same hash function as H(m) -> SHA1(m).

~~~
paulgb
Right, but if SHA1(m) = SHA1(n), then SHA1(SHA1(m)) = SHA1(SHA1(n)) also.

The converse is not true. There will be cases where SHA1(SHA1(m)) =
SHA1(SHA1(n)), but SHA1(m) != SHA1(n). (Or does SHA1 somehow guarantee that
this will not happen?)

So it seems to follow that the chances that H(a) = H(b) when H(m) ->
SHA1(SHA1(m)) must be higher than when H(m) -> SHA1(m).

Am I missing something?

~~~
tptacek
I think you're going to need to better explain the logic behind graf 2 before
I can answer you.

But on the off chance that we can end this thread gracefully, I'll point out
that SHA256(SHA(256(m), m) is SHAd256(m), and considered by Ferguson to
address security concerns in straight SHA256.

~~~
paulgb
It boils down to this:

If SHA1(n) = SHA1(m), _we know_ SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a
consequence of SHA1 being deterministic)

Also, even if SHA1(n) != SHA1(m), there is _still a chance_ that SHA1(SHA1(n))
= SHA1(SHA1(m)). (this is a consequence of collisions: SHA1(a) may equal
SHA1(b) even if a != b.)

So the probability that SHA1(SHA1(n)) = SHA1(SHA1(m)) must be more than the
probability that SHA1(n) = SHA1(m).

~~~
tptacek
You realize that despite the fact that I still don't follow the logic here,
SHA collisions have no meaningful impact on the security of PBKDF, right?

~~~
paulgb
Actually I didn't, but that more or less answers my original question. Now I
realize why you (or rather, your character) said that it didn't matter that
md5 is broken in this case.

In general though, I'm still convinced that putting a hash function in a loop
will increase the probability of a collision.

Or maybe I misunderstood your point about the loop? I assume by putting a loop
around the hash you mean something like this:

    
    
      key = SHA1(password)
      1000 times do
        key = SHA1(key)
      return key

~~~
tptacek
Look I don't want to be dense here --- and there's a good chance that's
exactly what's happening --- but it sounds to me like you're talking about
collisions between SHA1, SHA1(SHA1), SHA1(SHA1(SHA1)), etc.

Rename the functions. SHA1(SHA1) is now SHA1', SHA1(SHA1(SHA1)) is now SHA1''.

Explain why I care if there's an m and an n such that SHA1'(m) = SHA1''(n)?

They're two different hash functions.

~~~
jibiki
No, you're still missing the point :)

He's saying that SHA1'(m) = SHA1'(n) implies that SHA1"(m) = SHA1"(n).

Let's do this with a very simple, silly hash function.

Let h(x) = (x/2)%10 (in integer arithmetic, e.g, 5/2 = 2, so h(5) = (5/2)%10 =
2%10 = 2.) So then let h2(x) = h(h(x)) = (h(x)/2)%10, and h3(x) = h(h(h(x))) =
(h2(x)/2) % 10.

Here's a table:

    
    
      x:  h  h2  h3
      0   0  0   0
      1   0  0   0
      2   1  0   0
      3   1  0   0
      4   2  1   0
      5   2  1   0
      6   3  1   0
      7   3  1   0
      8   4  2   1
      9   4  2   1
      10  5  2   1
      11  5  2   1
      12  6  3   1
      13  6  3   1
      14  7  3   1
      15  7  3   1
      16  8  4   2
      17  8  4   2
      18  9  4   2
      19  9  4   2
      20  0  0   0
      21  0  0   0
      ...
    

Hopefully this table shows that collisions are much more likely for h2 than
for h, and more likely for h3 than for h2. The range of h3 is smaller than the
range of h2 which is smaller than the range of h.

~~~
paulgb
Right, that's what I had in mind.

Of course, it's a bit of a jump to assume that the result is the same when you
use a cryptographic hash function instead of a trivial one.

But regardless of the hash function, the same problem will occur unless there
are no collisions for input between 0 and 2^160 - 1 (inclusive). From what I
can tell, SHA1 can't guarantee this.

------
ars
Related to this: <http://www.codinghorror.com/blog/archives/001267.html> I
assume.

~~~
jcl
Perhaps... A comment on that article:
<http://news.ycombinator.com/item?id=615209>

Is the "AES" line a tptacek catchphrase? Or is he quoting someone else?

~~~
tptacek
It's a me catch-phrase. I am like Poochie that way.

------
snorkel
As far as doing it wrong can we go back to "app ‘A’ set a cookie with your
account ID in it" and discuss why on Earth the app works like that???

~~~
tptacek
Because app A and app B can't share state directly, and can't talk directly to
each other? You're basically asking why federation and single sign-on exist.

