Set up an encryptor:
irb> e = OpenSSL::Cipher::Cipher.new('aes-256-ofb')
irb> e.key = "\x11" * 32
irb> e.iv = "\x00" * 16
irb> d = OpenSSL::Cipher::Cipher.new('aes-256-ofb')
irb> d.key = "\x11" * 32
irb> d.iv = "\x00" * 16
irb> ciphertext = (e << "A * 40")
irb> mask = ("187 she wrote".to_bignum ^ ("A" * 13).to_bignum).to_rawstring
irb> new_ciphertext = (ciphertext.to_bignum ^ mask.to_bignum).to_rawstring
irb> d << nct
=> "AAAAAAAAAAAAAAAAAAAAAAAAAAA187 she wrote"
Definitely an interesting read even for crypto dilettantes. Perhaps, especially for crypto dilettantes.
I took Rivest's Computer and Network Security class in college and the most important takeaway for me, far outstripping all of the interesting technical content, was "Don't implement crypto."
[edit: Thomas, thanks for the book recommendation below, I'll definitely grab a copy]
Everything that is wrong with "Applied" is right with "Practical". "Here are 4 modern block ciphers. We wrote one of them. Don't use it. You should use AES, but if you're a paranoid, use Serpent. But really use AES." It's great stuff, especially because if you really read it, you're going to end up not implementing crypto directly at all.
Unfortunately, there's seems to be a huge gap between the average working programmer, who often has little idea how to do security, and the security guru's who are often barely intelligible to the rest of us. :-)
Sadly, the result of most programmer's poor understanding of security is even worse than the bad crypto implementation in the article. It's often something as basic as thinking it's O.K. to store user passwords un-hashed. (http://news.ycombinator.com/item?id=628680)
What's the best battle-tested library (and call) for implementing exactly that, without making any of the common mistakes?
You've made me so nervous about everything I thought was true that I did a hg revert and am looking at gpgme bindings.
You've done a good deed, I think.
A question to start up founders: How do you guys handle security? Do you hire consultants/pen-testers? Do you just copy+paste code from the internet? Do you use whatever framework's components for it?
Gotta love putting a lot of effort into an HN comment thread on a submission that ends up flagged. =)
A place like this exists? All I can think of is Intelligensia.
Our office is in fact a few floors up from the Intelligentsia in the Monadnock building.
Keyczar and cryptlib are two libraries that offer a high-level interface that is deliberately hard to screw up. But Google Keyczar is very new (it recently had a really horrible flaw) and cryptlib is not free unless you GPL your code.
But I actually kind of agree (at the very least, it's visually noisier than the prose wall of text). There's a backstory to this; I haven't blogged in almost a year, and my most infamous blog post from before that was also a screenplay, so doing this post in this style was also an in-joke:
(This post is funnier than today's post).
I'm sure you're glad you know this now.
all that is a long way of saying, in spite of the visual formatting, it's a very interesting article. and personally, i think the the screenplay itself works.
Just have a server side session store and all of this cookie encryption crap just vanishes. Of course that won't help you with session fixation etc, but the post doesn't address that stuff either. Do it over TLS and you're pretty safe though.
And let's not forget the author's suggestion to pad out the cookie with 1000 bytes to make it harder to falsify. That cookie gets send with every single request. 20 images on the page, you're sending 20KB of junk up just to load the page. On a connection with slow upstream, like say ADSL, you can easily add a second or two of request latency. You might not care but some people sure do.
And come on, I read until the very end expecting to hear what "processes or threads" have to do with security, so tell us already!
Also processes vs threads was just another interview question, it was only slightly more relevant to the story than unicorns in space.
I assume my hash comparison function is constant time (eg. XOR(a[x],b[x])==0), rather than comparing of char-by-char.
Often you don't need confidentiality, and in those cases, HMAC can be a safer bet than a secure encrypted message format. You want to use HMAC though, not a simple hash with a secret key in it.
Oops, you invented your own keyed MAC and now I can append any data I want.
But it allows everyone to concentrate on how you're managing the secure context instead of what algorithm has been shown to cause cancer in rats.
1. Use a blocks-size unique prefix (IV) for each message (random will do as well)
2. SHA-256 your entire message before encryption and add the hash value at the end to prevent tampering
3. Use AES-256 with chaining to encrypt
4. Use SHA-256 to turn a password into a key.
If your key space is small harden it with hashing 1000 times or so.
5. The time your algorithm takes should be always the same (or at worst one time for success and one for failure)
6. Don't use any other symmteric crypto algorithm.
1. Has a well-known problem, which is why "Practical" suggests using a nonce.
2. SHA-256'ing a known plaintext doesn't authenticate a messge. In fact, even simply taking a secret key and appending it to your message before you SHA-256 the message isn't secure; there's a reason HMAC is as complicated as it is.
3. This whole blog post was about things that go wrong with CBC mode. For instance, nothing you wrote addressed padding checks --- btw, not strictly a "timing" attack.
4. Using a password as a crypto key is bad for reasons illustrated in the post, which is why secure keystores use random bytes. Hashing 1000 times has nothing to do with your key space.
5. This is like saying "your algorithm should be secure". Easy to say.
6. Everything we're talking about going wrong goes wrong even when you're using AES.
So, yes, I believe you don't understand what's so hard about encryption. You're obviously smart and you've taken some time with this material, and I still don't believe you'd get this right in your first fielded version.
Anyways, re padding - what if I hash the padding as well? surely an attacker would not get anything of value by playing with it?
Just put a “for” loop around SHA-1 and run it 1000 times to generate the key; that’ll at least slow down a brute force attack. SHA-1 is lightning fast. By itself, it’s a crappy way to generate a key.
I see the purpose of this, but doesn't it also provide 999 more opportunities for a hash collision to happen? Are you so confident in the collision resistance of SHA-1 that this isn't an issue?
My (probably naive) proposal would be to keep the result of the first hash and XOR it with the result of each subsequent hash. Bad idea?
I'll try explaining my concern a little better. Suppose you have a hash function h. Let p be the probability that h(x) == h(y) for any unique x and y.
Now suppose you have some arbitrary inputs a and b.
h(a) = h(b) with probability p
h(h(a)) = h(h(b)) with probability 2p
h(h(h(a))) = h(h(h(b))) with probability 3p
With a good hash function, p should be astronomically low, so I guess that's why it isn't a problem in this case? Or am I completely off-base in my assumptions?
And to answer your question (I'm not a crypto expert, so take with a pinch of salt):
With SHA-1, that probability is generally accepted to be 2^(-160). 1000 is roughly 2^10, so the final probability is about 2^(-150). I guess that's still improbable enough...
But I think you do make a very valid and interesting point.
Maybe a better way to explain my concern is to consider hashing something once versus twice, rather than 1000 times.
Let h be a hash function and p be the probability P(h(x) = h(y)) for randomly chosen x and y.
Say you have unique inputs a and b.
Obviously, P(h(a) = h(b)) = p.
Now consider P(h(h(a)) = h(h(b))). h(h(a)) = h(h(b)) if:
h(a) = h(b) [probability p]
h(a) != h(b) [probability 1-p], and
h(h(a)) = h(h(b)) [probability p].
Since p is small, the probability of a collision has nearly doubled just by applying the hash function again.
[note: Higher up in this thread I mistakenly implied that the probability increased linearly. With a low number of iterations and realistic values for p it should be close to linear, but not exact.]
The converse is not true. There will be cases where SHA1(SHA1(m)) = SHA1(SHA1(n)), but SHA1(m) != SHA1(n). (Or does SHA1 somehow guarantee that this will not happen?)
So it seems to follow that the chances that H(a) = H(b) when H(m) -> SHA1(SHA1(m)) must be higher than when H(m) -> SHA1(m).
Am I missing something?
But on the off chance that we can end this thread gracefully, I'll point out that SHA256(SHA(256(m), m) is SHAd256(m), and considered by Ferguson to address security concerns in straight SHA256.
If SHA1(n) = SHA1(m), we know SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of SHA1 being deterministic)
Also, even if SHA1(n) != SHA1(m), there is still a chance that SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of collisions: SHA1(a) may equal SHA1(b) even if a != b.)
So the probability that SHA1(SHA1(n)) = SHA1(SHA1(m)) must be more than the probability that SHA1(n) = SHA1(m).
In general though, I'm still convinced that putting a hash function in a loop will increase the probability of a collision.
Or maybe I misunderstood your point about the loop? I assume by putting a loop around the hash you mean something like this:
key = SHA1(password)
1000 times do
key = SHA1(key)
Rename the functions. SHA1(SHA1) is now SHA1', SHA1(SHA1(SHA1)) is now SHA1''.
Explain why I care if there's an m and an n such that SHA1'(m) = SHA1''(n)?
They're two different hash functions.
He's saying that SHA1'(m) = SHA1'(n) implies that SHA1"(m) = SHA1"(n).
Let's do this with a very simple, silly hash function.
Let h(x) = (x/2)%10 (in integer arithmetic, e.g, 5/2 = 2, so h(5) = (5/2)%10 = 2%10 = 2.) So then let h2(x) = h(h(x)) = (h(x)/2)%10, and h3(x) = h(h(h(x))) = (h2(x)/2) % 10.
Here's a table:
x: h h2 h3
0 0 0 0
1 0 0 0
2 1 0 0
3 1 0 0
4 2 1 0
5 2 1 0
6 3 1 0
7 3 1 0
8 4 2 1
9 4 2 1
10 5 2 1
11 5 2 1
12 6 3 1
13 6 3 1
14 7 3 1
15 7 3 1
16 8 4 2
17 8 4 2
18 9 4 2
19 9 4 2
20 0 0 0
21 0 0 0
Of course, it's a bit of a jump to assume that the result is the same when you use a cryptographic hash function instead of a trivial one.
But regardless of the hash function, the same problem will occur unless there are no collisions for input between 0 and 2^160 - 1 (inclusive). From what I can tell, SHA1 can't guarantee this.
Is the "AES" line a tptacek catchphrase? Or is he quoting someone else?