You're right, of course. To be clear, though, the "doom principle" simply says "encrypt then MAC or die". TLS reverses the order (it was designed before this problem was seriously studied). TLS's CBC mode is flawed, but CBC mode in general --- I still wouldn't use it! --- is basically OK.
I know that I'm going to be thought a heretical fool for saying this, but overall I would consider CBC mode "more secure" than GCM. Of course encrypt-then-MAC would be preferable (which TLS now supports, though implementations are scarce) ... but even with the broken Mac-Then-Encrypt variant that we have today ... still more secure.
It's a case of choose your poison. With the Mte CBC vulns, we've seen some pretty hard to implement padding oracles; I don't think that these attacks are in general practical or a big risk for real-world users and workloads. It's awful, ugly, stuff, but it's also something that the internet has been able to stay on top of with patching.
On the other hand the mass-deployment of AES-GCM, we've deployed a length preserving cipher as the world's default for network traffic. An exact length-preserving mode makes realistic mass content fingerprinting attacks far cheaper and more feasible. These are very practical passive attacks that happen mostly out of band ... so I think it's a mistake to make these easier.
In real terms, AES-GCM reduces by a exponential factor the amount of state that a FVEY-style attacker must maintain to mostly-successfully identify what webpages you are browsing. It's so much more feasible, in the realm of an undergrad term project, that I would be surprised if it is not occurring. Even a modestly resourced security agency with relatively sophomoric skills can do it.
To my mind, making a practical passive attack much cheaper, to the point that it may be occurring (though that's a guess, I have no information) to be able to turn our nose at a cryptographic construct that offends us, but is less practical to attack ... seems to me like the wrong trade-off.
At least with TLS1.3 and QUIC we can pad encrypted data and get a measure of length-hiding back; enough to at least raise costs for passive fingerprinting attacks (and raise those costs more than CBC ever could on its own).
Yes! I've experimented with server-side changes, but unfortunately the request length leaks information too, so the client needs to do something similar.
In real terms, I think the consequence of the rapid mass-migration away from CBC made the passive attacks far easier; instead of building graphs and de-fuzzers, analysts can identify content using combined request and response length signatures. All of that happened purely at the cryptographic layer, and it's unfortunate.
In general, it's hard to ask HTTP servers and applications higher up the stack to take it on; the folks writing those ton'd tend to be cryptography or SIGINT experts. It's still easy enough to find subtle CRIME-like compression attacks against servers and applications that mix compression and untrusted input, which is some sign of how effective we all are at chasing down similar kinds of issues at those layers.
Certainly this logic doesn't suggest using CBC instead of an AEAD for a new, non-TLS system, where, if traffic analytic attacks were a concern, you'd just bake that into the protocol itself while still claiming the benefits of modern authenticated encryption.
That's a tough call. We've thoroughly learned that not only should developers not build their own cryptographic primitives, but that they shouldn't even trivially combine them on their own. "Use AEAD" (or better yet: use noise) is good advice for application and protocol designers who on their own probably wouldn't even think about the problems ... but it's not sufficient.
We haven't seemed to learn that length hiding, traffic analysis, and shared compression dictionaries are all so firmly in the realm of what encryption is supposed to do ... make content secret ... that developers also can't be trusted to get these right on their own.
So if you're in the position to telling someone to use AEAD instead of CBC, I'd make sure to also tell them that they should think hard about length hiding, message timing, and shared compression dictionaries, and whether those are important concerns in that environment. Oh and they still have to watch out for IV problems with plenty of the AEAD implementations.
Personally I long for some better crypto boxes than "AEAD" and I should really get off my butt and define one; at the level of total hand-waving, I trust cryptographers more to review this stuff much better than protocol designers.
Like here I am claiming that TLS, the most pervasive network encryption protocol, has actually gotten measurably less secure at encryption, by at least one important measure, in the last 5 years.
Measurably? I think you have a point about length hiding, although many models seem to just dismiss it, but I'm not sure the accidental padding you get from CBC is measurably better.
Scenario 1: https, downloading a file. Plus or minus a few bytes is still accurate enough for identification, no?
Scenario 2: ssh, where one keystroke equals one packet, regardless of length. I can still hear your password.
Did you have a scenario in mind where CBC padding provides a meaningful degree of length obfuscation?
The case I think about the most is web fingerprinting. For example if the authorities in a country that funnels all external traffic through a point of mediation want to identify who is reading certain specific pages on wikipedia - they likely can.
The most effective forms of web fingerprinting focus on static resources; the two most prominent papers focus on Google Maps tiles, and Netflix videos, which were both collections of static resources that were compressed offline. Static resources are easier to target because the size of response is fixed, and the domains they're hosted on tend to be relatively free from cookies, auth tokens, query strings, and other noise.
In practical terms for attackers, it's easiest to focus on the images, CSS and javascript that your browser is downloading; since these are usually static, pre-compressed, and on CDN-hosted domains. This is pretty feasible against Wikipedia, but also Facebook shares, tweets, etc ... anywhere where you're viewing content that includes images or video.
O.k. so far nothing controversial, nobody would even get much of a paper published about this, we bucket it under "Traffic Analysis" and while it might shock the average internet user, cryptographers aren't surprised.
Where CBC vs GCM comes in is that GCM reduces the "cost" of these fingerprints and increases the effectiveness of the whole approach. If a webpage contains say 4 images, and an external CSS reference, that gives you 10 points of information to fingerprint. You get 5 request lengths and 5 response lengths. Given the spread of URL lengths and file sizes, it's pretty easy for that combination to be unique, or at least narrow things down to a small set of possibilities. With CBC enabled, each request length and response length is rounded to the nearest 16-bytes. That's not a huge amount of padding, more would be better, but it still reduces the uniqueness of the fingerprinting by an exponential factor. It's particularly effective on the request lengths, which are small to begin with.
Now with more requests, and a bigger graph, it's still probably possible to de-fuzz this; but my point is that in practical terms - when you have exact lengths, you just don't need to this. With exact lengths, you can ask a junior SDE to code this up for you in a few weeks. Someone who barely understands tcpdump and mysql can build this. With blocks and /some/ length hiding, the same person finds that they get less signal and have to correlate between more requests and pages, it might not even be practical for many sites. On principle, I think it's dumb to lower costs for practical attacks like this.
Okay, so I'm assuming you can help the protocol designers make the appropriate improvements in TLS 1.4, or whatever the next major version gets called.
In the meanwhile, what about those of us actively trying to deprecate TLS 1.0 and 1.1 and support exclusively TLS 1.2+, and with certain preferred cypher suites?
Maybe? I believe with enough traffic you can determine the actual content size by calculating the average apparent padding length. (Compare JavaScript timing side channel attacks that can remove random clock jitter, or Onion Routing traffic analysis.)
What you really want is padding to a fixed-length block. That requires restricting the maximum allowed length of confidential fields. (Could vary per field.) Yet such "arbitrary" limits have been frowned upon for ages as poor design.
That said, there's usually nothing stopping people from imposing their own limits and padding accordingly, server- and client-side. But without enforcement across the entire software stack it's difficult to maintain the invariant.
> At least with TLS1.3 and QUIC we can pad encrypted data and get a measure of length-hiding back; enough to at least raise costs for passive fingerprinting attacks (and raise those costs more than CBC ever could on its own).
I have to say this feels like both the more actionable and more insightful output of your... heresy.
We are not going back to CBC. You say that some CBC attacks don't seem practical. The trouble with the Web is it's like some lunatic deliberately designed a system to be attacked. Oh you have a chosen plaintext attack that needs 1 million repetitions? Impractical. Wait, Javascript exists. OK, but even with Javascript who'd ever have a system that terminates HTTPS for completely unrelated parties on the same physical hardware - there's no way we'll get to exploit that? Oh, bulk hosting exists. OK. Still nobody would build a crypto system in which one party is just completely anonymous all the... oh wait, that's exactly how Tim's system was designed. Eric Rescorla has some great slides about this.
Whereas "Hey, let's pad every message to hit the path MTU size for packets" totally feels like something vendors could do next week and make a huge impact with no real cost. A lot of network devices can't hit their target speeds for small packets anyway, so the impact may be not merely negligible but literally non-existent.
I have to say though, I'm not actually clear on why we couldn't pad TLS 1.2's AEAD modes to the same effect?
The TLS 1.3 padding is _clever_ but what does it do here that makes it more important for your passive fingerprinting attack than the older padding scheme (with an AEAD cipher)?
AES is a block cipher that operates on 128-bit blocks exclusively. If you look at the implementation of AES-GCM, it also operates on 128-bit blocks. AES-GCM is basically just counter mode with built in correct MAC handling.
Your claim that AES-GCM is length preserving is completely false, other than the obvious multiples of the block size which is the same as CBC mode or any other mode.
Your statement isn't heretical, it's just based on a false statement. If the premise of your statement were true, your extra analysis would be correct.
I would read up on the details of the mode itself, this graph is a good place to start then move on to the mathematical section which talks about the 128-bit blocks:
First, you might want to look closer at who you're talking to.
Second, you might want to read a little more about how counter mode works.
Or, why not just try it yourself? "pip install cryptography", pop open a Python shell, and encrypt some stuff. There's only so much you can get from Wikipedia. Because: counter mode doesn't create ciphertexts that are multiples of the block size; not doing that is the point of counter mode.
I opened up my book of cryptography from Jonathan Katz, and found this in regards to CTR mode:
> As with OFB mode, another "stream-cipher" mode, the generated stream can be truncated to exactly the plaintext length
This is contrary to what Jonathan Katz told a class taught from the University of Maryland which I took, which is odd. I've used CTR mode before and exploited CBC padding oracles, I don't recall being able to use CTR mode this way but I rarely used it since we focused on CBC exploitation and a bit too much on CPA/CCA proofs. After looking at the diagrams shown, it's clear that since the message is XOR'd on the output of the generative random of the encryption function, CTR mode can indeed be truncated since the remaining generative output can be ignored. So I had a misunderstanding in my head on CTR mode.
Now on to GCM. Unfortunately my edition of Katz' book doesn't include GCM so I have to default to Wikipedia. The last XOR is more than likely where a truncation can occur, so I was wrong about GCM mode as well.
As for Python tests (package already installed, no need to use PIP when it's in the various Linux repos):
While this proves your point, I want to make clear that ignoring the reason and just trusting the output of an implementation isn't really a good way of learning things (although complimentary). I was using the Wikipedia and text references so I could understand why it allowed variable length, and at my first look the construction didn't appear like you could truncate.
Despite all of this, the CTR-mode section of the book includes a CPA-security proof and the CBC section says it is vulnerable to CPA. I'm going to try to dig through that to see why. If they are cognizant of the fact that same length attacks are something that makes you vulnerable, there must be a reason why they believe CTR/GCM are not.
You did not exploit a CBC padding oracle in CTR mode, because CTR mode isn't padded. Again: that is the point of CTR mode: to transform a block cipher (or, for that matter, a keyed hash function) into a stream cipher.
GCM is built on top of CTR; CTR is literally in the name. Like CTR, GCM is a stream cipher mode. There is no GCM padding.
The output of the code you pasted should be 26, 27, and 27, not 28 (as it is on my system). "test12" and "test13" have the same length, and consume the same amount of CTR keystream.
You seem hell bent on making me look bad based on a statement I made which I've already conceded was incorrect. I have no ties to this account to my actual identity and no intention of doing so, so I have no idea why you'd care about anonymous dorkusmcgavin's reputation which will never take off by definition anyway.
This is sort of why I don't like tying my identity to any accounts on the internet and don't engage in academia. When you do, you have to dig in deep on your statements and then end up with meaningless ad-hominem attacks on identity rather than seeking truth to the discussion. This is increasingly happening all over the internet, too, so I might just drop off of it entirely and just go back to reading for my own sake.
> You did not exploit a CBC padding oracle in CTR mode, because CTR mode isn't padded.
Correct, I have never done that nor have I claimed to. I said the class focused on CBC padding oracles and I exploited CBC mode in that way. I also stated that I've used CTR mode before. These were two different statements. This is so far off-topic that I don't understand why you care.
> GCM is built on top of CTR; CTR is literally in the name. Like CTR, GCM is a stream cipher mode. There is no GCM padding.
You are literally responding to a comment where I show there is no padding and concede you are correct here.
> The output of the code you pasted should be 26, 27, and 27, not 28 (as it is on my system). "test12" and "test13" have the same length, and consume the same amount of CTR keystream.
It was supposed to be testbytes123, and I copied from an IPython window where I had typed both out. I apparently not only typed it in incorrectly the first time when entering it in and corrected it, but then copied and pasted the wrong line when going back to this window.
Look, I'm trying to get to the bottom of the original debate here, I don't really care too much about my copy paste errors or stating for a ninth time that my original comment was incorrect because it's been awhile since I went through these courses. I want to learn things, not have an ego fight. Let's move on.
The original topic being, why exactly is AES-GCM considered more secure than CBC by expert cryptographers?
I've been re-reading parts of Katz' book and I think I can come up with my interpretation, but I'm trying to find a section where he mentions this (because he comes very close in a few sections when discussing the classes of proofs themselves and also of his old-timey stories). I also think he could word it better and his reasoning might be different than my theory.
I think the point is that if you can choose the length of the attack, it doesn't matter if it's grouped into clusters of X-bits or not as long as the message is at least of a certain length (obviously encrypting the string "true" and "false" you'll be able to tell the difference if you know the context). If you can choose the length of the attack, then you'll most likely always be able to determine the length of various other encrypted information in the resulting ciphertext, it just becomes slightly more work.
So say I have X-bits of plaintext that I control of a plaintext that includes other text before being encrypted (HTTP as the example here). If it's padded, it might output Y-bits of ciphertext. If you then do (X+1)-bits of plaintext, you could again get Y-bits of ciphertext again. This can be perceived as an advantage, but if the attacker then pads the plaintext with zeroes at the beginning and incrementally passes it to the server, the (X+2)-bit, up until (X+Z)-bit output will eventually tick over to the next block, and then you have (Y+blocksize)-bit ciphertext. At that point, they can hang there at that specific plaintext length and then use that to perform the same length-based attack as without padding.
I'm going to keep digging because this question is really fascinating to me right now. Katz does have sections where he mentions fixed-length encryption mechanisms where the ciphertext is defined at a certain length, so I'm looking around there right now because I have a feeling he might probably somewhere state that any stream-cipher (which he includes CBC-mode as a stream cipher) is default less secure period. He might even have a proof somewhere on that.
The point of the padding in CBC mode is simply because it's necessary based on the way the plaintext moves into the encryption function. Padding oracles are dangerous because they exploit the type of padding, not because they exploit the length. They also do so in a way that allows you to decrypt entire blocks. IV-reuse can bump into a similar block border issue as I was describing above, too.
This is a lot of text responding to some pretty simple facts. The only value judgement I've made here is that you might want to look up who people are before you decide to tell them they're spreading false statements (that's with respect to Colm, not me; you can say whatever you'd like about me.)
The rest of it is just: here are things I think you might want to know. Read them again: none of them say you're a bad person. They're just statements of fact.
GCM is "more secure" than CBC because it's authenticated (technically, authenticated with additional authenticated data). CBC is not authenticated; it's malleable, so you can violate integrity and, with CCAs like Vaudenay's oracle, confidentiality as well.
You can separately authenticate CBC with a MAC (usually HMAC); that's the "generic composition" of CBC and HMAC. Generic composition is generally believed to be secure in the encrypt-then-MAC construction, but, as we're seeing with TLS here, not the other way around.
In neither a generic ETM CBC+HMAC nor in GCM should control of any plaintext anywhere in the message allow you to violate cryptographic integrity or confidentiality. But CBC+HMAC != CBC, bringing us back around to our point here.
Padding oracles don't really exploit PKCS7 pading itself so much as they exploit the decryption process of CBC, coupled with the ability to choose ciphertexts (again, a capability GCM takes away from you).
> This is a lot of text responding to some pretty simple facts. The only value judgement I've made here is that you might want to look up who people are before you decide to tell them they're spreading false statements (that's with respect to Colm, not me; you can say whatever you'd like about me.)
You are correct that I was wrong with what my statements were, but I don't care if it's Jonathan Katz who's spreading misinformation, if I believe someone is spreading misinformation I'm going to call them out on it. I was WRONG this time (again have to admit that?), but am trying to address the original question. Half of that "large text" was new information that you've distracted the topic away from.
And you've done it yet again, changed the topic.
My offtopic sub-point here is that you are taking large quantities of information out of context for no reason other than to change the subject away from actual information search.
> But CBC+HMAC != CBC
Here's what Colm said:
> I know that I'm going to be thought a heretical fool for saying this, but overall I would consider CBC mode "more secure" than GCM.
OMG he said CBC mode (NOT CBC-HMAC!) is "more secure" than GCM! Go call him out on that one sentence taken out of context!
Obviously we are talking about CBC-HMAC because that's the topic. Colm started it on that topic, and if you look at his entire post, the context makes that clear. If you instead focused on the fact that I'm talking in the same context, you wouldn't get off on a tangent that somehow I think that Colm believes CBC mode without an HMAC is somehow more secure (he doesn't, obviously). You are creating a strawman for no reason. You are right, you aren't attacking me. You are conducting half of a strawman argument. You've created a strawman, and just left it there. It's weird. It changes the topic. It serves no purpose except to derail the conversation.
Colm is arguing that CBC with a MAC is more secure than GCM, but if you take the strict way you are interpreting my statements, you would not believe that because you are applying a double standard to me. Colm apparently is allowed to make contextual statements but I am not.
Again, back to the original topic I'm trying to figure out why the encryption experts believe the opposite, which he also says is the overwhelming majority of opinion, and why we are even conversing in the first place. I doubt I can keep this on topic anymore because you keep drifting it off for quite literally no reason.
Screw it. I'm going back to reading the book. I will not share my findings with you because you have already ignored my other findings and theories and it will just fall on deaf ears.
[...]spreading misinformation, if I believe someone is spreading misinformation I'm going to call them out on it. I was WRONG this time (again have to admit that?)
Maybe you're bringing something to the conversation that isn't inherently there? Someone could be just wrong without trying to 'spread misinformation' and cowering in fear of your implacable fist of callout justice. Similarly, someone could just be pointing out you're wrong without a diabolical plan to force you into some embarrassing public recanting and ruin your reputation.
I am lost. What is your complaint at this point? You went on HN, participated in a thread, and now know more about cryptography than you did this morning. You're welcome? :)
But when CBC is used correctly (with authentication), then one could just as well use CTR and get a nice performance improvement. The differences between these two (IV reuse, malleability) only become relevant when they're used incorrectly in the first place.
I would probably generally use CTR? But you can chop this up in different ways; CTR is faster than CBC, and slightly less code to implement, but CBC (assuming EtM) fails less catastrophically than CTR if initialized improperly. CBC+HMAC-SHA2 is probably the safest thing you can implement from scratch.
There's a downside to this, if you're trying to serve people running Safari on OSX Yosemite or older. Nobody cares about Windows Phone 8.1 though so you're probably safe there. [0]
Are these actually novel vulnerabilities, or just unpatched CBC vulnerabilities? As far as I am aware CBC has been discouraged for a long time precisely because of the padding oracle attacks.
They're novel instances of a bug class, the same way a new use-after-free in Chrome is a novel bug. This is what makes crypto vuln research so great: the published work deals mostly in bug classes, which can be leveraged against systems ad infinitum.
>As far as I am aware CBC has been discouraged for a long time precisely because of the padding oracle attacks
My understanding of these padding oracle attacks was that they were the result of an implementation vulnerability that leaked information through error messages about the padding being the cause. Not that the problem was an inherent flaw in CBC itself.
If the server simply used generic error messages when the padding was invalid, there would be no 'oracle' AIUI.
> Not that there was an inherent flaw in CBC itself.
But if history shows it's difficult to implement something without creating vulnerabilities, and there are alternatives which are less prone to mistakes, it should be retired and discouraged to be implemented.
Also, CBC itself is not an actual problem, but CBC in TLS is authenticate-then-encrypt, which is how the vast majority of problems are created.
>But if history shows it's difficult to implement something without creating vulnerabilities, and there are alternatives which are less prone to mistakes, it should be retired and discouraged to be implemented.
>Also, CBC itself is not an actual problem, but CBC in TLS is authenticate-then-encrypt, which is how the vast majority of problems are created.
Oh definitely, GCM (or another AEAD cipher) is a much better choice - I wouldn't argue in favour of CBC in production TLS cipher suites. I just wanted to comment on the technical details of where padding oracle vulnerabilities actually arise.
> Oh definitely, GCM (or another AEAD cipher) is a much better choice
GCM is only a better choice if you have hardware support for it. I'd recommend Chacha20-Poly1305 for anything that might have to cope without hardware support.
CBC has a fundamental weakness in that you can toggle one bit at a time and get a predictable change. Encrypt-then-authenticate would obviously resolve that, but that's trivially broken by breaking SHA1/2 ;)
Of course a problem with encrypt then authenticate is that you can verify the identity of the sender without decrypting the content, which is of course why A then E happened in the first place - the threat model turned out to be incorrect. For messaging platforms I would still lean towards authenticate then encrypt as I think the attack model is different.
No, this isn't a fundamental weakness of CBC. In fact, CTR is even more susceptible to this problem (a CBC bitflip totally corrupts the block to which it's applied, and creates a predictable change in the subsequent block; a CTR bitflip simply flips the matching decrypted plaintext bit; in that sense CTR is totally and perfectly malleable).
Also: nobody uses SHA1 anymore, SHA1 and SHA2 aren't MACs (you're thinking of HMAC), and even MD5 is still unbroken in HMAC.
One issue (the openssl one I think) wasn't a different error message, but rather the different delay in time to closing the connection. The end result is obviously the same, but it's important to remember that it's not just a matter of returning the same error code, but rather having identical termination characteristics.
That was my exact thought - CBC mode is not secure against a vast array of attacks, and I’m not sure if I want to read an entire paper to determine if there’s anything new/novel in attacking a known weak cipher mode
No, this isn't true. CBC mode is generally fine. The problem is improperly authenticated CBC. But improperly authenticated CTR is also insecure, and CTR is an essential building block of a whole bunch of modern AEAD schemes, including GCM.
CBC mode is also susceptible to attacks based on predictable IVs. But again, CTR is susceptible to a sort of isomorphic class of bugs related to repeated nonces --- as, of course, is GCM, which you mention before; GCM not only inherits CTR's brittleness with respect to nonces but also fails in spectacular new ways with them.
Arguably, if you're going to have to build your own cryptosystem from primitives (don't do this), you have the best chance of success with random-IV CBC mode and HMAC-SHA2 in an EtM configuration. You're less likely to blow that up than GCM.
The research here is in detecting instances of the well-understood CBC padding oracle flaw. A new measurement technique that surfaced a whole bunch of new instances of heap overflows in server software would be an important contribution, in the same way this is.
I don't know the best summary, but the broad strokes are that nonce reuse both destroys confidentiality and allows recovery of authentication keys --- this is an extraordinarily common problem and one attackers can sometimes induce --- as does truncating the GCM MAC, and, if you don't have hardware support for GCM (ie, x64 CLMUL), the table-based version is timable.
Essentially these are implementation specific bugs resulting in oracles, rather than being attacks on CBC itself - basically openssl and some embedded (?) TLS stacks produce different results (delayed hang up and/or different error codes) for padding vs. MAC failures in CBC mode.
I assume that the joy of CBC mode just makes those oracles easier to directly exploit. The exploits are still harder to use than GCM with a reused nonce (reusing a nonce in GCM mode essentially breaks it. GCM is hard).
Repost Moxie's The Cryptographic Doom Principle, again...
https://moxie.org/blog/the-cryptographic-doom-principle/