Deep breath. I know that I'm going to be thought a heretical fool for saying thi...

tptacek · on Feb 26, 2019

Couldn't length-hiding be done more effectively (than CBC) at the HTTP layer by injecting random headers?

colmmacc · on Feb 26, 2019

Yes! I've experimented with server-side changes, but unfortunately the request length leaks information too, so the client needs to do something similar.

In real terms, I think the consequence of the rapid mass-migration away from CBC made the passive attacks far easier; instead of building graphs and de-fuzzers, analysts can identify content using combined request and response length signatures. All of that happened purely at the cryptographic layer, and it's unfortunate.

In general, it's hard to ask HTTP servers and applications higher up the stack to take it on; the folks writing those ton'd tend to be cryptography or SIGINT experts. It's still easy enough to find subtle CRIME-like compression attacks against servers and applications that mix compression and untrusted input, which is some sign of how effective we all are at chasing down similar kinds of issues at those layers.

tptacek · on Feb 26, 2019

Certainly this logic doesn't suggest using CBC instead of an AEAD for a new, non-TLS system, where, if traffic analytic attacks were a concern, you'd just bake that into the protocol itself while still claiming the benefits of modern authenticated encryption.

colmmacc · on Feb 26, 2019

That's a tough call. We've thoroughly learned that not only should developers not build their own cryptographic primitives, but that they shouldn't even trivially combine them on their own. "Use AEAD" (or better yet: use noise) is good advice for application and protocol designers who on their own probably wouldn't even think about the problems ... but it's not sufficient.

We haven't seemed to learn that length hiding, traffic analysis, and shared compression dictionaries are all so firmly in the realm of what encryption is supposed to do ... make content secret ... that developers also can't be trusted to get these right on their own.

So if you're in the position to telling someone to use AEAD instead of CBC, I'd make sure to also tell them that they should think hard about length hiding, message timing, and shared compression dictionaries, and whether those are important concerns in that environment. Oh and they still have to watch out for IV problems with plenty of the AEAD implementations.

Personally I long for some better crypto boxes than "AEAD" and I should really get off my butt and define one; at the level of total hand-waving, I trust cryptographers more to review this stuff much better than protocol designers.

Like here I am claiming that TLS, the most pervasive network encryption protocol, has actually gotten measurably less secure at encryption, by at least one important measure, in the last 5 years.

tedunangst · on Feb 26, 2019

Measurably? I think you have a point about length hiding, although many models seem to just dismiss it, but I'm not sure the accidental padding you get from CBC is measurably better.

Scenario 1: https, downloading a file. Plus or minus a few bytes is still accurate enough for identification, no?

Scenario 2: ssh, where one keystroke equals one packet, regardless of length. I can still hear your password.

Did you have a scenario in mind where CBC padding provides a meaningful degree of length obfuscation?

colmmacc · on Feb 26, 2019

The case I think about the most is web fingerprinting. For example if the authorities in a country that funnels all external traffic through a point of mediation want to identify who is reading certain specific pages on wikipedia - they likely can.

The most effective forms of web fingerprinting focus on static resources; the two most prominent papers focus on Google Maps tiles, and Netflix videos, which were both collections of static resources that were compressed offline. Static resources are easier to target because the size of response is fixed, and the domains they're hosted on tend to be relatively free from cookies, auth tokens, query strings, and other noise.

In practical terms for attackers, it's easiest to focus on the images, CSS and javascript that your browser is downloading; since these are usually static, pre-compressed, and on CDN-hosted domains. This is pretty feasible against Wikipedia, but also Facebook shares, tweets, etc ... anywhere where you're viewing content that includes images or video.

O.k. so far nothing controversial, nobody would even get much of a paper published about this, we bucket it under "Traffic Analysis" and while it might shock the average internet user, cryptographers aren't surprised.

Where CBC vs GCM comes in is that GCM reduces the "cost" of these fingerprints and increases the effectiveness of the whole approach. If a webpage contains say 4 images, and an external CSS reference, that gives you 10 points of information to fingerprint. You get 5 request lengths and 5 response lengths. Given the spread of URL lengths and file sizes, it's pretty easy for that combination to be unique, or at least narrow things down to a small set of possibilities. With CBC enabled, each request length and response length is rounded to the nearest 16-bytes. That's not a huge amount of padding, more would be better, but it still reduces the uniqueness of the fingerprinting by an exponential factor. It's particularly effective on the request lengths, which are small to begin with.

Now with more requests, and a bigger graph, it's still probably possible to de-fuzz this; but my point is that in practical terms - when you have exact lengths, you just don't need to this. With exact lengths, you can ask a junior SDE to code this up for you in a few weeks. Someone who barely understands tcpdump and mysql can build this. With blocks and /some/ length hiding, the same person finds that they get less signal and have to correlate between more requests and pages, it might not even be practical for many sites. On principle, I think it's dumb to lower costs for practical attacks like this.

tptacek · on Feb 26, 2019

Whatever it is this take is, it's not boring. :)

bradknowles · on Feb 26, 2019

Okay, so I'm assuming you can help the protocol designers make the appropriate improvements in TLS 1.4, or whatever the next major version gets called.

In the meanwhile, what about those of us actively trying to deprecate TLS 1.0 and 1.1 and support exclusively TLS 1.2+, and with certain preferred cypher suites?

wahern · on Feb 26, 2019

Maybe? I believe with enough traffic you can determine the actual content size by calculating the average apparent padding length. (Compare JavaScript timing side channel attacks that can remove random clock jitter, or Onion Routing traffic analysis.)

What you really want is padding to a fixed-length block. That requires restricting the maximum allowed length of confidential fields. (Could vary per field.) Yet such "arbitrary" limits have been frowned upon for ages as poor design.

That said, there's usually nothing stopping people from imposing their own limits and padding accordingly, server- and client-side. But without enforcement across the entire software stack it's difficult to maintain the invariant.

tialaramex · on Feb 27, 2019

> At least with TLS1.3 and QUIC we can pad encrypted data and get a measure of length-hiding back; enough to at least raise costs for passive fingerprinting attacks (and raise those costs more than CBC ever could on its own).

I have to say this feels like both the more actionable and more insightful output of your... heresy.

We are not going back to CBC. You say that some CBC attacks don't seem practical. The trouble with the Web is it's like some lunatic deliberately designed a system to be attacked. Oh you have a chosen plaintext attack that needs 1 million repetitions? Impractical. Wait, Javascript exists. OK, but even with Javascript who'd ever have a system that terminates HTTPS for completely unrelated parties on the same physical hardware - there's no way we'll get to exploit that? Oh, bulk hosting exists. OK. Still nobody would build a crypto system in which one party is just completely anonymous all the... oh wait, that's exactly how Tim's system was designed. Eric Rescorla has some great slides about this.

Whereas "Hey, let's pad every message to hit the path MTU size for packets" totally feels like something vendors could do next week and make a huge impact with no real cost. A lot of network devices can't hit their target speeds for small packets anyway, so the impact may be not merely negligible but literally non-existent.

I have to say though, I'm not actually clear on why we couldn't pad TLS 1.2's AEAD modes to the same effect?

The TLS 1.3 padding is _clever_ but what does it do here that makes it more important for your passive fingerprinting attack than the older padding scheme (with an AEAD cipher)?

dorkusmcgavin · on Feb 26, 2019

AES is a block cipher that operates on 128-bit blocks exclusively. If you look at the implementation of AES-GCM, it also operates on 128-bit blocks. AES-GCM is basically just counter mode with built in correct MAC handling.

Your claim that AES-GCM is length preserving is completely false, other than the obvious multiples of the block size which is the same as CBC mode or any other mode.

Your statement isn't heretical, it's just based on a false statement. If the premise of your statement were true, your extra analysis would be correct.

I would read up on the details of the mode itself, this graph is a good place to start then move on to the mathematical section which talks about the 128-bit blocks:

https://en.wikipedia.org/wiki/Galois/Counter_Mode

tptacek · on Feb 26, 2019

First, you might want to look closer at who you're talking to.

Second, you might want to read a little more about how counter mode works.

Or, why not just try it yourself? "pip install cryptography", pop open a Python shell, and encrypt some stuff. There's only so much you can get from Wikipedia. Because: counter mode doesn't create ciphertexts that are multiples of the block size; not doing that is the point of counter mode.

    >>> from cryptography.hazmat.primitives.ciphers.aead import AESGCM
    >>> gcm = AESGCM("\x00" * 16)
    >>> len(gcm.encrypt("\x00" * 12, "YELLOW SUBMARINE", ""))
    32
    >>> len(gcm.encrypt("\x00" * 12, "YELLOW SUBMARINES", ""))
    33

dorkusmcgavin · on Feb 26, 2019

I opened up my book of cryptography from Jonathan Katz, and found this in regards to CTR mode:

> As with OFB mode, another "stream-cipher" mode, the generated stream can be truncated to exactly the plaintext length

This is contrary to what Jonathan Katz told a class taught from the University of Maryland which I took, which is odd. I've used CTR mode before and exploited CBC padding oracles, I don't recall being able to use CTR mode this way but I rarely used it since we focused on CBC exploitation and a bit too much on CPA/CCA proofs. After looking at the diagrams shown, it's clear that since the message is XOR'd on the output of the generative random of the encryption function, CTR mode can indeed be truncated since the remaining generative output can be ignored. So I had a misunderstanding in my head on CTR mode.

Now on to GCM. Unfortunately my edition of Katz' book doesn't include GCM so I have to default to Wikipedia. The last XOR is more than likely where a truncation can occur, so I was wrong about GCM mode as well.

As for Python tests (package already installed, no need to use PIP when it's in the various Linux repos):

    from cryptography.hazmat.primitives.ciphers.aead import AESGCM
    import os
    key = AESGCM.generate_key(256)
    gcmtest = AESGCM(key)
    nonce = os.urandom(16)
    len(gcmtest.encrypt(nonce, b"testbytes1", b"ASD"))
    len(gcmtest.encrypt(nonce, b"testbytes12", b"ASD"))
    len(gcmtest.encrypt(nonce, b"testbytes13", b"ASD"))

This outputs 26, 27, 28 respectively.

While this proves your point, I want to make clear that ignoring the reason and just trusting the output of an implementation isn't really a good way of learning things (although complimentary). I was using the Wikipedia and text references so I could understand why it allowed variable length, and at my first look the construction didn't appear like you could truncate.

Despite all of this, the CTR-mode section of the book includes a CPA-security proof and the CBC section says it is vulnerable to CPA. I'm going to try to dig through that to see why. If they are cognizant of the fact that same length attacks are something that makes you vulnerable, there must be a reason why they believe CTR/GCM are not.

tptacek · on Feb 26, 2019

You did not exploit a CBC padding oracle in CTR mode, because CTR mode isn't padded. Again: that is the point of CTR mode: to transform a block cipher (or, for that matter, a keyed hash function) into a stream cipher.

GCM is built on top of CTR; CTR is literally in the name. Like CTR, GCM is a stream cipher mode. There is no GCM padding.

The output of the code you pasted should be 26, 27, and 27, not 28 (as it is on my system). "test12" and "test13" have the same length, and consume the same amount of CTR keystream.

dorkusmcgavin · on Feb 26, 2019

You seem hell bent on making me look bad based on a statement I made which I've already conceded was incorrect. I have no ties to this account to my actual identity and no intention of doing so, so I have no idea why you'd care about anonymous dorkusmcgavin's reputation which will never take off by definition anyway.

This is sort of why I don't like tying my identity to any accounts on the internet and don't engage in academia. When you do, you have to dig in deep on your statements and then end up with meaningless ad-hominem attacks on identity rather than seeking truth to the discussion. This is increasingly happening all over the internet, too, so I might just drop off of it entirely and just go back to reading for my own sake.

> You did not exploit a CBC padding oracle in CTR mode, because CTR mode isn't padded.

Correct, I have never done that nor have I claimed to. I said the class focused on CBC padding oracles and I exploited CBC mode in that way. I also stated that I've used CTR mode before. These were two different statements. This is so far off-topic that I don't understand why you care.

> GCM is built on top of CTR; CTR is literally in the name. Like CTR, GCM is a stream cipher mode. There is no GCM padding.

You are literally responding to a comment where I show there is no padding and concede you are correct here.

> The output of the code you pasted should be 26, 27, and 27, not 28 (as it is on my system). "test12" and "test13" have the same length, and consume the same amount of CTR keystream.

It was supposed to be testbytes123, and I copied from an IPython window where I had typed both out. I apparently not only typed it in incorrectly the first time when entering it in and corrected it, but then copied and pasted the wrong line when going back to this window.

Look, I'm trying to get to the bottom of the original debate here, I don't really care too much about my copy paste errors or stating for a ninth time that my original comment was incorrect because it's been awhile since I went through these courses. I want to learn things, not have an ego fight. Let's move on.

The original topic being, why exactly is AES-GCM considered more secure than CBC by expert cryptographers?

I've been re-reading parts of Katz' book and I think I can come up with my interpretation, but I'm trying to find a section where he mentions this (because he comes very close in a few sections when discussing the classes of proofs themselves and also of his old-timey stories). I also think he could word it better and his reasoning might be different than my theory.

I think the point is that if you can choose the length of the attack, it doesn't matter if it's grouped into clusters of X-bits or not as long as the message is at least of a certain length (obviously encrypting the string "true" and "false" you'll be able to tell the difference if you know the context). If you can choose the length of the attack, then you'll most likely always be able to determine the length of various other encrypted information in the resulting ciphertext, it just becomes slightly more work.

So say I have X-bits of plaintext that I control of a plaintext that includes other text before being encrypted (HTTP as the example here). If it's padded, it might output Y-bits of ciphertext. If you then do (X+1)-bits of plaintext, you could again get Y-bits of ciphertext again. This can be perceived as an advantage, but if the attacker then pads the plaintext with zeroes at the beginning and incrementally passes it to the server, the (X+2)-bit, up until (X+Z)-bit output will eventually tick over to the next block, and then you have (Y+blocksize)-bit ciphertext. At that point, they can hang there at that specific plaintext length and then use that to perform the same length-based attack as without padding.

I'm going to keep digging because this question is really fascinating to me right now. Katz does have sections where he mentions fixed-length encryption mechanisms where the ciphertext is defined at a certain length, so I'm looking around there right now because I have a feeling he might probably somewhere state that any stream-cipher (which he includes CBC-mode as a stream cipher) is default less secure period. He might even have a proof somewhere on that.

The point of the padding in CBC mode is simply because it's necessary based on the way the plaintext moves into the encryption function. Padding oracles are dangerous because they exploit the type of padding, not because they exploit the length. They also do so in a way that allows you to decrypt entire blocks. IV-reuse can bump into a similar block border issue as I was describing above, too.

I will keep reading.

tptacek · on Feb 27, 2019

This is a lot of text responding to some pretty simple facts. The only value judgement I've made here is that you might want to look up who people are before you decide to tell them they're spreading false statements (that's with respect to Colm, not me; you can say whatever you'd like about me.)

The rest of it is just: here are things I think you might want to know. Read them again: none of them say you're a bad person. They're just statements of fact.

GCM is "more secure" than CBC because it's authenticated (technically, authenticated with additional authenticated data). CBC is not authenticated; it's malleable, so you can violate integrity and, with CCAs like Vaudenay's oracle, confidentiality as well.

You can separately authenticate CBC with a MAC (usually HMAC); that's the "generic composition" of CBC and HMAC. Generic composition is generally believed to be secure in the encrypt-then-MAC construction, but, as we're seeing with TLS here, not the other way around.

In neither a generic ETM CBC+HMAC nor in GCM should control of any plaintext anywhere in the message allow you to violate cryptographic integrity or confidentiality. But CBC+HMAC != CBC, bringing us back around to our point here.

Padding oracles don't really exploit PKCS7 pading itself so much as they exploit the decryption process of CBC, coupled with the ability to choose ciphertexts (again, a capability GCM takes away from you).

dorkusmcgavin · on Feb 27, 2019

> This is a lot of text responding to some pretty simple facts. The only value judgement I've made here is that you might want to look up who people are before you decide to tell them they're spreading false statements (that's with respect to Colm, not me; you can say whatever you'd like about me.)

You are correct that I was wrong with what my statements were, but I don't care if it's Jonathan Katz who's spreading misinformation, if I believe someone is spreading misinformation I'm going to call them out on it. I was WRONG this time (again have to admit that?), but am trying to address the original question. Half of that "large text" was new information that you've distracted the topic away from.

And you've done it yet again, changed the topic.

My offtopic sub-point here is that you are taking large quantities of information out of context for no reason other than to change the subject away from actual information search.

> But CBC+HMAC != CBC

Here's what Colm said:

> I know that I'm going to be thought a heretical fool for saying this, but overall I would consider CBC mode "more secure" than GCM.

OMG he said CBC mode (NOT CBC-HMAC!) is "more secure" than GCM! Go call him out on that one sentence taken out of context!

Obviously we are talking about CBC-HMAC because that's the topic. Colm started it on that topic, and if you look at his entire post, the context makes that clear. If you instead focused on the fact that I'm talking in the same context, you wouldn't get off on a tangent that somehow I think that Colm believes CBC mode without an HMAC is somehow more secure (he doesn't, obviously). You are creating a strawman for no reason. You are right, you aren't attacking me. You are conducting half of a strawman argument. You've created a strawman, and just left it there. It's weird. It changes the topic. It serves no purpose except to derail the conversation.

Colm is arguing that CBC with a MAC is more secure than GCM, but if you take the strict way you are interpreting my statements, you would not believe that because you are applying a double standard to me. Colm apparently is allowed to make contextual statements but I am not.

Again, back to the original topic I'm trying to figure out why the encryption experts believe the opposite, which he also says is the overwhelming majority of opinion, and why we are even conversing in the first place. I doubt I can keep this on topic anymore because you keep drifting it off for quite literally no reason.

Screw it. I'm going back to reading the book. I will not share my findings with you because you have already ignored my other findings and theories and it will just fall on deaf ears.

pvg · on Feb 27, 2019

[...]spreading misinformation, if I believe someone is spreading misinformation I'm going to call them out on it. I was WRONG this time (again have to admit that?)

Maybe you're bringing something to the conversation that isn't inherently there? Someone could be just wrong without trying to 'spread misinformation' and cowering in fear of your implacable fist of callout justice. Similarly, someone could just be pointing out you're wrong without a diabolical plan to force you into some embarrassing public recanting and ruin your reputation.

dorkusmcgavin · on Feb 27, 2019

Diabolical plot? You are insane to believe I think that.

Spreading misinformation isn't a conspiracy, everyone does it. Hell I just did it. That doesn't mean it should be left unchecked.

There's nothing embarassing about what I posted. I also have no reputation on an anonymous account.

tptacek · on Feb 27, 2019

I am lost. What is your complaint at this point? You went on HN, participated in a thread, and now know more about cryptography than you did this morning. You're welcome? :)

tedunangst · on Feb 26, 2019

What?