Certainly this logic doesn't suggest using CBC instead of an AEAD for a new, non...

colmmacc · on Feb 26, 2019

That's a tough call. We've thoroughly learned that not only should developers not build their own cryptographic primitives, but that they shouldn't even trivially combine them on their own. "Use AEAD" (or better yet: use noise) is good advice for application and protocol designers who on their own probably wouldn't even think about the problems ... but it's not sufficient.

We haven't seemed to learn that length hiding, traffic analysis, and shared compression dictionaries are all so firmly in the realm of what encryption is supposed to do ... make content secret ... that developers also can't be trusted to get these right on their own.

So if you're in the position to telling someone to use AEAD instead of CBC, I'd make sure to also tell them that they should think hard about length hiding, message timing, and shared compression dictionaries, and whether those are important concerns in that environment. Oh and they still have to watch out for IV problems with plenty of the AEAD implementations.

Personally I long for some better crypto boxes than "AEAD" and I should really get off my butt and define one; at the level of total hand-waving, I trust cryptographers more to review this stuff much better than protocol designers.

Like here I am claiming that TLS, the most pervasive network encryption protocol, has actually gotten measurably less secure at encryption, by at least one important measure, in the last 5 years.

tedunangst · on Feb 26, 2019

Measurably? I think you have a point about length hiding, although many models seem to just dismiss it, but I'm not sure the accidental padding you get from CBC is measurably better.

Scenario 1: https, downloading a file. Plus or minus a few bytes is still accurate enough for identification, no?

Scenario 2: ssh, where one keystroke equals one packet, regardless of length. I can still hear your password.

Did you have a scenario in mind where CBC padding provides a meaningful degree of length obfuscation?

colmmacc · on Feb 26, 2019

The case I think about the most is web fingerprinting. For example if the authorities in a country that funnels all external traffic through a point of mediation want to identify who is reading certain specific pages on wikipedia - they likely can.

The most effective forms of web fingerprinting focus on static resources; the two most prominent papers focus on Google Maps tiles, and Netflix videos, which were both collections of static resources that were compressed offline. Static resources are easier to target because the size of response is fixed, and the domains they're hosted on tend to be relatively free from cookies, auth tokens, query strings, and other noise.

In practical terms for attackers, it's easiest to focus on the images, CSS and javascript that your browser is downloading; since these are usually static, pre-compressed, and on CDN-hosted domains. This is pretty feasible against Wikipedia, but also Facebook shares, tweets, etc ... anywhere where you're viewing content that includes images or video.

O.k. so far nothing controversial, nobody would even get much of a paper published about this, we bucket it under "Traffic Analysis" and while it might shock the average internet user, cryptographers aren't surprised.

Where CBC vs GCM comes in is that GCM reduces the "cost" of these fingerprints and increases the effectiveness of the whole approach. If a webpage contains say 4 images, and an external CSS reference, that gives you 10 points of information to fingerprint. You get 5 request lengths and 5 response lengths. Given the spread of URL lengths and file sizes, it's pretty easy for that combination to be unique, or at least narrow things down to a small set of possibilities. With CBC enabled, each request length and response length is rounded to the nearest 16-bytes. That's not a huge amount of padding, more would be better, but it still reduces the uniqueness of the fingerprinting by an exponential factor. It's particularly effective on the request lengths, which are small to begin with.

Now with more requests, and a bigger graph, it's still probably possible to de-fuzz this; but my point is that in practical terms - when you have exact lengths, you just don't need to this. With exact lengths, you can ask a junior SDE to code this up for you in a few weeks. Someone who barely understands tcpdump and mysql can build this. With blocks and /some/ length hiding, the same person finds that they get less signal and have to correlate between more requests and pages, it might not even be practical for many sites. On principle, I think it's dumb to lower costs for practical attacks like this.

tptacek · on Feb 26, 2019

Whatever it is this take is, it's not boring. :)