
Designing Headers for HTTP Compression - fanf2
https://www.mnot.net/blog/2018/11/27/header_compression
======
Bucephalus355
HTTP Headers can do amazing things, it’s unfortunate they are so overlooked. I
would say in security, where I work, good headers are the backbone of good web
application security.

They are:

X-XSS-Protection

X-Frame-Options

Content-Security-Policy

Strict-Transport-Security (HSTS)

X-Frame-Options

Expect-CT

Feature-Policy

All of these can be set very quickly in your Apache2.conf.

------
userbinator
As much as I like using netcat and friends to interact with HTTP servers, I've
always found the text-based protocols to be needlessly verbose and oriented
towards a minority use-case. Something like ASN.1, which is very widely used
in the telecommunications industry for protocols like GSM and such, does not
have the same bloat problem yet remains extensible.

~~~
jimktrains2
Isn't it notoriously difficult to write a sane, not exploitable asn.1 parser?
The openssh didn't want to use x.509 certs for say because they considered
asn.1 too risky and difficult iirc.

~~~
tialaramex
It is notoriously difficult, but to be fair a big part of that notoriety
arises in the era when you'd have written this in C, as indeed OpenSSH is
written in C. You obviously should not write such software in a language as
unsafe as C today.

The Distinguished Encoding (DER) used for X.509 forces everything to be
unambiguous, so in principle in a language where you can translate that lack
of ambiguity into code and not inadvertently have it execute the raw data if
you have an off-by-one error this seems no more dangerous than the present use
of ASCII. It's just that C has fewer obvious sharp corners when it comes to
parsing ASCII and programmers who work mostly in C knew where they were.

On the other hand, X.509 is also a scary choice because it's a thicket of
options. One of the biggest pieces of work done in the Web PKI has been
telling Certificate Authorities not to issue stuff with random options filled
out, because that's yet more code surface area to either cause
interoperability issues or worse, security problems. For the former, an
example is Let's Encrypt. You may notice today your Let's Encrypt intermediate
is named X3. What happened to X1 and X2? Well they are created in pairs, so
that's why there were more, but it doesn't explain why we're not still using
X1. The answer there is that X1 contains a feature called a negative
(prohibitive) Name Constraint. This is a way to say in X.509 that the issuer
(in this case "DST Root CA X3" operated by Identrust) does not permit this
intermediate CA to issue for specific names. Let's Encrypt had a constraint
forbidding the US military TLD .mil

But Windows XP doesn't understand this optional X.509 feature, if it sees any
name constraint that it doesn't understand it concludes that all names are
forbidden. So having this constraint meant Let's Encrypt was useless for Win
XP clients. Identrust agreed to have Let's Encrypt simply obey this as a
policy (and then they subsequently abandoned the policy) to enable XP
compatibility.

So, avoiding X.509 might have been a sensible choice in OpenSSH but that
doesn't necessarily mean using ASN.1 DER would necessarily be a bad choice in
new systems today. I don't see any reason to use BER or other encodings of
ASN.1 in new systems.

~~~
blattimwind
With ASN.1 there are also [the relatively new] OER, whose encoding rules seem
significantly simpler than the previous ones (but may require transmitting one
or the other bit more).

W.r.t. parser security, I don't think that has been a success, historically,
regardless of format. Few if any parsers for moderately complex formats have
had zero vulnerabilities. If you think of a web server, or an XML library, or
something similar, chances are pretty good it had at the very least one
critical vulnerability related to parsing.

------
danesparza
Am I the only one that thinks 1k of HTTP headers per request is absolutely
bananas? As somebody who traces requests regularly (as part of AJAX or REST
API debugging) I would be hard pressed to see even half that on a request.

Please let me know if there some spec or framework that I'm not thinking of
that passes that much data in the HTTP headers.

~~~
detaro
Open dev tools and browse around a bit and look at the request sizes. For me,
on the request side, User-Agent and Accept-* headers already are ~200 bytes.
Add a long referrer, another 100 bytes. Cache-control/ETag, cookies, ... get
it up to around 1k fairly easily.

~~~
lmkg

      > cookies
    

This is the real culprit right here. This very page on HN that I'm reading
right now, the cookie field is 415 bytes and there's really not much there.
New York Times homepage has 988 bytes and I don't even visit that site.
Reddit? 2055. Boom, that's twice your 1k right there in one field.

------
thresh
It's all fun and games until we realize our web designers put a four megabyte
jpeg on the index page.

~~~
benhoyt
I think this is a really good point. While I like the article's focus on
depth, and thinking about the small stuff is important, saving a few bytes on
headers is going to be blown out of the water by all the images, JS, CSS, and
3rd-party pixels being loaded. "Profile before optimizing" applies here too.

~~~
mcguire
1\. Images are transferred once, not with every request.

2\. Most requests and many responses are all or mostly headers.

3\. Intermediaries.

4\. Header compression is at the protocol level; if it's done wrong, the best
js/css/whatever policy can't do anything to improve it.

------
londons_explore
I think it's really sad that we have to use sub-par compression because we
can't trust TLS to keep our data secure when we use good compression, since
TLS leaks the compressed size to attackers, and when compressed, the size can
depend on the content.

I don't have a solution to that issue, but it seems really fundamental, and
something we should all be on the lookout for ways to solve.

~~~
tialaramex
> I don't have a solution to that issue

You don't have a solution because there isn't a solution.

It's your definition of "good compression" which is leaking the data, not TLS,
if you're willing to let Bill guess what your phone number is, while you agree
to just tell him which digits he gets wrong, Bill can guess your phone number
in no more than ten tries (the tenth will be correct if none of the others
were).

TLS doesn't actually leak the compressed size, it's just that in practice you
will stop the TLS session after transmitting the compressed data, because to
do otherwise is wasteful and if you didn't care about waste you would not use
compression. If you want, you can run TLS with padding to always hit, say, a
multiple of 4 kilobytes per transaction. Now say your "compression" took you
from 3.84kB to 2.16kB and then you padded it to 4kB anyway and oh, wait, this
was worse, why did we bother with compression?

If you have a system with an explicit range of sizes and can tolerate always
transmitting the maximum size, TLS absolutely will mask out the actual size
with padding.

