
HTTP headers we don't want - kawera
https://www.fastly.com/blog/headers-we-dont-want
======
buro9
Via is not safe to remove and Fastly know this as well as Akamai, Cloudflare
and others.

A very cheap attack is to chain CDNs into a nice circle. This is what Via
protects against: [https://blog.cloudflare.com/preventing-malicious-request-
loo...](https://blog.cloudflare.com/preventing-malicious-request-loops/)

Just because a browser doesn't use a header does not make the header
superfluous.

~~~
randomdrake
What a terrible stance for a company like Fastly to take:

 _More debatable perhaps is Via, which is required (by RFC7230) to be added to
the response by any proxy through which it passes to identify the proxy. This
can be something useful like the proxy’s hostname, but is more likely to be a
generic identifier like “vegur”, “varnish”, or “squid”. Removing (or not
setting) this header is technically a spec violation, but no browsers do
anything with it, so it’s reasonably safe to get rid of it if you want to._

Actually, it isn’t “debatable,” since the debate occurred, and a decision was
made, and published. That’s what RFCs are for.

To ignore them with such wanton disregard speaks volumes.

Edit: to clarify, I didn't mean that RFCs should not be debated at all, only
that disregarding this because "no browsers do anything with it" didn't seem
like a good justification or stance.

~~~
tcd
Not really. Standards are nice, but as time goes on, things change, and we
should NEVER only change things 'once a standard says so'. The web is an ever
evolving platform, and standards are loosely respected these days anyway.
Heck, browsers aren't a standard themselves!

~~~
SahAssar
By that logic there isn't any point to standards at all. If we all are
supposed to ignore them when we feel like it then what's the point of having
them at all?

If there is a standard published for something, follow it or publish your own
RFC. Don't just nitpick the bits you want and break clients in the process.

~~~
tptacek
They're provided as guidance. They aren't some kind of internet law. Sometimes
contravening standards is harmful; sometimes it's helpful. It's not productive
to point at them as if they were dispositive in debates.

~~~
quickben
Not really.
[https://tools.ietf.org/html/rfc791](https://tools.ietf.org/html/rfc791) ^ Was
burned in hardware all over the planet.

~~~
tptacek
People play games with IP all the time.

~~~
scrollaway
Sure, but there isn't a big company with clout saying "Hey, you should deviate
from RFC791".

I agree that standards can and should be replaced/amended to over time, but I
kinda see what GP's point is getting at.

~~~
pvg
Dropping Via from a response is roughly in the same category as ignoring the
TCP Urgent flag. Most widely used standards have vestigial bits.

~~~
AstralStorm
A lot of hardware silently edits TOS flags, causing trouble with ECN and diff-
serv...

Ignoring is usually fine. Dropping is not.

------
phyzome
Saying that a header is useless because it has been deprecated and displaced
by a newer header is... misleading at best.

If all you ever code for is the latest version of Firefox and Chrome, you
might not understand this, but there's a _whole world_ out there with an
astonishing diversity of browsers. (Also, your site is bad and you should feel
bad.) Removing X-Frame-Options without first checking if 99.99% of your users'
browsers support Content-Security-Policy is just asking for increased risk.

------
ShaneWilton
Most of the suggestions in this post are great, but as always, especially when
security is involved, you need to assess your business needs yourself.

The suggestion to use Content-Security-Policy over X-Frame-Options is great --
if you don't expect many of your users to be using IE-based browsers. If
you're primarily serving large enterprises or government customers though,
it's likely that most of your users will still be coming from a browser that
doesn't support Content-Security-Policy.

~~~
Ajedi32
But interestingly, they deem `x-ua-compatible` "useful" even though AFAIK
that's also only needed for backwards compatibility with IE.

------
Hamuko
P3P is unnecessary until you have clients complaining that Internet Explorer
users cannot use the site and it's hurting their business. I speak of
experience.

Curiously enough, P3P enforcement depends on the operating system and not on
the browser. Internet Explorer 11 may or may not care about P3P depending if
you're on Windows 7 or Windows 10.

~~~
pfarrell
Came here to say the exact same thing. P3P may be "officially" obsolete, but
if your business wants older browsers to be able to handle your code, you're
going to have to deal with it.

If you have the misfortune of encountering it, you can get really hard to
detect bugs with ajax calls or script files not getting loaded in IE when you
don't have P3P set up correctly. (for instance:
[https://www.techrepublic.com/blog/software-
engineer/craft-a-...](https://www.techrepublic.com/blog/software-
engineer/craft-a-p3p-policy-to-make-ie-behave/))

------
justinsaccount
cache-control doesn't completely replace Expires for some use cases.

If you have a scheduled task that generates data every hour, you can set
Expires accordingly so all clients will refresh the data as soon as the hour
rolls over.

You can do this using max-age but then you have to dynamically calculate this
header per request which means you can't do things like upload your data to s3
and set the cache-control header on it.

With expires, I can upload a file to s3 and set

    
    
      Expires: ... 17:00
    

and then not have to touch it again for an hour.

you can work around this client side with per hour filenames or the other
usual cache busting tricks, but that's annoying.

~~~
AbacusAvenger
It seems like kind of an unlikely scenario that you'd want to expire content
at a specific time. I mean, if someone chooses to do that, they better know
what the impact could be.

With the Expires header, all clients that retrieved that content would expire
at the exact same time, which could cause some disproportionately high load in
the few seconds after that (the "thundering herd" problem). The Cache-Control
solution will stagger the expirations (relative to when the client last
retrieved it) so the server doesn't get trampled.

~~~
AstralStorm
Congratulations, your infrastructure stability now depends on particular web
browsers and caches implementing their caches and headers correctly...

It takes just one big bad actor to break. Reminds me of certain routers
damaging NTP traffic.

~~~
AbacusAvenger
That's a cynical view, and I don't think I said you should _depend_ on Cache-
Control working. Yes, there will be bad actors, but the majority of clients
are good actors. It's just one of _several_ measures you should take to even
out the load.

Of course you'd want a caching layer in front of the server doing the actual
work, but it's still possible to "thundering herd" the cache server if you use
an Expires header. Even if the herd doesn't hurt your backend server, it can
still make the load on your caching frontend servers spike at specific time
periods with every good actor refreshing the content at the same time. So it's
still ideal to _try_ and even out that load with Cache-Control.

~~~
ponyfleisch
The use case of having hourly updated data (e.g. weather data) on an S3 bucket
behind a CloudFront distribution is not that niche.

Thundering herd may or may not be an issue depending on the amount of traffic
you normally get, the architecture of your backend (e.g. AWS Lambda or S3
which can most likely deal with this easily) and the primary purpose of your
CDN usage (e.g. caching data closer to the users for faster delivery world
wide rather than reducing back end load).

------
daxterspeed
I really wish the browser vendors would come together to establish a plan to
clean up User-Agent. It's one of the worst offenders in header legacy[1] and
fingerprinting. Exposing what browser I am using and it's major version is
fine but I don't think every website I visit deserves to know what OS I am
using, nor the details of my CPU.

[1] [https://www.nczonline.net/blog/2010/01/12/history-of-the-
use...](https://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-
string/) (2010, though little has changed since then).

~~~
gcp
Browser vendors can't clean up User-Agent because the websites sniff it and
break if it's "wrong" (for any random value of wrong).

I'm sure there's a Bugzilla bug about the "X11; Linux x86_64" in the headers,
and I'd be terrified to open it.

------
dewiz
Client HTTP header I don't want:

    
    
      * referer
      * user-agent
    

Happy to be wrong, but these shouldn't be mandatory to browse the web, which
they _kind of_ are.

~~~
gtirloni
I too would like a world where all browsers implemented fully the same
standard but that's not how it worked out (or it ever works).

This is an amusing (scary?) article about the history of the user-agent:

[https://webaim.org/blog/user-agent-string-
history/](https://webaim.org/blog/user-agent-string-history/)

~~~
maxk42
Yes but you should be testing browser capabilities not user agents.

~~~
mattmanser
The requires more steps and a slower process. User-agent is a one step
process. Browser capabilities means returning something back to the browser
and potentially coming back to the server.

While it has obviously been abused, neither way is ideal. There's no way for a
server to say "tell me the browser capabilities before I serve you the
request".

------
_ZeD_
>>> Vanity (server, x-powered-by, via)

gosh, no.

server is no vanity, server is needed to know WHO THE HELL responded you (we
are in a very messy cdn selector _s_ \+ cdn _s_ \+ application layer _s_
depending on non obvious rules on (sub)domain and cookies).

~~~
AstralStorm
That is supposed to be handled by Host header. Server etc. provides at most
redundant debugging info.

~~~
speleding
The Host header is in de request, it should not occur in the reply (as the
article states)

------
dijit
Speaking of HTTP headers. One I wish more people would use is Accept-Language
instead of region/geoip based localization. Practically every site I've come
across ignores this header in favour of geoip with the weird and notable
exception of Microsoft exchange webmail and Grafana.

~~~
Avamander
Yes, please! Is there some catch I don't know why people aren't relying on the
header to determine the language served? Because if not I don't get how
geoIP/region is used so widely.

------
ggg9990
I get that this is data that Fastly has to send but doesn’t get to bill
directly to customers, but don’t expect ME to care about this until the
average news article stops sending me 10 MB.

~~~
manigandham
Fastly is a CDN that charges by requests + bandwidth, so it absolutely makes
money from extra headers on responses no matter how small.

~~~
ggg9990
I don’t know how they bill. If it is just the Content-Length then they eat the
cost of the header.

~~~
ggg9990
Also, I can’t see any other reason they’d care. Who does the header harm?

~~~
manigandham
You seem to be taking this way too critically. It's a simple article that's
looking at the typical headers in responses and showing which ones probably
are outdated or unnecessary. If you have 10 hits per day, it doesn't matter.
For others that send billions of requests, it might just make a material
difference.

------
Steeeve
I wouldn't trust this entry at all. The author did not do proper research to
understand the why's behind the headers that he didn't understand or didn't
know well enough.

------
LinuxBender
They list "date" as being required by protocol. This is not true. The term
used in the RFC is "should". It is a nice to have, for additional validation
by proxies.

In haproxy, you can discard it with:

    
    
        http-response del-header Date

~~~
Rjevski
Just curious, what would a proxy do with such a header?

~~~
LinuxBender
Proxies used to (and some still do) compare last-modified and date, if the
date header is present. [0] They are not required to trust this header as
accurate.

For reference around and clarification around the Date header, the "should"
comes from the loophole that nobody is required to have a time source. The
previous RFC's made that harder to understand, as the loophole was in another
section.

[0] [http://devel.squid-cache.org/rproxy/dateheader.html](http://devel.squid-
cache.org/rproxy/dateheader.html)

[1]
[https://tools.ietf.org/html/rfc7231#section-7.1.1.2](https://tools.ietf.org/html/rfc7231#section-7.1.1.2)

~~~
rkeene2
I believe you have mis-interpreted Section 7.1.1.2 of RFC 7231, specifically
it is identical to RFC 2616 Section 14.18 in that a Date header MUST be
included except for 3 exceptions. They have listed the 3 exceptions first and
also the wording that includes "SHOULD" which defines when the origin server
should compute the date, but notwithstanding those notes it still notes that
the Date header is mandatory for an origin server: "An origin server MUST send
a Date header field in all other cases." (where other refers to the 3
exceptions -- HTTP 500-class errors; HTTP 100-class message-less responses;
and no-clock systems)

~~~
LinuxBender
I understand what you are saying, but I am not required to have a time source,
which makes the entire requirement optional.

~~~
rkeene2
A time source isn't required, a clock is. Further, if the origin server does
not have a clock, any proxy (such as HAProxy) is still required to add the
Date header if it has a clock, as if it were the origin server. In practice,
there are very few functional systems without clocks.

------
torstenvl
Oh God. No. Expires and Pragma are absolutely essential if you're writing a
web app to be used by folks stuck behind a walled garden proxy implemented in
the dumbest way possible.

------
sqldba
Step 1: Complain, "Nobody follows the standard."

Step 2: Advise, "This is part of the standard but ignore it because it's
pointless."

------
prashnts
Interesting that their blog itself has the headers they deem unnecessary...

    
    
        Server: Artisanal bits
        Via: 1.1 varnish,1.1 varnish
        X-Served-By: cache-sjc3150-SJC, cache-cdg8748-CDG

------
yeukhon
First, we should fix user agent. Time to dump that historical baggage.

------
brobinson
>P3P is a curious animal.

This was a requirement to have IE6 accept third party cookies from your site.

~~~
synhare
This part of the article really threw me off. Someone writing an article on
HTTP headers for a major CDN has never had to deal with IE6?

~~~
barneygale
Are people really still dealing with IE6? I gave up web dev almost a decade
ago and it's disturbing to hear that IE6 is still an issue!

~~~
zbentley
Healthcare and government (US). So, so very many systems are on IE6. So, so
very many websites only work correctly/fully when end users are on that
platform. Until you've had to support code distributed by the US federal gov't
and watch the percentages of users hitting your site from XP ( _or earlier_ )
UAs rise to the double digits, you have not known sadness.

~~~
gboudrias
Also all of China or so I was told about a year ago.

------
Theodores
It would be helpful to have a guide to this for people running a 'low audience
website' where there is no CDN or Varnish, just some Apache or Nginx server on
a slow-ish but cheap VPS.

For a local business or community, e.g. an arts group with a Wordpress style
site, there are many common problems, they might not need a full CDN, just
serving media files from a cookieless subdomain gets their site up to
acceptable speed cutting the header overhead considerably.

Purging the useless headers might also include getting rid of pointless 'meta
keywords' and what not.

The tips given here could be really suited to this type of simple work to get
a site vaguely performant. How to do it with common little guy server setups
could really help.

~~~
CapacitorSet
Realistically, how much traffic is saved by cutting headers? A simple article
like [this]([https://tp69.wordpress.com/2018/04/17/completely-silent-
comp...](https://tp69.wordpress.com/2018/04/17/completely-silent-computer/))
(currently on the HN frontpage) weighs 178 KB, and that's without external
resources. Unused headers account at best for 0,1% of the total traffic.

~~~
ovao
One could argue that the headers comprise a very important 0.1%, but any
wasted time the client spends waiting for and parsing headers will almost
always be utterly dominated by the unavoidable wait for HTML parsing,
JavaScript parsing, painting and so on.

I could see the argument for pruning useless headers if, say, the method for
generating them relied on some high-latency database call or filesystem
access, but that would rarely be the case.

------
nebulous1
The details are interesting but "adds overhead at a critical time in the
loading of your page" ... this seems pretty unlikely to have any noticeable
processing overhead. Doing things better is generally good, but this all seems
very low impact.

~~~
__jal
Depends on where you measure it. A client on a decent connection will never
notice. If you're serving billions of hits, 20 bytes in a header is something
you will definitely notice on your bandwidth bill.

~~~
tgsovlerkhgsel
More importantly, an extra header may push you into needing another packet,
which creates extra potential for packet loss.

~~~
dijit
That's a very weak argument to make though when regarding the modern web;
sites are routinely sending me multiple megabytes of content; optimising 20
bytes isn't going to make even the smallest dent if you're trying to pack your
site down.

~~~
__jal
I'm sorry, but that's simply not reflective of reality. There's a reason Yahoo
wrote yslow.

20 bytes times billions of requests is absolutely an optimization target; in
fact, this is a really easy and low-hanging one at that.

Sure, there's also a ton of garbage sites shoving piles of garbage around, but
the existence of fast food doesn't mean nice restaurants don't exist.

------
lopmotr
I got stuck with a website once that was using one of the compression headers
- maybe content-encoding to indicate that it's .gz files were gzipped even if
the client didn't indicate it supported it. Some browsers would ignore it and
just download the file, but others would unzip it. So you got a different file
depending on what browser you used! I think wget and chrome behaved
differently from each other. I wrote to the site operator who corrected it.

------
dorfsmay
"no-cache" doesn't prevent caching. "no-store" does.

Cache-Control: no-cache, no-store, must-revalidate

Mozilla recommends the following to prevent cachine:

Cache-Control: no-cache, no-store, must-revalidate"

[https://developer.mozilla.org/en-
US/docs/Web/HTTP/Headers/Ca...](https://developer.mozilla.org/en-
US/docs/Web/HTTP/Headers/Cache-Control)

------
dandare
Please use a monospaced font when displaying numbers in a table. Otherwise, it
is hard to tell which number is 10x bigger than the neighbor.

~~~
ygra
No need for a monospace font; tabular numerals should suffice.

------
chrisweekly
Bit of a tangent, but Fastly's CTO gave a terrific talk I attended about a
year ago, titled something like "Why load balancing is impossible". My career
in consulting has led to a gradual diffusion from my earlier focus on front-
end performance optimization, but Fastly retains credibility in my book on a
number of fronts.

------
avip
In short
[https://caniuse.com/#feat=contentsecuritypolicy](https://caniuse.com/#feat=contentsecuritypolicy)

------
itajooba02
Good article to read, Learned something new today regarding HTTP headers

------
jiveturkey
fairly poor compared to cloudflare blogs

------
n00bi3s
This is a great article!

------
JeremyBanks
This grey-on-white is hard to read. I gave up on the article.

~~~
manigandham
It's more of the skinny font weight, which is somewhat of an unfortunate
hipster design trend that only looks good on high-res retina macbooks.

