A very cheap attack is to chain CDNs into a nice circle. This is what Via protects against: https://blog.cloudflare.com/preventing-malicious-request-loo...
Just because a browser doesn't use a header does not make the header superfluous.
Disclosure: I work at cloudflare.
Cache-Control: max-age=0, s-maxage=3600
I mean, Via might be useful as a safety check against mistake, but I'm not getting the security angle.
It's like relying on the From: header in an e-mail.
If CDN A proxies requests to CDN B and CDN B proxies requests to CDN A then those two will DoS eachother fairly quickly.
There is no attacker inbetween to strip the Via, that would be counterproductive to the attack.
Email has this too; the Received: header. If you manage to get a loop between two MTAs going they will detect it by seeing themselves in the Received: header list.
If. Much, if not most, server software is written under the implicit assumption that it will not be under attack.
More debatable perhaps is Via, which is required (by RFC7230) to be added to the response by any proxy through which it passes to identify the proxy. This can be something useful like the proxy’s hostname, but is more likely to be a generic identifier like “vegur”, “varnish”, or “squid”. Removing (or not setting) this header is technically a spec violation, but no browsers do anything with it, so it’s reasonably safe to get rid of it if you want to.
Actually, it isn’t “debatable,” since the debate occurred, and a decision was made, and published. That’s what RFCs are for.
To ignore them with such wanton disregard speaks volumes.
Edit: to clarify, I didn't mean that RFCs should not be debated at all, only that disregarding this because "no browsers do anything with it" didn't seem like a good justification or stance.
If there is a standard published for something, follow it or publish your own RFC. Don't just nitpick the bits you want and break clients in the process.
Of course that isn't what is happening at all. Instead we're having the usual heap of politics and ever-faster update cycles. So I'd agree to say that web standards failed - but not that they were meant as a guidiance in the first place.
I agree that standards can and should be replaced/amended to over time, but I kinda see what GP's point is getting at.
Ignoring is usually fine. Dropping is not.
Only people who like lice eggs nitpick the bits they want. Others pick-and-choose.
Thus it may not be useful to the browser. But the article saying that its usage is debatable in this context is very wrong.
A MUST is a MUST, in my opinion - and too often there are serious issues in (web) communication because people ignore them as they see fit.
In either event - no one is saying that you should wait to change an RFC (or wording in an RFC) until it's fully deprecated and completely not in use - but a lot of people use RFC's for researching issues, especially in areas they are not 100% familiar with. Coming across a MUST and seeing that some software vendor or hardware vendor doesn't follow it is all too common. This delays certain projects by weeks, sometimes longer, and costs the involved companies lots of money. RFC's exist for a reason...because the approved standards are just expected to be followed when creating new things.
At any rate there's lots of things you just have to ignore, drop, reject, and otherwise muck about with in order to run a sane network. These standards are not written with software experience. They're written much in a vacuum and out of touch. This varies widely across RFCs so it might not apply to RFCs you like.
Example of a MUST for SIP and HTTP: line folding and comments in headers. Apart from being crap for performance (so much for being able to zero-copy a header value as just a pointer+len) there's zero legitimate use for these "features" of the syntax. Simply rejecting such messages is in your best interest as a network operator.
Or you can debate them as much as you want, or even publish a new and improved version and perhaps people will decide to follow that instead. Or perhaps they’ll just do whatever they feel like.
There's a really important reason for not sending the Via header in the request: it disables compression, by default, in most major web-servers!
· nginx defaults gzip_proxied to "off", where 'proxied' is determined by the presence of the Via header: http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip...
· The same goes for IIS 7 & IIS 8 via noCompressionForProxies defaulting to "true" - https://docs.microsoft.com/en-us/iis/configuration/system.we...
· Apache's mod_deflate doesn't do this (thankfully).
This has an immediately negative impact on performance and, in many cases, cost: the origin server is sending more bytes over the wire, and network transit is often a non-trivial cost for those on AWS, Azure, et. al.
Akamai also has a post on this: https://community.akamai.com/community/web-performance/blog/...
Note: I used to work at Cloudflare, and believe they (we!) made the right decision here. There are other mechanisms that can be used to detect proxy loops, and there are also cases where customers may "stack" edge network vendors (migration, specific feature needs, application complexity).
PS: Hi David! :)
So is this Fastly article suggesting a different point of view?
Via is a MUST https://tools.ietf.org/html/rfc7230#section-5.7.1
Forwarded is OPTIONAL https://tools.ietf.org/html/rfc7239#section-4
Protecting against loops only works with MUST headers.
Via => proxy/organization name chain where the response came from
X-Forwarded-For => ip address chain where the request came from
If all you ever code for is the latest version of Firefox and Chrome, you might not understand this, but there's a whole world out there with an astonishing diversity of browsers. (Also, your site is bad and you should feel bad.) Removing X-Frame-Options without first checking if 99.99% of your users' browsers support Content-Security-Policy is just asking for increased risk.
Cache-Control (proxies ignoring private usually)
The suggestion to use Content-Security-Policy over X-Frame-Options is great -- if you don't expect many of your users to be using IE-based browsers. If you're primarily serving large enterprises or government customers though, it's likely that most of your users will still be coming from a browser that doesn't support Content-Security-Policy.
Curiously enough, P3P enforcement depends on the operating system and not on the browser. Internet Explorer 11 may or may not care about P3P depending if you're on Windows 7 or Windows 10.
If you have the misfortune of encountering it, you can get really hard to detect bugs with ajax calls or script files not getting loaded in IE when you don't have P3P set up correctly. (for instance: https://www.techrepublic.com/blog/software-engineer/craft-a-...)
If you have a scheduled task that generates data every hour, you can set Expires accordingly so all clients will refresh the data as soon as the hour rolls over.
You can do this using max-age but then you have to dynamically calculate this header per request which means you can't do things like upload your data to s3 and set the cache-control header on it.
With expires, I can upload a file to s3 and set
Expires: ... 17:00
you can work around this client side with per hour filenames or the other usual cache busting tricks, but that's annoying.
On paper our use case should be precisely what you described but even we found expires to be unnecessary.
With the Expires header, all clients that retrieved that content would expire at the exact same time, which could cause some disproportionately high load in the few seconds after that (the "thundering herd" problem). The Cache-Control solution will stagger the expirations (relative to when the client last retrieved it) so the server doesn't get trampled.
It takes just one big bad actor to break. Reminds me of certain routers damaging NTP traffic.
Of course you'd want a caching layer in front of the server doing the actual work, but it's still possible to "thundering herd" the cache server if you use an Expires header. Even if the herd doesn't hurt your backend server, it can still make the load on your caching frontend servers spike at specific time periods with every good actor refreshing the content at the same time. So it's still ideal to try and even out that load with Cache-Control.
Thundering herd may or may not be an issue depending on the amount of traffic you normally get, the architecture of your backend (e.g. AWS Lambda or S3 which can most likely deal with this easily) and the primary purpose of your CDN usage (e.g. caching data closer to the users for faster delivery world wide rather than reducing back end load).
 https://www.nczonline.net/blog/2010/01/12/history-of-the-use... (2010, though little has changed since then).
I'm sure there's a Bugzilla bug about the "X11; Linux x86_64" in the headers, and I'd be terrified to open it.
This is an amusing (scary?) article about the history of the user-agent:
While it has obviously been abused, neither way is ideal. There's no way for a server to say "tell me the browser capabilities before I serve you the request".
These days the referrer header rarely makes it through for 2 main classes of reasons .
1. Requests transiting across HTTP <-> HTTPS boundaries do not include the referrer header.
2. The referrer header is frequently disabled by sites (especially search engines and high-traffic sites) through the use a special HTML header meta control tag :
<meta name="referrer" content="no-referrer" />
Every time someone asks me to add GA to a website I get a little bit more bitter.
Perhaps your complaint is of a higher order though? Recently I've been spending most of my time wrestling with CSS so my perspective is a bit skewed...
server is no vanity, server is needed to know WHO THE HELL responded you (we are in a very messy cdn selectors + cdns + application layers depending on non obvious rules on (sub)domain and cookies).
So beware of unexpected side-effects!
In haproxy, you can discard it with:
http-response del-header Date
For reference around and clarification around the Date header, the "should" comes from the loophole that nobody is required to have a time source. The previous RFC's made that harder to understand, as the loophole was in another section.
Server: Artisanal bits
Via: 1.1 varnish,1.1 varnish
X-Served-By: cache-sjc3150-SJC, cache-cdg8748-CDG
Step 2: Advise, "This is part of the standard but ignore it because it's pointless."
This was a requirement to have IE6 accept third party cookies from your site.
Most emulations don't work.
You need the old stuff (IE), for the old applications to work, and as long as they can force it to, they won't update said old application.
I've even had to touch systems which required IE 4 in the last few years, from before Trident became the rendering engine.
For a local business or community, e.g. an arts group with a Wordpress style site, there are many common problems, they might not need a full CDN, just serving media files from a cookieless subdomain gets their site up to acceptable speed cutting the header overhead considerably.
Purging the useless headers might also include getting rid of pointless 'meta keywords' and what not.
The tips given here could be really suited to this type of simple work to get a site vaguely performant. How to do it with common little guy server setups could really help.
I could see the argument for pruning useless headers if, say, the method for generating them relied on some high-latency database call or filesystem access, but that would rarely be the case.
20 bytes times billions of requests is absolutely an optimization target; in fact, this is a really easy and low-hanging one at that.
Sure, there's also a ton of garbage sites shoving piles of garbage around, but the existence of fast food doesn't mean nice restaurants don't exist.
Cache-Control: no-cache, no-store, must-revalidate
Mozilla recommends the following to prevent cachine:
Cache-Control: no-cache, no-store, must-revalidate"
You seem to be making assumptions about the motivation for the article and then reacting strongly against it, but that's also dubious.
For what it's worth, the author is listed as a "Developer Advocate".
So it seems they are starting to fall into the propaganda mode to paint over the issue rather than admit that it is time for them to start innovating again.
Surrogate keys and quite cache busting used to be Fastly special sauce but since 2014 it is rather standard.
This is literally nothing more than a minor blog post that points out that some of us are still using headers we might not need to. Finding anything else in that is utterly baffling.
However, the HTML is well-formed enough that a Reader-view, whichever your browser supports, should work to be able to view it.