
A small website mystery - kumaranvpl
https://jvns.ca/blog/2017/05/10/a-small-website-bug-story/
======
yes_or_gnome
> What happened to the Content-Encoding: gzip header though?

> I still don’t know where the Content-Encoding: gzip header went! Did
> Cloudflare remove it? Did my webhost stop serving it for some reason? I have
> no idea! Anyway, my site seems to work again (I think/hope?) and I thought
> this was kind of a fun excursion into HTTP headers.

I would presume that Cloudflare's HTTP client doesn't send an 'Accept-
encoding: gzip' header, and , likely, they don't support caching the 'Content-
encoding:' header. I would expect this because Cloudflare's purpose is to
optimize your traffic as much as possible, so to do that, they want to
negotiate the least optimized content from your upstream server. Requesting
unoptimized content from upstream will minimize your server's performance
utilization at the coat of increased traffic utilization. However, that small
increase in traffic utilization is offset by Cloudflare's content caching.
Also, when Cloudflare makes that initial request due to a cache miss, there's
a downstream user expecting that content. If Cloudflare wants to minimize the
time of that request, as they want to do for all requests, so if they get
gzipped content, then they need to decompress, store the content in cache,
and, if 'accept-encoding:gzip' is present, then re-gzip the content. They are
trying to minimize that turn around by not having to decompress content which
the author's rewrite rules, essentially, broke the HTTProtocol contract.

I'm sure it's incredibly rare to encounter one, but this very issue is
expected behavior for HTTP/1.0 clients as they don't support gzip.

------
i336_
I think
[https://api.stackexchange.com/docs/compression](https://api.stackexchange.com/docs/compression)
is interesting and relevant here:

> _While our API is HTTP based, we chose to diverge from standard HTTP in one
> particular area. During normal operation, we guarantee that all responses
> are compressed, either with GZIP or DEFLATE._

Read this line carefully:

> _If Content-Encoding is set on the response, use the specified algorithm. If
> it is missing, assume GZIP._

So SO's API (but ONLY the API endpoint) does precisely what this website was
doing. Just thought that was interesting to share.

Their rationale:

> _There is a way to remain in compliance with the HTTP spec, which is to
> reject all requests that do not list "gzip" or "deflate" in their Accept-
> Encoding header. Unfortunately, this does not work in practice as far too
> many proxies (affecting ~1% of users in our experience) will strip out this
> header._

> _The motivation for this is simple, serving uncompressed content is a loss
> for all parties. Bandwidth is, in comparison to CPU time, exceptional[ly]
> expensive and severely limited on many devices. Its really a no-brainer to
> require compression accordingly._

And then there's this hilarious tidbit at the end:

> _If response is not compressed this suggests a proxy between the user and us
> is intentionally decompressing content, or errors are occuring very early in
> processing requests. You can detect uncompressed content by checking for the
> appropriate magic numbers, assuming your library cannot detect this error
> for you._

------
js2
_I have a static site, so I gzipped every page on my site, and set up this
Apache configuration:_

    
    
        RewriteEngine on 
        RewriteCond %{HTTP:Accept-Encoding} gzip 
        RewriteCond %{REQUEST_FILENAME}.gz -f 
        RewriteRule ^(.*)$ $1.gz [L] 
    

_This tells Apache “hey, always send gzipped replies no matter what!!”._

I don't think so. The first RewriteCond is checking for the presence of an
"Accept-Encoding: gzip" header (and the second that the file exists in gzip'ed
form).

------
replete
Most CDNs strip Accept-Encoding headers

~~~
executesorder66
Why do they do that?

------
rograndom
I ran into a similar problem when using Sucuri's Cloud Proxy. My nginx was
configured to serve up Brotli compressed pages, and would work great as long
as the first visitor to the site after a cache flush wasn't using chrome. If
that happened, then the proxy would cache the Brotli encoded pages and serve
them up to browsers that didn't support it and just show a bunch of junk.

Ended up ditching the Brotli compression.

------
undersuit
I saw the first dump of the headers and I thought back to the article someone
linked in HN comments([https://www.fastly.com/blog/best-practices-for-using-
the-var...](https://www.fastly.com/blog/best-practices-for-using-the-vary-
header)), and I was on the right path.

------
Safety1stClyde
Looking at this alexa rank:

[http://www.alexa.com/siteinfo/jvns.ca](http://www.alexa.com/siteinfo/jvns.ca)

the mystery is why this site needs "Cloudflare" or a "Content Distribution
Network" at all.

------
korzun
Why is this on the front page with over 30 points?

This is not an issue with CloudFlare; their service will pass whatever weak
E-Tags you want to the external request. They are proxying the
misconfiguration over.

The title should be 'My host turned on compression and one of the rules I
wrote two years ago suddenly kicked in and broke my website'.

~~~
Matachines
Bc Julia Evans wrote it

~~~
goldenkey
Hacker News should care more about the content than the author. But that's a
lot to ask.

------
Sarkie
how did you clear the Cloudflare cache?

