
Editing my blog's HTTP headers with Cloudflare Workers - jgrahamc
https://jvns.ca/blog/2018/09/03/editing-my-blog-s-http-headers-with-cloudflare-workers/
======
newscracker
Under "things I tried", she lists:

> clearing my Cloudflare cache lots of times (this would temporarily fix the
> problem, but it would just crop up again later)

> upgrading to a new ‘realm’ on my webhost, in the hopes that there was a bad
> Apache server or something that I could move away from

> Making sure <!DOCTYPE html> was at the beginning of all my HTML in case that
> helped browsers figure it out it was HTML (it didn’t)

> Switching away from nearlyfreespeech’s “free beta bandwidth” program

> emailing Cloudflare’s support to see if they knew anything about this

> making a lot of curl requests to my webhost directly to see if I could
> reproduce it (I couldn’t)

The last point in this list makes it seem like the issue could probably be
with Cloudflare. Considering that she spends an additional $5 per month for
the Cloudflare Workers solution now, a similar thing I would've tried would be
to turn off Cloudflare DNS/caching on the site for a few weeks (or longer) and
observe (this can be done by turning off caching on the Cloudflare console or
by changing the DNS servers on NearlyFreeSpeech.net from Cloudflare to what
NFSN provides). This would add a minor cost every month, though a lot lesser
than $5, IMO. If the problem shows up, then it's certainly something on NFSN.
If not, then it's certainly Cloudflare.

~~~
kentonv
It could be that some particular request header (maybe an unusual user-agent)
causes the web server to fail to send a content-type, for some reason. But if
the response was nevertheless marked cacheable, then Cloudflare may have
cached it and served it to other users who didn't send the unusual header.
This problem would seem to go away if you disabled Cloudflare caching, but
would still technically be a problem with the web server, not Cloudflare...

When thinking about cost, don't forget to factor in developer time. As a rule
of thumb, your time as a developer is worth $1 per minute.

(Disclosure: I'm the tech lead of Cloudflare Workers.)

~~~
user5994461
You need to set "Vary: User-Agent" when the response varies. It's very
important.

~~~
manigandham
Cloudflare CDN does not use the `Vary` header and only caches by the URL,
unless you have an enterprise plan.

~~~
user5994461
Good to know. That breaks a lot of stuff. Good thing we always had the
enterprise plan at work.

~~~
kentonv
`Vary` is tricky because it potentially forces the cache to do many more disk
reads per lookup, hurting performance. Without `Vary`, there's at most one
cache entry per cache key, and logic is simple: load that entry, verify it
hasn't expired, and then return. With `Vary`, there could be many entries
matching any cache key, and you can't even compute a hash bucket upfront
because you have to examine the `Vary` header on the cached response to find
out which headers matter. You either have to do a linear scan of all entries
for that URL, or you need to maintain some sort of fancy index, and even then
it's going to be slower than without `Vary`.

FWIW, with Cloudflare Workers you can implement logic that is (almost)
strictly superior to Vary, and doesn't have these performance challenges.
Instead of varying on a header, you can write code to vary on arbitrary
properties of the request. For example, `Vary: User-Agent` would normally
partition the cache not only for every browser, but for every minor version of
every browser and every OS version, ruining your cache hit rate. With a
Worker, you can parse the User-Agent and decide the buckets in your own way in
code, then compute a custom cache key (or just add a query parameter to the
URL) based on that. (Workers run "in front of" cache, so you get to modify the
request before cache lookup occurs.)

The only down side compared to the header is that your code has to know what
to vary on at request time, whereas the `Vary` header can be determined at
response time. But if you're writing code specific to some application or web
site, usually this is not that hard to determine based on URL alone.

(Disclosure: I'm the tech lead of Cloudflare Workers.)

~~~
user5994461
It doesn't affect performance of the browser or the web server, but I
understand that it takes some work to support for a CDN. Too bad since you run
a CDN ;)

I had to debug some pretty nasty issues caused by missing Vary. I would
certainly do without it if it were possible, but it's not an option.

Vary is strictly required for anything that renders content per browser (User-
Agent), compression (Content-Type Content-Encoding) or CORS policies (Origin).

~~~
manigandham
Cloudflare is/was built on nginx so Vary is already supported and available
for enterprise plans as stated. They just avoid it since they have such a
large free tier of users and vary headers can easily eat up cache space for
most scenarios.

It seems Workers is their standard answer for more flexibility now and it
works well (if you're ok with the pricing) since you can create your own cache
key easily by combining and hashing the different headers you're interested in
and just turning them into a querystring param in the origin request.

------
toast0
I totally understand fixing it this way, but it's of course more satisfying to
fix the underlying issue.

I couldn't (easily) find documentation of the webhosts' webserver setup, but I
did find that Apache 2.4 deprecated DefaultType and will return pages without
a Content-Type header if there are no matching rules. It seems possible that
some portion on the original hosts may not be configured the same as
everything else, leading to this problem.

Detailed access logs (if available) might help show where the problem request
hit and help track it down?

It's probably worth poking at the host's customer support too. They seem
pretty competent, and worst case, they say they're not going to look at it,
because there's not enough information.

------
erdaniels
This feels like an expensive fix to a problem that seems worth digging into.
Is the hosting provider the one not sending the header? Or is it Cloudflare or
the web server? Saving 60 dollars a year to find that out seems worth it to
me.

~~~
user5994461
$5 a month is rather cheap for web hosting.

~~~
erdaniels
It's $5/month just for the HTTP header interception/injection. The hosting
cost isn't listed.

------
distantsounds
That's the most expensive HTTP header I've ever seen.

------
manigandham
Why not also use the Workers to now inspect the requests and log the ones that
are missing the content-type header to see what's happening?

~~~
bewuethr
She does that, according to the second to last paragraph.

------
creeble
Seems like she's spending $60/yr to fix a broken hosting company's problem?

------
illumin8
You can also customize request or response headers with AWS Lambda @ Edge and
it only costs $0.60 per million requests:
[https://nvisium.com/resources/blog/2017/08/10/lambda-edge-
cl...](https://nvisium.com/resources/blog/2017/08/10/lambda-edge-cloudfront-
custom-headers.html)

~~~
kentonv
I try to avoid commenting on competitors... but I really have to point out
that Amazon's pricing is misleading here.

To use Lambda@Edge, as far as I can tell, you need to pay for:

\- Lambda@Edge base cost: $0.60/M

\- Lambda@Edge CPU/RAM cost: minimum $0.31/M (every request is rounded up to
at least 128MB+50ms)

\- CloudFront requests: minimum $1.00/M (assuming HTTPS in US+Canada)

\- CloudFront egress bandwidth: minimum $1.70/M (assuming average 20KB
responses in US+Canada)

\- Probably other things, too?

So... $3.61 per million... and probably more due to bandwidth and region.

Cloudflare Workers charges $0.50 per million requests, with a $5 monthly
minimum (covering your first 10M requests). There are no other costs: You can
use it on top of Cloudflare's free plan, which gives you unlimited bandwidth.
And... it performs better: [https://blog.cloudflare.com/serverless-
performance-compariso...](https://blog.cloudflare.com/serverless-performance-
comparison-workers-lambda/)

(Disclosure: Again, I'm the tech lead of Cloudflare Workers, so take my
comments on competitors with appropriate grains of salt...)

------
juanbyrge
Wow, this is a bug in Cloudflare that you are paying to work around?

~~~
mehrdadn
I understand she doesn't know for sure, that's just her best guess.

------
petercooper
I wanted to test if an old school META "http-equiv" directive would be a
temporary fix for this issue.. so I tried to recreate the problem by making a
one-shot HTTP server with netcat that only returned Content-Length and no
Content-Type header:

    
    
        { printf 'HTTP/1.1 200 OK\r\nContent-Length: %d\r\n\r\n' "$(wc -c < a.html)"; cat a.html; } | nc -l 8080 
    

Yet Chrome, Firefox and Safari automatically detect the payload is HTML and
renders it properly _without_ any Content-Type header, unlike with her pages.
What am I missing?

~~~
treve
Perhaps a X-Content-Type-Options: nosniff header?

~~~
petercooper
Aha, that was indeed the one - thanks :) And.. I can confirm meta http-equiv
does not help in this scenario at all.

------
Buge
If you're serving any user-created content (such as comments or user uploaded
images) I would think this would be a bad idea security-wise, because it might
lead to XSS.

But if you're just doing author-created content, it should be ok security-
wise.

------
Yokohiii
I think my light bulb was flickering when she published this.

------
markdown
Beware anyone dealing with Cloudflare; If you're locked out of your account
for any reason (lose your mobile phone/2FA, or any other reason), you lose
your account forever.

There is no way to contact Cloudflare _without_ the account you just lost
access to.

~~~
abtinf
That’s a really good thing, assuming they are upfront with warnings and
advice. I wish more service providers behaved similarly, especially amazon,
google, and others.

~~~
markdown
It's a good thing to not provide a support contact without a login? What
planet are you living on? And you want other tech providers to follow suite
and hide their support channels?

~~~
Nadya
I specifically use the email host I use because there is no way to recover a
lost account. If you lose the password - too bad! The owner does not process
password resets and there is no way to recover your account. The host makes
this very clear that you are in charge of never misplacing or forgetting your
password. It removes an entire attack vector - social engineering a password
reset.

It is very much something I wish more companies would do, or at least let
users opt into. "Never under any circumstances reset my password no matter how
much I might beg you to." Only tangentially related to _support_ as a whole
but more directly related to "being able to recover an account".

~~~
markdown
You're missing the point here. Whether or not I can recover my account is
tangential to the real problem of them not allowing me to contact Support.

You might be happy with that, but most people aren't. The "contract" is: so
long as I can prove who I am to a reasonable degree, give me access to my
account.

------
CSEThrowaway
As others have pointed out, this is an odd issue that really shouldn’t be
happening. Hopefully, since posting, the author has investigated alternative
solutions for website hosting as there are many (which are very nice and very
free).

I sort of got the impression that this was just a paid promotion for
CloudFlare workers. If it wasn’t, maybe they could do you a solid and help you
identify the actual issue. :)

~~~
jgrahamc
It's not a paid promotion. [https://blog.cloudflare.com/no-payola-
here/](https://blog.cloudflare.com/no-payola-here/)

