
Cache Poisoned DoS Attack: Shutdown any CDN Website with One HTTP Request - ldmail
https://cpdos.org/
======
0vermorrow
So after the authors disclosed this issue to AWS it was fixed and CloudFront
no longer caches 400 Bad Request by default, also from the paper linked on the
website [0]:

""" Amazon Web Services (AWS). We reported this issue to the AWSSecurity team.
They confirmed the vulnerabilities on CloudFront. The AWS-Security team
stopped caching error pages with the status code 400 Bad Request by default.
However, they took over three months to fix our CPDoS reportings.
Unfortunately, the overall disclosure process was characterized by a one-way
communication. We periodically asked for the current state, without getting
much information back from the AWS-Security team. They never contacted us to
keep us up to date with the current process.

"""

[0] -
[https://cpdos.org/paper/Your_Cache_Has_Fallen__Cache_Poisone...](https://cpdos.org/paper/Your_Cache_Has_Fallen__Cache_Poisoned_Denial_of_Service_Attack__Preprint_.pdf)

------
nkurz
It seems like the unstated larger problem here (in the blog article at least,
I haven't read the paper) is that cached pages are served that do not match
the full HTTP request. The problem isn't just that error pages are being
cached, but that certain HTTP headers that should change the resulting page
from the server are being ignored when it comes to determining identical
requests. Even if no one is maliciously using this feature to cause error
pages to be paged, the cache is still breaking the website if it causes a
different page to appear than would have been served directly from the server!

~~~
pquerna
Actually, the origin is supposed to send a `Vary` header if it changes
behavior based on any header.

So, if a client sends a 20kb `X-Oversized-Header`, when the server responds
with a 400 -- it might be conceivable that it should include `Vary:
X-Oversized-Header`.

Is that "really" the right fix? Probably not. But the HTTP RFCs provide `Vary`
for exact this kind of reason within HTTP caching: the origin is varying its
response based on a subset of headers.

~~~
mjw1007
In RFC-world (which may not be the same as the real world), there's no need
for `Vary` on a 400 response, because a 400 response isn't cacheable (unless
it has a cache-control or expires header).

~~~
derefr
I feel like HTTP response codes are an attempt to squish several layers of
Result monads together into one single many-valued one, where each layer of
the original nested set of Results has different caching semantics.

Like, in this case, a 400 is really the origin is saying that it's not even
_sending_ you a representation of the resource you requested, because you
didn't make a request that can be parsed as any particular resource. It's a
Left RequestFormatError instead of a Right CachableResourceRepresentation.

And, annoyingly, the codes don't have any line-up with which layer of the
result went bad. 4XX is "client error", sure; but 404 isn't really an "error"
at all, but an eminently cacheable representation of the representation of the
non-existence of a resource.

It'd be neat to see the HTTP codes rearranged into layers by what caching
semantics they require of the implementing UA, such that UAs could just attach
behavior to status-code _ranges_. Maybe in HTTP/4?

~~~
ktpsns
Well, HTTP was probably not designed for doing well with caches. You certainly
know that the first digit of a HTTP response code tells about some grouping,
where jt is

    
    
        4xx (Client Error): The request contains bad syntax or cannot be fulfilled
    

I would state that, since this is client-specific, all 4xx responses should
not be cached at some proxy/CDN, since he is not the client. And even the
client should not cache a 404. A ressource could just be created the next
moment.

~~~
brians
Could be, but there’s a reason to want cachable, proxyable errors here:
they’re often very expensive for the origin.

~~~
ktpsns
Actualy, handling an error should be cheaper then handling a proper request.
Just because an error most likely means an early exit of the handling server
-- which means less time-to-answer, i.e. cheaper. (This does not cover any
kind of DoS attack, which is _always_ difficult to handle, regardless of an
error or non-error answer)

However, effectively we agree with _derefr_ , saying that HTTP status code
design did not have this pecularity of cachable vs. non-cachable errors in
mind. This is definetly a shortcoming.

------
ge0rg
_Many intermediate systems such as proxies, load balancers, caches, and
firewalls, however, do only support GET and POST. This means that HTTP
requests with DELETE and PUT are simply blocked. To circumvent this
restriction many REST-based APIs or web frameworks such as the Play Framework
1, provide headers such as X-HTTP-Method-Override, X-HTTP-Method or X-Method-
Override for tunnel blocked HTTP methods_

This is so f __*ing scary. Who in their right mind invents such crazy tricks
that absolutely circumvent all we know about web API security, implements them
in frameworks, and leaves them enabled by default?

~~~
oauea
People who have to deal with end users behind overly restrictive corporate
firewall. After you get a few hundred bad reviews telling you your website is
broken, it looks very tempting to just fix it on your end.

~~~
zelon88
Maybe don't fix it and just accept that organization X doesn't want it's users
to access your resource?

Seems more reasonable than trying circumvent a legitimate restriction. Not
every block is an adversary which needs defeating.

~~~
jessaustin
What is the legitimate restriction? You'll let "users" POST but not PUT? In
what Mordacian fever-dream is that reasonable?

~~~
matheusmoreira
The PUT and DELETE methods can't be used with HTML forms.

[https://softwareengineering.stackexchange.com/q/114156](https://softwareengineering.stackexchange.com/q/114156)

~~~
jessaustin
Forms aren't the only sources of HTTP requests.

------
dpedu
It's pretty surprising to see an issue like this. I've spent some time in the
past tuning http caches/cdns. One key takeaway I recall is the importance of
your "cache key". That is, which fields matter when deciding if requests
match. If the cache key of two requests match, the requests are considered
identical and thus the desired behavior would be to serve from cache.

Obviously, things like the request method, path and Host header matter a lot.
Perhaps if you're A/B testing, the A/B cookie would make sense as part of the
cache key too.

This seems like a simple misconfiguration at a very critical location. But an
'exploit' that warrants it's own domain? Hardly. This is a promotion for the
authors and their upcoming presentation. It's a very nice gotcha.

~~~
jefftk
This is a very easy mistake to make: you configure your cache keys in one
place, and process requests in another. With nothing linking them it's not at
all surprising for them to be out of sync.

As for whether it requires it's on domain, domains are cheap and publicity
gets people to pay attention to problems and fix them.

------
parliament32
A sane CDN doesn't cache error responses. There is no legitimate reason to
cache a non 2xx/3xx response, unless you / your CDN is really pinching
pennies.

Caching a 5xx kinda makes sense I guess, but a 4xx client error? That's nuts.

~~~
mdavidn
Caching a 404 can make sense, however.

For example, every browser requests /favicon.ico by convention. If you don't
have one, you wouldn't want every single request reaching your backend.

~~~
parliament32
Meh, it's subjective but I'd argue that's a bad idea. A 404 isn't exactly a
heavy request; your origin should be able to handle plenty of those. The whole
point of CDNs is to serve fat assets (images, videos, js libraries sometimes)
not patch holes in your site or bandaid your raspberry-pi origin. Caching
error responses, including 404s, can lead to all sorts of trouble and isn't
worth it IMO.

~~~
justincormack
Actually, generating heavy to process 404s is a fairly common attack, for some
kinds of site.

~~~
elmo2you
Could you maybe provide a practical example of such a "heavy to process" 404?
I'm having difficulty to image how a "Not Found" could (or ever should)
involve heavy processing. AFAIK, a 404 should be only given for a non-existing
resource (e.g. file). That should be straightforward enough. Granted, I've
seen some horrible semantic resource pointers/paths in URI's over the years,
some of which requiring processing, and some of them generating 404s. However,
such contortions are mostly just a testament to horribly bad design. If the
request involves processing on the server side (higher up than the web server
or cache itself), resulting in a conclusion that a resource is unavailable,
should that not return a 5xx response?

~~~
Dylan16807
The origin server might be set up to serve everything through cgi. That would
mean loading the entire framework just to spit out a 404 for the favicon. If
all the other pages cache just fine, you could hit 100% load on favicons when
you'd otherwise be at 10%.

It's not an ideal design but I wouldn't call it 'horrible'.

~~~
iliketoworkhard
What is CGI in this context? I’ve only heard of it in a movie special effects
context.

~~~
Dylan16807
The web server having to run an external program to handle the request,
typically because it's written in a scripting language.

Technically it stands for "common gateway interface" but that doesn't come up
very much.

------
robocat
CDNs could whitelist HTTP header keys, constrain HTTP header values, and
regenerate a full clean header to the web server as part of their security
services.

CDNs already parse headers, and the CDN developers have the knowledge and know
how to correctly constrain header values.

There are 100's of different HTTP Header attacks e.g. I like this one that
returns a response from a different persons HTTP response into your HTTP
response: [https://portswigger.net/blog/http-desync-attacks-request-
smu...](https://portswigger.net/blog/http-desync-attacks-request-smuggling-
reborn)

~~~
ris
Well this is the thing I'm confused about, having worked on some cloudfront
stuff last month, documents like
[https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope...](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-
caching.html) gave me the clear impression that every header cloudfront passes
to the origin will be considered a cache key. In particular if you configure
it to "forward all headers", which is what would surely be needed to pass this
custom "X-Oversized-Header" onwards, it would effectively disable all caching.

I'm clearly going to have to spend another day staring at the cloudfront
documentation.

~~~
iliketoworkhard
And why weren’t all headers considered in the cache key to begin with? The key
computation will never not be O(1).

Only thing I can think of is too many cache entries with the same cache values
for the case where requests with different headers produce the same API
response.

Which could lead to a large cache.

------
mattacular
Why would you have a CDN configured to cache error responses from the origin?
It shouldn't serve those at all.

It's also a good idea to whitelist headers and query params your app uses.

Better put this sounds like an attack that only works on very poorly
configured CDNs.

~~~
dpedu
A main purpose of a cdn is to protect your resources by using caching. Error
pages still require server-side resources to create. Not caching error pages
gives attackers an easy way to consume your server's resources a la "DoS".

~~~
mattacular
That is if a hacker can find an endpoint which is erroring, which they
shouldn't be able to if you've configured your cdn not to serve errors from
the origin in the first place. There's obviously other good reasons not to
serve error messages out to the public.

Anyway, if you're an attacker it's easier to DDOS against urls that will 404
as those must go back origin anyway typically and likely won't be safe to
cache for long even if the cdn is config'd to do so for 4xx responses. To
protect against that your cdn provider probably has some sort of ddos
protection feature as well though.

~~~
wbl
If the CDN doesnt serve an error, what pray tell should it return when the
request triggers an error on the origin?

~~~
mattacular
Depends what kind of error you're referring to but if it's a 5xx cdns can be
setup to serve a static html. That html can contain as much or as little info
about the error (including no info if you think the endpoint is likely to come
under attack the page could be disguised as something legit).

Of course you should be monitoring for and logging these error responses from
the origin and fixing them as soon as possible. The cdn response is just to
provide cover. Again, that is if you need or want it. If you want to expose
errors to the public go right ahead nobody is going to stop you.

------
notacoward
In the text it points out that adhering to the standard regarding what may be
cached - i.e. _not_ most kinds of error responses - is a key mitigation, so
it's pretty noteworthy that Amazon CloudFront™ seems to be by far the most
afflicted CDN type. Also nice to see that Good Old Squid fares exceptionally
well.

------
Elte
Maybe I'm missing something, but is the exploit somehow generating a cache
miss? Otherwise everything that's already in cache isn't vulnerable to this
right? Not that it makes it in any way less scary, but slightly more
complicated at least..

~~~
paulhodge
Yeah, that’s part of it. The attacker wants to hit a url that isn’t already
cached by CDN. That might be easy or hard depending on the site. Like if the
CDN just has a 30 minute TTL, the attacker will need to be the first request
right after the 30 minutes expires.

------
juancampa
Why would a CDN cache a 400?

~~~
almost_usual
A CDN that doesn’t follow HTTP standards

“One of the main reasons for HHO and HMC CPDoS attacks lies in the fact that a
vulnerable cache illicitly stores responses containing error codes such as 400
Bad Request by default. This is not allowed according to the HTTP standard.“

~~~
notacoward
It seems that it should be feasible to cache more kinds of errors if the
request that populated the cache and the subsequent request are identical.
These attacks all rely on that not being the case. However, "identity" is a
more slippery concept than most might think. Generally it requires putting
requests into some canonical form, but defining that canonical form
(especially what it excludes) requires making exactly the same kinds of
distinctions that were missed to make these attacks possible. It just shifts
the problem around, and introduces new potential for breakage. In the end it's
no better than just following the darn standard, whose authors probably
defined what was cacheable with exactly these concerns in mind.

------
hartator
Is that really a legitimate attack vector?

I would expect web servers to just discard weird headers and serve the regular
page. And, at the same time, I would also expect CDN to not cache 400 bad
requests or 500 server errors.

~~~
stebann
Yes, it is. Many servers/proxies/caches/ad-hoc-header-filters configurations,
if they're not handle properly, don't expect that some clients will tamper
their headers or people who configure them don't even take them into account,
so servers/proxies/caches/ad-hoc-header-filters don't discard them (better
approach is to use white-lists but it's not the panacea). CDNs can cache or
not, but I have seen a variety of behaviors even for the same CDN, so again I
suspect many times this could be caused by misconfiguration.

------
crawdog
I can think of only rare occasions where any status other than 200 is cached.
Usually this is an edge case that has to be configured in the caching tier.

~~~
thexa4
Not caching your error pages is a good way to bring down your servers once
something goes wrong. We usually have a short cache for errors to make sure
there is an upper limit to the amount of requests coming through the cache.

------
rkagerer
One more mitigation idea: set non-trivial expiration times and "prime" your
caches. When you update a resource, hit all your CDN endpoints (at least ones
which don't follow spec and are known to be exploitable) to force them to
download and cache the good copy. Bake it into your CI deployment process.

------
StreamBright
I usually do not let any header through the CDNs. I need to try this attack
out on some of our infra, not sure if any of those are affected if you have
tight header policy.

------
edoceo
The title is hyperbolic but the article is clear and well researched. Issue
seems to affect Cloud Front more than other CDNs - from the matrix provided

