
Security advisory: Breach and Django - Lightning
https://www.djangoproject.com/weblog/2013/aug/06/breach-and-django/
======
brokentone
Correct me if I'm wrong, but it appears as though Django isn't the only
framework/technology that is vulnerable to such an attack, they're just one of
the first to provide a mitigation strategy (resulting in this post).

~~~
steveklabnik
Any website that

    
    
      * Be served from a server that uses HTTP-level compression
      * Reflect user-input in HTTP response bodies
      * Reflect a secret (such as a CSRF token) in HTTP response bodies
    

is vulnerable, regardless of technology.

The mitigation strategies were given in the original paper[1], this
announcement is just repeat of what's in there. That said, it's exactly the
right thing to do, that's not a knock on Django.

1:
[http://breachattack.com/#mitigations](http://breachattack.com/#mitigations)

~~~
homakov
> * Be served from a server that uses HTTP-level compression

this is the only must-have. Last 2 exist almost in every website (everyone
needs a CSRF token, everyone reflects something somewhere)

~~~
jamoes
From my understanding (which may be wrong...), the requirement of "Reflect
user-input in HTTP response bodies" is actually pretty important. If the
application only does this on POST requests, then it should probably be fine.
Since an attacker cannot formulate a valid POST without the CSRF token
(assuming the app is using CSRF tokens correctly), then there is no way for an
attacker to get this attack bootstrapped.

If the application reflects GET request input in the response (eg
`[https://domain.tld?q=ASDF`](https://domain.tld?q=ASDF`) results in some
`value="ASDF"` being included somewhere in the response), then it is indeed
likely vulnerable. This allows the attacker to simply continually change the
value of `ASDF` as they guess and check for some secret on the page.

Of course, if your application is allowing untrusted POSTs to be made, then
you will still have to worry about POST requests...

~~~
homakov
you're right, also, to make compression work, ASDF is not enough, attacker
needs [https://domain.tld?q=value="ASDF](https://domain.tld?q=value="ASDF)

------
gojomo
Currently, the Django templating tag:

    
    
      {% csrf_token %}
    

...results in an insert like...

    
    
      <input type="hidden" name="csrfmiddlewaretoken" 
        value="566e4606b2094c7c48e5d04b58236f51">
    

I suspect that the particular mitigation strategy the BREACH authors' describe
as "Randomizing secrets per request" could be implemented by having {%
csrf_token %} instead emit:

    
    
      <input type="hidden" name="random_data" 
        value="91178a84e0bc6e08a2fda853eef2d2c8">
      <input type="hidden" name="csrfmiddlewaretoken_xor" 
        value="e0b594e902c7fe6b1748d13aefaf63aa">
    

...where the random_data changes every response, the emitted
csrfmiddlewaretoken_xor is the real token XORed with the random_data, and upon
submission the server will again XOR the two values together to get the real
CSRF token.

There may be other secrets that need protection in other ways, and maybe this
would make any random-source issues more exploitable... but this would seem to
protect the CSRF token, in a cheap and minimal way.

UPDATE: Thinking further, though, maybe the attacker can probe for both values
at the same time, and thus determine the probability of certain pairs, and
thus this only slows the attack? I'd appreciate an expert opinion, as this was
the first mitigation that came to mind, and if it's wrong-headed I'd like to
bash my intuition into better shape with a clue-hammer.

~~~
homakov
Your UPDATE is right, attacker can probe a..z 2 times, and just choose the
letter that was compressed in both of them, ignoring random compressions

~~~
gojomo
Thanks, but can you clarify... does that mean probing (a..z)×(a..z) (one pair
per probe), so there's at least a giant increase in probing required per
character? And perhaps even more each character in, since probing for the Nth
character now requires (a..z)^(N-1) × (a..z)×(a..z) ?

(I'm guessing also, though, it may be possible to probabilistically probe
multiple ranges of the secret at once... in a process that seems vaguely
similar to forward-error-correction coding.)

------
homakov
Offtopic: this is very simple mitigation for any website, requires JS:
[https://gist.github.com/homakov/6147227](https://gist.github.com/homakov/6147227)

------
Erwin
So to be clear:

1) The attacker must be on the same network as you, or at least be able to
detect how large the compressed and encrypted replies are.

If you are on the same network it seems to be there are far more MITM and
whatnot attacks that are more likely to succeed, if you do not use HSTS (or
secure DNS if that helps).

2) The attacker must be able to get your browser to rapidly generate many (how
many?) requests from your browser to the site. It takes "30 seconds" they
claim, but is that at a rate 100 requests per second?

3) Each request must carry something that will be reflected by the body of
that particular page when it's rendered. I suppose it could be an error
message or search string that's echoed.

It seems to me that unless you generate a CSRF token unconditionally on every
page, the subset of pages that both reflect something with no protection (e.g.
search results) and have a protected form (e.g. change my email address to
XYZ) might be small.

4) The secret that can be extracted is what's in the reply body and not the
headers -- headers are not compressed, since the TLS compression is now
universally disabled post-CRIME.

Personally I use Referer header checking as well. IME all the browsers of my
users do send them. So if you extract the CSRF token, it's useless by itself
unless you also can make the browser send the right Referer header (and AFAIK,
all the holes such as Flash have been plugged).

Other than that -- it seems that if you are normally generating e.g. a 32 byte
CSRF key, you could interleave it with 32 bytes of good randomness per
request?

~~~
homakov
>Personally I use Referer header checking as well. IME all the browsers of my
users do send them. So if you extract the CSRF token, it's useless by itself
unless you also can make the browser send the right Referer header (and AFAIK,
all the holes such as Flash have been plugged).

do-not-use-referrer-as-csrf-protection.com

~~~
tptacek
I'm sad that doesn't appear to exist.

~~~
homakov
The parent said "as well" but I'm still insisting it referrer must not be
neither a whole nor a part of CSRF protection.

~~~
tptacek
Sorry, I meant the specific domain you mentioned. :)

------
RyanZAG
Switch off all GZIP..? That feels very extreme, I'm sure there are better
workarounds than that one.

EDIT: The following workarounds should be very simple to implement and seem
like more viable alternatives for production?

    
    
      Length hiding (by adding random amount of bytes 
        to the responses)
      Rate-limiting the requests
    

Mitigations 6 and 7 taken from
[http://breachattack.com/](http://breachattack.com/)

~~~
tptacek
I'm sure everyone is going to come up with workarounds that re-enable
compression, but they'll be context-dependent and will involve code; in the
meantime, the attack is straightforward and viable. Think of disabling
compression as a stopgap.

~~~
illumen
Definitely think about it before just doing it though...

Disabling compression can break some apps. Especially when they rely on huge
compression ratios for text (5-10 times ratio is common for with much json for
example). So that is not an app agnostic work around. For example, a 100k of
json request, can turn into a 1MB json request. The more data required to
send, the more chance of error - especially on 3g/2g networks.

For many high end projects, just disabling compression without regard to
testing or having an idea of what the application is doing would get you fired
or taken to court.

Not only would this break apps, but it would also lose business in that there
is evidence from Amazon and others that every 100ms extra latency can cost 1%
in sales.

From SPDY whitepaper: "45 - 1142 ms in page load time simply due to header
compression". Remember that headers use the upload part of the link... which
means too many headers and you can saturate the upload, therefore making the
whole internet connection stall for everyone using it. Common upload limits
are only 5-10K/second, so excessive headers combined with many requests can
easily DOS many internet connections.

I spend a lot of time optimising websites for these reasons, and disabling
compression could add 20 seconds of load time for a good percentage of users.

So, for many apps, turning off compression is no solution at all. You might as
well just disconnect your app from the internet - that will also give you a
secure and broken app.

A proper risk, and impact analysis should be done first. Too often quick hot
fixes to security issues just break things or even make things less secure.

------
danso
A few days ago, Meldium's announcement of a Ruby gem that provides an
inexpensive partial protection (i.e. not disabling gzip) made it to the HN
front page:

[http://blog.meldium.com/home/2013/8/2/running-rails-
defend-y...](http://blog.meldium.com/home/2013/8/2/running-rails-defend-
yourself-against-breach)

The two protective measures are masking the Rails CSRF token and appending a
HTML comment to every HTML doc to slow down plaintext recovery. How easy is
this to include in a Django plugin?

~~~
mhurron
Is a partial workaround really better than a guaranteed workaround.

------
level09
This would cause a big problem for us. we mobile web service serves around
3-4k concurrent requests on average. without compression our API would take
300% - 900% increase in the delay.

is there any alternatives ? would like to know what Cloud Flare would do as
their CDN is based on compressed nginx responses.

~~~
pudquick
As gzip compression only applies to the content of the page, not the headers,
I would assume that prefixing your page with content that is variably
compressible and of varying lengths would throw a monkey wrench in the
attacks.

The compressed content of any part of a page very much depends on what came
before it. Altering the content to include a script comment block full of
random text and various common HTML and JavaScript elements (Markov chains
anyone?) would definitely change how a page is compressed.

If the compressed length of the replies varies significantly with every
request - even if the request content is identical - attacks like this can no
longer reveal hidden information.

Edit:

You could improve this significantly by including false positive matches as
well. If your HTML content has: csrf="45a7..." in it, you could hash that
content into enough material to generate 19 or so identical looking code
blocks embedded in a script comment. You've now provided a 95% chance they
attack the wrong one / increased the number of attacks they'll need to try by
20x.

This method (minus the above part) would actually be cacheable by smart CDNs
like Cloudflare.

~~~
Daniel_Newby
Random padding can be averaged out. It increases the work factor of the
attack, but not by much.

------
z-factor
The attacker has to be able to issue requests on behalf of the user with
injected "canary" strings. I fail to see a practical exploit where one can do
this and wouldn't have access to the secret in the response anyway. What am I
missing?

~~~
tptacek
Does any GET or POST URI endpoint in your application accept parameters? Do
none of those parameters impact the output of the application? That set of
circumstances is extraordinarily common.

~~~
z-factor
The request has to be issued by the attacker from the victim's browser. If the
attacker can do that, why is he unable to read the response to that request?

Edit: I think I can see a scenario where a third-party website does these
requests via an <iframe> or an <img>. I'm not sure there's a way to do POST
quite as easily.

~~~
tptacek
Do you understand how CSRF works? Just think of it in terms of CSRF. Since the
attacker is trying to infer page content, they don't care that the server
rejects all the probing requests, so CSRF protection doesn't help you as the
attacker carries out the BREACH/CRIME stuff. If the result of the attack is an
inferred CSRF token, they then cap the whole exploit off with a (now working)
actual CSRF attack.

~~~
z-factor
I understand how the attack works, the question was about how a practical
exploit would actually be carried out. I've figured out how one would issue
GET requests from the right environment, but I don't know if the same is
possible for POST.

~~~
wglb
It is just as possible. POST csrf exploits add between two and three minutes
to an attacker to craft the request differently.

------
sehrope
How about having the CSRF token change with each request? If it's
encrypted/signed by the server for each request with a random IV then it would
be different in each request. It would be a bit more processing on the server
(decrypt vs just HMAC verify) but it would be completely different each time.
It seems kind of belt and suspenders as you're encrypting data within an
encrypted channel but I think it gets around this issue.

~~~
dvogel
If the CSRF token changes with each page view then opening a second page
(perhaps an explanation for a form field) in a new tab/window would invalidate
the form in the original tab/window.

~~~
sehrope
Not necessarily. The token can be used to simply verify that the request came
from a legit page and not cross site request. The encrypted CSRF need only be
verified by the server to see if it's not expired. The server can store the
expiration in the CSRF token itself (encrypted and signed). It does not need
to maintain a list of the CSRF tokens.

I wrote about this a little while back. Comments are here:
[https://news.ycombinator.com/item?id=5971464](https://news.ycombinator.com/item?id=5971464)

~~~
dvogel
Wouldn't the lifetime of the token have to be <30 seconds, according to the
claims made in the paper?

~~~
sehrope
I don't think so. Here's the snippet from the linked PDF[1]:

> DEFLATE [2] (the basis for gzip) takes advantage of repeated strings to
> shrink the compressed payload, an attacker can use the the reflected URL
> parameter to guess the secret one character at a time.

By encrypting the CSRF token (or any other "secret" data you want to roundtrip
from server to client and back) with a random IV per request this wouldn't
work. The value sent by the client would not be the same as the new token
generated by the server (since each has a random IV). Even though the
decrypted value of each token is the same, the values presented to the client
in the response body are each different and not predictable (to the client).

[1]:
[http://breachattack.com/resources/BREACH%20-%20SSL,%20gone%2...](http://breachattack.com/resources/BREACH%20-%20SSL,%20gone%20in%2030%20seconds.pdf)

------
pquerna
Has anyone looked at mitigating the attack by changing the behavior of chunked
transfer encoding?

Chunked Transfer encoding is basically padding that a server can easily
control, without having to change content or behavior of a backend
application. A web server could easily insert an order of magnitude more
chunks, and randomly place them in the response stream.

~~~
donaldstufft
I'm not sure I fully understand the proposed fix here, how does it differ from
the application simply including random chunks of data inside the response?

This area of things isn't my strong suite, but assuming that this is analogous
to just adding random data to the response, I _believe_ that simply adding
random data to the response can be worked around by doing more requests as
using statistics to factor out the noise introduced.

If my understanding is wrong then excuse me :)

~~~
lnanek2
I have seen random workarounds at the app level as well, where the app adds a
random length HTML comment on the end of the page.

But if random can be statistically removed, then they shouldn't add a random
amount. Maybe just track the max size of the returned response and always add
enough to reach that max size. Therefore the lengths of all the pages will
always be the same. This is still better than turning off compression
completely. A typical max for a detail page in an app might just be the size
of the page plus 256 bytes per app output field.

~~~
pquerna
You can basically add as many or as few bytes as you like, by abusing the
chunk-extension in chunked encoding:

[http://tools.ietf.org/html/rfc2616#section-3.6.1](http://tools.ietf.org/html/rfc2616#section-3.6.1)

So you could make all http responses round into 128 byte chunks, by appending
1 to 128 bytes at the end of every response.

Effectively it gives you length hiding at an http layer; Still attackable.

------
tomp
So, it seems that even if I encrypt everything, a lot of information is still
present in the _size_ of encrypted message; in case of VOIP, it's possible to
guess speech that is being transferred over an encrypted transport, in the
case of text, it's possible to figure out secrets if the attacker can modify
an equally-sized part of the message.

Is there any general way of preventing this kind of attacks? Inserting random
data could work, but it's distribution would have to be exactly right for the
attack to be impossible over longer periods of time. For the BREACH case, we
could solve it by not compressing user input, but what about the VOIP case?

Also, why does the site [http://breachattack.com/](http://breachattack.com/)
says that "Randomizing secrets per request" is less effective than disabling
compression?

~~~
sdevlin
> Is there any general way of preventing this kind of attacks?

Disabling compression is a 100%-effective countermeasure for compression
oracle attacks.

> Also, why does the site [http://breachattack.com/](http://breachattack.com/)
> says that "Randomizing secrets per request" is less effective than disabling
> compression?

Putting random data in the server response will only slow down the attack.
With enough requests, the noise from that random data will wash out.

Disabling compression will stop the attack cold. The whole thing is predicated
on analyzing the size of the compressed text. No compression, no compression
oracle.

~~~
STRML
How is it that random data would only slow it down? If I add a random field to
my response of variable length, say, from 0 to 50, with completely random
characters, it should completely throw off this attack. The length of the
output will change from request to request.

I suppose, given infinite time, you could send the same request over & over
and map the variance of content lengths, and get an idea of what the actual
content length was before random padding? But the compression seems to throw
that off even more AFAIK - because the data we pad with is random, it could
very well accidentally compress well because of the rest of the data in the
response, further throwing off any guesses.

Edit: From the pdf on breachattack.com:

    
    
      While this measure does make the attack take longer, it does so only slightly.
      The countermeasure requires the attacker to issue more requests, and measure the
      sizes of more responses, but not enough to make the attack infeasible. By repeating
      requests and averaging the sizes of the corresponding responses, the attacker can
      quickly learn the true length of the cipher text. This essentially boils down to the
      fact that the standard error of the mean in this case is inversely proportional to p
      N, where N is the number of repeat requests the attacker makes for each guess.

~~~
thatthatis
I think it is still discoverable because adding random data leads to the
following two distributions:

Attacker gets secret wrong, page is size: original page + (zero to fifty) +
length of incorrect secret Attacker gets secret right, page is size: original
page + (zero to fifty)

With a sufficiently high number of observations the attack with the right
secret has a mean value that is lower than the attack with the incorrect
secret.

At least that's what it appears to me. I could be wrong.

------
cschmidt
I'm sure it will come, but I'd appreciate a layman's terms explanation of
this. What is the threat, and how do you go about fixing things in Django?

~~~
IvyMike
Imagine you're going to send a compressed and encrypted message to a friend,
and I (the attacker) can do two things:

1) Append a bit to the message before it is compressed and encrypted. 2) See
the size of the final message.

So I start by appending the string "4179174b19e0cdc91bf4" to your plaintext
message. I see the final encrypted message size is 500 bytes.

Then, I redo the experiment, but this time, I append the string
"cschmidt@example.com" to the message. The final encrypted message size is now
480 bytes. The string I injected was the same size, but the compression worked
better this time, and I can guess it's because the string I picked is
redundant with something in your plaintext.

Mix in a bunch of complicated math and a bit of javascript, and you've got an
exploit.

This threat isn't specific to Django: it's being billed as a TLS attack, but
any encryption system that uses compression the same way is vulnerable.

------
softbuilder
This attack works very much like the game Mastermind.
[http://en.wikipedia.org/wiki/Mastermind_(board_game)](http://en.wikipedia.org/wiki/Mastermind_\(board_game\))

~~~
chopin
That's true for almost all side channel attacks using user controlled input
(eg. padding oracle attack).

------
STRML
Could somebody help me understand how this attack would be viable?

It seems like the attack has the following requirements:

    
    
      1. You want a secret that appears in the response body, like a 
         CSRF token.
    
      2. The web server always responds with the exact same response 
         for a request.
    
      3. The response body contains data that you send to the server, 
         e.g. url params.
    
      4. The attacker has access to an environment where he can send requests 
         under your browser session (otherwise, the user would be
         unauthenticated and there would be no secrets to steal).
    

Given (4.), how is this a real concern? If I, an attacker, am able to make
3000+ requests while logged in under your session and modify the request
character by character pre-encryption, doesn't it logically follow that I have
your cookies anyway?

~~~
Erwin
The #4 is not that difficult without compromising the user's browser -- as
long as the user can visit the site under your command or see some HTML under
your commend you can make the browser do a HTTPS request to anywhere at all.

Maybe you buy some targetted ads served in an iframe. Maybe you send the user
an email where his email server either always shows images, or you trick the
user in clicking 'display images' with promise of kittens.

You won't be able to see the results directly, but if you can observe how long
the encrypted responses will be, you'll know whether your reflected input
could make use of the compression dictionary (meaning your reflected input
matches the secret) or not.

I wonder if there is any way to even do this without the passive network
snooping -- like some kind of internal browser stats API call that tells you #
of HTTPS bytes transferred. It could be innocent enough so it's not protected.

------
e12e
Looking at
[https://github.com/django/django/blob/ffcf24c9ce781a7c194ed8...](https://github.com/django/django/blob/ffcf24c9ce781a7c194ed8722b850e7873922f6b/django/middleware/csrf.py)
I'm a little confused about how the csrf-token is generally used in Django --
but if I understand the code correctly, it looks for a cookie with the
csrf_token, and compares that to a POSTed value (or x-header in case of an
Ajax request).

If the system has a decent random-implementation there is no secret involved,
just a (pseudo)random string -- essentially a csrf cookie is given the client
on one request, and compared on the next request(s).

Is there any reason one couldn't simply use the rotate_token()-function on
every (n) request(s)?

------
sbov
Just to make sure I understand this correctly: is this only a security issue
if you include sensitive information on a page by default?

For instance, if you had a search field, the contents of what users puts in
that search field will not be compromised. However, if you include a csrf
token with the search field form, that can be compromised since it will be
there every time the attacker gets the victim to make a request.

------
lpomfrey
I've knocked up a package that provides CSRF token masking and length
modification that may help mitigate this. If anyone wants to vet it and submit
pull requests, you're more than welcome. [https://github.com/lpomfrey/django-
debreach](https://github.com/lpomfrey/django-debreach)

------
homakov
im a rabbit

~~~
jacobian
To your second point, security@djangoproject.com. It's documented in a bunch
of places; where'd you look for it? I'll add it there, too :)

To the first point, we believe that Django's CSRF protection is as strong as
session-linked CSRF protection, and adds CSRF protection to anonymous users
(users without a session as well). In other words, it's a design decision, one
that we believe doesn't compromise CSRF protection. If you believe otherwise,
please get in touch (see above).

~~~
homakov
1) session and logged in/out user are two different things. Session is the way
you store information about current user, no matter he is anonymouse or logged
in.

2) I checked again for instance
[https://bitbucket.org/](https://bitbucket.org/) \- edit csrftoken cookie to
any, 123123 for example. Reload the page and if site keeps working - Cookie
Forcing with MITM will do the same thing using http: injector Set-Cookie

not only MITM, subdomains can do precisely same thing. Either bitbucked uses
old django or django is vulnerable to it (which is, well, a severe
vulnerability imo)

~~~
jacobian
1) I know the difference between sessions and logging in. I didn't say
anything about logging in; I said that our CSRF protection protects users
_without sessions_. Not all sites use sessions (some for performance reasons,
others for privacy reasons); must those sites be vulnerable to CSRF?

2) First, you should report this to Bitbucket:
[https://www.atlassian.com/security](https://www.atlassian.com/security). And
c'mon, disclosing a possible CSRF vulnerability on a public board is kinda
irresponsible. Is responsible disclosure not something you practice?

SecondI don't know what Bitbucket is running, exactly, and exrapolating from
Bitbucket to Django is pretty lazy. Frameworks != sites. Once again, we've
spent quite of bit of time validating the design and implementation of
Django's CSRF protection, and we believe it works. If you find proof
otherwise, can you _please_ send it to security@djangoproject.com, and not
post it to Hacker News?

~~~
homakov
1) only to make sure we are on the same page. Now I see - we have different
understanding of "session".

>Not all sites use sessions (some for performance reasons, others for privacy
reasons);

what kind of site doesn't use sessions? To track a user you need a cookie
right?

2) Frameworks != sites. As I used to think, only framework is responsible for
CSRF protection, hence I extrapolated. I sent it to security@ as soon as I
found this email. I am trying to not proclaim anything but some websites from
[http://www.djangosites.org/](http://www.djangosites.org/) are vulnerable.

------
dangayle
Disable compression altogether? That's craptastic.

