
New attack plucks secrets from HTTPS-protected pages - jamescun
http://arstechnica.com/security/2013/08/gone-in-30-seconds-new-attack-plucks-secrets-from-https-protected-pages/
======
AnIrishDuck
So if I understand correctly this attacks HTTP compression instead of TLS
compression - which is what makes it different from the previous CRIME attack.

This means there should be several important practical differences:

1) Things like cookies and other authorization headers are safe. They are not
compressed according to the standard and cannot be sniffed in this attack.

2) Components of the URI are also safe.

However, it seems like anyone that uses form-encoded CSRF tokens might be in
for a bad time. In fact, any guessable data in a request body is not safe
(phone numbers, credit card numbers, SSN numbers, email addresses, etc). Also,
I'm not sure what the implications are for SPDY/HTTP 2.0, which IIRC compress
headers by default as well.

~~~
cryptbe
Disclaimer: CRIME co-author.

It's the same attack. CRIME works for TLS compression, SPDY header compression
(yes it broke HTTP/2.0 even before it's called HTTP/2.0), and HTTP gzip
response. BREACH was described in slide 39 in our presentation. We didn't test
it because it's an application-dependent attack, and we couldn't find a nice
target. We're happy that BREACH authors have found a very good target, and
proved that the attack works.

See:
[https://docs.google.com/presentation/d/11eBmGiHbYcHR9gL5nDyZ...](https://docs.google.com/presentation/d/11eBmGiHbYcHR9gL5nDyZChu_-
lCa2GizeuOfaLU2HOU/edit)

~~~
tptacek
Using that comment as a springboard:

CRIME is one of The Great crypto bug classes to be found in the last 10 years
or so; it's a side channel leak based not on timing, error handling, or power
consumption, but on pure traffic analysis. Seriously excellent work.

Importantly, and I think this a point Juliano and Thai undersold, the big
issue with CRIME wasn't simply that it impacted TLS, but that it implicated an
entire cryptographic implementation technique (one that I'd add Applied
Cryptography _recommends_ ). The immediate feeling I got when I learned about
CRIME was "this is going to take out a lot of systems"; it's a serious threat
anywhere you have chosen plaintext and compression.

It's not surprising that there are other scenarios in TLS where CRIME works,
but the big thing to learn from today's talk is that you should go out and
look for other places that compress before encrypting.

~~~
AnIrishDuck
Are you aware of any research into compression schemes that address these kind
of length-leak attacks?

Intuitively, I'm not even sure such a thing is possible.

~~~
Scaevolus
It's possible, but it would require the compressor to be aware of locations of
'secret' data in its input.

More details:

The very common compression algorithm used by TLS and gzip is DEFLATE: an LZ77
transformation combined with Huffman coding. LZ77 turns a sequence of text
into a sequence of literal instructions (output "abcd"), and copy instructions
(output 5 bytes from 10 bytes back). Huffman coding lets you encode an
alphabet with varying token probabilities with different length strings of
bits-- think Morse code, where a common letter like E is one dot, but an
uncommon letter like Z is dash-dash-dot-do.

CRIME detects the different ciphertext sizes that result when encoding a
secret as literal data versus as a copy instruction.

There are a few tweaks that could be effective. Assuming the secret data is
identified, it can be specifically coded so that it's coded as part of a
literal -- not replaced by a copy instruction, and not used as reference data
for any future copy instructions.

This could still leak some data, since a secret with more common letters might
be Huffman coded to a shorter bitstring and detected. Luckily, DEFLATE has a
way to indicate a "raw" block, which is just the literal data: 8 bytes of
secret data would always take 8 + 5 (block header length) bytes to encode, and
wouldn't be referenced by any future copy instructions.

How should a web app indicate to potential upstream compressors that some
portion of its output is secret? A special HTML tag? A new HTTP Secret-Ranges
header?

~~~
AnIrishDuck
Sure, you can manually isolate secret information from attacker information.
That wasn't really what I was looking for.

I was referring to a general compression scheme that could be applied pre-
encryption and not leak info when combined with partial plaintext oracle
attacks. I don't think such a thing is possible, but it would be awesome if
some really smart researchers could prove me wrong.

~~~
Scaevolus
Most compressors are adaptive: they adjust their state depending on previous
inputs. A static compressor wouldn't leak anything-- for HTML, it might have
preset dictionaries for common tags and attributes (instead of <div>, output a
single token), and have a static Huffman encoding tuned to "average" HTML
pages. The compression wouldn't be nearly as good as DEFLATE since it has to
compress each chunk in isolation, but it would still beat plaintext.

------
tomfitz
My understanding of the attack:

Suppose the target web server has an endpoint /foo?probeMe=bar such that the
HTTPS response will include 'bar' in the HTML. (Quite an assumption, sure.)

Suppose the target web server compresses its responses.

Suppose the attacker can make requests to the target web server, on behalf of
the target user (e.g. when the target user is on an attacker-controlled
webpage, and the attacker can make AJAX requests to the target web server).

In the case that the HTTP response already contains 'bar', and doesn't contain
'cbs', then a HTTP response to /foo?probeMe=bar will have a shorter length,
than a HTTP response to /foo?probeMe=cbs , since compression will mean 'bar'
is deduplicated.

Using this, the attacker is able to mount an Oracle attack. That is, if they
know something of the form *@gmail.com , and they want to know the whole email
address, they can make 26 probes, with probeMe set to: a@gmail.com,
b@gmail.com, ..., z@gmail.com

and whichever produces the shortest response is part of the response.

Suppose the shortest is the probe for probeMe=y@gmail.com . They try another
letter: ay@gmail.com, by@gmail.com, ..., zy@gmail.com . Again, one probe will
have a shorter response than the rest.

They continue, until they find larry@gmail.com .

Now they know larry@gmail.com appears in the response. Success!

~~~
tptacek
For what it's worth, it's not "quite an assumption"; it's an almost universal
property of web applications of any significant size.

~~~
tomfitz
You're right. My mistake. Search functionality on sites will often exhibit
this. Maybe even stylised 404 pages would too.

------
pygy_
I don't know if the browsers would tolerate this, but it should be possible to
pad the compressed payload, rather than the source document.

A DEFLATE stream [0] is made of blocks whose first bit determines whether it
is the last one.

One could add a number of random bytes after the last block such that the
payload always has the same length, or is a multiple of a reasonably large
number of bytes.

This assumes that the decoder will ignore it whatever follows the last block.

Another, cleaner option: Use a Trailer HTTP header, using Chunked transfer
encoding[1].

The technique should also work with gzip.

\--

[0]
[http://en.wikipedia.org/wiki/DEFLATE#Stream_format](http://en.wikipedia.org/wiki/DEFLATE#Stream_format),
[http://www.ietf.org/rfc/rfc1951.txt](http://www.ietf.org/rfc/rfc1951.txt)

[1]
[https://en.wikipedia.org/wiki/Chunked_transfer_encoding](https://en.wikipedia.org/wiki/Chunked_transfer_encoding)

~~~
FZeroX
It's not a sure fix though, there will be occasions where you can change the
page length enough to hit the padding boundary.

~~~
pygy_
Indeed. You could do what CRIME does to TLS (I just discovered these attacks).

You could also add a random amount of random padding. It would slow down the
attack linearly if the random amount is taken from a uniform distribution.

I wonder if it would be possible to make it slower by taking another
distribution.

------
peterwwillis
This isn't going to be useful to malware writers. People who can _not_ install
malware on a machine may be able to use this to extract sensitive information.

The hackiest fix for this is to randomize the output of your content, or
employ random padding. If the compression payload size changes every time (
_and_ if you can no longer assume the structure of the payload) you can't
effectively determine if a guess was right or wrong.

~~~
JshWright
Random padding increases the number of requests necessary by a couple orders
of magnitude. It would be easy enough to filter out the noise if you tried
each guess enough times.

It would certainly increase the number of requests necessary to a quantity
that (for the vast majority of cases) would be well above any sane usage, and
couple easily be rate limited.

Credit where it's due, StavrosK and I were discussing this earlier, and
averaging out the noise was his idea.

~~~
phlo
What about deterministic padding? As a (badly thought through) example: Hash
the plaintext, then use the first couple of bits of that result as your
padding size.

This should counteract averaging of requests. On the other hand, an attacker
might work around this by adding a unique token in addition to their guess...

------
StavrosK
Did anyone catch how they can insert things in your plaintext so it passes
through the compression algorithm? It sounds like it can only guess things
that are in the POST request to the server, only if they can write to it but
cannot read it.

~~~
seldo
I was confused about that too. They don't make the request, they force you to
make the request (by putting a bunch of them as pixels in an HTML email, or by
getting you to trigger some JS). I'm still not clear on how they see the
traffic -- I guess the attack relies on having a packet sniffer that can see
the encrypted packets go by?

~~~
StavrosK
No, I know that part, but I mean that this method can only sniff headers,
pretty much, as long as they're being compressed. They own the body, since
they're injecting data into it, the only thing they don't own is the headers,
which is the only thing they can guess. If HTTPS compresses headers
separately, or not at all, it's useless.

~~~
AnIrishDuck
This attack is on HTTP compression, which covers only the body of the request,
not the headers [1].

1\.
[http://en.wikipedia.org/wiki/Http_compression](http://en.wikipedia.org/wiki/Http_compression)

~~~
StavrosK
Yeah, so you'll have to be able to write to, but not read, the body of the
request, observe the ciphertext on the wire and guess things in the body only.
So this is only good for CSRF tokens and the like, and only if you can write
to the plaintext you want to guess but not just outright read it.

------
eli
Cute attack -- takes advantage of the deflate compression shrinking the size
of repetitive strings as a way to test if a given set of bytes is in the
encrypted data.

Has nothing to do with email addresses. That was just an illustration.

~~~
tracker1
I think it is definitely interesting, if you can test for a string for encoded
data, either form encoded `&key={GOLD}&nextkey=` or json encoded
`"key":"{GOLD}","nextkey"` you can get to almost anything.

------
luizg
There's a copy of the PDF for the BREACH attack at
[https://kyhwana.org/US-13-Prado-SSL-Gone-in-30-seconds-A-
BRE...](https://kyhwana.org/US-13-Prado-SSL-Gone-in-30-seconds-A-BREACH-
beyond-CRIME-WP.pdf)

