
Bypassing CSP using polyglot JPEGs - inian
http://blog.portswigger.net/2016/12/bypassing-csp-using-polyglot-jpegs.html
======
ubernostrum
While this is a neat technical trick, it does appear to require that an
attacker also be able to inject a script tag with a specific source of the
attacker's choice. Which feels like cheating, kind of; for a web application
that's pretty much the equivalent of "well, assume you already have root on
the victim's machine, once you have that you can do THIS".

~~~
xnyhps
That's the point, it's about bypassing Content-Security-Policy, so you musy
already have something like xss. It's more like "well, assume you have code
execution inside a sandbox, then you can break out using THIS".

~~~
ubernostrum
It still doesn't work, though. Taking the full scenario into effect, it's:

1\. If the site has a CSP allowing JavaScript from "self", and

2\. If the site has an upload feature hosted on the domain of "self", and

3\. If the site gives someone the ability to inject a script tag pointing to
an arbitrary target,

4\. Then it's exploitable.

But conditions 1-3 effectively _are_ the "you've already been rooted" case.
This is a technically-interesting way to exploit the fact, but in itself is
not the vulnerability; it's simply exploiting a vulnerability that was already
there and wide open.

~~~
inian
CSP is supposed to protect the website when there is a xss in it as a defence
in depth solution. So a CSP bypass is when, it fails to do so. #1 - this is
true for any site using CSP with javascript. #3 - the site is not giving the
ability to inject the script tag. It is just an xss vulnerability which should
have been caught by CSP, but in this case it doesnt.

------
chias
Serious question: do any "real" web applications which allow you to upload
images not re-encode the image before saving it to disk? I thought this was
industry standard.

There's a whole host of issues associated with not doing this, including
potentially unwanted exif data, and e.g. just cat'ing a jpeg with a rar file
and using the image host as an arbitrary file host, etc.

~~~
xnyhps
Re-encoding is not enough to prevent attackers from constructing polyglots,
see [https://www.idontplaydarts.com/2012/06/encoding-web-
shells-i...](https://www.idontplaydarts.com/2012/06/encoding-web-shells-in-
png-idat-chunks/)

> Placing shells in IDAT chunks has some big advantages and should bypass most
> data validation techniques where applications resize or re-encode uploaded
> images. You can even upload the above payloads as GIFs or JPEGs etc. as long
> as the final image is saved as a PNG.

An attacker will likely be able to figure out the exact re-encoding you apply,
so unless you add some form of randomness, the attacker can work backwards to
get the payload they want included.

------
woliveirajr
Well, nowadays, you just can't trust a file by its extension, you shouldn't
trust it magic header too.

~~~
AstralStorm
How do you verify it then? Trust the unfuzzed library to decode it in a VM
that will get compromised whenever someone finds a bug in libjpeg or JS
engine?

~~~
pdkl95
You verify it with a formal recognizer that was generated from the official
grammar. Traditionally this would have been e.g. a yacc/bison parser written
from the specification's BNF. Today, parser combinators such as hammer[1] are
probably easier to use (it has nicer bit-level support).

This puts all of the recognizing/parsing code in the same location. It also
verifies the _entire_ input at the same time, _before_ the results are passed
back to the main program. You get clean valid/invalid check of the entire
input.

For a very good discussion of why formal recognizers are important (and why,
if possible, it's important to design transport formats and protocols that are
deterministic context-free or simpler[2]), see Meredith and Sergey's talk[3]
at 28c3.

[1]
[https://github.com/abiggerhammer/hammer/](https://github.com/abiggerhammer/hammer/)

[2] [http://www.langsec.org/occupy/](http://www.langsec.org/occupy/)

[3] [https://media.ccc.de/v/28c3-4763-en-
the_science_of_insecurit...](https://media.ccc.de/v/28c3-4763-en-
the_science_of_insecurity)

~~~
ben0x539
Wouldn't a proper polyglot pass a formal recognizer?

------
rebelwebmaster
Funny enough, this is fixed in Firefox 51 already (shipping in January).
[https://bugzilla.mozilla.org/show_bug.cgi?id=1288361](https://bugzilla.mozilla.org/show_bug.cgi?id=1288361)

------
Qwertystop
Huh. Why do JPEG files support comments?

(I mean, that's what this comes down to, right? Both formats support comments,
and starting a comment in one is a valid start for the other, so you can
interleave them and do whatever you like.)

~~~
tfm
Inasmuch as there is blame to be apportioned in this case, it's due to
JavaScript / ECMAScript having broad definitions of acceptable variable names,
and (arguably[0]) the fact that browser JS implementations will generally
accept arbitrary 8-bit data within multiline comments, rather than the strict
Unicode code units specified by ECMAScript.

JPEG comments exist for the same reason that EXIF tags exist – it's handy to
store metadata alongside the image data, it gets copied around when the file
gets copied, the tags can be transferred if the image gets re-encoded. There
are enough error recovery mechanisms built into browsers that one could likely
make a polyglot by just abusing the data segment, maybe even while crafting a
legitimate standards-compliant JPEG.

Ultimately, bytes are bytes! Interpreting them with a variety of content types
can give a variety of results, so keep it in mind.

[0] Resynchronisation / recovery from bit errors is one of the explicit
motivations behind the design of Unicode encodings, so the browsers get a pass
from me on this one. It's almost certainly possible to craft a suitable JPEG
using legitimate code points anyway.

