Hacker News new | past | comments | ask | show | jobs | submit login
Break Google (mahdiyusuf.com)
383 points by volksman on Sept 16, 2011 | hide | past | web | favorite | 87 comments

When you search for "${", the page is missing 26 lines of minified JavaScript (lines 9-35 of a non-broken page, at least for me), almost certainly because of a templating bug. These lines, among other things, are responsible for adding the top toolbar to the page. (The missing JS is here: http://pastebin.com/B9cy3T2c)

I think google search uses this templating language:


It makes sense that the ${ could cause problems.

In my experience it's pretty rare for template language bugs to cause errors if user entered content includes one of their special characters. The template would have to be evaluated twice for any problems to occur - once to insert the user's template code in to placeholders within the original template , and then once again to execute the resulting combination.

Sad to say, Mustache.js has exactly this bug last time I checked, but only under some circumstances.

Minimal reproduction:

        Mustache.to_html('{{b}}', {b: '{{c}x}' }) -> '{{c}x}'
        Mustache.to_html('{{#a}}{{b}}{{/a}}', {a: [{b: '{{c}x}' }]}) -> '{{c}x}'
        Mustache.to_html('{{b}}', {b: '{{c}}' }) -> '{{c}}'
        Mustache.to_html('{{#a}}{{b}}{{/a}}', {a: [{b: '{{c}}' }]}) -> '' (wrong)

Hrm, but is $ a metacharacter in that language?

Hrm, did you click the link?

Most likely he did. Did you?

If you did click and skim the page at 1000000000 words/sec, those `$` over there are for USD, and not part of templating system.

I disagree! It does not make sense that submitting any text via a form input should in any way interfere with a templating engine, in the same way we dont expect to be able to affect a database by entering SQL into a form field.

The fact that Google brings back an empty result set to me indicates the problem is a bit deeper...

Can you tell us what tools you used to find all this info and to un-minify their js? thanks!

I did a diff of the source code for a SERP for "${" and a SERP for "$$", ignored any lines that were the same except for s/${/$$/, and then un-minified with http://jsbeautifier.org/

Tip to the poster, and to anyone: Google (and Facebook, and others) have bug bounty programs. You can get paid tens to thousands of dollars if you report vulns to the vendor first.

Pretty low bounty.

The base reward for qualifying bugs is $500. If the rewards panel finds a particular bug to be severe or unusually clever, rewards of up to $3,133.7 may be issued.


For random XSS-style things on web pages, that's a high bounty.

The crazy lucrative bugs you may have heard of tend to be drive-by remote code execution in popular clientsides (like IE or Flash), and the stories about valuations tend to be apocryphal.

Even those bounties are only for security-related bugs, one of which this doesn't appear to be.

So, that gives you your BATNA for negotiations with the dark side...

Either google is very confident that they don't have serious bugs or they are setting themselves up for a problem. Imagine the value of finding a serious bug in adsense or adwords.

For me just typing ${ breaks the layout. I agree it probably has something to do with a template engine. I know Java EL uses the syntax ${variable_name} and so does Velocity Templates.

The bug doesn't exist on https://encrypted.google.com/

and it doesn't exist if you search it from the url bar or the searchbar (with google as search engine) of firefox.

Oh, cute. Yet another injection flaw in Google.

Guys, (and I don't mean Google, I mean all of us), don't fix injection by plugging injection bugs; put together some framework that actually avoids all of these problems (or at least doesn't let you add bugs).

This is actually a hard problem in the general case, and it is an active area of research. One promising approach is static taint analysis, wherein the source code of a web app is analyzed to detect whether "tainted" output is given to a sensitive "sink" without being properly sanitized. See, e.g., Omer Tripp et al., "TAJ: Effective Taint Analysis of Web Applications" (PLDI 2009) (http://www.cs.tau.ac.il/~omertrip/pldi09/paper.pdf).

As an example of a difficult case, consider the following pseudocode snippet:

  if request['raw']:
      print("Content-Type:text/plain; charset=utf-8\r\n\r\n")
      print("Content-Type:text/html; charset=utf-8\r\n\r\n")
In one branch, doc must be HTML-sanitized; in the other branch, it must not.

That's a poor example. I would never send a document as HTML without tags. html_sanitize() should really be generate_html(), which adds structure to the document.

What the GP is saying (and I agree with) is that generate_html() should use a library which understands HTML structure and only allows content to be generated using a strict API (no doc+="<foo>bar</foo>" garbage).

Such a discipline greatly reduces the chance of injections, to the point where you have to actively write code to create injection points. And it's simple to follow: any time you write HTML, use the library.

Taint analysis sounds nice in theory, but you can get the same effect by writing code modularly (i.e. only one small module can actually access the raw output stream) and using libraries to create structured data.

You don't know where "doc" is coming from from a snippet like that. I use logic moderately similar to that for my blog software. If I'm writing the blog post, pretty much pass through what I wrote. If it's a comment (back when I had them), process the heck out of it. The blog post itself is a blob of HTML, basically.

I do not currently write my blog posts in a templating language (any more than anyone else does), though Hamlet [1] has me sort of tempted as it is so close to what I write anyhow.

[1]: http://www.yesodweb.com/book/templates

It's not a poor example, and it's not about sending the document without tags. It's about whether special characters should be escaped, and the answer depends on the Content-Type that the client requested.

A framework could use static types to tag whether it is escaped or not.

Then, the framework could map different kinds of requests (e.g: raw content vs. html content) to different types.

Then, the only way to convert between the types are functions that do proper escaping.

twisted.web.template is a great example: http://twistedmatrix.com/documents/current/web/howto/twisted...

You're making the same mistake made by people who mistake good static type systems and type inference. Yes, it would be nice to infer the correct escaping function; but forcing people to escape somehow would be sufficient.

Django does this.

No. Django solves the 90% problem, which is usually a fine approach but will llikely lead to security vulnerabilities down the line.

I'll refer to something I wrote last time I had this argument: http://pavpanchekha.com/programming/injection.html.

Nice article. In response to the last part I can think of a way to achieve the sort of smart escaping via template you talk about using Haskell and Hamlet (among other templating systems used by yesod). I believe, although I can't absolutely confirm that Hamlet already performs context appropriate escaping, based mostly on the type signatures and the names of a few of the functions.

Yes, Hamlet does context specific escaping. It will handle all the examples given, except you can't mix your javascript in with your html (which is generally good advice anyways).

I disagree with the articles premise that injection is always a display issue. In the [Yesod web framework](http://www.yesodweb.com) which uses Hamlet, we sanitize, not strip html by default before it is ever put in the database. The more you can make injection not a display issue, the better- you just have to know your options.

You should replace your '<' with '&lt' if you're going to claim your page is xhtml.

Thanks. Pages are compiled with org-mode, I'll report a bug.

I love org-mode.

I don't know what you mean by "90% problem", but unlike what your blog article suggests, Django's template engine escapes everything by default. You have to explicitly pass content through a filter to request that it not be escaped.

Based on the fact that the suggestions in your blog article could easily support someone forgetting the "|escape" on a variable, I would accuse your methodology of only solving the "90% problem".

How exactly?

Due to django's "we want the templating system be general, to be usable for stuff other than html", it can't provide support for such 'guarantee that the output is well formed / valid / has no injection attack entry points' features.

> How exactly?

Everything is escaped by default, and you have to explicitly request for your content to be unescaped.

OK, you escape stuff. Great.

How do you handle user-submitted image tags? Do you allow rich formatting, and if so, how do you sanitize it? (see http://www.codinghorror.com/blog/2008/08/protecting-your-coo...)

I think that Django uses sha for password hashes. They should use bcrypt, right? Did you turn on XSRF protection (which I think is off by default)? Are cookies secure?

Web security is not as simple as 's.replace("<","&lt;")' (escaping by default).

How do you generate slugs? Could someone put something nasty in a pathname or URL?

Django is not secure. You can secure it, with minimal effort, if you keep things radically simple. But you do need to know what can go wrong, so you don't introduce any "features" that are actually "gaping security holes".

Even if you do everything right, it doesn't mean that Django is magically secure no matter what people use if for (obvious, yes, but this is HN and sometimes failing to point out the obvious can get you downvoted). That's why people are objecting.

Handling user-submitted image tags is (in my opinion) way outside the scope of the framework. Which tags and attributes to whitelist, or whether to use html markup at all compared to a different language like markdown, is very project dependent. If you have to, just install BeautifulSoup or any of the other great libraries that have cropped up in the last year or so to handle the sanitizing.

Django uses sha for password hashes because until recently there hasn't been a better library to ship with natively across all the platforms that Django supports. If you know you'll only be working on *nix, django-bcrypt can enhance the default password hashing behavior. As other commenters have noted, they're moving to PBKDF2 in the near future as a better included hashing library.

CSRF is on by default. If you need secure cookies and HSTS headers, there's a package that provides them called django-secure, which last I heard is being rolled into Django proper in the near future.

Django prevents path traversal and anything else you can imagine that might be nasty in a URL. The auto slug generation included.

So how exactly is Django not scure again? Where are the "gaping security holes"? Or do you have no idea what you're talking about.

CSRF in on by default. Cookies could be more secure, and it's being worked on. Django is moving to PBKDF2 (there's no pure python bcrypt lib). There's not really opportunity to do anything interesting with slugs.

Like any framework, there will always be room to improve security, but it does do very well out of the box. At least it makes you work to expose anything obvious.

Seems likely that it's due to a lack of escaping in a custom templating layer. I wonder if it could be used to perform a XSS attack?

Unlikely. You'd have to get someone else to run the same query.

...which you can do simply by posting a link anywhere.

Edit: I guess it would be more helpful to explain why for those not familiar with XSS. If all it takes it a specially crafted URL to your site to exploit it, your site is toast. The security model of the web assumes that people can open even the shadiest of links without negative consequences. I could have obscured the URL with a shortener and named the link "Cutest cat pic ever!" I could have hosted a page on a totally separate domain and put the crafted URL in a hidden iframe. All I have to do is send document.cookie over to my server and now I control your account.

My mistake. I thought it didn't work if you linked to it directly. It turns out the bug just manifests itself differently if you do that.

Or iframe it.

If this is a templating engine type thing, you should be able to do something like


If you can figure out what "KEYWORD" is for a given template tag as well. I tried links and a few others, but none that I can identify: it does still reproduce the bug though.

And i thought that this was one of the most sanitized input field in the internet. Let's seen how long it takes to google deploy a fix on their search.

Also works if you use the html code: "&#36;&#123;" It returns the symbol instead of the search query.

Interestingly enough, the https version of Google doesn't have this bug.


looks fine, but



The first one is broken for me too (in a different way, though).

They fixed it .. http://google.com/#q=${

funny tho that at this point such a query would not return the discussion about this flaw :)

Anyone have an explanation for this? Looks like its messing up the CSS for the top link nav.

I'm not seeing it. Can you post a screenshot?

Looking at those screenshots, I see that you all tried:


However, I tried the literal


Which also breaks it. In fact, it looks like anything that has ${ in it will break it, anywhere at all in the search string.

Also, if you close the parentheses, e.g. ${}, it fixes it. This works with any number of leading ${, e.g. ${${${${${}

you are leaking your gmail address with that screenshot.

not sure if that was intended or not.

Hmm, anybody have an idea as to why mine's different?


You loaded the page directly. It only seems to happen if autosearch was involved.

Loading http://www.google.com/search?q=${ does what you see, but entering ${ in to the search box and pressing enter does what everyone else sees.

edit: compare your clean human generated address bar to the other screenshot's messy software generated one.

It's still broken, even if it looks different

Ahh, makes sense. Thanks for the explanation.

It looks that way when you search using Chromes omnibox. Go to google.com then try.

Maybe it's a specific theme? The old default theme?

Note: this also works: http://www.google.com/#q=$%7B

The Google Search Appliance, with roots in Google proper, generates XML results, which are then (normally) transformed via XSLT. (I'm looking at you, XPath.)


I s'pose attributes don't enter into it, but still, I wonder if the XSLT pass (if any) has anything to do with this?

I bet Google knows about this and its when they do bug triage its so low priority that it just hasn't been fixed yet.

This looks like a javascript issue but I have seen a server error from google - http://www.rajeeshcv.com/2010/07/have-you-seen-google-search...

On a related note, searches for many symbols do not product results. Searching for '&' will bring up results for ampersand, but most others that map well to words do not, e.g. $ => dollar, % => percent, etc.

Looks like the syntax of Google's CTemplate someone posted today: http://code.google.com/p/google-ctemplate/

> Shortest way to produce issue is here — http://google.com/#q=${

Making it an actual hyperlink would've been a bit shorter.

I would have been suspicious of a hyperlink in this context...since this is potentially discussing a XSS vulnerability on Google (not necessarily, but maybe).

I tried it in my Search box at WordPress.com blog and it has no effect.

try to search this and see real break 9999999..99999999999999999999999

No, that's designed to break - within that range is a large amount of credit cards numbers, and Google blocks that. The OP however is a genuine bug.

thanks for info

That's 'cause the money ($) is the key (}). Oh, you're welcome.

the bug exists only by searching on the google homepage, searching it on the searchbar of the browser doesn't happen.

Breaks for me in either case, just in different ways (an auto-search link breaks the toolbar, while the other one doesn't even display the toolbar).

it doesn't break on google.co.ma

Everyone seems to avoid to mention the obvious. This breaks the page layout only, not really a critical issue concerning google's integrity/security.

Still, this obviously doesn't look good. Above anything else google has excelled on being simple and reliable. All this javascript goodness added recently might be a step in the wrong direction. If stuff like this starts to happen every now and then, google's reputation might be at stake.

If your templating language is going to use a magic character it would seem useful to pick something less common than $. There are several odd characters on my keyboard (§`~±|¤) and if you are willing to use the ALT key there are really obscure characters that you can safely filter from the input instead of going through the trouble of escaping them. Filtering is so much more efficient/easier/safer than escaping.

Imagine how much easier life would be if in HTML we only had to filter for § instead of escape every <, > and ".

Why the down vote?

~ is not unusual on the internet, it is used for home directory webspace

http://www.proweb.co.uk/~matt for instance

That's my homepage. It's been a long time since I read it.

Reading it now as a treatise from my younger self, though I didn't write it, I realise that spirit is lost. For a while it was "our" place but now we have to return to the underground.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact