Hacker News new | past | comments | ask | show | jobs | submit login
Why does Google prepend while(1); to their JSON responses? (2010) (stackoverflow.com)
212 points by chupa-chups 44 days ago | hide | past | web | favorite | 52 comments

> Contrived example: say Google has a URL like mail.google.com/json?action=inbox which returns the first 50 messages of your inbox in JSON format. Evil websites on other domains can't make AJAX requests to get this data due to the same-origin policy, but they can include the URL via a <script> tag. The URL is visited with your cookies, and by overriding the global array constructor or accessor methods they can have a method called whenever an object (array or hash) attribute is set, allowing them to read the JSON content.

What an absolutely ridiculous language and platform we decided to base the whole web on.

This isn't really a language issue. This is strictly at the level of browsers having a very complicated and ad-hoc concept of permissions deciding when a page should be allowed to make requests to, and to which, servers. We could have a richer permission model api exposed to whatever language we were using (so only certain white listed scripts tags would have the ability to make requests for example); but we don't. In any case, this is going to be at the DOM level, which isn't really part of javascript.

Pages could have more control (via a header manifest or something) over tracking and permissioning based on where particular scripts come from; but that's not the model we went with. All scripts get put into a single executor and namespace. Now, this isn't an irrevocable choice, but a browser still has to support existing pages that would still be vulnerable to fun XSS attacks.

It is a language issue, because the security hole is the fact that you can redefine the Array constructor or accessor methods.

On a similar note, it's also crazy that you can't just use `foo instanceof Array` to figure out if something is an Array because it could be an Array from a different namespace, instead you need `Array.isArray(foo)`.

That's not really a security hole except in the weird browser context of having to execute content-controlled code --- and then only due to the same-origin policy. Being able to redefine anything in the system is normally a virtue for a language.

Being able to redefine everything is never a virtue as it introduces non-locality. You can't reason about what a particular piece of code is doing without understanding the full data flow that got you to this point and any re-definitions that might have happened along the way (with one re-definition overwriting the previous one).

For example, look at the following code:

    function test(a) {
          if (a > 1) {
        } else if (a <= 1) {
        } else {  
           console.log("Why is this happening to me?!");
      var a = { x      : 1,
                valueOf: function() { 
                                   this.x -= 1; 
                                   return this.x==0?1:2;

I'm not looking to recapitulate the debate about whether monkey patching is good or not. It's probably bad most of the time. That doesn't make the possibility of doing it at all, ever, bad. If it helps you, just substitute "mainstream feature" for "virtue".

There is a difference between being able to monkey patch

  * objects created in the same scope
  * objects created outside the current scope but global
  * global objects
  * core language features
For anyone who has to read code someone else wrote, as you go down this list life becomes much more painful. Even something as simple as forcing all the patches to be in one place so there is an unambiguous place to look for them would make a huge difference.

Well, you can do those things in Ruby, you can do them in Javascript, you can even do them in Python. Is there a high-level "interpretable" language that takes a hard line on this, or is it just a standard we're retrofitting onto Javascript?

No, you cannot.

In python you can certainly define your own object which might be a list and override the list accessor for your object, but you cannot override the general list accessor for all lists or those lists that you don't have a reference to.

This is a big difference between a language like python and javascript.

In javascript, buried deep in a function scope of some library might be code that overrides Function.prototype.apply, and then everywhere else in your code that a function is invoked, the new behavior will take effect. Nothing like that type of interference is possible in Python. In Python, you can only monkey patch those things that you have a reference for.

Eh, this works fine (on Python 2.7):

    class A(object):
        def foo(self):
            return 'foo'

    a = A()
    a.foo()    # returns 'foo'
    a.__class__.foo = lambda self: 'bar'
    A().foo()  # new instance (!) returns 'bar'
You're correct for list ("TypeError: can't set attributes of built-in/extension type 'list'"), but that's more a implementation limitation than a principled stance.

You just looked up the class of a which is A and then you patched the foo method in the class definition of A. This is no different than you saying A.foo = lambda ...

But what you can't do in Python is monkey patch a method to a class for which you don't have a reference. Try it.

This is not an "implementation limitation", it is because Python is not a prototype language, so you can't monkey patch what it means to "call a function" for every method in every class because there is no "Function" object that is a prototype of all method calls in Python that plays a similar role to Function in Javascript. Similarly you can't patch "Object" de-reference and change the behavior of every single object de-reference. Those prototype chains are not exposed to you.

In javascript you can override methods of classes which you can't reference. That's huge. Imagine one person creates an object with a foo method, and someone else changes what it means for all objects of any class to call any method. Think about what you have to audit to determine what the behavior of foo is -- in Python, you just need to audit anything that has a reference to the class where foo is defined. In javascript, you need to know what all the code is doing. That's a big difference.

Python did have some scheme to do something like this. If you were using gevent, you needed to call gevent.monkey_patch() to override all the network calls that you were making under the hood, so that they would happen on the event loop.

Yes, gevent did have a reference to the classes it was patching, but it still made pretty deep changes to a program.

It was janky and has been replaced several times by other async frameworks. It doesn't go as far as javascript to be sure, but it is sort of what you're suggesting.

Yeah, the issue isn't monkey patching per se, but lack of isolation. Javascript just doesn't support the concept of a "module" that cannot interfere with globals via side effects or is somehow restricted to interacting with the rest of the system via a clearly defined API that can be checked without reading all the code in the module.

That, combined with the fact that people load thousands of modules just to do simple short programs, makes it very hard to reason about javascript code efficiently.

You can try to hide global references from a javascript module, for example by using "with", but you will always have access to things like constructor() which will give you the raw window, so you can't hide a reference to Window even if you shadow Window via "with".

Even ignoring that, you still can overwrite builtins from anywhere in the code, thus effectively changing the shared runtime. There is no notion of scoped builtins -- e.g. overwrite builtins all you want, but just in your own scope.

That lack of isolation is the problem, not monkey patching your own objects.

The easy and obvious solution to info leakage and XSRF is for the web standards committee to ban third party cookies from being sent in cross-origin requests. But that would break Facebook Like buttons from tracking your browsing activity...

It's probably not reasonable to assert that tracking is the reason we have cross-origin requests; that feature was avidly requested by developers across the spectrum for many years, and the fact of JSONP is a pretty good illustration of just how badly people wanted it.

Well, JS was changed (backwards incompatibly!) in order to eliminate the problem. So yeah, it was bad, but it was arguably just a bug that got fixed.

If they are that motivated wouldn't they catch the trick soon enough and remove it from the response?

The idea is that your browser is running their site, and their site is trying to be malicious and steal your data for a remote site. Your data there can only be accessed with your cookies/auth, so they can't just do it by themselves, they need your computer/browser. If they try to request the json via XHR, CORS will deny it. If they try to read it using a <script> tag, they can't access the response before the response runs as a script and hangs at the infinite loop.

The catch is that attackers can redefine accessor methods globaly, because those are part of the object system, and do run modified on load, but they cannot redefine "while", which is a built-in, one of the few reserved language key words. At least I hope they can't. Of course that does depend on implementation details.

Please don't post low effort comments like this on HN.

Just because you don't understand sarcasm and its implications doesn't mean that a comment is "low effort".

There's no need to insult my intelligence.

> What an absolutely ridiculous language and platform we decided to base the whole web on.

JavaScript is used for its virtues, and its flaws can be mitigated.

Other languages might not have those flaws, but they have other downsides.

It's not like English is the best choice for language either!

JavaScript (and the way browsers use JavaScript poorly) is used because Brendan Eich went on a bender one weekend 20 years ago, and then path dependence. The virtues do not outweigh the vices, except for the sunk cost of all the infrastructure already built.

It's not that different from English, in a sense.

A ten day bender!

I created a UTF-7 JSON hijacking method back in 2010 that enabled full hijack of entire JSON data streams in Firefox and Safari: https://www.reddit.com/r/programming/comments/b7ebd/json_sni...

Originally at https://code.eligrey.com/sec/json-hijacking (archive: https://web.archive.org/web/20100304213300/http://code.eligr...)


Mentions how inconsistent usage of the XSSI prevention caused a information leak that could detect which user was logged into Facebook - received a bug bounty of $1000 from Facebook.

Previous discussion: https://news.ycombinator.com/item?id=19306309

Sorry, didn't notice :( This time i relied on HN's mechanism to automatically redirect one to old posts with the same URL.

Usually i check.

No worries. I only mentioned that because there's more history in those other threads that people might want to check. I didn't mean to say you shouldn't have posted, quite the contrary, thanks for posting it :)

After a year or so we no longer count it as a dupe. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.

Well I'm one of today's 10,000[0] so thanks for posting anyway!

[0] https://www.xkcd.com/1053/

I’ve never seen that xkcd before.

Very meta.

There is an XKCD for almost every situation.

It protected against a weird javascript edge case in the distant past and against flash injection in the past.

See also https://stackoverflow.com/questions/15306636/why-do-facebook...

I made a post few months ago regarding this exact vulnerability: https://dev.to/antogarand/why-facebooks-api-starts-with-a-fo...

HN discussion: https://news.ycombinator.com/item?id=18443125


While this vulnerability is getting old, it's very interesting to see its prevention still in effect on major websites. Even if the original vulnerability is patched, we never know when a modern variant might pop out, such as using UTF16BE as charset to extract array data!

Something I just learned about JavaScript: the semicolon is an empty statement, so this is an infinite loop that does nothing.

JavaScript shares this trait with a lot of languages that borrow syntax from C.

A good example of this is a simple strcpy in C:

while (a++ = b++);

I was gonna say you need a dereference operator in there somewhere, but based on that itslicized ‘a’, i think HNs markdown formatting turned them into italics

Correct version (not broken by formatting):

  while (*a++ = *b++);

How did you manage to enter it in HN? Is there an escape sequence for asterisk? Or are you using special Unicode codepoints?

Double space at the beginning of a line makes that line verbatim (and monospaced).


This reminds me of the arguments that the folks at n-gate make over why they won’t use https. Basically, “it’s not me, your client is broken”.

There are too many conflicts on interest to fix the broken client.

The n-gate that is a web log of Hacker News comments they don't like?

Yes, but specifically this article that is unrelated to HN comments: http://n-gate.com/software/2017/07/12/0/

I thought this kind of protection was only necessary when using arrays on top-level json. Can nested arrays still be obtained somehow when not using this (or other) prefix?

Is this still needed with CORS?

It is for older browser which do not support CORS, as well as different variants such as using a script with the UTF16BE charset:


This isn’t needed for browsers newer than 2011: https://stackoverflow.com/questions/16289894/is-json-hijacki...

From a link on that page, various variations that worked in newer browsers: https://portswigger.net/blog/json-hijacking-for-the-modern-w...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact