Hacker News new | comments | show | ask | jobs | submit login
Microsoft's Skydrive sends two million NULL characters (opera.com)
201 points by franze 1594 days ago | hide | past | web | 115 comments | favorite

while it's certainly not a very good example of good network citizenship to send 2 megs of NULL bytes, I think Opera shouldn't be crashing here. The fact that it does points to a bug in the code that might potentially be exploitable beyond the simple denial of service which it already is.

AFAIK Opera still runs all tabs in one shared process, so spamming one tab with 2MB of NULL will cause the whole browser (and with it all other tabs) to crash.

They said so in the post: “We should fix that”.

There are other browsers out there that have the same problem regarding tabs, namely Firefox. A misbehaving tab causes the whole browser to stall and you can't even kill that single tab. So Opera isn't alone in that regard.

I thought Firefox had already isolated tab processes a couple releases ago?

They isolated plugins (like Flash), but the project for tab process - called Electrolysis - was put on hold last year.


Nope. The major fix that they did was to eliminate memory leaks in extensions when the user closes a tab.

no, there is still no meaningful isolation in that regard.

Right now I have 101 tabs running, and only 6 cores in my CPU. Do you know how much context switching costs in terms of performance? Do you know what the overhead would be if I were using Chrome?

So spare me, please. When general purpose computing moves to GPUs then we'll revisit the idea of "one tab per process". Until then, out of "Chrome, many tabs, performance", pick any two, drop the other.

The context switch itself is almost trivial at that scale -- a couple hundred ns per switch, with the occurrence interval in the 10s-100s of milliseconds, depending on the CPU scheduler. The part that hurts is all the cache invalidation. Unless you're actually swapping. Then you're really hosed.

Well, that and any particularly nasty plugins (cough flash cough) that don't like many instances at once.

Which OS are you running? I have yet to see chrome on linux have any hiccups on as many tabs as I throw at it. The only time I've seen hiccups are when the X server's busy on something else or I'm short on memory.

If you're on another OS, perhaps chrome's suffering b/c it can't share GDI resources (or mac's equiv) across tabs? I'm assuming it can't, as each tab's got a different process ID.

Context switching isn't that big a deal, it's actually pretty good for responsiveness since everything isn't being serialised through a single thread. Compare/contrast animgifs on a busy Opera vs a busy Chrome - Opera will regularly freeze momentarily, presumably because of stop-the-world GC.

The bigger problem is having that many tabs open without burning through enormous amounts of memory. Chrome's at least 10x less memory-efficient than Opera in this regard, which isn't great when my Opera sessions regularly run to well over 4GB :)

You are right, your scenario is much more visible nowadays. It's just that I have in mind my old single core CPU and how Chrome managed to grind my system to a halt with only a dozen tabs.

Chrome could do that with one process as well as with 100.

> Do you know how much context switching costs in terms of performance?

How much do you think? Most of your 101 tabs should be idle, not active, so my guess would be "not much".

> Most of your 101 tabs should be idle, not active, so my guess would be "not much".

Keep open in the background a few js-intensive websites (e.g. gmail, twitter) and your tabs will be flickering like the proverbial Christmas tree. Depending on your hardware, the paradigm difference can be very noticeable (also, have fun on a laptop's battery).

But Chrome - and any sensible browser, really - will scale down JS execution on pages not in the foreground; you can see this in for example Piecon (http://lipka.github.com/piecon/), which will only update the favicon in your background tab bar at half the speed (if that).

I believe that Chrome caps the number of processes that it uses, so you wouldn't actually be using 101 processes. I think it caps at something like 8, or did when Chrome first came out.

"pgrep Chrome|wc -l" gives me 54 right now with 2 windows open (29 tabs + 2 tabs)

yep, one master, one GPU process.

plus, one per: tab, extension, and plugin.

>>>>Right now I have 101 tabs running

Dude, I get claustrophobic if I have more than 4 or 5 tabs open. Dare I ask why you have 100 tabs open?

Answered the "why" somewhere below, I think you're more concerned about the "how".

Place your tab bar on the left (vertically) and make it wrap to multiple lines (or columns, in this case). I have two columns at the moment, the second one with room for more tabs. Combine this with tab stacking and you have a winner.

You can have your tab bar wrapped to multiple horizontal lines but it's not as efficient given today's widescreen monitors. On lower resolutions (e.g. laptops) it's harder to put as many tabs, but the system is the same and you can still get a decent amount open.

Give it a try, 4-5 tabs is nothing.

That sounds like a personal problem. Your brain can't possibly handle anywhere near the amount of information 101 tabs provides at once, there's no reason to do that.

I use tabs like bookmarks - sites that I might need now or a bit later. I GC them from time to time. Actual bookmarks are used for things like sending a link to myself at home using firefox sync or bookmarking sites I need to visit again and again, like docs sites, the internal bugtracker, etc. I don't open those from the bookmarks menu though, I just type in the awesome bar and firefox always shows bookmarks first, i.e. I use bookmarks as a manual way to bubble up search results in the awesome bar.

I use tabs like this too and I'm not really happy about that, I just haven't found a better alternative yet.

What usually happens is I'll be browsing and find an article/code library/inspiring web design/useful tool that I know I can make use of later in some way, but if I just bookmark it then I'll probably forget it's in there.

I have very organised bookmarks but these only work for sites I visit often and I do use tools like Evernote and Gimmebar but it still doesn't feel right.

Perhaps Mozilla have the answer: https://blog.mozilla.org/ux/2012/10/save-for-later/

I too browse this way, and it's not about processing all the content at once. It's about utilizing the browser's ability to remember things you can't, and establishing a workflow as you move through the tabs.

It's a bad habit (I can't think of a situation when 100 tabs would be necessary) but it's still pain if browser can't handle that.

Often it's just mis-used as a “read later” list.

I have forum threads, stackoverflow tags, manuals, ... opened. Not exactly a read later list, although I do use tabs for that as well. I use bookmarks for things I want to save for later. Usually something I've already read (at least partially) to know it's worth saving. (I know a bookmark folder/tag would be enough)

I always hold at least two digits' worth of tabs open, usually grouped into stacks. For instance, I have a stack of music/radio related tabs (e.g. youtube), several stacks with documentation that I infrequently need, stacks with tabs that I will read sometime later in the week/month/year, stacks related to various issues in the code I am working on, and so on.

I can't be bothered to use bookmarks for these things, having tabs in stacks is much more convenient. If the browser lets me keep a lot of tabs open then I'll abuse this posibility for my personal comfort. If we were to use your philosophy then Chrome should be limited to a certain amount of tabs by design. How about 6, the number of cores in my CPU?

But thank you for the diagnosis dr. Jobs, surely I'm holding the phone the wrong way too. Keep blaming the user.

It's not blaming the user. The user in this case is using the browser in an unintended way. It it therefore the users' own fault if the browser doesn't hold up.

If I want to ride my bike facing the wrong direction, it's not the fault of the designer that I can no longer steer. I could potentially make it work, but would it ever be reasonable for the manufacturer to add another set of handlebars on the back?

As for your comment about Apple claiming the user was holding their phone incorrectly, they did ultimately provide cases right and therefore admitting that enough users were having trouble to merit a change / fix ?

All said, you are using your browser wrong. Sorry to be the one who has to tell you.

That's a poor analogy. The situation of using so many tabs is about making the browser work harder, not in an incorrect manner. So if I was to compare that to riding a bike it would be like pedaling as fast as I could. If the bike couldn't handle it then yes, I would say it was the manufacturers fault.

I was thinking of a fixed gear bike. It can operate in both directions. One direction has handlebars.

Designers shouldn't be expected to support efficiency on the side they didn't put the bars.

Hold on to this mentality, and you will never develop great software.

The fact is the Chrome efficiently runs many tabs. This guy wants an extreme use case to be primarily supported or the browser must suck.

Supporting it would be a waste of developer time. If they run out of ways to make Chrome faster and have nothing better to do then sure, let's deal with this niche problem.

I do something similar. If I am to open too many tabs and don't really need the dev tools, i.e. doing some casual browsing, I go for Opera, or Firefox instead of Chrome. I also go for them when travelling and the battery is scarce.

Context switching costs far less than the time it takes you to press the key to switch between tabs, and is not appreciably more expensive for a process than for a thread.

Cache validation, on the other hand, can be quite expensive. The context switch itself of course less so.

Your point is actually acknowledged by the author of the post.

aside from Chrome, do ANY other browsers handle tabs separately?

Even Chrome reuses processes for child tabs. As far as I remember it is done to allow JavaScript to access child window from parent.

Which, to me, is one of its biggest annoyances. Opening a slew of tabs from a Google Reader page results in very clear resource contention (one of 8 CPUs pegged at 100% and nothing moving, etc).

Internet Explorer has some degree of tab separation. I'm not sure if it's processes or threads, but it can recover a crashed tab.

Internet Explorer uses processes but may opt to pack more than one tab into a single process to reduce system load by too many processes.

Funnily enough it publicly supported isolated tabs before Chrome did (in a beta release of IE8) :-)

I think Chrome does a similar thing, it groups together tabs from the same origin.

Chrome also has a cap on the number of processes. The process grouping per origin is not really intended as an optimization; it's required by JavaScript semantics.

> Internet Explorer uses processes but may opt to pack more than one tab into a single process to reduce system load by too many processes.

What is it about having too many processes that causes extra system load?

Every process consumes additional kernel resources for things like memory mappings, data structures for parent/child relationships, etc. There is also likely to be per-process overhead for libraries that allocate and manage global data structures. For example, the HTML parser likely has some global data structures that can be reused between threads but don't get shared between processes.

Every process consumes additional kernel resources for things like memory mappings, data structures for parent/child relationships, etc.

True, but we should be able to assume that that is negligible.

There is also likely to be per-process overhead for libraries that allocate and manage global data structures. For example, the HTML parser likely has some global data structures that can be reused between threads but don't get shared between processes.

That's what somebody else said, too. I'm surprised that's enough to matter, but I guess it must be.

Windows internals are very different than Linux and 32bit DLL files are not SO files. 32bit DLLs are typically relocatable (as opposed to .so files being position independent), which means they are compiled with a with a preferred load address they must be loaded at. If that address is not available at link time, the code must be moved to a different address and jumps must be fixed up to compensate. Because of that, in practice, loaded libraries are often not able to be shared between processes in memory. The reason for this decision is that pic code requires an indirect jump through an offset table which adds extra processing overhead. 64bit Windows is implemented closer to Linux style .so files due to the addition of new pointer relative addressing.

Thanks for the info! Today I learned.

I don't see why we should be able to assume that the kernel's per-process data structures are negligible. Maybe they are, but you'd have to measure it, you can't just assume it.

I've done enough kernel work to feel that it's a safe assumption in general, but that there may be some exceptions.

For example, I wouldn't be completely shocked if somebody said, "We really need to support a particular version of an old OS that had unusually high per-process overhead in some particular corner case."

If anybody knows how much kernel memory a basic process needs in, say, modern-day Linux, please chime in. I tried looking it up, but didn't find it. Probably it's just sizeof(task_struct), which I can't be bothered to check right now, plus a few KB for stack stuff.

If you try to assume something like that is trivial when I accidentaly spawn over 400,000 or more python processes by accidentaly in a spawning loop (guess why I couldn't count them?), you are gonna get a big shock.

It's a rendering engine and JS engine per process, I guess. Those tend to consume memory (as overhead apart from the actual stuff needed for the page), for example.

I'm surprised that's enough to make a meaningful difference.

Keep in mind that IE8 goes back to the Vista days. That was when there were still computers with just 512 MiB of memory (which might be fine with Windows 8 but Vista would chomp on it hungrily). At least this could be a reason.

Internet Explorer, since version 8, has one process per tab.

AFAIK, IE8 still had one process per Window, not per tab. It changed in IE9.

IE has tab processes and a UI process – the tab processes can handle more than one tab and do so on slower systems.

It's not very hard to crash a browser or browser tab. There's a certain point where all situations can't be reasonably handled by the browser, and it makes more sense to crash when something unexpected happens.

Here's an example I posted a while back of how to crash most major browsers with a link: http://blog.doteight.com/blog/2011/08/24/this-link-will-cras...

* Just checked, it only seems to work with Chrome these days. I guess the other browsers have figured that one out.

If I were opera, I would quietly suck it up. This is more of their own fault. Even if someone sends you 20M of NUL, you should not crash.

is it actually sending 2M characters over the wire (ie uncompressed)? you could imagine seeing something like this via a glitch in compression (2 million of anything, run length encoded, doesn't take much space).

(i guess it's also possible that this really is 2M nulls at a lower level and compression just happens to save you from an embarrassing waste of bandwidth).

it's not clear to me what the URL is to check myself.

I was thinking the same thing. Is the content gzip'd on the server? This could be a nice (i.e. evil) way to send a very small response to the browser that would pretty much kill it.

a slightly more evil version would be to perhaps do:

  var x="...2 billion of the same character...";
This should compress very well to gzip, but will likely to exhaust the browser's memory (??)

You are describing a zip/decompression bomb. Most zip and compression libs protect against them.


Interesting, but the wikipedia article mentions that antivirus libraries protect against those. Couldn't find much on wikipedia or otherwise talking about browsers or zip libraries that have built-in protection for this...

This is very similar to the ping of death that requests a undeliverably big ping response.

Ping of death was +++ath0 which causes a dialup modem to hang up.

I have no idea why most modems did this. But by god it was a lot of fun on IRC when I was 14 or so. I still remember the hex string for +++ath0 you'd need to pass into 'ping -p', haha..

The "proper" Hayes AT drop-back-to-command-mode sequence was +++ followed by a long pause (1 second or so). Any characters sent during that pause time would invalidate the attempt to switch to command mode. So theoretically this shouldn't happen. However, lots of cheaper modems seemed to not check for the pause, so sending +++ATH0 would make them switch to command mode and immediately process a Hangup command ... not pretty :-).

Specfically, that pause was the part of the Hayes AT command-set that was patented. Therefore lots of manufacturers who didn't want to licence the patent but wanted to have 'Hayes AT compatible' on the box just didn't bother implementing the timeout-cancellation bit.

Wow, I didn't know that, it explains a lot.

Indeed, there was a 'crash your browser' webpage a few months ago which used this technique I think. Unfortunatly I can't find it currently.

Oh, I do remember seeing that. Ahh, just fou

The easier version would be to just run a crashing script

    var i = 1, a = []; while(a[i++] = "a" + a);

The author's choice of title somewhat implies malfeasance on Microsoft's part. In general, the article seems intended to spread FUD regarding Skydrive.

I would point out that if Opera is just now debugging their interaction with Skydrive, it is indicative of how few people use it. While I appreciate the motivations behind it and its rich feature set, I gave up on Opera for Firefox after a year. In part this was the fatigue of experiencing broken websites, in part because of low quality plugins comparable to noscript.

Opera would be great if it handled nonstandard html gracefully for me instead of telling me about its high horse and the shoddiness of the websites I use.

> Opera would be great if it handled nonstandard html gracefully for me

We put a huge amount of effort into handling "nonstandard" html, including fixing the standards so that they represent the reality of what browsers have to handle.

In this case our failure is clearly a bug that we should fix. I don't think anyone disputes that.

In general, site-compatibility is a hard problem. People often depend – intentionally or not – on the specific behaviour of particular browsers. When this causes sites to break in Opera we try to analyse the breakage and adopt some strategy to fix it. Hallvord is one of the people responsible for this and, if you read his blog you'll see that it's not a trivial undertaking; pinpointing an error in tens of thousands of lines of closure-compiled, minified JS makes for an interesting challenge, for example.

If we find the problem's a bug in Opera we obviously try to fix that bug. If it is the site doing something broken we typically try to contact the site owner to get it fixed. In either case we may also try using our browser.js system to make a short-term "sitepatch"; a hack using a combination of UA string manipulation, javascript and CSS that changes the behaviour for a specific site and which can be rapidly deployed. In this instance sitepatching likely isn't reasonable, so we will have to work to find the solution that will unbreak the site as quickly as practical.

I gave up on Opera because it was unusable on a small site that I use regularly. The site's owner licenses vBulletin and hires some local web guy to customize it (local being rural Pennsylvania).

I was the only person using Opera on the site. It wouldn't be a surprise if I was the only Opera user the web guy encountered all year.

IE, Firefox, Chrome all worked properly. Using Opera was turning me into a whiner as far as the site's owner was concerned. And he was justified, using Opera was just a luxury that was creating a PITA for him...on a site he was providing gratis.

I suspect his site was broken in terms of standards. But from my standpoint, it was Opera that was broken.

Don't misunderstand me, I chose Opera originally because I admired the project's goals. I still admire the feature set Opera provides. For me, it's tradeoffs became increasingly impractical. I want my browser to deal with the problem of broken sites in real time, not contact the site's owner for a long term solution.

To be clear, we aren't running some idealistic enterprise where we value standards compliance over real world usefulness. Indeed Opera has historically been more willing to implement IE quirks than other browsers, a policy that has recently come to cause problems because some sites punt us down an legacy-IE-only codepath that we don't work with, rather than a standards codepath that would work fine.

Our sitepatching is specifically designed to solve problems in real time rather than needing the site owner to make any changes, or for the user to upgrade their Opera.

Did you report a bug on Opera for the site that you were having difficulty with? Often half the battle is knowing that there is a problem, and having good steps to reproduce. With those things we likely could investigate whether we have a bug that needs to be fixed or whether the site is treating us differently to Gecko/WebKit for some reason, and come up with a solution.

Where's the FUD? The tone of the article is fairly strictly factual. The only deviation from facts is where he says that it's bizarre to encounter this, which is an opinion, but still completely reasonable. IMO it is bizarre.

The meaning of "FUD" slowly morphed to "saying bad things about software I like" a few years back.

Skydrive's main page worked fine in Opera last week, I assume an update caused this. I've made many stupid mistakes myself (sometimes very public ones - http://my.opera.com/hallvors/blog/2012/08/04/bumpy-road-to-o... ), so I have no intention of implying malfeasance or spreading FUD about Skydrive. It is however a surprising and amusing error, hopefully soon fixed.

>Opera would be great if it handled nonstandard html gracefully for me instead of telling me about its high horse and the shoddiness of the websites I use. //

I find it hard to see how you think this is Opera's fault. MS send their browser different, non-standards compliant, content that includes 2M of null characters. Yes Opera can modify to correct their handling of MS's disastrous abuse of the web but you can't exactly blame Opera.

>if Opera is just now debugging their interaction with Skydrive //

I'm sure MS won't amend it in any non-standards compliant way without notifying the other major browser makers and giving them a suitable implementation period; they'll surely keep all interactions well documented too. /s

> I gave up on Opera for Firefox after a year

And went to use what? IE? Or did you try Opera/FF just recently when Chrome became usable?

Chrome's been usable for something like 4 years now.

Not really. It wasn't really usable after the release (which was 4 years ago). Constantly crashing, featureless (which it still is, but now you have extensions). I tried it several times in the course of the first year after release and it was really (really) bad. But even if it was 3 years, it's still a short time for a browser to be around.

But as leddt pointed out, I misunderstood the comment I replied to.

> featureless (which it still is)

It can render web pages.

Obviously. And "feature" might not be the best term as I meant something like "can't configure it or do anything else than render a web page really".

So can wkhtmltopdf, but I wouldn't use it as a browser.

That's not an exhaustive list of features. It can also execute javascript, which seems to be lacking in wkhtmltopdf. That's a must-have for me.

I think he meant he gave up on Opera and went to use Firefox.

Right, I somehow misread it and understood he gave up on both. Thanks for pointing that out.

Exactly what I was thinking. My best guess is that their data is triggering a latent bug in their gzip compressor which causes it to output something that decompresses to 2MB of zeroes.

Just to save anyone else bothering to check, 2M of nulls gzips to almost exactly 2k. Easy to imagine with compression how you could miss that, although it is still a bit embarrassing.

I think most HTTP these days uses transparent compression, but I can't find a source for that just now.

A friend of mine who uses SharePoint at work has mentioned its peculiar habit of occasionally inserting “millions of whitespace characters” in source files. Perhaps this is related.

A bit of noodling shows people having this (or something similar) in 2009.


195,000 blank lines? Oh you kidder, Microsoft! (https://padavis.wordpress.com/2009/05/07/sharepoint-designer...)

Maybe they are just trying to locate a faulty router that always sets certain bits to 1 above a certain MTU size.

Well played....

(or for those who missed and like epic networking related fun http://news.ycombinator.com/item?id=4709438 )

A post-mortem after this bug is fixed would be interesting, I don't see how 2MB of NULL characters should be different from a random <img>.

A random embedded image wouldn't be passed through the HTML parser (at least not if the web server has it's MIMEs set up correctly - and if they don't, then the site would be considerably less usable).

It will be interesting to see the results though - to see if the null characters even reach that far or if this is effectively hitting Opera with a zip-bomb

this reminded me of `A file that's both an acceptable HTML page and a JPEG (view source on it)` : http://news.ycombinator.com/item?id=4209052

data: URLs?

Well yeah, I nearly added that disclaimer myself but then thought it was pretty obvious the previous poster was referring to referenced URIs rather than data URIs. Though I may have misinterpreted his post.

Does the mobile version of Skydrive do this too?

"I don't understand how I could have hit my wireless cap again?!?"

Somebody fall asleep on the keyboard right after the if(userAgent == Opera) :)

It shows up in Firefox too.

The most interesting part of the article is that the author uses Comic Sans as the font in his text editor.

Comic relief :)

2MB of null could be a bug somewhere with compression or decompression

just to test this theory I used tamper data with firefox to remove the Accept-Encoding header and I received ~ 100kb page back and with the Accept-Encoding header present I received ~ 30 kb page but the 30 kb page also had the 2 million null characters. I assume microsoft has a bug in their gzip library or possibly firefox/opera have a bug in theirs.

are you the original author? what url are you using taht shows the null characters?

just go here: https://skydrive.live.com/ and login. it's the first page after you login.

ok, thanks. so i had tried that. at least on chrome/linux, for me, it's not showing the nulls. at least, i can't see them in view source and saving to a file gives:

    > wc /tmp/skydrive.html 
    310   4068 128404 /tmp/skydrive.html
similarly with firefox.

Looks like it is fixed now for me too.

People still use Opera?

Is it the Skydrive website sending 2Meg 0's? Or the Skydrive software itself? That's a big difference. It sounds like the website's sending 2Meg of 0's in a webpage. Webpages are compressed usually when sent across wire so the 2Meg of 0's is really nothing.

2M null characters is just a few bytes when zipped... this smells like a screwed up server config...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact