Hacker News new | past | comments | ask | show | jobs | submit login
Remotely send Chrome and Node.js into infinite loops via OS X kernel bug (sandstorm.io)
334 points by kentonv on Apr 9, 2015 | hide | past | web | favorite | 74 comments

Anybody remember this gem? http://en.wikipedia.org/wiki/WinNuke

I remember patching it and then scripting mirc to seek out the handle of the person who sent it via the source IP.

I scripted mIRC to WinNuke everyone on join in all channels. Something like this:

    ON *:join:#: {
        /run c:\winnuke.exe $gettok($address,2,64)

I think my favorite is the bug where you could delete any folder from a user's computer, just by linking them to a site with a few lines of code inserted in the source. I think it was on XP, and it would trigger Windows help to open at the same time.

Yeah, you'd trigger a very specific help:// URL that happened to accept as a query parameter the path to a folder. That folder was created by the help page, which would put some temporary stuff in there and, when the user closed the help window, would remove the folder and its contents (because you have to clean after yourself!). The obvious thought is... "what if I point it to a folder that already exists?"


It was definitely there before XP, at least since 2002.

XP was RTMed on August 24, 2001 and GA was October 25, 2001.

But I can understand why you wrote what you wrote. Early XP experience for many remains repressed memory, because it was quite buggy (be it OS itself or drivers delivered with it, BSODs were a norm), far from stable 2000 SP4, which I kept using for a long time. XP around SP2 (August 25, 2004) got usable.

AFAIK it was fixed in XP SP1 and after user backlash they also released a patch for RTM.

In high school, we used it (and variations) on each other all the time during CS lessons, so teachers ended up moving us to DOS.

The first thing I did when I saw the mention of OOBD was Ctrl+F for "nuke" and was disappointed to not see it :-(

The last statement reminds me of Linus saying

we do not break user space

If an application is utilizing a bug, it is not a bug but a feature

Apple's general approach is "if it isn't documented, we can break it", a bit different to Linux.

I think, Apple's general approach is:"We can break it." Documented or not.

Seems like they change stuff all the time. Especially on iOS.

Can you provide examples of non deprecated API that Apple broke?

The recent Swift 1.2 release brought a whole load of code breaking changes. None of Swift 1.0 was explicitly deprecated.

That's a bit different, they explicitly said Swift is a work in progress and code compatibility is not guaranteed between versions...

After 1.0 API changes really should increment the major version.

Unless you say "We're probably going to break things." when you announce it, which they did.

Perhaps they should have chosen a more appropriate version number then.

Apple doesn't claim to do semver

Regardless, that is what developers have come to expect. Not that semver is adopted wholesale, but at least don't break the API without incrementing the major version.

I agree. I think there was never a major iOS release that didn't break things in my app. Really basic stuff like a text input for example.

I actually think it's much the same, judging from the quote from linus in the parent.

Yeah I was kind of wondering what his take on something like this would be.

On the one hand you're changing an API, which is a promise to userspace. On the other hand, no one was using it right leading to most networking apps being vulnerable to a DOS.

Given the proven risk and how little the feature is used, changing the API to match what people THINK it does seems sane.

I have a vague memory or Linus making a similar 'change it' decision once.

Also according to this article it was either undocumented or extremely poorly documented depending on which doc's you look at. So maybe they're removing a feature a lot of developers either didn't know about or didn't know they COULD utilise.

If no examples could be found of someone relying on the behavior, then a change would be okay. Otherwise the solution would be to implement a modified API.

Alternatively glibc could change it, as glibc almost seems to want to break existing programs.

There was an issue when people were using memcpy with overlapping addresses (ahem Flash), but I guess that was on stdlibc


I think this attitude is perfect for Linux, where many developers will not expend effort to maintain their software.

On the other hand, developers for OS X and iOS are often willing to expend enormous amounts of effort to keep their apps working.

Interesting but raising my eyebrowns a bit because sounds a bit too black and white. Do you have time to explain more or do you have some data to back up such a claim?

I think the conclusion warrants repeating:

"The moral of the story? Confusing APIs are a security problem. If many users of your API get it wrong in a way that introduces a security bug, that’s a bug in your API, not their code."

> Linux has epoll. BSD has kqueue. Windows has... well, about five different mechanisms that cover differing subsets of usecases and you can only choose one.

Aren't I/O completion ports fairly analogous to kqueue?

AFAIK (and I'm not a Windows person, so feel free to correct me) I/O completion ports are analogous to something like Linux's kernel AIO (asynch I/O). The model is to start an I/O operation on an fd (providing a buffer) and then be informed of its completion (with the data already copied into your process space), rather than the traditional Unix polling model of being informed when there is data waiting on an fd an then do a (non-blocking) read on it. If you're used to writing traditional Unixy event-based network daemons this seems pretty backwards.

One advantage of the aio model is that it at least theoretically allows "non-blocking" I/O of a normal disk file, whereas none of the polling-based systems I am aware of work with regular files. However, at least on Linux the kernel aio is notoriously fidgety (as it still may block the process if you take certain kinds of page cache misses; the solution is apparently to use O_DIRECT to bypass the page cache which has its own bag of problems), and most people fall back to using blocking reads with thread pools, or something like libeio or libuv that handles all that crap for you. (There's also the POSIX AIO that's built into glibc, but that done entirely in userspace [presumably with thread pools] and uses signal-based completion notification, which is kind of gross and probably pretty slow if you have a ton of I/O events happening.)

I thought events all were handled were through libuv.

Only if the only event type you care about is stream I/O. If you also need to handle GUI events or wait on mutex handles or whatnot, you get to use MsgWaitForMultipleObjectsEx, which is as ugly as it sounds.

You said that this was in the context of async I/O.

UI events are I/O...

You are not talking about UI events in the blog post. You are saying that there isn't a unified Windows API similar to kqueue or epoll that can be used for async file/stream I/O with which I disagree. AFAIK you can't use epoll to do UI events either.

X events are delivered via a socket. You can absolutely epoll that, and people in fact do so, and this is considered an essential feature.

One of the best features of all of the epoll/poll/select functions is that they let you listen on arbitrary file descriptors uniformly. You can send events over a unix domain socket or a pipe, and just treat it as if it were a regular file. It's especially useful with stuff like signalfd or timerfd, where you can listen on sockets, handle signals, and have a timeout, with barely any additional code.

this is unrelated to the topic at hand.

I've made a (hopefully) complete list of methods in this answer: http://stackoverflow.com/questions/11830839/when-using-iocp-...

As an FYI, the Chromium bug has now been unrestricted for public viewing - https://code.google.com/p/chromium/issues/detail?id=437642

I wonder if OOB data is ever going to be fixed to allow more than one byte.

I can't quite comprehend how the implementation got so screwed up in the first place. A two byte field in every header, supposed to be a pointer to designate a range of data, OBVIOUSLY when it's set it should transmit exactly one byte of information.

Unlikely. Almost nothing uses OOB data (and it's unclear what it's even useful for!), so there's no demand to fix it.

More likely, I suspect, would be a gradual move to deprecate OOB data. It's a poorly designed wart on TCP.

I thought that SSH used it to pass ^C up faster than the rest of the stream so that you can quickly kill programs that flood the terminal. I've never verified this though.

The article says Telnet works that way, it seems reasonable to think SSH might use it for the same purpose.

But the ^C should be encrypted. If SSH sets the URG flag on a packet containing an encrypted ^C, it would be leaking plain-text - even for just one byte :)

Not just encrypted, authenticated- it would be bad if a MITM could send ^Cs out to fuck with you at any time.

I don't know how it actually works, but I can't see how this would be needed in that situation. The ^C signal goes from client to server, while the program is flooding in the opposite direction. So the flood of data shouldn't delay the ^C transmission.

The real problem is not with programs flooding the terminal, but rather that if the foreground program isn't accepting input (perfectly normal for many programs) then by design the socket buffer will fill up, TCP flow control will kick in, and the client machine will stop even sending the bytes to the server. In order for a ctrl+C to get around the backed up buffer, it has to bypass the regular queue.

SSH uses the URG flag, IIRC.

The TCP URG flag and the OOB data described here are the same thing.

While I agree it is likely to be deprecated (or already is), I think being able to signal that side data which was transmitted later should be processed earlier has value in situations where processing delay is more significant than transmission delay. You could do this with a single stream, but the receiver would have to read ahead and check for flags etc. Had early implementations got this consistently usable, it might see more use.

I use OOB to force an early ACK to speed up communication with a request-reply style protocol.

It's necessary to share file handles between processes using TCP over UNIX Domain Sockets. I'm sure some other mechanism could be devised for that though.

Minor nit: While Unix Domain Sockets (AF_UNIX) can send open file descriptors between processes, and it uses the msg_control field of sendmsg's struct msghdr ("ancillary data"), that's not the same as TCP's OOB data and in fact TCP is not involved at all in unix sockets, even when used as SOCK_STREAM. sendmsg() even has a MSG_OOB flag that is used to trigger TCP's OOB mechanism that (as far as I can tell) doesn't use the msg_control field at all, as the same option is available for plain send() and sendto(); the OOB data should be sent through the normal msg_iov field.

Some places where msg_control is used for TCP/IP sockets (at least in Linux) include kernel timestamping of messages. This is "out of band" data, but it is out of band to the kernel, not the network peer.

It's pretty confusing.

Weird. I guess I built a mental model that felt like it made sense. Turns out computers are crazier than common sense. Thanks for the detailed reply :)

Well, there's no reason to fix it unless someone has a compelling use case. And there's never going to be a compelling use case built around it if it's not fixed.

And regarding the fix, which do you propose, change the spec or change the majority of implementations?

It may offend the sensibilities but we're stuck with the current state of affairs for the life of TCP I think.

I was a bit confused about at what layer this OOB byte would be sent, I think this article cleared it up for me[0]. TCP has a feature called TCP Urgent Data which can be enabled by flipping a bit in the TCP header. BSD maps this to the OOB feature as described.

This means that if your Node.JS app was behind a http proxy that's not vulnerable, like nginx+passenger, you would've been safe. Just a TCP pass-through would leave you vulnerable though.

All theory of course, since no one is really running a Node.JS server on OSX.. right?

0] http://www.serverframework.com/asynchronousevents/2011/10/ou...

It's a bigger problem for Chrome. Imagine if you ran an ad on a major ad network that loaded a banner from your server that sent a byte as TCP urgent. It would lock up any Mac who visited a significant fraction of the Internet.

(Incidentally, a bug like this was what got me to switch from Netscape to Internet Explorer back in 2000. Doubleclick ran an ad that included some Javascript which locked up Netscape and hung the browser entirely, blocking roughly 40% of the Internet for me. At that point I was like "This is ridiculous, fix your damn software Netscape" and switched.)

I'm kind of disappointed that this bug doesn't have branding. :-)

"oob-killer?" "oob-o'-death?"

Both the other brands have been clever and played on what's being exploited. (Heartbleed is bleeding data thorugh heartbeat, for example.)

How about "reURGitate"? It plays on URG, and regurgitate is something you can do forever.

They seem to have only fixed this on Yosemite.

I can reproduce the bug on Mavericks and Chrome 41.0.2272.118 after the 2014-004 update for Mavericks.

If true that strikes me as kind of shitty. I know Apple doesn't care much for maintaining old versions and backwards compatibility but Mavericks isn't all that old.

Ehhh. Maybe. Keep in mind that this bug is probably one of the least serious of the dozens of bugs patched today, as there's very little damage you can really do with it. Hardly anyone runs OSX servers, so you're not going to take down any major web sites with this. You can crash someone's browser, but that's actually already pretty easy to do with some javascript that eats a bunch of resources. Chrome doesn't even consider DoS to be a security issue because there's just nothing they'll ever be able to do about it anyhow.

If the bug in any way threatened data integrity or confidentiality, then yeah, they should backport it. But for a DoS, I can see the case for not really caring.

FWIW, for many of the bugs patched today, Apple did in fact backport to Mavericks and even Mountain Lion, so it seems like they haven't completely abandoned old versions.

You can crash someone's browser, but that's actually already pretty easy to do with some javascript that eats a bunch of resources. Chrome doesn't even consider DoS to be a security issue because there's just nothing they'll ever be able to do about it anyhow

They could implement resource limits and stop consuming more resources once those limits are reached.

It would be challenging to tell the difference between a legitimate but resource-hungry site and a malicious one. In the former case, the user may in fact want the site to page everything else out to get the job done.

In any case, DoS attacks exist yet don't seem to be a big problem in practice, probably because there's not much in it for the attacker.

Judging from the many tabs I have open, it seems that you don't need an exploit to eat resources in Chrome! That thing eats my battery and loves CPU feasts without any prompting.

(Disabling Flash helped)

Yosemite is a free upgrade though, so I think that's okay.

As biglain pointed out, there are people that can't upgrade. Someone who bought a Mac in 2008 is stuck with a release that doesn't get security updates backported. When you're an engineer it's easy to say "Well, that's a seven year old computer anyway. Just buy a new one." There are however people that either can't for financial reasons or see no point when it still works very well for them.

I know companies can't support old software forever though. I also know Apple has their hardware and software coupled fairly tightly so it may not be as simple as simply basing it on resource constraints the way Windows does. But client security is more important than ever and last I checked Apple won't commit to publishing support timelines which doesn't strike me as very fair if you're trying to make an informed decision when purchasing something. It would be nice if they were able to strike a balance between how they do things and Microsoft's noble but certainly expensive and painful commitment to backwards compatibility.

All that said, I have multiple Macs and I love them.

Not if your hardware is "too old"...

(My single core Mac Mini works fine as a media/file server, but cant upgrade past 10.6.something...)

I wonder if this is related to a wacky clipboard bug I hit today while VMWare Fusion was open. Basically I copied & pasted a file within a VM, then copied some text from Chrome on OSX and Chrome froze instantly.

I think this would apply to a few common ircds running on affected versions of OS X too.

Such a twisted forewarning about the importance of net neutrality...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact