Hacker News new | comments | show | ask | jobs | submit login
I think I found a Mac kernel bug? (jvns.ca)
349 points by ingve 9 months ago | hide | past | web | favorite | 117 comments

Back in Leopard days, I was playing with the x86-64 ABI on Mac OS X (no real documentation existed whatsoever, or at least that I could find, except for the source code). Very soon I accidentally ran into a kernel panic that could be reduced to a three-instruction program:

mov rax, 1

mov rdi, 1


and it would bring down the entire OS (kernel panic) when run by any user mode application under a standard non-root user. Took a few releases for them to fix that. I stopped trusting them from a low-level reliability perspective since then.

Fuzzing their system call interface may not have been a bad idea.

P.S. XNU is an interesting as it lets userspace use Mach syscalls as well as a bunch of BSD system calls directly. Probably esoteric interactions between them are not very well-thought-out. (I hope I am not offending Avie Tevanian here :))

I immediately thought of the htop bug and then she references that in the article. How is this not fixed yet? Like, this is a security bug, right? You can DoS from a simple usermode app with this bug.

Nothing official from Apple on this yet?

I don’t think the attack vector (presumably asking for admin credentials to install a startup item) would be any different than an app that wanted to fork bomb, allocate too much memory, or spin wait on all cores. In that way I don’t think it’s any more security critical than any other bug that hangs the system.

It’s definitely something that should be fixed of course.

@cliffordheath on twitter says [0]

> This will be a kernel data structure protected by a mutex or semaphore. task_for_pid waits at pri>=0 for a wakeup that won't happen because race. ps queues behind it at pri<0 (disabling ^C). At least two bugs there.

Seems like a reasonable explanation as to the underlying cause of the behaviour.

[0] https://twitter.com/cliffordheath/status/957505667568353280

Very true; she acknowledges it in the article as well.

Where? The only obvious link to github links to a tracking rbspy issue.

Closer to the bottom she has a "This appears to be affecting htop" section ... she might have edited that in later since you read it?

She does? Where?

Article has been edited - wasn’t there originally.

I'm still not seeing it. Could you quote the relevant part?

I had to add ?33333 to the end of the URL, and reload, to bust the browser cache.

Might not just be the browser cache -- I use Cloudflare on my blog and have a script that uses the Cloudflare API to clear the CDN cache when I update the site, but the Cloudflare API doesn't appear to work 100% reliably and I'm not sure why. sorry about that!

Hey Julia - feel free to email me (msilverlock @ cloudflare) so I can look into this. If you can share the API calls you’re making it’d be useful too!

Thank you; I thought I was going crazy. Ctrl-F: htop

Ah! That was confusing.

Wanted to point out the same. It’s time for apple to fix something here. Not sure if I like the idea to work with timeouts in code just to prevent the bug from happening.

Timers in general are a code smell to me. It’s one thing if you’re waiting for something to complete, but having arbitrary sleeps in your program is undesirable IMO. It makes reasoning about your program more difficult.

It seems Apple just can't get a break these days. Has anyone checked if this bug is exclusive to High Sierra?

The htop report (in TFA and an other comment) seems high sierra specific (as in, users report the issue starting right when they upgraded to high sierra), running the C snippet from TFA on my system (a 2010 MBP running El Cap'), I can run through 15000 iterations without any freeze, according to TFA "sometimes it needs to try 10 times before it’ll freeze."

Think you make your own breaks.

Htop also had segfaults after a while in OpenBSD 6.2 i386, when I used OpenBSD exclusively for a couple of weeks. It could also be present in other BSD-like kernels.

Htop segfaulting on OpenBSD doesn't show that the OpenBSD kernel has the same bug, it shows that htop has some (unrelated) memory management issues (OpenBSD is pretty good at exposing those).

At first I remembered this and wondered if it was related:

"The rules for using Objective-C between fork() and exec() have changed in macOS 10.13. Incorrect code that happened to work most of the time in the past may now fail. Some workarounds are available."


But seems like no, actually, at least not obviously.

Question from someone who knows very little about the Mach/XNU APIs: Does this code leak Mach ports? If you call task_for_pid, you get back a Mach task port. Do you have to close the port with mach_port_deallocate? Could a resource leak be contributing to the system freeze?

Additionally, valgrind is incompatible with Maverick.

Which is a huge pain, because it means that if I ever need to use it, I have to debug on a server.

You mean Sierra / High Sierra. This is because of compatibility breaking changes to some low-level kernel system calls. Since valgrind is essentially a CPU emulator, it is tightly integrated with the OS kernel, and has to be updated accordingly. The macOS contributors to valgrind seem to be relatively few, probably because most macOS developers primarily use the various sanitisers in clang (they also have UI integration in Xcode).

Have you tried the clang or gcc asan/tsan/usan sanitisers as a replacment? There are pros and cons of valgrind vs compile time instrumentation. The sanitisers increase the memory footprint, but run with less overhead. valgrind can detect some errors that the sanitizers cannot etc.

I haven't, mostly just because it happens so rarely and I just want a quick fix, but some of my colleagues have started building the sanitizers into their production process. I probably should just for the sake of good practice.

Are you sure? IIRC valgrind supports everything up to macOS Sierra.

Oh, my mistake. It's "High Sierra" on which it can't work. I can't keep track of the OS names.

Local kernel-level DoS sadly isn't uncommon on the Mac. I remember finding some trivial ways to panic the kernel using ptrace() back in 2006 or so.

What version of macOS are you running with?

I'm the initial bug reporter. This machine is on 10.13.3.

I don't even use a mac but I've heard terrible things about High Sierra, is it really that bad?

Seems like they pushed a bunch of substantial changes to the kernel.

It's not bad, nothing's a show-stopper, there's just several little annoyances and embarrassing security issues.

My example is that there's an empty blank line in the bottom of about half of my terminal sessions. I haven't looked into it, and I assume it'll disappear in a future point release.

I've waited for the dust to settle before doing the upgrade, and did upgrade one machine after another, but I didn't meet anything problematic, this has been a smooth upgrade.

I've waited a bit after the "root" security issue, and now I prefer to be on the latest version.

I havent had an issue with it. Other people have.

You might want to run it again in 10.13.4 beta :)

My choice is to avoid OS beta versions ^_^

on twitter she noted it was 10.13.x

I found the style of the article to be quite refreshing somehow. The OP is not trying to look like a smartass about the discovery (a trait very common in the IT industry), and she acknowledges that she doesn't really understand what is the underlying cause. She is just happy that she discovered something and is keen on sharing it with the world.

Yeap! Julia Evans is generally awesome.

You should read the rest of her writing, her work is always a treat to read.

I wish there was more of this. Exploring tech should be a delight.

I've been writing a devlog on implementing a Lua vm in handwritten WebAssembly, https://www.patreon.com/serprex it pieces together the commit log with a stream of consciousness aspect

Very much admits an 'I have no idea what I'm doing' experience

Looks quite interesting project, good luck.

On the one hand yes they're quite enthusiastic, and written with a clear love of learning.

On the other hand I find that if every other sentence ends with an exclamation mark, or a question-mark it gets quite exhausting.

I think High Sierra is the worst Mac OS release yet.

I’m sure there’s a cognitive bias partially to blame (since it’s the most recent) but it looks like we are way past that.

As long as we are talking about anecdotical evidence: my MBP with High Sierra has been running fine for the last few months. I haven't encountered any issues in my day-to-day work as iOS app developer and neither in my home use. I think it's a pretty decent release, though it didn't add any new features that I feel I really need.

Same here. My only reboots have been for system & security updates. No kernel panics. Stable as a rock.

Are you using anecdote to mean 'not factual/confirmed' or simply to mean 'single data point'? I understand that its common to switch between the two meanings. I can usually guess from the context but its unclear from your comment.

In any case, I would say that even a single data point of a kernel crashing bug is cause for concern.

Actually, my mid-2014 13" rMBP randomly reboots on High Sierra, even on a clean install. It leaves no trace of anything in any logs. I had to downgrade to Sierra.

My 2017 MBP hard locked once shortly after upgrading to High Sierra but it hasn't done it since.

I hit this bug when using htop

10.0 was pretty bad - as in unusably bad. Like everyone I really miss snow leopard.

"Snow leopard" was 10.6, was it not? 10.6.8 was the best OS X ever. I wish I could have stuck with it forever, but that machine eventually died and the replacement wouldn't boot under anything older than 10.9. Someday I'm sure I'll be forced to "upgrade" again, but in the meantime I'm leaving things well enough alone.

Yes Snow Leopard is 10.6. In retrospect, 10.6.8 was the high point for MacOS X.

> 10.0 was pretty bad - as in unusably bad

That's why they gave everyone free CDs with 10.1.

The mach kernels in general are buggy, regardless of the release. e.g. I'm pretty sure they end up delivering SIGPIPE to the wrong thread in some circumstances on Sierra. There is also a problem with recvmsg not returning control messages some time.

Another reason they killed OS X Server maybe.

They killed server because that was a losing battle: Apple was never going to "win" at servers. Despite all their effort they managed to score only a few modest wins, and the intense competition in that space made it a huge distraction.

Remember Apple started down the "server" road decades ago with their A/UX UNIX server (https://en.wikipedia.org/wiki/A/UX). Throughout the 1990s you could get a high-end Mac spec'd as a UNIX machine if you wanted, though I've only ever seen a handful in the wild.

The introduction of a special-purpose server was an anomaly, the Xserve had no specific predecessor, perhaps some kind of "go big or go home" effort on Apple's part.

While that system wasn't bad by the standards of the time, it couldn't compete with the likes of Dell, HP and others who offered way more flexibility on configuration and who would seemingly sell super budget low-end servers at a loss to lock people into their product line so they can squeeze them on support costs, a model Apple's allergic to.

Still it did fulfil a market, servers for Apple shops.

Now there is only Mac minis for it.

Even then they realized it was a losing battle. Apple is an Apple shop and even they couldn't use their own gear at scale. I'm sure when the Xserve team saw racks and racks and racks of generic kit in Apple's own datacentres they realized they weren't going to win.

You're right that the Mini is a very capable workgroup server, and providers like http://macminicolo.net/ are hosting Mac "servers" by the thousands, so it's not like they've completely given up. Intel's the only other player in this space with their NUC machines and those tend to cost as much or more.

Hopefully they'll kit out the Mini better in the 2018 iterations. That 2012 quad-code i7 variant was an exceptional unit.

To be fair, the whole story about signals and threads on UNIX OSes is not that clean.

On the other hand, the Darwin kernel also powers iOS, and Apple seems to be eager to keep the system secure so that the App Store appears trustworthy. While a local privilege escalation might seem dull on macOS, the fact that the same flaw might affect iOS where untrustworthy local apps need to be sandboxed is probably the saving grace for Macs.

Qualitatively, it feels like apple’s software quality has been on a slide for several years.

What attracted me to move to Mac OS in the first place some 15 years ago was the sheer quality. It was thrilling to use a computer that Just Worked, with no BSODs or the endless dependency hell that was Linux at the time.

It doesn’t feel like that any more, across either OSX or iOS - it feels fragile. Things crash, behaviours are inconsistent, and it feels like more emphasis has been placed on immediate commerciality than long term retention through quality.

For what it’s worth, I’m in the process of moving to Linux on my MBP. The pros of OSX just aren’t as strong any more.

Thank goodness that Linux is consistent and has no bugs :-)

Ubuntu still doesn't let me change my IP address to a static one via the UI and video drivers still crash or result in hours of googling and reading contradicting articles on how to change xorg.conf

Software has bugs. Operating systems are hard. It doesn't matter where you will go, you will most likely deal with similar issues. Whether it be Linux, Windows or macOS.

The difference is you pay a huge premium for Apple products, Linux distros are generally free (unless you want enterprise-style support).

The same criticism applies to Windows - give it away for free and I might care less about crashes and the huge slide in quality over the last ~5 years.

You’re not paying for the OS; you’re paying for the hardware. If Macintosh hardware was buggy when running Linux or Windows, for reasons to do with, say, badly-written ACPI tables, then you’d have an argument.

As it is, it’s the opposite: Macs run both Windows and Linux “easily”, while other vendors’ buggy hardware has to be patched over with heuristics in these OSes because it has non-conforming device names, responses to introspection queries, etc. Apple hardware is uniquely well-engineered. If you’re writing your own hobbyist OS, it’s a breath of fresh air to run it on a Mac, compared to other kinds of PCs.

This is, coincidentally, also the reason it’s so hard to run macOS on other vendors’ machines. Windows and Linux paper over all the brokenness in hardware land, and so machines built for Windows/Linux rely on these heuristics with sloppy integration work. Apple, meanwhile, just does the integration to the standard in their hardware, and then relies on said standards-conformance in their OS. If everyone conformed to hardware ABI standards as well as Apple does, Hackintoshing would be a matter of just dropping a patched dont-steal-osx.kext into your install disc and calling it good. Everything else you have to do is the hardware vendors’ fault.

You will not find significant fundamental hardware differences between the latest high-end Apple and Dell laptops.

> As it is, it’s the opposite: Macs run both Windows and Linux “easily”, while other vendors’ buggy hardware has to be patched over with heuristics in these OSes because it has non-conforming device names

> This is, coincidentally, also the reason it’s so hard to run macOS on other vendors’ machines

Are you not aware that OS X does various explicit checks to make sure it is running on approved hardware, regardless of compatibility and capability attestation? Apple doesn't want OS X running on non-Apple hardware.

> various explicit checks to make sure it is running on approved hardware

Er, yes, that's the "dont-steal-osx.kext" that I mentioned. My point was that thwarting those checks is a very small part of getting macOS to run on a system (and is a solved problem—if it was all that was required, Hackintoshing would be a one-and-done thing, rather than something that breaks on every system update.)

The majority of the (continuing) effort of getting macOS to run on arbitrary hardware—hardware that, by its components, should be compatible with macOS's drivers—is dealing with vendors' whack-ass "doesn't even pass static analysis using Intel's own provided AML compiler" DSDTs (which macOS rightfully tosses its hands up at, but which Windows and Linux heuristically munge into something barely passable and then use it.)

> latest high-end Apple and Dell laptops

Dell (along with HP and Lenovo) are the better vendors as far as spec-compliance goes. Really, any of the PC makers who have an "enterprise workstation" arm, have the in-house expertise for things like ACPI compliance, or UEFI compliance, or PXE compliance, etc. But other vendors? Acer? LG? Xiaomi? Razer? Better to not even try.

I do miss the track pad from the Mac. The newer ones are gigantic. To the point that that is the only part of my budget laptop that is laughable bad.

I'm a bit skeptical about your claim that apple hardware is well engineered. I think they pick aesthetics over function, to the point that functionality is compromised - leading to computers that overheat.

The result is they have a lot of problems with tin whiskers. Which is why you'll find about a billion forum threads about people baking their macs in ovens, and similar hijinks.

Granted, if you're not using a mac for anything that intensively uses the hardware (for instance, just text editing) you'll never run into this problem.

I'm happy to pay a premium to never again have to deal with xorg.conf. All platforms come with a cost.

Yeah, except 15 years ago it was xf86config with xinerama if you wanted multiple heads, and lcds are a walk in the park compared to CRTs - I actually killed a nice 21” monitor by screwing up the frequencies. I even recall the delightful process of having to modify graphics drivers to work with the bizarre Chips&Technologies integrated graphics I had. At that point, OSX was such a breath of fresh air.

Xorg.conf is a delightful stroll through a meadow compared to the nightmare fuel that came before, and in setting Linux (Kali,Ubuntu) on several machines of various spec over the last weeks, it has Just Worked.

FWIW, X configures itself nowadays and Wayland also just works. But I know where you are coming from. Everything audio related for example is still in very bad shape when you want more than Intel HDA.

Canonical is not of the same opinion, they just gave up on Wayland for the next LTS release.

Wayland may be ready for default in Ubuntu and Fedora, but they clearly didn't consider it ready for LTS. They have already said the next non-LTS version will still have Wayland as default.

Now try hotplugging an external display.

My Ubuntu 16.04 LTS X-Windows just froze today multiple times, not even changing to virtual terminals was possible, only hard reboot.

I spent hours last night trying to get Bumblebee to work with my Optimus laptop. Ubuntu is a great plug-and-play distribution, except when it isn’t.

Optimus is a piece of shit even on windows.

Facing that right now :-(

Optimus is one of those ideas that’s cool on paper, but horribly implemented.

Re Ubuntu and docs, this something where I find CentOS valuable: it moves slowly but is very stable. I’ve never seen network issues on CentOS on cli or GUI. Red hat also puts out great comprehensive docs so you don’t have to dredge the forums.

Yes, all software has bugs. And pretty much all modern operating systems are good. But some are better than others in much the same way as all Olympic 100m runners are fast but some are faster than others.

>Software has bugs. Operating systems are hard.

Sure, but it doesn't help when you are rewriting fundamental parts of the stack like the window server, for instance.

And I'm very glad that they did, the move to Metal shows they are committed to the Mac and not standing still. Just don't treat us as involuntary beta testers.

Except their window system rewrite actually slowed everything down. Every single os release has added bugs to the graphics layer. I've been maintaining a simple little screensaver since Mac os beta (almost 20 years old now) and the last 3 or 4 releases there's always something that breaks it. Bug reports go unheeded.

May I ask which screen saver?

Ubuntu still doesn't let me change my IP address to a static one via the UI

What do you mean? It doesn't get applied?

That is right. I fill in the fields, then submit the change and nothing happens and it jumps back to DHCP. Always have to just edit the network settings manually.

Sorry, but Ubuntu and Gnome NetworkManager issues do not equate to issues with Linux. They are very distro and window manager specific.

Regular users see a full distro experience as Linux, they don't dissociate kernel and userland.

Not sure why your comment is grayed out. This is very much true, and part of the reason I run CentOS when I want infrastructure to work 24/7/365, but eg Fedora for testing new software.

I found OS X to be incredibly buggy at release. I think 10.2 was fairly stable, but I had a lot of issues before that.

Agreed. 10.2 was the first OS X release that felt usable day-to-day without running into too many bugs, annoyingly missing features, or frustrating UI decisions.

10.4 felt like the first version that was a pleasure to use. It felt like the polish had finally caught up with OS 9.

IMO, 10.6 was still the best release. Wish they would do another “no new features” release one of these years.

I've edited 'he' to 'she' in the two otherwise fine comments that made this mistake (https://news.ycombinator.com/item?id=16251566 and https://news.ycombinator.com/item?id=16251562) and grouped several empty replies and one lame off-topic subthread under this one. It's rare that we do something like this (and I've emailed the author), but it seems fairer than to penalize their original posts, which were otherwise informative and on topic.



It's a she.

The gender of the author is one of the least interesting things about this blog post, yet it's brought up in (at the time of writing this) three separate comments here.

Comments that just say "It's a she" or "she" are basically spam and add nothing to the conversation unless the gender of the author is really that important. Since the blog post doesn't mention anything gender specific, I think it's safe to assume these comments are just spam.

No doubt, but by getting further into it like this, you blew it up 100x and produced by far the least interesting thing about this thread.

The internet is replete with opportunities for getting triggered and starting flamewars. For HN not to sink into a deeper circle of hell, we all need to resist these temptations. So could you and everyone else please not take HN threads on generic, divisive tangents in the future?


I apologize for how this blew up, I did not anticipate that. I was trying to channel the guidelines similar to "it never does any good, and it makes boring reading", hoping HN could focus on the substance of the article, rather than the politics of whether or not it's okay to use "he" as a generic gender term.

It does get extremely boring to read on every article written by or about a woman someone correcting every comment that uses "he" as a generic antecedent, but in the future I will just ignore it instead of inciting a potential flamewar.

It's true, and a bit weird, that this happens even when one's sincere intention is just the opposite. It takes a bit of forethought to realize this in advance and avoid it. That's the skill we're most hoping to see become more widespread here.

Look, I don't particularly want this to degrade into a stupid pointless flame war, but getting this sort of thing right does matter.

Given the relative gender disparity in the technology field, it's easy to make the assumption that any given developer is a man – we all do it sometimes! It's useful to have that pointed out when we're wrong, so that we can be reminded that we sometimes make incorrect assumptions, and eventually it hopefully won't happen as much.

None of the comments along those lines have been needlessly faux-offended, or attacking anyone – just pointing out a mistake that it would be good to rectify. It's really not harming anyone :)

I am not sure if you have ever been on the other side of this, but having people constantly assume you are a man because you know how to program and you're on the internet gets very old, very quickly. You can either accept being constantly referred to as a guy, which becomes flat-out alienating, or you can speak up for yourself, in which case you get responses like the one you just wrote.

I get that the gender of the author doesn't matter to you, and it may not even matter to the author of the article, but it does matter to a lot of people. No one is suggesting mistakes are a grave sin—even if her blog does say "Julia Evans" in giant letters right at the top. If someone makes one though, the correct response to people pointing it out is to accept the correction and keep it in mind for next time, not to accuse them of spam.

Ironically, we're not supposed to assume gender based on someone's name either.

You're perfectly allowed to use "they" since you weren't sure.

It's not nice to assume "he" to always be the default - that is what was being corrected. One did not have to know that "she" was correct, but it was easy to find out that "he" was probably wrong. Let's keep discussion on topic, rather than venturing into talking about what you're "allowed" to do.

The parent comment did not bring up their gender. They simply corrected the poster who erroneously thought they were a man. This was an error, and we should hold ourselves to a standard that corrects mistakes. If anyone "brought up their gender" it was the poster that used masculine pronouns as a default for some reason.

Referring to her with the wrong gender is no different than using the wrong name of an author. She is a woman. Anyone who glanced at the blog would see a giant, honking "JULIA" on top of the blog.

Getting the obvious gender right is a matter of courtesy.

The parent has a point. She knows gender gender doesn’t matter. And the author of the article wouldn’t mind either.


This comment made me laugh. There's a bunch of charitable explanations as to why it was written, most notably "honest mistake".

But there's also very uncharitable explanations! For instance, I'm sure I'm not the only one who's noticed that hard technical IT blogs written by women are disproportionally often written by women who were born men (disproportionally wrt the general population of transgender women). Just like the one starting this thread might have assumed that "it's deep tech, so it must be a guy", I can imagine that stevenh assumed that "it's deep tech, so she must be trans". This is beautifully similar to the sentiment of his comment in general, to not under any circumstance let anyone get away with getting gender stuff wrong.

So it looks like even the people who have a strong opinion about being hard on each other about gender stuff, can do it wrong. Maybe the lesson is that we should be slightly less hard on each other about it?

I propose we politely point out mistakes and assume the best intentions (even though of course sometimes that assumption will be wrong).

> I propose we politely point out mistakes and assume the best intentions (even though of course sometimes that assumption will be wrong).

It's possible to politely point out peoples' mistakes while giving them no room to purposely misgender, which is exactly what this thread does. Let's examine the course of this conversation:

"He said..."

"It's a she" (In my opinion a polite response. This is where any discussion should have ended.)

"Why does that matter?"

"You can't let people get away with misgendering a trans person"

If you only respond to the final comment, then your point makes sense. But the the act of politely pointing out the mistake already happened - now we must justify why it matters.

I only responded to the final comment. Sorry, I thought that was clear.

I agree with you for the rest. I did think stevenh's comment had a needlessly militant vibe to it, which is what I responded to. The rest of the thread lacks that vibe, which is good.

Gotcha, no hard feelings.

She does.

I keep forgetting that the the great coders like Avie Tevanian are long-gone from Apple.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact