Hacker News new | past | comments | ask | show | jobs | submit login
Meltdown patch reduces mkfile(8) throughput to less than 1/3 on OS X (metaobject.com)
253 points by mpweiher on Jan 13, 2018 | hide | past | web | favorite | 57 comments

I have a suspicion that APFS is partly to blame. When I saw benchmarks of APFS vs. HFS+ [0, 1], it appeared that Apple had managed to transition to a filesystem that is markedly slower at reads and writes than the 20 year old one it was replacing.

[0] https://malcont.net/wp-content/uploads/2017/07/apfs_hfsplus_...

[1] https://malcont.net/wp-content/uploads/2017/07/apfs_hfsplus_...

As I wrote in the article, I already accounted for this. The raw combined performance degradation was 4x. APFS itself seems to be around 20%. Now that is somewhat rough, as APFS degradation may itself by non-linear. However, others have measured the pure APFS costs and they seem to be roughly what I saw:


Could you please disable that annoying behavior where scrolling slightly left or right swaps to a different article? It’s hard so imagine anyone actually uses that and it is extremely annoying.

My favorite wrinkle of that "feature" (on mobile) is when there's an embedded video in a post that's wider than the viewport. Swiping very slightly left or right scrolls the page, but a tiny bit moar and it switches articles. (ノಠ益ಠ)ノ彡┻━┻

That's a problem with the Blogger platform, it might not be possible to disable that behaviour without switching to something else entirely.

Have you ever heard of the expression "How do you know someone disables javascript on the internet through an add-on? They'll tell you"?

No? Well, anyway, one draconic way to fix this is installing something like uMatrix or NoScript. This particular website works fine without JS.

Likewise you can just turn on reader mode or whatever the mobile browsers call it. Should pull just the text out.

I wonder if we couldn't just make an "open in reader mode" add-on

The web is better as a whole without js

It would suck if you had to refresh the page here on HN for every upvote.

Kind of a cool hack: I believe HN used to record upvotes through a CSS rule that (when clicked) caused an empty image to load. No JS required.

Nah, we'd just have check boxes and a submit at the bottom of the page and we'd call it UX.

Upvoting adds nothing but the capacity for groupthink. See what voting has done to the level of discourse on reddit. The larger the community, the worse the problem.

If we couldn't vote on comments asyncrhonously (or, IMHO at all), nothing of value would be lost.

Hmm...I can't even reproduce it. Now on Safari, I do have a JavaScript blocker on, but I also tried on Chrome and still nothing.

Probably a mobile-only behavior.

That’s weird because the article states it works better on spinning media but it is only offered automatically on SSD.

Yet also transitioning to a filesystem with actual checks for integrity and many features which improve performance for, f.x. Time Machine.

Even if that is the case, you also have to account for situations where APFS requires far fewer writes entirely (such as, apparently, duplication of files).

In a typical aggregate workflow it’s entirely possible that it’s still a better design.

This is an utterly pathological test case. mkfile makes a lot of syscalls which do very little work. Heck you could have made the batch size 1 byte; that would really show a slowdown!

Can someone who is more familiar with Meltdown speculate on this? Is this a quick fix and is it likely to be optimised over time and become less slow? Or is this the raw trade-off and current hardware will always suffer slow-downs to this degree. Can an OS be rearchitected in ways that would mitigate the performance loss?

The "fix" Intel pushed out this week is a microcode update that in my experience doesn't fix or address Meltdown at all. The update does however make Spectre slightly less reliable, so I'm going to assume that the microcode update has something to do with fixing, updating, or adding new controls to the branch predictor buffer.

So absent a microcode update that outright fixes Meltdown, there will always be some level of slow-down for vulnerable devices. System calls now jump from user mode code to a stub kernel in "supervisor memory". The stub kernel then does a full context switch (touching %cr3 paging register and wiping a good portion of the TLB), and once the real kernel finishes, it does a full context switch back to the stub kernel. It's all terribly inefficient, and realistically it's unlikely that there will only be negligible performance impacts. It should also be noted that this "work-around" doesn't fix processor, it just makes it so that that there's nothing juicy in the supervisor memory.

You may have to learn to live with this for a while. Even if it takes Intel a month to design and validate a fix for Meltdown, prototype and mass production turn around times mean that no customer will have a processor that isn't vulnerable to Meltdown until April-June 2019.

> Can an OS be rearchitected in ways that would mitigate the performance loss?

The performance loss comes from extra overhead on syscalls, so it could be sidestepped by allowing programs to do more work per syscall.

At its simplest that could mean adding more syscalls that perform the same operation over an arbitrarily long list of inputs (like linux's sendmmsg) but I would like to see kernels take inspiration from modern graphics APIs that allow arbitrarily long lists of arbitrary operations to be batched and executed with a single syscall. GPUs had this stuff figured out years ago.

I agree with your basic point - a generic syscall batching mechanism would be a good thing - but it's worth noting that GPU workloads are usually much better suited for ginormous "fire and forget" command/command+data sequences. syscalls tend to be more interactive/round-trip-dependent, where syscall N+1 is at least partially determined by a userspace process doing something fairly complicated with the result of syscall N. This is a natural enemy of massive batching.

Note that Red Hat patented batching syscalls... https://www.google.com/patents/US9038075

1) this is exciting, if not also brilliant

2) SQL transactions as prior art? If they’re successful w their patent, I sure hope there’s a way for BSDs to try it on.

RedHat has a patent promise that they will not enforce their patents against any free software that make use of their patents. https://www.redhat.com/en/about/patent-promise

It's a hardware issue, IMHO it'll be reversed after we get structurally re-engineered (i.e. a new generation, which takes a lot of time to get through all the pipeline from design to actual manufacturing) of CPUs, everything Intel manufactures in 2018 and possibly even 2019 will likely still have those issues.

Just as a reference point, I didn't see any measurable slowdown on mkfile (New-Item) throughput on Windows 10 after applying the patch. I suspect this may be more of a filesystem issue.

I also tried this:

    $f = new-object System.IO.FileStream c:\temp\test.dat Create, ReadWrite
with no speed difference on a patched and unpached system

So I'm not a Windows expert but I've done file system work. If windows is reasonably smart it will support files with unallocated blocks. Setting the length to 8GB is not the same as writing 8GB in any reasonable file system.

I'd retry with a script that actually writes the data.

NTFS and APFS support sparse files, HFS+ does not.

As stated elsewhere, this may be because of architectural choices in MacOS that are affected by Meltdown. The file system drivers run in their own process.

The headline is completely bogus. The author switched both the filesystem from HFS+ to APFS and the Meltdown patch was applied. This effect cannot be attributed to just one of them without further testing.

I applied the patched on both my systems (iMac 2013 core i7 on Sierra under HFS+ that was upgraded to High Sierra for the occasion, and a MacBookPro 2017 already on High Sierra under APFS), and I can confirm that both my systems are seriously and objectively slowed down, 4 days after the patch now.

From what I could observe, applications taking the most serious hit are electron based (like VSCode, that went from smooth as silk to mildly sluggish) and Safari.

Docker on mac has also taken a hit, although I couldn't quantify it objectively.

Interesting. I had thought about game load times. Most use large agglomerations and shouldn't be too affected. Some games do load small loose files and would be slowed significantly.

I suppose the same applies to desktop applications. Most use a large executable with maybe a few dlls. Electron and Python applications that ship as source or bytecode will be much more affected.

Ok so I'm not completely insane, everything felt like molasses this week and that's why.

You did read the bit where I accounted for APFS, right?

And also that the raw, combined difference was 4x, not 3x?

As I read it, the article assumes that the 20% performance penalty measured when writing large blocks is almost entirely due to APFS performance. This seems reasonable, since the number of syscalls will be low.

However, it is then also assumed that this 20% performance penalty between HFS+ and APFS is going to be flat no matter which block size is used. I don't think this is a reasonable assumption.

You are comparing the maximum write data rate of HFS+ and APFS with the write rate of mkfile with small blocks. I do not think you can assume that the write rate in the filesystem stays the same if you decrease the block size. The expensive operation might be the block allocation that is easy for larger blocks, but for small blocks it might try to squeeze them in somewhere to avoid fragmentation.

Yes, syscalls are expensive and most likely became very expensive with the Meltdown patch. Even more so on OS X, where filesystem drivers are running in their own process.

I am not arguing against performance drops due to both Meltdown path or APFS, but I do not think your data shows the conclusion you are using in the headline.

> Yes, syscalls are expensive and most likely became very expensive with the Meltdown patch

> The headline is completely bogus.

One of your statements is therefore bogus.

You didn't read the headline.

Or the article.

Or the referenced (linked) article.

Seriously: a rebuttal of what you wrote is easy, but amounts to rephrasing the article.

Was there something that prevented testing HFS+ vs. HFS+?


> As part of the upgrade process, the macOS High Sierra installer will automatically convert an SSD to the new APFS


I wonder if the inefficiency of the meltdown patches will incentivize cloud providers to lower the price of large instances that have a high CPU core count (relative to small instances with low core count).

At the moment, the price of instances goes up linearly relative to CPU core count - Maybe in part because the performance overhead of virtualizing a single 32-core machine into 16 2-core machines had been minimal. But now that the performance overhead of virtualisation is higher (due to isolating the CPU cores being more expensive), maybe it's more efficient to lease entire CPUs (with all of their cores) without the patch (and associated overheads).

If performance is better, why would they charge less instead of more? Why apply the discount to the more desirable product?

My initial guess would be to align pricing with expected performance, which has now degraded, right? I don’t expect it to happen, but I can see customers like myself being unhappy with paying—just for example—for an 8-core VPS whose performance now matches what previously was 4-core performance. So, I don’t think anyone would expect them to charge less for a higher core count than lower core count, but adjusting prices downward to match performance wouldn’t be upsetting.

Yes exactly. To clarify my point; this vulnerability only affects you if you're sharing a physical machine/CPU with other users (isolated by a virtualization layer)... So if you choose not to share your physical machines/CPUs with other users then you are not exposed to that vulnerability and ideally you also shouldn't need to get these patches and their associated performance overheads.

Virtualization already has some known minor performance overheads but now these patches will add even more overheads.

Can cloud providers keep pretending that 4x 2-core virtual instances are as performant as a single physical 8-core instance?

I find it really unhelpful that these 'threads' are being hyped with little reference to any sort of thread model, reasonable attack vector or probability. I do not want the kernel on my Arch laptop patched and slowed down to mitigate against issues that does not reasonably exist in the context of a laptop user.

By all means patch my browser that runs random js. I'm not in the habit of downloading and running untrusted binaries - those that do have far greater problems than Meltdown or spectre already.

Of course I want it patched on AWS, and my bank's backend machines, but you are now forcing me to continually pay, hour after hour, for insurance against a thread that I'm happy to not worry about.

Do not steal my cycles to pay for 'security' I do not require.

Not sure why you're downvoted. It's a perfectly reasonable trade off.

I'll be searching for ways to disable most of these mitigations too after the dust settles. I rarely run untrusted code, and for that I can probably find a way to run it securely depending on the perceived threat. A lot of what I do on my main computer is syscall heavy, and I don't like 25-50% performance hit.

You could switch to TempleOS. :-)

Meltdown is a much smaller (but not zero [1]) security risk on TempleOS than it is on Windows or Unix and Unix-like systems.

[1] At first one might think there is no risk, because TempleOS runs everything at ring 0 in a single address space so anything that might be exposed via Meltdown is already wide open. That would be true in the case of running untrusted binaries. Where Meltdown would still affect TempleOS is in in the case of trusted binaries being exploited via techniques such as return oriented programming. Meltdown could make more "gadgets" available in the binary, increasing the chances that someone could make it read something it otherwise would not have read.

What can one actually gain access to with such an attack? And what if you have a firewall that only allows specific programs/services through? It would have to use the browser, but how, through an add-on that the user would have to install? Seems pretty complicated for a random machine, more likely to target government/company computers... I don't see why everyone is forced to install the updates.

I have to wonder if there is a sound business decision in all of this to force upgrades on users.

Im perfectly content with a several year old laptop for what I do but I realize that the performance hit from this patch is going to force my hand on an upgrade.

Lets face facts, intel hasn't seen a lot of real gains for real world usage in a few years - this change adds bloat to software where it simply wasn't being generated before.

Note: im not suggesting the issue was intentional but this unintended consequence has an upside for someone, the question is who.

It's probably a trivial modification to disable KPTI for root owned processes.

Or signed binaries that don’t execute user provided code, although there is still the potential to turn a local buffer overflow into a VM breakout... don’t think VPS provides will take that risk

Is this fix the KAISER fix that is mentioned in the Meltdown paper? That sounded like it removed the kernel from processes address space, where did it go?

Such a blantely non-real world and sythetic benchmark.

One of the reasons Node developers hate Windows is because of how stupidly slow it is to open lots of files... This will teach them

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact