Hacker News new | past | comments | ask | show | jobs | submit login
Time protection: the missing OS abstraction (acolyer.org)
168 points by walterbell on Apr 15, 2019 | hide | past | web | favorite | 51 comments

This post lifts out the key parts of the paper [1] and is a good summary. I think the paper is an accessible read as well.

Not too much discussion two weeks ago when the paper was posted on HN [2], so I will raise a point I've made before [3][4][5] and is consistent with the recommendations of the paper (and another post in this thread [6]): this is an opportunity to improve the terminology, mental models, and formalisms of observable state, and its implications for information hiding, privilege separation, and computer system design.

This conversation needs not only to occur among (e.g.) computer chip designers and cryptography experts, but also among higher-stack users of that technology, so that the information leakage aspects and trade-offs can be analyzed together with other performance indicators of the system.

It seems as if the haphazard, ad hoc way that chipmakers and system architects dealt with this issue have contributed to an environment where Spectre could occur: and such timing attacks were never a secret, but resistance to them in various levels of mainstream computing appears to have been fitted in a patchwork of hasty fixes and well-meaning but informal caution. The conversation around this topic could use an upgrade, and the paper's authors agree.

[1] https://ts.data61.csiro.au/publications/csiroabstracts/Ge_YC... [2] https://news.ycombinator.com/item?id=19547293 [3] https://news.ycombinator.com/item?id=17308014 [4] https://news.ycombinator.com/item?id=16165942 [5] https://news.ycombinator.com/item?id=19644997 [6] https://news.ycombinator.com/item?id=19670296

You might even say there's room for a startup to reinvent computers (and operating systems) from the ground up.

Would you?

Not until the people who are building existing systems become (financially / legally) liable for their mistakes. Homogeneity is fragile, but it's also extremely cheap, which is how we got here.

Do you think that people in the software field need something like the AMA to protect us so that we can have liability?

With about a hundred billion dollars of funding and a ten years to market schedule, maybe?

4 more years till that 2013 plan reaches the 10 year threshold.

I don't know if I'd look at it with that lens. Clearly, knowledge and analyses of this topic are resident in academia and have been long present at various levels of industry, but other factors like performance and power consumption have received generous affordances by hardware designers, and makers of system software and application software had to step in where the hardware provided little help for leak-proof sharing. In retrospect, the market may have rewarded hardware makers' direction of pursuit -- until now, but the survivors of those past pursuits remain.

Leak-proof sharing of resources may be a more marketable pursuit now, with cloud providers hosting code from mutually untrusted parties, and consumers running delivered applications and untrusted Javascript on the same machine, but I'd rather see the conversation about this topic improve across the board, independent from commercial forces.

Well let's think it through. It would seemingly have to follow the Disruptive Technology arc. That means it would start off as a crappy computer that had really great information leakage control. And it would be marketed to a small but enthusiastic market who really values that, and doesn't mind that most everything else is a PITA.

Who might that market be?

Cryptocurrency users?

(Although they tend to be very bad at threat modelling)

Yeah, that's a good starting point. Maybe not just cryptocurrency users, but "people who have expensive digital assets on a computer that runs untrusted code".

Although, it kind of seems like, the more expensive you digital assets, the more you just need to a) hire a full time security person and b) insure your asset.

I'm not sure how a new OS would fit into that.

It seems like it would have to be people WITHOUT a lot of assets who would be the initial customers. People who can't just pay a security person to go over their images.

So maybe casual crypto users.

But wouldn't they be better off just with a custom dongle? A dongle isn't running untrusted software.

Also, isn't part of this technology that it squeezes a little more performance out of a machine without leaking information?

That takes me to the comment below... cloud hosting platforms.

Major cloud companies and VM providers?

Most of these vulnerabilities come from speculative execution impacting a shared cache.

Yeah, this is sounding like a good angle to start with. Target users are people who need to squeeze as much performance out of hardware as they can without leaking information while running untrusted code. So, cloud providers seems like exactly where to start.

Cloud providers who target security-sensitive users who also don't have a lot of money (to just pad the hardware budget) would be ideal.

Nah, cloud providers are under enornlis pressure to optimize resource usage (real estate, electricity, labor). A hardware design without competitive performance has a snowball's chance in hell of being adopted in that space. The economics of cloud datancenter mean that security only needs to be good enough, not perfect.

In that case, the perception that current hardware security is good enough needs to change.

Or the cloud vendors could start offering compute running on new, more secure hardware - e.g. special VM types marketed accordingly.

I think that this is wishful thinking on your part unless the drastic performance per power drops are negated.

I personally think that these performance drops leads to a horrific environmental impact because every performance drop means that more hardware needs to be provisioned and powered to counter it. So this directly results in more toxic waste from these elecronics and a higher carbon dioxide output into the atmosphere (cloud data centers are a major consumer of electricity). Compared to the long term impact of that, a extra few security breaches sound like the lesser evil to me personally.

Major cloud companies == energy consumption. Most of the time cloud users would be able to 'just' rent the entire machines and not have to bother with 'untrusted' code.

It saddens me that we're collectively going to spend a lot of effort trying to patch out a problem that we've imposed upon ourselves. We were making such great progress in terms of processing speed until someone came along and decided that we need to have multiple tenants share the same hardware, and they should have no way of knowing anything about each other. The vast majority of consumer hardware will _never_[0] be exposed to this category of attack, but will pay the performance penalty regardless.

Fundamentally, the need is for a completely different model of computation to abstract away time-channel leaks. This cannot be fixed by patching existing software and hardware, and we're going to go through a lot of pain and anguish trying. As another comment points out, the well of possible timing attacks is infinitely deep (attached hardware, network performance measurements, etc.).

The two options are performance or security, pick one. It seems the industry is trying to pick both, and it's going to take us a long time to realize that we're going to get neither.

For clarity - my proposal is segmenting hardware and software products between the two categories of "general purpose, trusted computing" and "safe for shared hosting." The 2nd category is so small compared to the first, it seems unfair that its domain-specific problems should hamper the rest of us.

[0] Thanks to a combination of reasonable software mitigations (unprivileged lower-resolution timers) and that most of these attacks require arbitrary code execution in the first place

> my proposal is segmenting hardware and software products between the two categories of "general purpose, trusted computing" and "safe for shared hosting."

If a normal user is using a web browser and one tab has their bank information, and the other has a suspect website, then you have to be concerned about sharing resources and security.

I'd prefer to see 'trusted' sites being to use javascript only. Heck, even use javascript only after log in.

Alternatively, run them at an isolated context with no websockets, limited access to timing (second or two precision - a lot sampling needed), limited CPU and memory utilization, no sound, no GPU acceleration [likely another large side channel surface], etc. Ahh yeah and delete all their cookies while we are at.

Javascript has turned to a total bloat

Running somebody else's code is putting a ton of trust in them. Maybe someday we'll have great sandboxing, but for now, it makes no sense to let random sketchy origins to just run whatever code they please on your device.

Make these web pages run on a crappy VM. Yank most of the JS bloat and the JIT compilers out and you're good. Maybe insert arbitrary nondeterministic delays into the interpreter as well for good measure. Compared to sacrificing total desktop performance, this is a perfect tradeoff.

This is on the browser to handle though and only confined on the browser's process(es), it doesn't need to affect the entire OS.

But things like scheduling and resource sharing do affect the entire OS.

Yes, so? Unless i'm misunderstanding what you are talking about, these are exposed to (untrusted) web pages through the web browser, the pages do not have a means of accessing it directly. So the browser can take measures of avoiding issues (e.g. decreasing the exposed timing resolution, like browsers now do).

Then you can give your (trusted) native applications all the available resources without artificial constraints.

(of course this assumes you see most native applications as trusted, but personally i see any alternative as too Orwellian to get behind)

> The vast majority of consumer hardware will _never_[0] be exposed to this category of attack

Arbitrary code execution is common in the browser (Javascript and Webassembly). Really, this is any case where you don't entirely trust every program running on the device (e.g. smartphones).

> Thanks to a combination of reasonable software mitigations

That is time protection. Restricting access to system timers wasn't enough here; mitigations also need to prevent user-created high resolution timers, so useful features like SharedArrayBuffer had to be disabled, to prevent the creation of synthetic timers.

> The two options are performance or security, pick one.

That is not the findings of the paper: "Across a set of IPC microbenchmarks, the overhead of time protection is remarkably small on x86 [1%], and within 15% on Arm."

I think if there's a segmentation to be made, it's "general purpose, untrusted computing" and "trusted high performance computing". The second category would be the reserve of such projects as physics simulations and render farms.

> The second category would be the reserve of such projects as physics simulations and render farms.

or smoothly scrolling, 60fps rendered canvas apps in browsers. I think there can't be an apartheid between untrusted and trusted, because developers would push the user to make their software trusted to get the max performance (and the users would just agree).

> developers would push the user to make their software trusted

I don't see it going this way. This is comparable to virtual memory and MMUs. When there is support in hardware, the speed benefit of not using it is negligible (as shown by the 1% difference demonstrated in the research).

When it is not needed, there is a benefit of not implementing it in hardware, and saving power and die area. For example, GPUs (traditionally) and crypto mining hardware do not employ MMUs.

every phone runs javascript in browser

> someone came along and decided that we need to have multiple tenants share the same hardware, and they should have no way of knowing anything about each other

Isolation between apps on mobile phones is really important. That's a huge part of the computing landscape in terms of number of devices deployed, and falls within the 2nd category. I don't think it's realistic to dismiss so easily.

Not the GP, but I think this complaint is about workatation and on-premises server performance. There, the software fixes intended for cloud servers are already throwing the baby out with the bathwater for little to no security benefit. All these changes are accomplishing in that space is waste time and energy by ruining hardware performance.

Very cool ideas. Time protection is indeed a must going forward, particularly for cloud hosting. Not surprised that seL4 got there first either.

Getting this to work in raw Linux may be hopeless given the breadth of the kernel data, but they have smart devs, so maybe they'll figure something out.

And Rob Pike said systems research is dead. Ha!

One issue I see here is that time protection would need to extend to anything shared, not just CPU micro-architecture. For instance, if a hard drive has a DRAM-based cache, that could be used as a timing channel, and the complexity of flash file systems opens up all kinds of potential leaks. In the case of two processes sharing network access, one process could conceivably estimate another's network access patterns implicitly by measuring latency through shared switches or drops due to buffers being filled. Mitigating this would require some kind of coloring support that goes as far as your ISP's switches, which seems impractical.

> One issue I see here is that time protection would need to extend to anything shared,

We might see this issue as an opportunity. That is, by thinking about a concept called "time protection" we expose all these things subsystems are doing and make them easy to argue about. We can now say "Oh good, XYZ improves best-case speed, but sadly it also compromises time protection".

Having such a language means the industry can slowly start improving these things rather than sweeping them under the rug. It will not stop the improvement from being slow and difficult.

I think there needs to be a little bit of contribution by both hardware and software sides. A "sufficiently stupid" application is not possible to protect. We need a set of best practices and guarantees that you wont leak if you follow them.

And there may even need to be a third part. An understanding that nothing can be fully protected.

As an alternative to this approach, I wonder if it's possible to push all sensitive computations into a few small components, and rewrite those components carefully to obscure any information that could be obtained from timing?

Of course. Branchless code equivalents can be written for practically anything, and you can force-prefetch memory regardless of branch(its an intrinsic in most C compilers), though this loses performance and branch prediction benefits.

Wouldn't this approach be cheaper, both in performance and human effort, than fixing the entire OS to hide the timing information?

It would require doing this for any new/existing code exposing timing information, the current timing fixes/isolation patches are much smaller. It would make more sense in user code, like browsers/media players/etc to remove influence of branch prediction regardless of host OS, just like GCC does with retpoline insertion.

Very dramatic graphs.

I wonder when programming languages will start factoring out the ability to check system timers. Some capability-aware languages have already done so.

I wonder if high resolution timers were privileged, we could get by with lower resolution timers. I'm not sure any timing attacks would work with second or even millisecond resolution timers.

I don't see handling this at the programming language could help, and I think that whether timing is privileged or not is built into CPUs so there's nothing we can do about it, but this seems plausibly acceptable to deal with speculation. Permit speculation, but make it privileged to detect if speculation occurred.

Apparently low resolution timers do work, but you need more data.


I don't think so. I think the whole point is that the OS provides time protection in the same manner it provides memory protection: completely transparent to the application. Just like your typical user-mode application does not need to worry about virtual vs physical addresses, I'd say the typical user-mode application would not need to worry about the effect of time as well.

Just a thought, but can't the OS prevent applications from knowing anything about other applications? Rather than isolating apps by flushing/coloring everything, couldn't the apps not know what else is running? Two apps can't communicate, or one app spy on another, if an app doesn't know what else was/is running.

(Not sure why this is wrong, but confident it must be.)

These exploits work on a hardware level, and do not require the malicious app to know what else is running. For example, a VM does not know what other VMs are running on the same host in AWS, but Spectre/Meltdown still affected AWS hosts. They are reading data other apps have written to memory.

Or, don't let timing reveal privileged information? This seems like a sledgehammer for a fly problem.

How is what you're talking about different from what the article describes?

Application vs system level. I think the parent is saying that since the application is in a unique position to know what information is privileged it would be better to make available a library of constant-time functions that are resistant to timing attacks then constantly pay the performance cost of blunter system-level boundary enforcement.

However, I'm not sure how much merit this argument has since very few applications even bother with this level of protection but need it.

That works fine when an application just wants to protect a private key in memory. But if you want to build, say, an application where a user enters via keyboard information that you want to protect from another application, you have to worry about keystroke timing attacks. That means the application needs to hide whether it did anything at all in a given time slice, which can be inferred from the micro-architectural information discussed in the article.

In the famous Intel case, the ability to provide kernel addresses in user space instructions opened the door to exposing kernel data. So, don't permit Kernel addresses in user space instructions? Trap that instead of getting in the way of efficient code execution everywhere.

I would like to see how Boeing and FAA decided to pretend that all is fine after the first crash, there were enough clues that MCAS has issues and I would like to see how was decided that is safe to fly while the software fix was not deployed.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact