Hacker News new | past | comments | ask | show | jobs | submit login
CacheOut: Leaking Data on Intel CPUs via Cache Evictions (cacheoutattack.com)
607 points by beefhash 26 days ago | hide | past | web | favorite | 130 comments



Before people get all nutty as usual:

This is another TSX (transactional memory) issue, and you can disable TSX without much of a problem.

The attacker basically needs to be running a binary on the machine (not JS in a browser or anything).

The leakage is extremely slow, about 225 byte/minute for ASCII text (a 4k page in 18 minutes). I'm not sure if that was an exact recovery either, just a probabilistic one. With noise reduction to enhance recovery, they said it took twice as long - so about 113 byte/minute.

It seems to be able to only be able to control the bottom 12-bits of the address to recover (but I didn't fully get why) and require the process to be either reading or writing the data to get it into L1 cache somehow, so just sitting in RAM isn't good enough.

attacker still needs to figure out an address (even with ASLR there is still a lot of guessing, and if you have really sensitive data, just move it every second until you wipe it).

Interesting, but kind of a non-issue.


That's a lot of work, but the instant it's in a rootkit everybody will be able to do it. 113 bytes is extracting an AES key from memory in about a minute.

It's only really hard and messy for the first guy who implements it, after that it is much easier, albeit still fairly messy. I'm not saying we need to panic, but it's more than a "non-issue".


113 bytes is extracting an AES key from memory in about a minute.

How do you know where the key is, and how are you guaranteed to be able to read enough of it before the "shifting sands" that is timing unpredictability and general noise in the system make you read something else?

That's what really irritates me about all these side-channels that have been found ever since the first Spectre/Meltdown --- they all demonstrate something flashy like "we read a key/password/secret/something important from memory in a few minutes" while conveniently ignoring to mention the countless hours spent aligning everything just right so they could show off the one "magic trick". In a lot of the cases there's barely even any control over where it reads.

Every time something like this comes up, I feel compelled to post some bytes from somewhere random in the memory of a random process on my machine to show just how much I care; here you go:

    E8 7F 00 00 00 A1 64 30 40 00 89 45 D8 8D 45 D8
    C4 20 00 00-CE 20 00 00-D8 20 00 00-E2 20 00 00
    F8 20 00 00-00 21 00 00-0E 21 00 00-16 21 00 00
    26 21 00 00-36 21 00 00-42 21 00 00-56 21 00 00
    66 21 00 00-76 21 00 00-84 21 00 00-96 21 00 00
    AA 21 00 00-00 00 00 00-FF FF FF FF-B4 16 40 00
    C8 16 40 00-7C 20 00 00-00 00 00 00-00 00 00 00
    EC 20 00 00-00 20 00 00-00 00 00 00-00 00 00 00
Of course if you are being targeted then the concern may go up, but I still think that attackers would have lower-hanging-fruit than this, seeing as setting up one of these reads to get one secret would itself require such intimate knowledge of your machine's configuration and state that they probably already know what they want.


> How do you know where the key is, and how are you guaranteed to be able to read enough of it before the "shifting sands" that is timing unpredictability and general noise in the system make you read something else?

I don't mean to understate the difficulty of the task, but I've also seen probably tens of repros by now that use intimate knowledge of e.g. the kernel page allocator and known post-boot state of e.g. a firmware image flashed on millions of devices to drastically cut down the search space


What is it that programmers do? They automate complex tasks that take humans considerable effort.

If a university can accomplish this task with their funding and no urgent incentive, what is the nation state actor doing with their enormous budget?


The conventional wisdom is that if a nation state wants your data badly enough it is game over. Even an air gap can apparently be bridged if you have the budget and you're patient and smart enough. It's root kits, booters, ransomware, exfiltration and script kiddies that you have to worry about in practice.


It's not what a nation does to it's own supply chain that a nations supply chain has to worry about.


Yes, and the point is how do we define "badly enough".

Badly enough as in they can ever do it a few times a year against a few high-value targets, or badly enough that they can do it to "only" a few tens of millions a year?

Even end-to-end encrypted communications can be bypassed if the government wants them "badly enough". But the point is that with E2E encryption, they can no longer tap into and data mine the conversions of billions in real-time to fish for crimes.

Just like when fighting malware creators, the point of security is to keep raising the standards and the difficulty for the malicious actors.


> while conveniently ignoring to mention the countless hours spent aligning everything just right

It's really not that hard. E.g. people working on browser exploits have been working on exactly this for years and years, back when just having an out-of-bounds read due to a regular browser bug was the mechanism. Turns out that programs are absolutely chock full of pointers and it doesn't take long to run across one that points to what you are looking for. Especially because programs tend to have lots of data structures that end up pointing to more and more important data structures, funneling you into the guts of the program.

Sure, reverse engineering takes work, but blindly hunting in memory with no clue it is definitely not.

Side-channel attacks are basically a persistent out-of-bounds read mechanism. That is a very bad thing (TM).

There are literally thousands of people who work on this day in and day out, and millions in bug bounty programs out there.


what's your opinion on package management communications? couldn't an eavesdropper on the network deduce quite a bit about your system's configuration? distribution of dates and sizes of packages downloaded seems like quite a leak.

perhaps package servers and package management software should round up the package size to hide the identity of the package? use oblivious transfer to hide the identity of the package from the package server itself?


History has repeatedly shown that "infeasible" attacks are really just difficult.


>but the instant it's in a rootkit everybody will be able to do it.

This makes no sense. If you have the privileges to install a rootkit, there is no need to use any speculative execution exploit.


The first feature of a rootkit is to get root access. A userspace kernel-information leak utility would be very useful for this step.


Okay, we have differing definitions of what a rootkit is it seems.


They mean in the implementation of a rootkit.


we have yet to even find Specter in a kit, this will never see the light of day, especially since it is so easy to avoid (turn TSX off).

You also, don't get to extract constantly. You only get a shot when the data is in the LFB, so the program needs to be actively reading or writing it to keep it moving back and forth from L1 and L2, at least that is the way I read the paper.


The "nutty"ness comes from people getting frustrated with intel fucking up the security of yet another feature they promised, and intel's overall poor security record.

Luckily there are other vendors who seem to know what they're doing: We just received our first batch of non-intel servers at work and have no intention of returning to intel for the foreseeable future. What they offer is simply not worth the risk. I expect more companies to follow suit.


At the level where large, security-centric organizations make decisions, these instances argue for getting AMD in the door like nothing else that I can recall. So, it's a good time for AMD to have such performant hardware and a process advantage. If it stays this way for a decade, I'll be able to put in a PO for AMD hardware...


The problem is, these leaky cache issues come from 'performant' hardware. It is the pipeline optimization that leads to the side channel.

There are cache architectures (such as pseudo random replacement) that are much more difficult to perform side channel attacks on, but have slightly lower performance.

Intel's problem has been its relentless pursuit of per clock performance, and that has led to some of these side channel attacks. Cache coherency on multi-core is hard.

My guess is that if the market share of Intel and AMD were swapped, you would see similar cache attacks on AMD.


When the market rewards secure design, you can make that complaint. Since the market does not reward secure design, expect investments in security to be commensurate. You cannot have it both ways.


The market is noticing that we're losing a whole lot of performance on Intel to these security-related mitigations, though...


Intel reports record profits again.


This doesn't mean that Intel hasn't been affected negatively by the resurgence of AMD in the processor business. AMD is taking several billion dollars of revenue, and every dollar of that costs Intel much more than $1. A noticeable part of AMD's newfound competitiveness comes from the various performance taxes that Intel has had to pay for speculative execution mitigation.

It's worth noting, too, that Intel has diversified-- the processor business is only one very important business to Intel.


Profit in infrastructure is a delayed indicator, it talks about how the company was doing, not how is it doing now.


Thank you. Apparently, the stockholders agree with my assessment.



I work for Intel (not speaking in my official capacity) and the AMD scare here is real.


Not a complaint but merely stating the obvious.


What do you do? The vulnerabilities don't apply to a vast majority of situations (meltdown was about the only one and it has been patched - more of a big).

Unless you are running untrusted VM from others, you're not making a rational, knowledgeable decision.

Most people are just using this to falsely justify their Intel hate, similar to Microsoft in the 2000s.


By your own numbers, this could translate to a 30-second AES key exfiltration in the cloud. This isn't a non-issue, even if you personally aren't affected.


you have to find the address first, which is a lot of rummaging around. you aren't just handed it. that is going to quite a bit, especially the program build any security measures in (eg, allocate at a random address). good luck with that.


it's easy to say vulnerabilities don't matter... when responsible disclosure results in fixes for those specific relevant systems for which actual executable Proof of Concepts were designed. The authors have the skill and knowhow to do this for new systems, and the surrounding knowhow and skills are not gained by merely reading the whitepaper that focusses on the exact nature of the side-channel. Similarily a paper from the field of say physics, will not hold your hand by having a huge appendix filled with the physics courses, exercises, ... the authors enjoyed as a student...


Spread the key over a whole page with fec, implement the key recovery using non saved registers. Would be nice to avoid pre-emption and avoid leaking regs to ram. Perhaps last step could use avx to run a permutation.


> It seems to be able to only be able to control the bottom 12-bits of the address to recover (but I didn't fully get why)

2^12 = 4096, or the x86 page size in bytes.


And as for why the page size is relevant for the L1 cache: Intel (and AMD, and most other modern CPUs) use a VIPT (Virtual Indexed Physical Tagged) L1 cache, where the virtual address (before the page table translation) is used to index the cache (this is faster since it can be done in parallel with the TLB lookup to get the physical address). To prevent the confusing situation where the same physical address has more than one index in the cache, only the bits of the address which do not change between the virtual address and the physical address can be used; these are the bits which correspond to the offset within the page.

(As an aside, this also limits the size of the L1 cache, which is why it hasn't grown much despite the L2 and L3 cache growing a lot; an 8-way set-associative VIPT cache with a 4KiB page size is limited to 32KiB, absent tricks like page coloring. Perhaps this will change if 64-bit ARM servers become popular, since they can only address the largest amount of memory when using a 64KiB page size, and this would make enterprise distributions default to that page size.)


Thanks for the explanation, but I‘m not sure I got why this can happen:

> To prevent the confusing situation where the same physical address has more than one index in the cache


You can have several virtual addresses mapped to the same physical address. However, if you use bits that are not 1:1 mapped between those two to index into your cache, then you might end up with one virtual address getting cached in one place in L1, and the other getting cached in another place in L1. That would be bad, because you'd have two potentially inconsistent caches of the same memory.


> You can have several virtual addresses mapped to the same physical address.

Ah yes, of course. Is this something that actually happens (e.g. because of processes sharing memory, OS hands out different virtual addresses pointing to the same physical memory) or theoretical in nature?


It does happen, and not just for two different processes.

For example, a common trick to implement an efficient circular list for objects of varying size is to map the same physical addresses twice in adjacent locations. Then you can just use a pointer to the start of each object and have no special handling for the case where an object straddles the end of the allocated area.


Does this mean we could see 16x larger L1?


All of hardware is tradeoffs.

A huge L1 would be slower to access. You might end up slowing down all memory accesses by a cycle (or more) just for accessing the large cache. You also have to find some place to put the thing in your floorplan, and route everything else around it. This may result in timing challenges in other parts of the chip, which might require additional delay to resolve.

A huge L1 would take up more space. This would increase area, reduce yield, and increase cost. Since in a multi-core chip each core has its own L1, you will have to pay this cost multiple times. Also, L2 caches are typically inclusive, so you would potentially need a much larger L2 to be able to accommodate all the extra information in these L1s.

These tradeoffs have to be studied with simulated experiments to make the right call. For programs with huge working set, maybe the added latency pays for itself because you have have fewer cache misses. For programs with good locality, maybe you end up losing performance. Maybe you save power by reducing misses, or maybe you waste more power because the cache is using more power. Maybe it's an insignificant area increase, or maybe it completely blows up your budget.


That first line is key to so much. All of hardware is tradeoffs. There is no magic.


"It seems to be able to only be able to control the bottom 12-bits of the address to recover (but I didn't fully get why)"

That looks conspicuously like the load/store port address aliasing size (look up "4K Aliasing"), which can be used to stall data availability while the conflict is resolved. I'll read up on this particular one, but there's a growing family of vulnerabilities with 12-bit address aliasing in their toolbox.


Reads data (indirectly) from cache - you can select cache slot.

It is on you to make sure something interesting is in the cache, but if you can make your target execute, 12-bits is fairly good selectivity.


The paper talks about how in a VM environment, this attack allows reading the hypervisor’s memory, and that public clouds put in mitigations based on the early disclosure to them.

That feels like a big issue.


How do you disable TSX? I don't see a control register flag or an MSR.


MSR IA32_TSX_CTRL, introduced with a microcode update a while back.


Wow, it's not even in the SDM. Only docs I can find are: https://software.intel.com/security-software-guidance/insigh...

Glad it's possible to do this in virt now: used to be impossible to disable TSX (you could avoid advertising it via cpuid, but you couldn't actually prevent usage of the instructions).


>CacheOut violates the operating system's privacy by extracting information from it that facilitates other attacks, such as buffer overflow attacks.

>More specifically, modern operating systems employ Kernel Address Space Layout Randomization (KASLR) and stack canaries. KASLR randomizes the location of the data structures and code used by the operating system, such that the location is unknown to an attacker. Stack canaries put secret values on the stack to detect whether an attacker has tampered with the stack. CacheOut extracts this information from the operating system, essentially enabling full exploitation via other software attacks, such such as buffer overflow attacks.

Can anyone explain this scenario? Is this really a realistic scenario? Do they mean if you have code execution on a system, and want to escalate privileges, you would find another network/socket service that is running on the same system, find an exploit in this service, and then leak the stack canary to allow corrupting the stack? There's often easier ways to defeat the stack canary.


Yes, it's a local privilege escalation issue, i.e. you must have local code execution first. Leaking KASLR/stack canary just mean you get 90s level triviality of attacking any stack-overflowing API you can find. Without this bug, if you found a stack-overflow you could trigger from your local unprivileged code, the target would likely detect the overflow due to the canary. If you defeated that mechanism, constructing shellcode might be more difficult due to (K)ASLR. With this bug, neither stack canaries nor (K)ASLR are effective defenses against unprivileged programs.


Sure, I was just kinda confused that this was the example they presented. I guess for highly targeted attacks, it might be somewhat useful.

>Leaking KASLR/stack canary just mean you get 90s level triviality of attacking any stack-overflowing API you can find.

It does not seem trivial with this exploit, but maybe I'm just not getting it. With the low accuracy and transfer rate, it seems like a lot of stars need to line up with regards to how the service you attack function.


Did AMD design their processors with these side channel attacks in mind, or is it a matter of where the security research is focused?


Mostly, it seems to be a difference in design philosphy - AMD processors prevent speculative reads of data that shouldn't be accessible almost everywhere, whereas Intel ones allow them pretty much everywhere. This particular attack requires TSX which AMD processors don't have, but I don't think it'd work on AMD processors anyway because they're not missing the security check that Intel ones are. If I remember rightly, there were other now-mitigated exploits for this line fill buffer leak that didn't require TSX.

(The one exception is also interesting. AMD processors allow speculative reads past the end of x86 segments and past BOUND instructions, which of course no-one uses these days. This suggests there may have been a deliberate decision to block them in the more important cases.)


It is known since... forever?, that one should not speculate across security boundaries. This was not enough to avoid Spectre, but this was largely enough to avoid TONS of security vulns Intel is also susceptible to.

Somebody messed-up big time. Or from a business point of view did they? Intel current problems are manufacturing and the continuously lower power of their "legacy" processor (except due to manufacturing problems, this "legacy" is still mostly the current one) makes it so that people are buying more. Of course there is AMD back in the game, but the market demand is large enough; plus AMD would have been there anyway, and, in the fiction that Intel did take good parts of the perf hit upfront instead of the secu vulns, as competitive as in the current situation.

The people most annoyed are the users. Intel got away by pretending this was not really defects in their product but only new SW tricks that they will help defend against, and their clients just let them say that without much complaint (well, I guess big ones got some rebate...) but security researchers and/or processor designers know very well this is bullshit (see the vulns papers and FAQ) and that they simply fucked-up big time on Meltdown, MDS, etc. I don't care that a few other vendors did some of the same mistakes: they are still mistakes and design flaws, and not even something new.

Pretty much the only new shinny thing in this stream of vulns was Spectre and the few variants that appeared quite early on (but NOT Meltdown&co). The rest are design flaws that comes from the "oh not a big deal to leak that potentially privileged data, we will drop anything and trap before any derivative can go out anyway" mentality. Yeah, no, I'm sorry but the funding paper about speculation already told to not do that :/ Either they did not do their homework, or they voluntarily chose to violate the rule.


>It is known since... forever?, that one should not speculate across security boundaries.

... if you value security over raw performance. Clearly Intel has decided at some point that it was worth playing with fire in order to get ahead in benchmarks. In their defence it seems to have worked reasonably well for them for quite a while.

>The people most annoyed are the users.

I wish, but I wonder how much of that is true. Are most users even aware of these problems? They get patched automatically by OS vendors and then most of the time they won't hear about them anymore. I think the "nobody gets fired for choosing Intel" will probably still prevail for quite some time.


"In their defence"? The fact that things worked well for the perpetrator (until they were caught) at the expense of the victim isn't an argument usually brought up by the defence.


It is when the only "court" is how much money you can make.


And you are accused of being stupid.


For most of Intel’s existence people ran fairly trusted workloads side-by-side. It wasn’t until “the cloud” that things really changed.


I wonder how much of Intel's long-term perceived lead over AMD in performance is explained by compromising on these security issues that are only in the past few years being (known to be) exploited.


This might be true in recent history, but Intel's Core architecture had AMD on its heels for several years.


You have to distinguish between mitigating an issue in software/firmware, and fixing it in hardware. You can't unfab a chip, so the software/firmware mitigation is necessarily crude. The actual performance advantage Intel was getting from "cheating" was most likely small.


"Faster than possible" seems really appropriate here.

They built a bunch of tech debt into their processors to boost their numbers, and now they hens are coming home to roost.

What I'm wondering is how many changes this will make to their product roadmap, and to what degree it will make next generation chips look lackluster compared to what people (think they) have now.


In this case AMD just doesn't support TSX instructions. For previous vulnerabilities with Meltdown I'd guess that AMD just didn't feel they had the engineering resources to do after the fact validation of memory access permissions without architecturally leaking data in some weird corner case. But then along comes Meltdown and it turns out that non-architectural leaks are a thing that can happen and if you try to do after the fact validation you can't prevent them.


> AMD is not affected by CacheOut, as AMD does not offer any feature akin to Intel TSX on their current offering of CPUs.

It seems to be down to the notoriously buggy TSX (hardware transactional memory) in Intel CPUs.


It's interesting that TSX seems to be one of those holy grails that causes more problems than it solves - trying to implement TSX caused, I believe, huge problems for Sunacle's Rock processor team.


It is specific to Intel TSX, which should already be disabled due to earlier published MDS vulnerabilities in the same space.


Only the implicit TSX has been disabled. You can still use xbegin/xend/xabort.

They had an additional mode that would transparently convert many spinlocks into transactions without code changes - that is now gone.


"Should" as in, I think the earlier vulnerability was damning enough that people /should/ have disabled TSX entirely. If they have not already, they /should/ now.


TSX can provide very important speedups where there aren't adversarial workloads sharing memory. Your web browser shouldn't be using it, but I think turning it off globally doesn't make much sense. Unlike hyperthreading use of TSX can be decided on an app to app basis.

As core counts increase spinlocks and other synchronization primitives simply become too expensive. We'll need transactional hardware support eventually.


It is dangerous to treat the security domain as app-to-app instead of whole-machine given the scope of vulnerability. If your TSX workload runs on dedicated machines and/or you're ok with the reduction in kernel defenses, sure, enable it. But the default should be "off."

Scaling workloads does not require transactional memory and certainly doesn't require a vulnerable implementation of it. HTM might be the easiest way to scale a relatively naive algorithm, but the most scalable synchronization is none at all (or vanishingly infrequent) — and that works just fine with conventional locks and atomics (both locked instructions and memory model "atomics" such as release/acquire/seq_cst semantics).


The problem TSX is attempting to address is bus traffic from those exact lock instructions you mention. These work OK for the relatively small core counts of today but won't scale to hundreds of cores.

TSX brings hardware supported optimistic locking and breaks the latency imposed by MESI and related protocols in use today. Of course its great if you can get away with no synchronization at all - but then you might as well just use a GPU. TSX helps with those non-trivially parallelized problems that are still best performed on a CPU.


I explicitly mentioned memory model atomics in addition to locked instructions in an attempt to prevent getting hung up on locked atomics. I guess that didn't work.

Obviously many workloads require some coordination, but often something as trivial as allocating one of a given resource per CPU is sufficient to avoid most contention even on 100s of CPU core machines. Profile; improve. The same is required with HTM.

Regardless of your thoughts on HTM and scaling technology, TSX is broken from a security standpoint, which is the primary subject of the fine article. HTM != TSX.


I agree with your conclusion that TSX needs to be disabled. I mean, I don’t know why anyone is arguing against that, since it follows directly from this attack; the alternative is to clear the L1 cache across security boundaries, but that’s probably slower than just coping without TSX. The case for disabling TSX was somewhat weaker before this latest finding, but not that much, and the point is moot now. Though I do wonder if someone will find a different way to leak data from the same buffer that doesn’t rely on TSX. (Didn’t the original ZombieLoad paper already find some, but they were amenable to kernel-side workarounds?)

I’m going to nitpick, though.

> I explicitly mentioned memory model atomics in addition to locked instructions in an attempt to prevent getting hung up on locked atomics. I guess that didn't work.

By “memory model atomics” do you mean atomic loads and stores rather than, say, compare-and-swap or atomic-increment? Because C++11 compare-and-swap and atomic-increment operations take a memory ordering parameter, yet they still generate the same `lock cmpxchg` and `lock inc` instructions.

But the issue isn’t locks; none of those instructions actually lock the entire memory bus like in the old days (...unless you pass an unaligned address!). They lock the cache line, which causes cache line thrashing if many processors do it at the same time, and that’s the biggest source of overhead. But plain stores also lock the cache line. A compare-and-swap is more expensive than a plain store, but not that much more.


That's putting alot of faith in the OS and it's ability to sandbox correctly.


Especially when this publication shows that sandboxing correctly isn't possible with TSX.


Amd still has almost no server marketshare. This attack specifically leverages tsx, which is an Intel set of transactional extensions to the cpu microcode. Intel has since published microcode updates to enable you to disable tsx.


> Amd still has almost no server marketshare.

Yet another Intel exclusive side channel vulnerability might help change that. This side channel stuff is terrible for cloud operators. Every time they have to adopt another layer of mitigation some fraction of their capacity disappears in a puff of shame and excuses.


Not sure you've been keeping up but in 2020 most major server makers have switched or at least offer an epyc varient because the price/performance is better even without Intel's constant security woes


Yes but most enterprise customers have reacted by buying more Intel chips, because the cost of migrating infrastructure is much greater. Once you're invested deep into the omni-path level interconnects and spend 100k+ on cables alone, buying a few extra blades isn't a huge deal. Speaking as someone who has sat in on meetings where we negotiate with vendors for 1ku pricing.

The big migration will happen in about 2-4 years. Typical enterprise schedule is 3-5 year cycles, zen2 launch was 2019.


This appears to be one of the four ZombieLoad/RIDL variations, rather than a new attack: https://zombieloadattack.com/


Yes, this is a RIDL variant called "L1D Eviction Sampling". Bolt the RIDL paper (Addendum 2 B [1]) and the CacheOut paper refer to the CVE-2020-0549: "L1D Eviction Sampling (L1Des) Leakage".

But the CacheOut paper describes how to use it in practice and why the intel fix is not sufficient.

[1]: https://mdsattacks.com/files/ridl.pdf#page=20


I mean, you gotta appeciate their efforts to make something so technical approachable by the general population, including gems like this in their Q&A: :-D:

What is an operating system?

An operating system (OS) is system software responsible for managing your computer hardware by abstracting it through a common interface, which can be used by the software running on top of it. Furthermore, the operating system decides how this hardware is shared by your software, and as such has access to all the data stored in your computer's memory. Thus, it is essential to isolate the operating system from the other programs running on the machine.


> An operating system (OS) is system software responsible for managing your computer hardware by abstracting it through a common interface

So uh... that's a reference to the IME right? Not the user's installed OS.


No they're speaking of the (user installed) OS.


I was kind of joking and it's also not explicitly stated, that's just the assumption the reader makes.


Tl;dr: Another Intel TSX async abort side channel.

See also Intel's advisory: https://software.intel.com/security-software-guidance/softwa... (cite 23 in the CacheOut PDF).


As someone unfamiliar with creating scientific papers I'm curious if anyone knows what format the citation modal is using?

https://imgur.com/a/bem4e8u

Is it for TeX or a common syntax for some publishing platforms to pick up?


Bibtex. Originally for the bibtex tool (which generates bibliographies for (La)TeX documents), but now common with other citation tools too.


That's bibtex, for LaTeX.


it's like all the security researchers are cheering for AMD


Dumb question: authors mention this not being exploitable via JS, but what about WASM?


No, WASM doesn’t expose the ability to generate TSX instructions so it can’t be used to exploit this.


Is this coordinated with OS providers? It only mentioned that Intel released microcode.


Is disabling Hyper-threading a viable work-around in this case?


"CacheOut is effective even in the non HyperThreaded scenario, where the victim always runs sequentially to the attacker."


Intel got cash in, but we've got cacheout.


I've pointed this problem out early last year to libsodium, but they chose to ignore it.

My memset_s does a full memory barrier, but no others do. Esp. the so-called "secure" libs.


> I've pointed this problem out early last year to libsodium, but they chose to ignore it.

I'm guessing you're referencing this [0]. libsodium's response that your suggestion wasn't enough seems reasonable, as does their response:

> `sodium_memzero` should be considered as best-effort, with reasonable tradeoffs.

The documentation for memzero [1] doesn't read like it works 100% of the time, no matter what. It reads like there are tradeoffs involved.

> My memset_s does a full memory barrier, but no others do. Esp. the so-called "secure" libs.

Your memset_s also doesn't work on big endian, whilst "the so-called 'secure' libs" often do.

I'm fine with self-promotion, but don't disparage efforts that have a much larger surface that they deal with, simply because you disagree with the tradeoff they find acceptable. A one-liner isn't enough for that conversation.

[0] https://github.com/jedisct1/libsodium/issues/802

[1] https://download.libsodium.org/doc/memory_management#zeroing...


The tradeoff they are talking about is leaving secrets in the cache, because mb(lb) is slow. This is not acceptable for security, only for performance. That time they had no idea about the upcoming cache exploits, they were only apparent to me. So who was right?


> That time they had no idea about the upcoming cache exploits, they were only apparent to me. So who was right?

I don't see anything that suggests you did have an idea about "upcoming cache exploits", so I'm afraid I can't say that you were.

I saw you mentioning something that would help, but was nowhere near enough, to deal with Spectre, which was then dealt with by microcode, kernels and every other part of the software stack, making attacking libsodium with Spectre something that might not even be possible.

> The tradeoff they are talking about is leaving secrets in the cache, because mb(lb) is slow. This is not acceptable for security, only for performance.

Not entirely. The main reason is this one:

> SPECTRE-like attacks can be conducted during all the time the secret is present, prior to zeroing. The required preconditions and the time window during which a full memory barrier could help, seem to be negligible compared to the actual lifetime of the secret.


Which would be more difficult, reading the secret at some point before the memory was cleared, or reading it _after_ the memory was cleared, but _before_ the cache was overwritten?


The secrets stay in the cache much longer, and are rarely evicted to the heap at all.


Cryptographic operations can take hundreds of thousands of cycles during which the values can be freely read. The cache will stay stale after the values have been cleared for _longer_ than that?


At least one of the 3 caches will stay much longer, yes. Esp. the L3. Throwing a tantrum because of 50 cycles for the mb is laughable. They should call it fast crypto, but not secure crypto.


Again, I reiterate: it's not humanly possible to make performant, general purpose CPU that is 100% safe from instruction level attacks.

Untrusted code execution must go, that's the only way.

The genie is out of the bottle now. There will be more and more practical instruction level attacks with each year now.


> Untrusted code execution must go, that's the only way.

This is how you get corporate rule over what can and can't run on machines you own.

Safer architectures can be developed, without handing over control to a third party.


I don't believe that even a complete comparch-101 in-order, non-pipelined, architecture without branch prediction, and register renaming can be made safe enough.

But yes, the industry made an extremely risky bet a decade ago with both virtualisation, and running unsafe Javascript with JIT.

It will take many billions for the industry to do a U-turn, and switch back to dedicated servers as a golden standard, and putting a leash on ambitions of browser makers.


> I don't believe that even a complete comparch-101 in-order, non-pipelined, architecture without branch prediction, and register renaming can be made safe enough.

Disagree. You do have to put a lot of work into the process. Formal spec as well as a formal method/simulation. There are certainly a lot of fun things to consider in that, but I don't think it's completely unfeasible.

As an aside, one of the things that intrigues me about RISC-V is things such as a formal instruction set spec being openly published [1]. You could apply all sorts of tooling to that before actually creating silicon.

> But yes, the industry made an extremely risky bet a decade ago with both virtualisation, and running unsafe Javascript with JIT.

Focusing so much on Jitting JS was as bad idea then just like it was now. The entire world of the web is artificially propped up and will probably die the way it should have died when it started: with people frustrated by constant security vulnerabilities enabled by a group of ad companies who don't care about anything but money.

> It will take many billions for the industry to do a U-turn, and switch back to dedicated servers as a golden standard, and putting a leash on ambitions of browser makers.

WRT Dedicated servers, if anything the industry needs that push now anyway. I've yet to see a 'Real' cost projection on a SaaS 'rewrite'; i.e. if the current thing has been in use 5 years longer than it should have, your cost models should go out 5 years longer than you intend your new solution to exist.

Which would be ironic; my pain is in the beancounting side (usually what an org really cares about) yet security will be the more likely scenario.

I think the cat is out of the bag for the browser market though. Users have on the whole gotten 're-enclosed' thanks to modern laptops tending towards small eMMC or SSD sizes.

But FFS, this sort of thing is the reason WASM scares the daylights out of me.

[1] https://github.com/mrLSD/riscv-fs


Can't edit, but obviously the web won't die. But it is going to change a bit over the next few years still; the privacy changes are likely to be a big catalyst for a lot of change...


Any idea if the fundamental incompatibility between GDPR and "cloud" is ever going to come home to roost? (You know, how personal data is supposed to be deleted on request, but the whole idea of "deletion" is pretty hard to properly execute in practice without physically destroying the storage, especially for transistor-based storage like SSDs!)


> especially for transistor-based storage like SSDs!

It's actually much easier to render deleted data unrecoverable on a flash based SSD than a hard drive, if you're willing to use only SSDs that eschew a few very common performance optimizations. So that fits right in with the theme.


Is simply unlinking a file in the FS not sufficient for GDPR compliance?


It's supposed to be erased, not put in a pile of “things to write over at a later date”.


This is how you get corporate rule over what can and can't run on machines you own.

He isn't necessarily talking about the "untrusted by corporates" situation, maybe just "untrusted by the user".

(Personal pet peeve about "trusted" and "untrusted" --- never neglecting to mention the by who!?)


Every consumer-facing trusted computing platform only allows keys from corporate entities, and usually only software signed with approved keys from said entities are allowed to run.

I can't get behind trusted computing until there is a strong movement and culture behind allowing user freedom on trusted computing platforms. As it stands, that isn't the case at all. Just look at the iPhone and Notarization on the Mac.


>Untrusted code execution must go, that's the only way.

Ironically, you have to enable third party javascript to even view the page.


The page is perfectly readable with no JS at all.


Not "perfectly" at all. The abstract is readable as is the click to the PDF, which is readable. The FAQ list below doesn't open, and the icons which drive the open/close actions appear as "tofu" characters.

However, the data is in all in the document source; the JS just plays with the visibility. I.e. there are no dynamic tricks to populate that.


I don’t think so. Side channel attacks became recently more mainstream and people start designing stuff with it in mind.

What you see here is yet another attack on a relatively unused Intel-only x86 extension.

The bigger problem is that Unixes and Windows are pretty bad at sandboxing syscalls by default.


I do remember there 100% were academic papers throwing barbs into Pentium 3 branch predictor security back in nineties. They just went unnoticed for there being no Google logo on the paper.

People in hardware engineering community knew of their existence long ago. It was just them not being practically exploitable that kept them from headlines.

But now we have a multibillion buck virtualised hosting industry, and JS with JIT in every browser — a million buck incentive for black hats to poke into architectural vulnurabilities


> back in nineties. They just went unnoticed for there being no Google logo on the paper.

Google wasn't nearly as big a deal back in the 90's (hell, they were only really around the last couple years of the decade) so that's a weird association to make.


You can avoid instruction-level attacks by putting untrusted code on an isolated core with an isolated memory module, right? That seems more likely to me than everyone disabling javascript.


> You can avoid instruction-level attacks by putting untrusted code on an isolated core with an isolated memory module, right?

No, unfortunately. Side channels will work even across numa domains, and cores with own memory.

Side channel free hardware is extremely hard to do even in simplest ISAs specially made for that. Look at how credit card industry keeps struggling making safe smartcards.


Any sufficiently powerful side channel that works across a network goes beyond "running untrusted code", though. It's a whole different level of problem to solve.

And we should be able to eliminate anything weaker.


Maybe with the recent influx of larger core count systems, this isn't entirely unreasonable in the future.


These CPU vulns have all been related to performance optimizations. Do you have any reason behind your claim that a CPU without such optimizations would be vulnerable?


It's harder to believe that there is _no way to leak information_ than the opposite. There is always a side channel, right? Timing is the most trivial one.


When every instruction takes the same amount of time (no speculative execution or cache sharing), then no, there is no timing attack. There isn't any reason to support the claim that all CPUs are fundamentally vulnerable to these types of attacks.


Then it should be written on the product packaging. Big letters.




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: