The timing attacks result in data leakage between processes, even if the process...

willis936 · on June 26, 2023

Yes but how are the timing attacks an inherent vulnerability and not the result of an accounting error? If the bookkeeping is done properly and the correct amount of cleanup time on a speculated branch is done I don't see how adjacent processes would leak data.

Sirened · on June 26, 2023

This is an idea many have had before but it doesn't quite work. When you do this, you tend to lose all the performance gained from speculative execution. It's essentially data-independent-timing as applied to loads and stores, so you have to treat all hits as if they were misses to DRAM, which is not particularly appealing from a performance standpoint.

This is not to mention the fact that you can use transient execution itself (without any side channels) to amplify a single cache line being present/not present into >100ms of latency difference. Unless your plan is to burn 100ms of compute time to hide such an issue (nobody is going to buy your core in that case), you can't solve this problem like this.

willis936 · on June 26, 2023

Why hits to DRAM? Just use cache for speculated branches. The performance gain of the difference between the length of the speculated branch and the length of the bookkeeping is still there. There are workloads with short branches that would have a performance penalty. In those cases it would be helpful to have a flag in the instruction field to stop speculative execution.

Sirened · on June 26, 2023

It's not that simple. The problem is not just branches but often the intersection of memory and branches. For example, a really powerful technique for amplification is this:

ldr x2, [x2]

cbnz x2, skip

/* bunch of slow operations */

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

skip:

Here, if the branch condition is predicted not taken and ldr x2 misses in the cache, the CPU will speculatively execute long enough to launch the four other loads. If x2 is in the cache, the branch condition will resolve before we execute the loads. This gives us a 4x signal amplification using absolutely no external timing, just exploiting the fact that misses lead to longer speculative windows.

After repeating this procedure enough times and amplifying your signal, you can then direct measure how long it takes to load all these amplified lines (no mispredicted branches required!). Simply start the clock, load each line one by one in a for loop, and then stop the clock.

As I mentioned earlier, unless your plan is to treat every hit as a miss to DRAM, you can't hide this information.

The current sentiment for spectre mitigations is that once information has leaked into side channels you can't do anything to stop attackers from extracting it. There are simply too many ways to expose uarch state (and caches are not the only side channels!). Instead, your best and only bet is to prevent important information from leaking in the first place.

panick21_ · on June 26, 2023

Some timing difference are inherent but if they are exploitable is the real question. There are paper and tools produced that can give you a high confidence that you are not leaking.

Sirened · on June 26, 2023

Much of transient execution research over the years has been invalidated or was complete bogus to begin with. It was extremely easy to get a paper into a conference for a while (and frankly still is) just by throwing in the right words because most people don't really understand the issue well enough to tell what techniques are real and practical or just totally non-functional.

You have to stop the leak into side channels in the first place, it's simply not practical to try to prevent secrets from escaping out of side channels. This is, unfortunately, the much harder problem with much worse performance implications (and indeed the reason why Spectre v1 is still almost entirely unmitigated).