'Remarkably, the patent explicitly states that: "if there is a hit at operation ...

rayiner · on May 31, 2019

The load and store buffers are an incredibly performance sensitive part of the processor. To be maximally conservative you'd fully resolve the physical address of the load and store (including the access check) before allowing the load to bypass a non-aliasing store, or before forwarding store data to a load. But you could be waiting awhile (for example, many processors have multiple levels of TLBs), during which time you've held up a load for no good reason.

The thing to understand about the quoted portion (here is the patent: https://patents.google.com/patent/US7603527B2/en?oq=7%2c603%...), is that you don't know if "the physical address of the load or the store operations is not valid" until long before you get to operation 302. The quoted paragraph is about how you get the correct result even if you optimistically forward the store data to the load.

The gist of the optimization is as follows. Operation 302 checks if the page offset of a load matches a store. Address translation won't effect the page offset (just the page number). So if there is a match, there is a good chance that the operations alias. Now, say you do a store followed by a load. At 302, you may not have the actual physical address of the load and/or store. But if there is a match at 302 (in the page offset), there is a good chance you're just loading from a location you recently stored to. Optimistically forwarding the data from the store buffer to the load allows the load to continue to make progress. Otherwise, you'd have to wait for both physical addresses to be resolved at Operation 310 before the load could continue.

As an aside, I think de-tuning this at the microcode or architectural level is probably a fool's errand. What you need is an architectural mode that basically says "this code needs to be protected from information leakage due to timing attacks." Then you can turn off speculation or whatever in such code.

bdonlan · on May 31, 2019

Having a mode switch isn't enough; potentially any part of the program could contain a spectre gadget, so you'd need to run the whole program in that mode.

A perhaps better approach would be to ensure that speculation aborts actually do clean up the entire microarchitectural state - ensuring that the cache state (and other persistent state) isn't affected by aborted predicted execution.

a1369209993 · on May 31, 2019

Specifically, the question is whether the current process (user or kernel) has mapped anywhere in its address space data the attacker[0] should not be able to access.

0: ie, there exists a entity (other than the end user) that should not have access.

kazinator · on May 31, 2019

Also, good luck working the aforementioned mode switch into every darn programming language, and finding all the places in existing code bases where it has to be applied, and then doing the work.

leoc · on May 31, 2019

It also means that this was laid out in writing for security researchers: but research came there none, before the last couple of years. As I've said before, the speculative-execution CPU bugs really seem to be the financial crisis or replication crisis of computing: https://news.ycombinator.com/item?id=16105385