
Intel disables Hardware Lock Elision on all current CPUs - my123
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2955f270a84762343000f103e0640d29c7a96f3
======
wahern
The title "Intel disables hardware lock elision on all current CPUs" seems too
broad. Intel is disabling the backward ISA compatible implicit HLE capability
--I can't remember exactly how it worked without looking it up, but IIRC it
was a hack that leveraged existing cache coherency and ISA semantics to permit
optimized spin-lock implementations that _without_ _feature_ _detection_
_still_ _worked_ _correctly_ on older chips.

Explicit TSX instructions are now merely opt-out, where previously they
couldn't be disabled by the kernel. The posted patch doesn't seem to disable
it but simply adds some of the bits needed to do so.

EDIT: Okay, I guess it _is_ disabling Hardware Lock Elision (HLE), proper. But
"hardware lock elision", lower case, implies all TSX-based lock elision code
will stop being optimized. AFAIU RSX (explicit TSX) is usually only used for
lock elision, anyhow; it's more of a microcode and speculation hack than a
generic transactional memory feature, so the dividends quickly become elusive
for anything beyond lock elision.

~~~
my123
> The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
> disabled by the new microcode but still enumerated as present by
> CPUID(EAX=7).EBX{bit4}, unless disabled by IA32_TSX_CTRL_MSR[1] -
> TSX_CTRL_CPUID_CLEAR.

Disabling HLE makes the code still works because of the backward ISA
compatibility... but you still lose the performance benefits of HLE.

Also, disabling it but still keeping it shown in CPUID...

~~~
the-dude
Is anything know about how much performance will be lost?

~~~
NicoJuicy
[https://twitter.com/damageboy/status/1194751035136450560?s=1...](https://twitter.com/damageboy/status/1194751035136450560?s=19)

Up to 20% for Array.Sort

~~~
asveikau
Why would Array.Sort have locking in need of elision?

[I'm reminded of the old Java "Vector" or "StringBuffer" that put a lock
around every operation "because thread safety is good" which had high cost for
single threaded usage. But this is likely not what this is about.]

~~~
saagarjha
It’s a different bug that requires extra padding for branching instructions.

------
comex
Shame. When TSX was announced in 2012, I started looking forward to a day,
years down the line, when it would be ubiquitous enough that I could write
code that depended on it. (At least on Macs, which don't run AMD processors.)
The first processors supporting it shipped in 2013... but then in 2014 Intel
abruptly disabled it in all then-shipping processors via microcode update, due
to an erratum. Eventually they started shipping processors without the bug,
and 4-5 years passed, a decent chunk of the time needed for ubiquity. But now
the clock's reset to zero once again.

Though, apparently Intel always disabled TSX on i3 processors for product
segmentation, so maybe universal support was never in the cards...

I wish more CPUs would support at least a limited version of this. Basically,
I want the equivalent of atomic<shared_ptr<T>>, but lock-free. That requires
reading a pointer value from memory and then incrementing a reference count
stored at that pointer, as a single atomic operation. I'm pretty sure that's
doable with TSX, although I haven't actually tried it.

~~~
roblabla
FWIW, my understanding is that XBEGIN/XEND/XABORT are still available on the
affected CPUs. Only XACQUIRE/XRELEASE are disabled. So the clock isn't really
reset.

I, too, wish TSX was more ubiquitous. I'm working on a kernel, and was hoping
to use TSX to greatly simplify the logic of safely reading user memory from
kernelspace - catching various exception cases without going through the real
exception handler.

Turns out, none of the machines I have at my disposal have TSX - not my
Desktop PC nor my server machine. So, RIP that plan, I guess.

What you want is absolutely doable with TSX. The overhead might be significant
though. I wouldn't be surprised if locking a mutex was faster.

~~~
jdsully
You can’t simplify the logic because TSX transactions may abort for any reason
and make no promises of forward progress. You must implement a fallback
codepath.

If you meant having an optimized codepath then that is doable. But given your
writing a kernel there may be microarchitectural hazards that trigger
excessive aborts.

~~~
roblabla
Yeah, at first I wanted to use TSX exclusively, but I came to the same
conclusion you did while digging into it more deeply.

I still think having a fast path using TSX could be useful, but since I never
had a CPU with it, I never had a chance to benchmark it.

~~~
jdsully
In some cases it can have a huge speedup. But it's rather tricky to get right.
So far for almost all of my use cases the transaction sizes have been too
large, and it almost always aborted.

The RPCS3 PS 3 emulator saw a 40% perf boost so some amazing gains are
possible.

[https://www.phoronix.com/scan.php?page=news_item&px=RPCS3-In...](https://www.phoronix.com/scan.php?page=news_item&px=RPCS3-Intel-
TSX-Support)

------
jakeogh
Is [https://github.com/speed47/spectre-meltdown-
checker](https://github.com/speed47/spectre-meltdown-checker) the best tool?
Impressive chunk of sh.

~~~
izietto
What do you have against that script? It's a long script but it's pretty clean
and readable

~~~
jakeogh
Oh, nothing! I think it's beyond excellent.

I ran shellcheck to see if it was bash or sh and it didnt make a peep:). My Q
is just asking if there is another similar quality tool. I see how you could
interperit it the other way, thanks for asking.

~~~
izietto
Oh I see now, I misinterpreted the sense of the question :)

------
shifto
Is there a place where I can find how much performance my Haswell CPU has lost
due to all of these 'fixes'?

~~~
jotm
Haswell doesn't have this feature

~~~
saagarjha
I believe Haswell _does_ have Hardware Lock Elision, but it was disabled by a
microcode update.

~~~
jdsully
It was disabled because it didn’t work. So I guess it’s in the eye of the
beholder.

------
mehrdadn
Uh, so there's absolutely no way to avoid disabling HLE (unless you block
microcode updates altogether)? Can people who paid extra for a computer whose
CPU has HLE get a refund?

~~~
age_bronze
You can avoid it. They didn't disable it, but implemented MSR which disables
it. Don't configure this MSR to disable HLE and you're good.

~~~
mehrdadn
But it says _" Hardware Lock Elision (HLE) is unconditionally disabled by the
new microcode"_?

------
zatertip
[https://ieeexplore.ieee.org/document/6877452](https://ieeexplore.ieee.org/document/6877452)
describes a speedup of 41% when using TSX under a specific HPC workload.

[https://www.phoronix.com/scan.php?page=news_item&px=RPCS3-In...](https://www.phoronix.com/scan.php?page=news_item&px=RPCS3-Intel-
TSX-Support) describes a 40% speedup for an emulation workload with TSX.

Do we know what the slowdown will be for current gen CPUs for various
workloads?

~~~
bonzini
As far as I know the only software that really wants it is SAP HANA. The
slowdown should be small, that 40% speedup on emulation is specifically on
load-locked/store-conditional instructions, and the linked blog post says
"non-TSX CPUs such as Ryzen [had] a noticeable improvement in performance,
although not to the same extent".

------
topbanana
How do microcode changes actually work? My mental model of a CPU is hard baked
logic paths.

~~~
iforgotpassword
Modern CPUs are more like VMs. The actual architecture is totally different
from x86 and just pretending to be x86 to the outside world. X86 instructions
get translated to the native instruction set of the CPU which is more or less
"secret". This makes it very easy to patch issues with the CPU through
microcode, as seen in this case.

~~~
nwallin
I would go so far as to say that modern CPUs are emulators, not even VMs.
Something like 20% of the die area is devoted to instruction decoding.

It's the only reason I'm interested in RISC-V or advanced arm stuff. Even
there's a metric fuckton more effort going into the latest Intel x86_64 chip,
there's a lot of silicon left on the table.

------
jstanley
How do the microcode upgrades get delivered? Do people have to manually
install them, or do Intel have some way to force a microcode update over the
Internet?

~~~
microcolonel
On Windows it's handled in Windows Update, and maybe there's a way to disable
loading the new microcode. On Linux, it's explicitly provided to the kernel by
whatever userspace you have. On Arch, for example, it's in a separate package
called intel-ucode.

Some board firmware loads updated microcode when it's updated. It has to be
loaded at each boot by software in order to change.

~~~
Darkstack
On windows you can disable / enable some of the mitigation via the registry.
[https://support.microsoft.com/en-us/help/4072698/windows-
ser...](https://support.microsoft.com/en-us/help/4072698/windows-server-
speculative-execution-side-channel-vulnerabilities-prot)

------
Too
Read the whole thing and though it was an article, reached the Signed-off-by
at the end and realized it was a commit message.

This is how a good commit message should look like! Telling what, why and how
something was fixed, instead of just "fixed X".

------
Forge36
Is this related to:
[https://news.ycombinator.com/item?id=21534232](https://news.ycombinator.com/item?id=21534232)

Or do we have two bugs being fixed at the same time?

~~~
saagarjha
It’s a somewhat unrelated bug that also had a workaround released for it
recently.

------
ddtaylor
Does this have any affect on AMD systems?

~~~
_-___________-_
While the link is to a Linux commit message, the described change (disabling
HLE) is implemented in an Intel microcode update, which of course will not
have any effect on an AMD processor.

------
krzyk
Is there a way to NOT get that "fix"? I don't care for security, I prefer
performance. A magic kernel switch to not load new microcode? Or a switch in
my distro (Debian)? Something similar to `mitigations=off` in current kernels.

------
jSully24
Each time I’ve seen this story today as I scrolled down the list my first
though has been “Why and how did intel disable Larry Ellison on CPUs? And why
didn’t they do it years ago?”

Maybe it’s just been a long week...

------
age_bronze
They do not disable it. They only implement MSR which can allow you to disable
it. If you want to stay with updated microcode and with HLE, just don't
configure it as disabled.

~~~
jdsully
Yes they have. The only conditionally disabled feature is xbegin/xend/xabort.

From the article:

> The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
> disabled by the new microcode...

------
kd3
What percentage of functionality has Intel already disabled on their CPUs?
25%? Soon they'll have to disable the entire CPU. It's fucking hilarious.

------
Yuioup
... in the Linux kernel.

~~~
mehrdadn
... in the microcode updates. Which any of your BIOS, UEFI, or OS may and
likely will apply.

~~~
maddyboo
If you're on Linux, you may need to install a package to get µcode updates.
The Arch Wiki has a particularly informative article on the subject:

[https://wiki.archlinux.org/index.php/Microcode](https://wiki.archlinux.org/index.php/Microcode)

------
baybal2
I think speculative execution is in principle incompatible with untrusted code
execution. Even if CPU makers will place memory protection in front of
speculative execution, and not behind as it is now, any untrusted
code/bytecode can still pwn the process running it, and there is no way to
work that around as such.

~~~
ben-schaaf
If the only thing in the process is the untrusted code (+ interpreter) then
there's nothing to pwn.

~~~
baybal2
Indeed, and that's the reason why a lot of current "sandboxing" efforts are
rather misguided.

There is no reason to filter syscalls from some kind of bytecode with an
interpreter run with full privileges if you simply run all of that
unprivileged and you already have all syscall hardened and ACLed.

But for as long as there is a remotest possibility of a process being able to
get around MMU, there is no reason to do that either.

