
Why Skylake CPUs Are Sometimes 50% Slower (2018) - wheresvic4
https://aloiskraus.wordpress.com/2018/06/16/why-skylakex-cpus-are-sometimes-50-slower-how-intel-has-broken-existing-code/
======
ajross
This is from 2018. It was extensively discussed at the time.

The broad consensus was that it was the .NET runtime's spin implementation at
fault, both because of the suboptimal use of the PAUSE instruction itself and
because of the overall design that had excessive spinning in a few spots.
Effectively, IIRC, they were putting the "pause" instruction on the hot path
where they needed to exit the loop synchronously, and that's just wrong.
Basically no one sane would look at a change log that said "exit latency from
pause will now be similar to a pipeline stall" and done anything but shrug.

It's since been addressed in .NET, as far as I know. And I'm not aware of any
other affected code.

~~~
dvxvd
So.. since 2018.. it was 2.5 years ago.. and nothing changed? Obviously, who
cares about ex-customers.. fresh one are generating bigger bonuses...

~~~
tecleandor
It was fixed.

[https://github.com/dotnet/runtime/issues/8744](https://github.com/dotnet/runtime/issues/8744)

~~~
m0xte
Not completely. I've seen hard deadlocks on ReaderWriterLockSlim on high
thread counts on 8+ core nodes on .Net core (2.1). Literally all threads in a
process jammed on spin waits. This is especially frustrating as the concurrent
collections sit on top of that steaming pile of crap.

I've learned to walk in the opposite direction of .Net now.

~~~
m0xte
Replying to self as I can't edit this now. Whenever I am critical of the .Net
framework or CLR there is a flurry of downvotes suddenly. Goes from 2 to -1.
It's almost as if there's someone at MSFT saying "look at this shit" to their
colleagues in the office.

Sorry but this was a monumental piece of shit that took us out in production
numerous times on classic .Net framework/CLR and I couldn't even get through
to someone who gave a shit despite being a gold partner at the time and
raising it on Connect. On Core, standard practice is to work around it because
any time you open a ticket it gets autoclosed or ignored or just steamrolled
(like the telemetry tickets)

I have no choice but to look elsewhere because I know any concerns I raise are
ignored. MSFT dug this grave. The company is still the sick old dog it always
was. Just a new shiny gown.

------
londons_explore
I think this article misses a _major_ point...

PAUSE used to have a fixed quite short latency.

It now has a variable latency. I would hazard a guess that the latency
strongly depends what the _other_ cores on the machine are doing, and the
other thread on the same core.

Specifically, I would guess intel saw pause being used in spinlocks, and
seeing a lot of power and time on functional units being wasted on locks that
spin a lot. They decided to deliberately slow down a spinlock's spinning to
use less power, and less of other functional units, and in doing so will free
up functional units for the computations on the other hyper-thread and
power/thermal headroom for other cores in the same CPU.

By doing so, the thought is that overall, the computation should complete
sooner.

If both hyperthreads on a core execute pause, I'd expect the latency to go to
very low again.

Yet that reasoning didn't work in this case... The only explanation I can
think of is that while some threads are executing pause instructions, other
threads are executing other kinds of spinlock's without pause instructions.
Those other spinlock's end up spinning faster now they have more functional
units, but still making no forward progress.

The overall result is the computation isn't sped up, and you still pay a
latency penalty to notice the computation is done, hence slower overall.

~~~
saagarjha
I don't actually think that was the point. Here the code was literally
spinning for dozens of milliseconds, running multiple pause instructions
sequentially. I'm no expert in this kind of thing, but spinning for that long
without checking the condition is really strange, and caused them to actually
be hit by the length of the pause instruction very severely since they were
essentially spinning for something directly proportional to the length of a
pause.

~~~
TwoBit
Possibly the original authors benchmarked against CPUs of the time and found
that multiple pauses worked best.

------
Animats
Why is .NET using spinlocks in user space, anyway? That's usually a kernel
technique, where you can't yield to another thread.

~~~
lallysingh
Userland has spinlocks quite often. The thing you're locking is often soon
unlocked. It's not worth the time to reschedule.

Futex-based spinlocks do what, 1000 spin cycles before actually futex(2)ing?

~~~
ahupp
> The thing you're locking is often soon unlocked.

If the holder of the lock got preempted you could be spinning for a very long
time.

~~~
titzer
This is why spinlocks don't spin forever; they will break out of the spinloop
and do a full-fledged OS-level lock acquisition, which will deschedule the
thread.

~~~
lokedhs
You still have the problem if your spinlock gets preempted while spinning. You
really should only use a spinlock if you can disable preemption, which is
usually limited to the kernel or realtime processes.

~~~
titzer
It's no worse than being preempted anywhere else, as long as your spin
sequence doesn't preclude other threads from acquiring that lock when it is
released; in fact it is often better. If you are preempted while spinning to
acquire a lock then that is working as intended (WAI), because you yield the
CPU to runnable work, instead of burning it.

> You really should only use a spinlock if you can disable preemption, which
> is usually limited to the kernel or realtime processes.

Oh god, no, don't do that. Then you could livelock if you don't do a hybrid
strategy, because you are totally hosed if all your cores enter the
unpreemptable spin sequence.

Maybe you are thinking of a different scenario, where the thread is preempted
in the spinloop, but the kernel thinks it is runnable and at some future time
just reschedules it, and it then continues to spin. That can happen, but the
thread should break out of its spinloop and block anyway if the spinlock is
tuned properly, so at most you just broke up the spinning across reschedules,
and haven't fundamentally made anything worse. In an ideal world, if the
kernel preempts a thread in a spinloop, it should transition it directly into
the waiting state, and not reschedule it until it can acquire the lock. But
since spinloops are in userspace, the kernel usually has no idea what is going
on.

------
Twirrim
Article from 2018. I imagine most compilers etc. have already accounted for
this?

~~~
secondcoming
I'd be surprised. In order to get the pause instruction you'd have to use
inline assembly or intrinsics, at which point the compiler would just blindly
generate what you wrote. Maybe a JIT compiler would do the right thing.

~~~
raxxorrax
Why do you think that? Because compilers tend not to use the pause
instruction? Any why would a JIT behave differently here?

~~~
BeeOnRope
Yes, a compiler wouldn't have a reason to insert a pause instruction, except
to implement a pause intrinsic or some other canned sequence.

------
noway421
Interesting strategy to deprecate x86 features, just make them slower and wait
for compiler and language developers to avoid slow instructions like fire.

~~~
imtringued
A slower pause instruction is still useful and you can even argue that it's
better than the previous implementation because the pause instruction frees up
time for hyper threading.

------
chowyuncat
This is funnily related to the Win9x crash on fast CPUs.

~~~
titzer
Not really. This isn't a delay calibration loop. The "pause" instruction is a
hint to the CPU to yield core resources to the sibling hyperthread because it
is (or would be) blocked due to lock contention. In no way does the
performance adjustment that Intel did affect software's semantic correctness.

------
cwzwarich
TLDR: PAUSE instruction pauses longer.

------
jeffbee
Imagine taking performance advice from a blog that logs 5000 exceptions and
then crashes the tab.

~~~
saagarjha
Sounds like you might want to file a bug against your browser, or pick another
one, because in a somewhat fortunate twist the things that don't work at the
social media scripts ;) But on a more serious note, surely you can make a
comment that's better than that…

~~~
DaiPlusPlus
Social media scripts need to die.

I fixed a friend’s site recently this month that still had a Google Plus link
on it.

------
late2part
Is there a significant # of people on HN that use Windows in production?
Perhaps I'm misguided, but if this was about Linux stuff, it'd seem
appropriate.

~~~
archi42
People are upvoting it, so I guess it's appropriate enough ;-)

But I feel your first part is a legit question. As a Linux user I can't answer
for myself (only reason to use Windows are games and SigmaStudio).

However, a lot of our customers in the embedded world seem to prefer Windows.
Not sure that's because their IDEs are Windows only or just because they're
used to it or because management forces them or because other reasons...

~~~
DaiPlusPlus
Depends if you’re talking about CE or NT.

Until very recently (2016+), a lot of us were assuming Microsoft was still
committed to maintaining CE as an OS for things like handheld RFID scanners,
train conductor ticket machines, etc - because all the companies that had CE-
based applications didn’t want to have to rewrite them for Android or iOS. And
the reason they were using CE in the first place was because Linux was not a
viable option for SH3/ARM/MIPS-based low-power GUI/screen-driven devices -
back in 2001-2010. Microsoft made CE integration and deployment easy with
their Platform Builder tools - building your own Linux district for an
embedded application is a job in itself.

Now that NT runs well on ARM, and that SH3 and MIPS are dead in the handheld
space, Microsoft is pitching NT as a replacement for CE in certain use-cases -
but no-one is pretending Windows 10 “IoT” or LTSB is suitable for batter-
powered devices. And the mismanagement of the Windows Phone 10 platform
especially leaves MS in an awkward situation where they can’t serve the best
customers you can get: large enterprises looking for a long-term supported
platform - who have no problem with MS’ relative lock-in; who now have to
switch to non-phone Android handhelds and switch from .NET on CE to JVM on
Linux - and they won’t want to do this again for a long time (decades?) -
Microsoft did untold long-term damage to their viability as an enterprise
solutions provider by abandoning CE. Smh.

(Okay, Android is one option - another is QNX (other platforms like VxWorks
aren’t really used for LoB applications). I wager most devs would prefer
Android over QNX out of familiarity though - especially as I imagine most devs
under 30 (35?) today have dabbled with writing their own smartphone app.

~~~
archi42
That's genuinely interesting, thanks :)

I actually meant most of our customers are running Windows on their
workstations, on which they develop embedded systems. Due to selection bias
(our target audience are "safety critical embedded" devs), they can not use
Windows CE (or for that matter Linux) as a product platform. I'd literally
expect people to die if they did ;-)

And yeah, it's more like 35. Android/iOS isn't that new anymore ;)

