
V8: Workaround for Intel Gemini Lake Processor Bug - tambre
https://chromium-review.googlesource.com/c/v8/v8/+/1803775
======
markdog12
Bruce Dawson is one of the programmers involved. I'd highly recommend some of
his fantastic blog posts, usually Windows-oriented perf pitfalls.

This is a great one to start with, "24-core CPU and I can’t type an email":
[https://randomascii.wordpress.com/2018/08/16/24-core-cpu-
and...](https://randomascii.wordpress.com/2018/08/16/24-core-cpu-and-i-cant-
type-an-email-part-one/)

~~~
snagglegaggle
These are indeed very good articles. Even if you are all-in to nix/BSD,
reading this can be quite eye opening. While at the lowest level the Windows
kernel is _basically_ a black box over a Unix kernel, there are in some places
unique design choices that lead to unique problems. Sadly we do not usually
get to see their resolution or the impact they had on MS's customers.

~~~
bogomipz
>"While at the lowest level the Windows kernel is basically a black box over a
Unix kernel,"

Can you elaborate on this? I've never heard this before.

~~~
roelschroeven
Neither have I. What I have heard before is that the Windows kernel is VMS-
like rather than Unix-like.

~~~
bogomipz
Interesting, might you or anyone else have any links or literature on the the
the Windows kernel and its VMS likeness?

~~~
roelschroeven
There is
[https://everything2.com/title/The+similarities+between+VMS+a...](https://everything2.com/title/The+similarities+between+VMS+and+Windows+NT)
which looks quite detailed. But this one seems more reliable, since it's
written by Mark Russinovich who is more or less _the_ authority on Windows
internals: [https://www.itprotoday.com/compute-engines/windows-nt-and-
vm...](https://www.itprotoday.com/compute-engines/windows-nt-and-vms-rest-
story)

------
baybal2
Gemini lake has quite a huge errata

Intel is 1 year late on delivering a Jasper lake (Gemini lake refresh,)
Tremont is also way too late. Elkhart lake (Xeon branded Atom) delayed
indefinitely. Skyhawk lake is said to arrive 2Q-3Q late next year.

Mercury lake seem to be certainly cancelled as early buyers are getting
refunds after 2 years of repeated shipping delays

And yes, the "Intel's Saviour" Lakefield is just coming out of fabs as we
speak

Chinese laptop OEMs are scratching their heads very hard now. All want to jump
the ship for AMD, but Intel holds them hostage through completely
anticompetitive "preferred partner" agreements

Things are so bad among OEMs in China that people are now making laptops with
desktop chips

------
icefo
This makes me worried about the future of processors. I know bugs have always
existed and sometimes they were very serious but they seem to have increased
in frequency in the last few years.

Is it just me or it's an actual trend ?

~~~
dooglius
See [https://danluu.com/cpu-bugs/](https://danluu.com/cpu-bugs/) which cites
anonymous engineers blaming a push by upper management for less validation and
quicker products.

~~~
fpgaminer
I've met with Intel execs; one of them told me, not in as much detail as that
anonymous engineer expounds, but simply that they reduced their emphasis on
validation. So there's no need to speculate here (pun intended).

~~~
dboreham
This is a general pattern: see stories of bridges falling down for the
canonical example.

~~~
selestify
Why does this general pattern exist across such wide swathes of society?

~~~
SkyMarshal
Seems to be related to the financialization of everything, where financial
engineering supplants actual engineering and financial culture replaces
engineering culture at the C-level. Here’s a good article on Boeing about
that:

[https://mattstoller.substack.com/p/the-coming-boeing-
bailout](https://mattstoller.substack.com/p/the-coming-boeing-bailout)

~~~
bcaa7f3a8bbc
I have an essentially identical question: Factory owners do not want their
factories to explode and lost ten million dollars by the virtue of profit-
seeking capitalism. Yet, being profit-oriented and neglecting basic safety and
preventative maintenance is a very common phenomenon, always happens, and seen
as the evil of "business people", and it happens even in the most powerful
company. I don't understand it.

Is it simply cognitive bias at work, like LessWrong often says? But I think
there must be deeper reasons than that. Did anyone write good books on this
subject? Either sociology, psychology, management & decision-making or
economics prospective is welcomed.

~~~
dooglius
Does the difference in expected value (from preventing rare, expensive issues)
actually exceed the cost of adding preventative maintenance? It is not obvious
that this is the case.

~~~
kristianp
That's why it's not done. However that means something disastrous happens
every few years. On average the company is better off, the increased profit in
good years outweighs the losses in the possible bad year. It's where black
swan theory comes in. How do you calculate the probability of a rare event?

------
bla3
If this is really a cpu bug, why do they make this change windows-only? It's
probably the only platform where they have enough users to be able to measure
the crash, but shouldn't the fix be applied if building for x86 independent of
OS? And the patch currently has an effect on Windows/arm where this CPU bug
won't exist.

~~~
josefx
As far as I can tell it isn't a generic fix. They had two functions that
always crashed on a misaligned read from __security_cookie so they added a
patch that forces the alignment for these functions. Since __security_cookie
seems to be a windows specific stack protection mechanism it makes no sense to
apply the workaround on all systems. Someone correct me if I got that wrong.

------
geekrax
Nitpick: [https://chromium-
review.googlesource.com/c/v8/v8/+/1803775/2...](https://chromium-
review.googlesource.com/c/v8/v8/+/1803775/2/src/objects/lookup.cc#986) seems
unintentional.

------
stefan_
Presumably the CPU bug doesn't care for the exact functions involved, so
wouldn't you have to do this for _all of them_?

~~~
tambre
It being triggered certainly depends on very specific microarchitectural
conditions that happen to be created by the instructions generated for those
two functions and the 16-byte alignment also happens to be one of the required
triggers.

Applying this fix to all other functions would certainly bloat the binary due
to unnecessary padding and likely reduce performance for all, since the
shipped binary is same for everyone.

