
AVX register corruption from signal delivery - est31
https://bugzilla.kernel.org/show_bug.cgi?id=205663
======
KindOne
Reminds me of this post from 2017:

"Debugging an evil Go runtime bug" \-
[https://news.ycombinator.com/item?id=15845118](https://news.ycombinator.com/item?id=15845118)

[https://github.com/golang/go/issues/20427](https://github.com/golang/go/issues/20427)

[https://bugs.gentoo.org/637152](https://bugs.gentoo.org/637152)

[https://lkml.org/lkml/2017/11/10/188](https://lkml.org/lkml/2017/11/10/188)

------
jameskilton
The issue that led to this bug report:
[https://github.com/golang/go/issues/35326](https://github.com/golang/go/issues/35326)

~~~
ronsor
It's awful strange. Usually if your program crashes, it's your fault, not the
kernel's.

~~~
gumby
When you support a compiler you get a massive stream of users complaining the
at at the compiler has a bug, though in most cases it’s the user’s bug.

Of course there are also good reports where the user actually has found a bug
in the compiler.

Though of course most are weird corner case or bugs in new features, there are
a surprising number that make you think “wow, how could _any_ program be
successfully compiled which this bug in the tree?”

~~~
cblum
I found a bug in the .NET compiler back when I worked at Microsoft, and at
first no one wanted to believe me :)

It manifested when you had a single static member of a specific generic type
in a class. The program crashed complaining about invalid CLR instructions. If
you added a second static member of the same type to the class, or changed the
type of the generic parameter, it didn't reproduce.

Turns out it was related to how the compiler used AVX intrinsics on CPUs that
supported those instructions.

Pretty fun but took some convincing for people to believe it was a compiler
bug.

~~~
chrisseaton
> and at first no one wanted to believe me :)

I don't know why people think compilers are so infallible, or are more likely
to be better written than your application. If people write bugs in
applications guess what the same people write similar bugs in compilers too.

~~~
patrec
How many bugs did you find in (production versions of) your applications? How
many in compilers?

~~~
bregma
I maintain compilers for a living for a safety-critical embedded OS. I've
found dozens of bugs in the compiler, dozens of bugs in the OS kernel, and
dozens of bugs in the third-party validation test suites we use to qualify the
compiler.

I also live in a log cabin in the back woods and can go off-grid. I've seen
shit you people would not believe. It's just a matter of time now. Dominoes.

~~~
SomeHacker44
Attack ships on fire off the shoulder of Orion yet?

------
dpc_pw
Reminds me about one time when I've spent a week debugging random, but quite
consistent kernel crashes, which turned out to be a gcc miscompiling kernel
driver code to decrement stack pointer before ceasing to use some values in
that stack area. There was one or two instructions, where if a re-entrant irq
happened, would reuse that stack part and corrupt data there.

~~~
Vogtinator
Sounds like the AMD64 red zone, which can't be used in the kernel context.

~~~
dpc_pw
Aarch64. It was just a minor gcc bug.

------
saagarjha
> To reproduce, build the attached program with "gcc -pthread test.c" and run
> it on a 5.2 or later kernel compiled with GCC 9 (if the kernel is compiled
> with GCC 8, it does _not_ reproduce).

I wonder if this is a compiler bug or a new optimization that broke the code.

~~~
asveikau
From the link, it seems like GCC 8 does not cache a read from a variable, and
has more memory access to read it, while GCC 9 reads that variable from a
register every time. (Maybe from a corrupted register?)

~~~
zaarn
From what I can tell, the issue is that GCC9 stores the result of a pointer
dereference in a register to reuse on each loop operation but the loop
operation needs to dereference the pointer each time to work correctly.

~~~
Asooka
Sounds like it needs to be volatile then, or rather whatever atomic memory
read the kernel has.

~~~
asveikau
No, it sounds like it needs the kernel not to corrupt the value. There is
nothing wrong with leaving the value in the register.

~~~
wahern
I believe the aforementioned pointer caching _is_ in the kernel--it's the
pointer to the FP register state which is cached across preemption points _in_
the kernel.

~~~
asveikau
Ok I see, that's a good point. Was not spelled out this well in the bug
tracker comment. (If it is now it wasn't when I read it.)

------
kakkoko
How about another register? (XMM, FPU, etc.)

