
Linux futex_wait() bug – update to latest patches now - quicksilver03
https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64
======
btrask
I'm not part of the Java world at all, but I'm starting to think that Azul
Systems is one of the few groups of people who know what they're doing with
regard to Linux performance and the user/kernel boundary. I recently watched
some talks by Cliff Click[1] that were extremely informative, and from what I
understand about their proposed kernel patches, they seem like important
improvements.

This report by Gil Tene (their CTO according to Wikipedia) lends more support
to that theory.

[1]
[https://www.youtube.com/watch?v=uL2D3qzHtqY](https://www.youtube.com/watch?v=uL2D3qzHtqY)

~~~
hga
Unfortunately, per Gin Tene as of a couple of years ago they've given up on
that approach (getting kernel patches in):
[https://groups.google.com/d/msg/mechanical-
sympathy/UnsLM6wh...](https://groups.google.com/d/msg/mechanical-
sympathy/UnsLM6whXcw/bC0-IK2RMWoJ)

A shame, since those would allow some really sweet things for garbage
collectors, which evidently isn't a goal.

~~~
click170
I'm skeptical of the way the author framed the discussion around his patches,
I believe the majority of people out there are genuinely not evil, people
don't often reject contributions that have clear benefits without reason, and
I find it's often best to get said reason straight from the horses mouth.

Does anyone have a link to that discussion? I get the sense there is more to
that discussion than the author let on in his brief response.

~~~
Jach
I got the sense too. The message sharing the link to the source describes it
as "incomplete and extremely buggy". Was it that way when it was first
proposed to the kernel group for integration? If so that seems like a
reasonable reason to reject it...

I googled for "kernel mailing list managed runtime initiative" and found this:
[https://lwn.net/Articles/392307/](https://lwn.net/Articles/392307/)
Apparently it was never even proposed on the kernel mailing list! But then
apparently the code as written wasn't ever meant to be integrated into the
kernel upstream but as a sort of PoC that would maybe be a better starting
point than going fresh for eventual integration... But there's lots of
interesting details, and as I'm reading through the comments, things are
coming back, now I remember reading about this in 2010/2011...

------
matheweis
I've been trying to track down randomly latency (> 1000ms) in the network
stack between the socket buffer and the client for about 3 months now... All
of the stack traces showed the app was stuck in futex_wait, but since it
looked identical to an idle server, I'd convinced myself epoll_wait was at
fault... All the sudden I'm wondering otherwise. We're not on Haswell though;
it's not clear to me if the bug would affect other processors or not - can it?

~~~
paulmd
The guy in the article has only experienced it on Haswell. However, that
doesn't mean that it doesn't exist elsewhere.

~~~
matheweis
I noticed later that the actual kernel patch references arm64, so it's
definitely not limited just to Haswell:

[https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85be...](https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0)

------
zobzu
"We certainly haven't seen much "INSTALL PATCHES NOW" fear mongering. And we
really need it, so I'm hoping this posting will start a panic."

Yeah well its a DOS triggered by pretty particular conditions. While it's bad,
starting a panic for that reminds me of the boy who cried wolf story.

But then again feel free to slap a fancy site and a codeword to it, all the
cool kids do it anyway :)

------
rincebrain
Mostly, I'm confused as to how this has only bitten people on Haswell - did
pre-Haswell just enforce a MB invisibly there for some reason, or did Haswell
explicitly change some semantics?

Also, an interesting note is that the commit references this deadlocking on
ARM64, so I'm guessing this probably broke on non-x86 architectures in strange
ways unless I'm really missing something...

~~~
the8472
You have to remember that the absence of a barrier does not automatically
causes concurrency failures in a fail-fast manner. The barriers just provide
guarantees. Your code might end up working (either always or in 99.999999% of
all observed operations) just by chance. So it might simply be more visible on
haswell than on other systems because it behaves differently or is more
aggressive about exploting non-barriered operations.

From the mailing list:

> _In our case it 's reproducing on 10 core haswells only which are different
> than 8 cores (dual vs single ring bus and more cache coherency options).
> It's probably a probability matter. [...]_

> _Pinning the JVM to a single cpu reduces the probability of occurrence
> drastically (from a few times a day to weeks) so I 'm guessing latency
> distributions may have an effect._

------
whoopdedo
The fix HAS been applied in Debian stable, as of 04 Nov 2014.

~~~
nickysielicki
Probably should prefer to the codename in this case, because debian stable is
jessie as of a couple weeks ago.

~~~
0942v8653
Yes, and Jessie ships with 3.16 (and that is probably what I have). Of course,
it's a very simple fix and should be easy to backport.

~~~
whoopdedo
I meant Jessie. And Wheezy never had 3.14+ so of course it wouldn't be
affected. I was saying the fix was applied six months ago.

------
kasabali
> For some reason, people seem to not have noticed this or raised the alarm.
> We certainly haven't seen much "INSTALL PATCHES NOW" fear mongering. And we
> really need it, so I'm hoping this posting will start a panic.

Should we also alert the President? Maybe OP is only talking regarding to the
ml he has posted on but we're out of context here on HN? Only affected systems
in production seems to be RHEL 6.6 on Haswell.

Ubuntu 14.04/Debian 8: have the fix for a long time [0] [1]

Ubuntu 12.04/Debian 7: was never affected [3] [2]. Newer enablement stack
kernels for Ubuntu has the same fix as [1].

RHEL 7: OP only talks about 6.6 so I assume either it doesn't have the
regression backported to it or it already has the fix

SLES: don't know, don't care.

[0]
[http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?showmsg=1...](http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?showmsg=1&qt=grep&q=Avoid+taking+the+hb-%3Elock&h=linux-3.13.y)

[1]
[http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?showmsg=1...](http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?showmsg=1&qt=grep&q=Avoid+taking+the+hb-%3Elock&h=linux-3.16.y)

[2] [https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
st...](https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
stable.git/tree/kernel/futex.c?h=linux-3.2.y&id=refs/tags/v3.2.69)

[3] [http://kernel.ubuntu.com/git/ubuntu/ubuntu-
precise.git/tree/...](http://kernel.ubuntu.com/git/ubuntu/ubuntu-
precise.git/tree/kernel/futex.c#n186)

~~~
buster
Not true, it is architecture independent (commit also mentions Android) and i
can find this bug backported to atleast RHEL5.11 with 2.6.18-404.

Also, i don't believe your queries are a sufficient check at all.

You can clearly find the missing default case in
[http://kernel.ubuntu.com/git/ubuntu/ubuntu-
precise.git/tree/...](http://kernel.ubuntu.com/git/ubuntu/ubuntu-
precise.git/tree/kernel/futex.c#n186) . So i guess that means ubuntu precise
is/was also affected?

Same in your other link
[https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
st...](https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
stable.git/tree/kernel/futex.c?id=refs/tags/v3.2.69#n186) .

Please check your facts next time.

~~~
kasabali
Thanks, I've corrected the links, please check them again to see they are pre-
regression versions [1]

> Not true, it is architecture independent (commit also mentions Android)

I saw it (and arm64 comment in this thread) but I didn't include them because
I don't think it would be a production-serious issue there. Thanks for
clarification.

> Please check your facts next time.

Thanks to your post I've corrected links, but no need for such an aggressive
tone, huh? Maybe you should try to be more polite in your next refutal tries?

[1]
[https://github.com/torvalds/linux/commit/b0c29f79ecea0b6fbce...](https://github.com/torvalds/linux/commit/b0c29f79ecea0b6fbcefc999e70f2843ae8306db)

~~~
buster
Please note that i was just looking for the missing default case. It might
still be the case that the bug is not in precise..

The patch also replaces atomic_inc() with futex_get_mm():

    
    
      -		atomic_inc(&key->private.mm->mm_count);
      +		futex_get_mm(key); /* implies MB (B) */
    

And your links are using atomic_inc().. What that means with regards to this
bug? I don't know.

~~~
kasabali
Fix commit and OP claims bug was introduced in [1], which has been never
backported to 3.2.x, so I assume lack of default case was not a problem before
that commit.

> And your links are using atomic_inc().. What that means with regards to this
> bug? I don't know.

It means they are simply old versions and they were never affected by the bug,
if we believe the commit message of the fix commit and OP.

[1]
[https://github.com/torvalds/linux/commit/b0c29f79ecea0b6fbce...](https://github.com/torvalds/linux/commit/b0c29f79ecea0b6fbcefc999e70f2843ae8306db)

~~~
buster
Or not recognized.. But you are right, i suppose that the atomic_inc() doesn't
need a memory barrier because it's atomic and the new function may need one,
guessing.

------
planckscnst
My eyes jumped straight to "%$^!" and I wondered what a shell expansion had to
do with a futex_wait bug. Then I briefly tried to parse it and only then read
the sentence. I wondered how many others this happened to.

~~~
AceJohnny2
You know you've been writing too much Perl or shell scripts when...

------
olalonde
[https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85be...](https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0)

30 lines commit message for 2 lines of code. Kernel developers sure are
disciplined.

------
random3
I've been hunting this for weeks now, it locks down JVMs, but everything seems
to indicate it's either glibc or kernel.

------
buster
It would be nice to know which distros are affected by this bug. Particularly,
i had unexplainable JVM lockups on RHEL 5.11 this week after it was upgraded.
Seeing that 6.6 and 5.11 both were released in 2014 with a 2.6.x kernel, i can
imagine this bug also applying to RHEL5..

Also, does anyone know if there is some RHEL errata about this bug?

edit: I just looked at the RedHat applied patches for RHEL5.11 linux
2.6.18-398 and this bug was also introduced in the RHEL5.11 series (not sure
if a subsequent kernel version fixes this)

------
minaguib
Can anyone recommend a way to check a linux server for whether it's running on
a Haswell CPU ?

I'm guessing perhaps checking /proc/cpuinfo for the XEON version v3, or
looking for flags 'hle|rtm|tsx' would work - but something more definitive
would help with mass-auditing.

~~~
random3
v3 in cpuinfo should do it

~~~
minaguib
Do you know if this also stands for non-XEONs ?

------
azinman2
Any word on affected Ubuntu distros and if/when they're going to patch?

~~~
scott_karana
Based on the Linux kernel range of 3.14 to 3.18 inclusive, and this[1] list of
Ubuntu kernel versions, I believe only 14.10 (Utopic Unicorn) would even be
affected.

1 [http://askubuntu.com/questions/517136/list-of-ubuntu-
version...](http://askubuntu.com/questions/517136/list-of-ubuntu-versions-
with-corresponding-linux-kernel-version)

EDIT: Turns out 14.04.02 LTS can also optionally use 3.16:
[http://www.omgubuntu.co.uk/2015/02/ubuntu-14-04-2-lts-
releas...](http://www.omgubuntu.co.uk/2015/02/ubuntu-14-04-2-lts-released-
includes-3-16-kernel)

~~~
buster
You can find this bug backported to many 2.6.x kernels, so don't rely on the
3.14 version number.. See my other comment regarding Ubuntu and RHEL
[https://news.ycombinator.com/item?id=9544272](https://news.ycombinator.com/item?id=9544272)

But i didn't check if there are updated kernels for those versions in Ubuntu..
Atleast for RHEL5.11 it looks to me that the -404 kernel is the latest...

------
yshalabi
This is why programmability is important. This is why being able to achieve
performance of relaxed memory models with a more intuitive SC memory model
should be a top objective for Intel and architecture researchers..

------
ape4
The Linux kernel has unit tests, right

~~~
fragmede
In case you're not trolling: not really, not officially.

Unpopular features on less common architectures are frequently broken for
large stretches of time, and go unnoticed until someone complains. Open source
really exemplifies the squeaky wheel getting the grease, which is kind of sad.

Places where Linux is popular undoubtedly have their own internal private test
suites, especially for features less popular on bleeding edge kernels (eg S390
arch support or Infiniband)

It would be hard to get any sort of good coverage with unit tests, too, but
that shouldn't be a reason to avoid trying.

~~~
dasil003
> _It would be hard to get any sort of good coverage with unit tests, too, but
> that shouldn 't be a reason to avoid trying._

Could a large but spotty unit test suite inspire false confidence that led to
be being less careful about signing off on changes and thus decrease overall
quality?

~~~
mburns
Could it? Sure.

Of course, kernel devs were already confident enough to merge breaking code
_without_ the added confidence of a partial unit test suite in place.

------
userbinator
They mention seeing this bug appear on Haswell CPUs, and nothing about any
other x86 - is it a Haswell-specific bug?

~~~
matheweis
Looks to be more than just Haswell. I was wondering this too, and just noticed
this comment on the (fix) patch: "the problem (user space deadlocks) can be
seen with Android bionic's mutex implementation on an _arm64_ multi-cluster
system."

------
mikerichards
from the patch I see, isn't the problem that C doesn't enforce a default case?

~~~
danieltillett
The problem is not that there is not an enforced default case, but that it was
not in the coding standards. I always have a default case, but this is in my
coding standards.

~~~
mirashii
Neither of these things actually fix the problem, which is in particular the
content of the default case. default: ; break; would be just as buggy.

~~~
gpvos
The commit message says the code was reviewed quite thoroughly. So I think a
break-only default case where the other two cases had an /* implied MB */
comment would likely have been noticed. So in this particular case, there is a
fair chance that such a warning would have helped.

