

Critical Linux bug that leads 100% CPU (leap second) - yekmer
http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/

======
pilif
I would love to see what's _really_ causing this bug. We read so many times
over the weekend to either reboot or just run that date command - but nobody
is telling us what's causing the problem.

Also, seeing that other threaded applications had similar problems, I doubt
this is a java issue - more likely a pthread, glibc or even kernel issue

~~~
gaius
There is a good explanation here: <http://serverfault.com/q/403732/58037>

~~~
agwa
That's predominantly about the kernel crash, not the high-CPU futex issue. One
of the most maddening things about this is that there have been several
different issues related to leap seconds on Linux, making it all the harder to
get information.

------
ecopoesis
Hard to call this a Java bug when many other, non-Java things are affected.
It's a critical Linux bug that causes futex to timeout, and anything that uses
it to behave incorrectly.

<https://lkml.org/lkml/2012/7/1/11>

~~~
tommi
ecopoesis, you are not the only one saying that it's a linux bug instead of a
java bug even though the link title says "Critical Linux bug that leads 100%
CPU (leap second)".

Did the link title change from a Java title, like the article, to a Linux
title to match the actual root cause?

~~~
davidw
> Did the link title change from a Java title, like the article, to a Linux
> title to match the actual root cause?

Yes, it did.

------
yekmer
Our company uses HBase, Elastic Search, GitBlit, SmartFox Server, Jetty which
have been by this bug, MySQL is said to be affected too
[http://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-
sec...](http://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-
cpu-and-the-fix/)

~~~
davidw
Thank you for that link! I had been scratching my head about that server even
though it wasn't mine to take care of (the other service I'm involved with
here, that I helped plan, uses Postgres, which does not seem to have
problems).

------
pjmlp
This is a Linux kernel bug, not a JVM bug.

~~~
mcescalante
Yeah, NTP is Linux kernel, but the JVM is what's eating the CPU after the
clock leap.

~~~
jbellis
no, it's the kernel livelocking in response to a call made by the jvm

------
jhund
I saw what is likely a related issue on one of our AWS EC2 instances, where
exactly at midnight UTC there was a high percentage of 'steal' CPU time in our
server monitoring charts.

I wonder if this was caused by another VM on the same physical box being hit
by the bug and as a result stole CPU time from our VM.

I resolved the issue by moving to a different VM (Rebooting didn't help), to
get away from my greedy neighbor.

More info here: <http://blog.thinrhino.net.in/cpu-steal-time>

------
j_col
So that explains why the 12 cores on my Fedora workstation were maxed-out when
I came to work this morning!

------
freestyler
There is a list of applications affected by this kernel bug
[http://blog.windfluechter.net/content/blog/2012/07/01/1481-1...](http://blog.windfluechter.net/content/blog/2012/07/01/1481-100-cpu-
load-due-leap-second)

------
JVIDEL
Oh man so that was causing it!

My rig crashed all weekend because of this POS bug, I had to boot back to
Windows to get anything done (oh cmd, I really didn't miss you at all you
insufferable bitch...)

Any fixes?

~~~
gcr
There's a fix in the article.

------
regularfry
The easier to type 'sudo date -s "`date`" seemed to work for me.

------
kzrdude
So if the leap second was handled in userspace instead of the kernel, just
like a normal ntp time update, all would have been fine. Why not just do that?

------
e40
On Sunday I noticed that Gerrit (code review, written in Java) was chewing
through CPU on one of our servers. Just applied this it appears to have
settled down.

------
coldskull
well, our hadoop cluster went bonkers because of this bug....luckily it was on
stage...not production!

~~~
danielhlockard
Yeah, I ended up rebooting our production hadoop cluster, it all came back up
fine, and we don't have too many people using it yet.

------
geetee
Hey, remember that time I spent a couple hours frantically checking logs and
restarting services?

------
abc_lisper
Does this happen on android too?

------
agentgt
What a PITA

