OOM relation to vm.swappiness=0 in new kernel

otterley · on April 28, 2014

Two observations:

First, setting vm.swappiness to 0 does not have the beneficial impact that people say it does, and yet it's religious dogma to set it that way on MySQL servers. To my knowledge there has never been a single article published that clearly demonstrates its benefit under a controlled test. The evidence of any benefit is anecdotal at best.

Second, you can easily exclude any process from being killed by the kernel's OOM killer: "echo -17 > /proc/PID/oom_adj"

falcolas · on April 28, 2014

> Second, you can easily exclude any process from being killed by the kernel's OOM killer: "echo -17 > /proc/PID/oom_adj"

Doing such with a large memory process is a sure way to cause a server to shut down and reboot when it runs out of memory. OOM killer is bad to large memory processes, but the alternative is worse.

Ideally you don't want to swap, you definitely don't want to preemptively swap, but even swapping is better than rebooting a server.

otterley · on April 28, 2014

I don't know about "definitely." Swap death is often worse than a reboot because (a) response latency goes through the roof, effectively making the service down anyway, and (b) you probably can't log in anyway because of I/O saturation (ssh, shell etc. could well be swapped out).

leif · on April 28, 2014

(c) a load balancer can often understand a server that's down, a server that might end up doing I/O while holding locks probably wouldn't be detected.

rodgerd · on April 28, 2014

Indeed. I've had zLinux boxes using VDISKs for swap that get it to the terrible state where the VDISK can sustain swapping at a high enough rate that Oracle will be able to accept and maintain connections, but not do any work. The application attached to the database is unable to see this as a failure and flop over to the other node in the RAC.

sitkack · on April 28, 2014

Sounds like underspecified failure conditions. Rarely are they as clean as, "I am dying, poof!"

sitkack · on April 28, 2014

Agreed, too many people conflate any kind of paging with server death. Yeah servers get slow when they start paging too much, most of the time I can still login and checkout what is going on.

So much time is wasted by not enough swap space resulting in rebooted or hung machines. The way linux memory allocation works, it is really not a good idea to constrain swap. It is really easy to trigger the OOM killer by using up too much virtual address space. Doing a `fork/exec` from a process near your physical memory limit can easily get you over virtual memory limit with not enough swap, even though under this condition no paging would occur.

swap on and have great week.

ambrop7 · on April 28, 2014

This looks like a bug. The kernel shouldn't raise OOM if there's swap available, the swappiness=0 should just make it not swap unless absolutely necessary. From my interpretation of the commit, it's supposed to improve the implementation in this respect. Quote:

"avoid swapping out pages of important process or process groups while there is a reasonable amount of pagecache on RAM so that we can satisfy our customers' requirements."

But apparently it's deciding to not swap even when there isn't RAM available, failing to allocate, and not falling back to swap.

It's also possible the bug is only in the redhat kernel, due to the correctness of this commit depending on some other commit not backported.

on April 28, 2014

[deleted]

ambrop7 · on April 28, 2014

I've just said how it's a bug, read my comment again. The commit message says it should only decide not to swap when there's enough RAM. The evidence in the article says otherwise.

I'm not saying it's not a valid feature to avoid swapping at all and just use it for hibernation, just that this does not appear to be the intent of this commit.

stingraycharles · on April 28, 2014

I think thats a matter of taste. Not saying i'm disagreeing with you, but I can see the reasoning behind not swapping when you set swappiness to 0.

Looking at the patch, it looks like swapiness=1 now does what the old 0 did. Would that not cover your use cass?

ambrop7 · on April 28, 2014

"Looking at the patch, it looks like swapiness=1 now does what the old 0 did."

Yeah, it seems, but the commit message, in my incomplete understanding, contradicts what you're saying. Also, I wouldn't feel so confident making claims about what the changes do, not without looking at the context and at least some understanding of that code.

ambrop7 · on April 28, 2014

What is the page doing with that code display? As soon as you click into it or select text inside (just like that, because you tend to do that while reading), it unexpectedly changes into some kind of plain text view and it seems you need to reload to get it back.

molecule · on April 28, 2014

looks like it's switching to copy-friendly presentation, so that code can be copied without line numbers (?).

I'm able to toggle between plain text and syntax highlighting via the '<>' icon @ the top of the code blocks.

zimbatm · on April 28, 2014

Talking about OOM: why is it that the only negotiation protocol between the kernel and userland is to hard-kill the userland processes when out of memory. I wish there was some sort of soft-OOM where the application would get notice and be given a bit of advance to free memory.

AndyNemmity · on April 28, 2014

Important news, thank you. I don't expect to see things that directly impact my servers here, but it's a welcome surprise.

0xbadcafebee · on April 28, 2014

IMHO, RHEL kernels should never be considered stable or production-ready. Sadly, support contracts sometimes require their use.

otterley · on April 28, 2014

On what basis is your opinion founded? Most of the top websites in the world use RHEL or one of its brethren (aside from Google).

0xbadcafebee · on April 28, 2014

Yeah, and they use it mainly due to support contracts and industry certifications. Doesn't make it a good idea.

They used to release RHEL kernels with hundreds of patches that could be applied in line, or removed/fixed individually. Now they keep one tree that they manage privately and release only the fully-patched source, without any documentation about what's changed or why. Your only hope of figuring out what has been changed or getting help in fixing it is to pay for support, or browse the commits at Oracle's RedPatch (https://oss.oracle.com/git/?p=redpatch.git). Basically, a RHEL kernel is an obscure product designed to make them money.

The primary reason RHEL kernels aren't stable is the bug this thread is about. There is no valid reason to backport a change to swappiness behavior into a 'stable' kernel. The whole idea behind 'stable' is it works, so don't change it. But RHEL has other ideas.

RHEL kernels are more likely to contain security holes than a vanilla kernel of the same version, partially due to the backporting of new or changing functionality. They also don't get as much testing and scrutiny as the vanilla tree. Besides internal testing, changes that aren't included in stock linux kernels are introduced into Fedora for their users to beta-test. Over time we've seen plenty of security holes make it into otherwise 'stable' RHEL kernels and tools by RHEL backports. It's usually a feature they're importing that brings along with it additional attack surface, or introduces a hole in pre-existing functionality.

It's also a big pain in the ass to get a RHEL kernel to support something a normal kernel would support, due to the major changes. You also have to modify whatever patches you want to apply on top of RHEL since most people create patches targeting the vanilla tree and not RHEL. In general if you have a problem with the RHEL kernel you have a lot better chance of success by throwing it out and downloading a vanilla kernel.

At this point RHEL's kernel is a nightmare hybrid of changes going into a tree that nobody else uses because they've moved to 3.x trees for new enterprise releases. Between vanilla 2.6.32 and RHEL's 2.6.32-431 there are 9144 files changed, 3,563,927 insertions, 699,721 deletions. The diff is 45MB compressed. The RHEL sources are 102 megabytes larger than the vanilla source. And in case anyone asks, no, this swappiness change was not introduced into the vanilla kernel's long-term support tree (which is still maintained for this same kernel version). (http://www.kroah.com/log/linux/2.6.32-stable.html)

otterley · on April 28, 2014

I get what you're saying, but none of this (except for your contention about introducing security issues via backported patches) has anything to do with its _reliability_, which is what most people would interpret "stability" or "production readiness" to mean.

As for your specific contention about the stability or security of backported patches -- can you provide a citation that RHEL kernels have any more fixes per year than, say, Ubuntu kernels? (Which BTW have their own backported patches.)

0xbadcafebee · on April 28, 2014

The posted article is about a backported feature that changed expected behavior and killed mysql processes. That is literally instability introduced by a backported patch. The security/stability problems in backported features speak for themselves; you don't get bugs in features that don't exist.

I'm not making an argument about who has more changes. RHEL just has bad changes. Bad because they're changing expected functionality. Bad because they're introducing unnecessary features. Bad because they increase exposure to security and stability bugs. Bad because their changes aren't documented publicly. Bad because they keep trying to support an antiquated platform that every other enterprise has moved on from. Bad because it's much more difficult to modify, and includes less support for modern hardware (unless it's patched in, again increasing problems). Bad because they have less testing and review. Bad because it creates vendor lock-in. I can't actually think of a single good reason to use a RHEL kernel, other than the vendor-locked-in support contracts and industry certifications that use RHEL as a standard.

otterley · on April 28, 2014

I respectfully disagree that you can come to such a conclusion from a single example (especially since this change was also introduced into the mainline). I think the benefits of RHEL, including its excellent record of long-term support, outweigh minor occasional trip-ups like this.

bri3d · on April 28, 2014

I don't think this issue has much to do with Red Hat kernel specifically - that commit has been in the mainline kernel since 3.5.

mmmooo · on April 28, 2014

well redhat back ported it to their 2.6.x kernels, which, one could consider, can cause a giant regression due to a minor kernel patch (assuming you are using swappiness=0). As said by others, swappiness=0 isn't as great an idea as some people claim in the first place.

josho · on April 28, 2014

I don't follow Red Hat all that closely anymore, but they have had a long standing policy of back porting changes to kernels. In fact, this is part of why they've been successful in enterprise: RH provides long term support on a stable kernel that receives key fixes from upstream kernel versions.

Sanddancer · on April 28, 2014

The problem is that this wasn't a bug fix, but a behavior change. Worse, it's an undocumented behavior change. This is the sort of thing that can and should be documented.

mmmooo · on April 28, 2014

Yep, and/or not back-ported.