
In emergency cases a passenger was selected and thrown out of the plane (2004) - nkurz
http://lwn.net/Articles/104185/
======
derefr
I guess I'm one of the few people(?) who like the OOM killer. If all your
deployed software is written to be crash-only[1], and every process is
supervised by some other process which will restart it on failure, then OOM is
basically the trigger for a rather harsh Garbage Collection pass, where
software that was leaking memory has its clock wound back by being forcefully
restarted.

Of course, this works better when you have many small processes rather than
few monolithic ones. But now you're designing an Erlang system :)

\---

[1] [http://lwn.net/Articles/191059/](http://lwn.net/Articles/191059/)

~~~
HerrMonnezza
> every process is supervised by some other process which will restart it on
> failure

I'm curious if this works in practice for you. The current OOM algorithm in
Linux sums up the memory usage of a process and all its children. So there are
good chances that the restarter process is killed first, and _then_ the main
software is killed too (when OOM killer realizes the last kill didn't free
enough memory).

This is exactly the problem we're facing at work here: on a computational
cluster, users sometimes start wild code that consumes all the memory, but the
OOM decides to kill the batch-queue daemon first, because it's the root of all
misbehaving processes. We have to explcitly set `oom_adj` on the important
daemons to prevent the machines from becoming unresponsive because of a bad
OOM decision.

~~~
derefr
My "restarter process" is upstart. It's convenient, since the OOM-killer tries
to not kill init (for bad things happen when you kill init), so it's a
somewhat-safe place to put supervisory logic. One of the better calls
Canonical has made, I think. :)

Still, in your use-case, I'd definitely recommend only letting users run their
"wild code" inside a memory cgroup+process namespace (e.g. an LXC container.)

Crash-only systems only work when a faulty component crashes itself before it
crashes you. Processes modellable as mutually-untrustworthy agents should
always have a failure boundary drawn between them. (User A shouldn't be able
to bring down the cluster-agent; but they shouldn't be able to snipe user B's
job by OOMing their job on the same cluster node, either.) And on a Unix box,
the only true failure boundaries are jails/zones/containers; nothing else
really stops a user from using up any number of not-oft-considered resources
(file descriptors, PIDs, etc.)

~~~
cbhl
Do you have any good resources on where to get started going about setting up
failure boundaries/jails/zones/containers like this properly?

I think it's surprisingly easy to get yourself in the situation where this is
a concern for you[0] but you don't know how to solve it.

[0] Just run "adduser" and have SSH running, or just create an upstart job, or
write a custom daemon that accepts and executes jobs from not-quite-
trustworthy-undergrads, or...

~~~
timClicks
If you are running Ubuntu, docker.io makes life pretty easy for you to create
and maintain LXC containers.

------
cbsmith
The real irony here is that airlines actually do something very much like
overcommit & OOM killer when it comes to reservations, and for precisely the
same reasons: they know that not all the reservations will be used at the same
time, but sometimes they do end up double booked, so then someone has to be
kicked off the flight.

~~~
bruceboughton
Except they do this before the flight, not once you've taken off.

Though Ryanair could well be heading down this route [1]

[1] [http://www.independent.ie/irish-news/three-ryanair-mayday-
ca...](http://www.independent.ie/irish-news/three-ryanair-mayday-calls-go-out-
on-same-day-26886838.html)

~~~
Kliment
Very clever, that, once you think about it. "The on-time airline" running
late, stuck in holding pattern, claim you're out of fuel, get landing
priority, no more delay. About as classy as the other Ryanair tactics.

~~~
taloft
Actually, they were low on fuel, because the company was forcing captains to
carry only the company-approved minimum under threat of discipline.

~~~
gnaffle
It's not company-approved, it's regulator-approved, which is designed to be
sufficient. Airlines have no reason to go above that, except for planning
reasons (eg you might want to add some extra fuel to have the option to fly
faster on some legs to make up for delays). So I wouldn't expect Ryanair to
carry less fuel than other airlines.

The only exception that airlines will sometimes do is to juggle around
destination and alternate airports so that they can save some fuel on that
(you have to have enough fuel to reach an alternate airport, and then 45
minutes in addition to that).

~~~
bruceboughton
It may be designed to be sufficient but you still have to declare a fuel
emergency if you fall below it so it is not a normal event.

------
jballanc
I wonder why Linux hasn't adopted something like OS X's "Sudden Termination"
mechanism:
[https://developer.apple.com/library/mac/documentation/Cocoa/...](https://developer.apple.com/library/mac/documentation/Cocoa/Reference/Foundation/Classes/NSProcessInfo_Class/Reference/Reference.html#//apple_ref/doc/uid/20000316-SW3)

~~~
calinet6
Well FFS that makes a lot of sense.

OSX is just like, "Hey guys, if any of you happen to not need your memory
sometimes, would you mind kindly letting me know and I'll go ahead and let you
go at a convenient time?" Meanwhile Linux goes on a murderous rampage with
unpredictable effects.

~~~
erichurkman
... And Windows is sitting in a corner, fans kicked on high, trying valiantly
to manage with swapping to disk until a sysadmin gives up trying to connect
via RDC and yields to the age-old "have you tried turning it off and back on
again?"

~~~
saurik
The part of this I hate most is that if you have physical access to the
machine and can hit ctrl-alt-delete you are given an out-of-band dialog that
functions flawlessly; this dialog happens to have a "task manager" button, but
as the task manager functionality is part of a normal user application and not
that magical dialog, you get thrown back into the swap storm, only now with
yet another process (taskmgr) competing for memory :(.

------
MattJ100
Depending on its use, the first thing I generally do on a new server is
disable the OOM killer.

At runtime: _sysctl vm.overcommit_memory=2_

To make it persist, just add _vm.overcommit_memory=2_ to /etc/sysctl.conf

~~~
HerrMonnezza
Note however, that turning off overcommit might yield unexpected results upon
fork()/exec(): [http://www.quora.com/What-are-the-disadvantages-of-
disabling...](http://www.quora.com/What-are-the-disadvantages-of-disabling-
memory-overcommit-in-Linux#)

For instance, if you run a Redis server and turn off memory overcommit, you
might not be able to background save.

~~~
alextingle
Warning - that site requires a login.

~~~
0x006A
as mentioned in
[https://news.ycombinator.com/item?id=6300856](https://news.ycombinator.com/item?id=6300856)
you can add ?share=1: [http://www.quora.com/What-are-the-disadvantages-of-
disabling...](http://www.quora.com/What-are-the-disadvantages-of-disabling-
memory-overcommit-in-Linux?share=1)

~~~
andreasvc
That allows me to see the first answer but not the rest.

------
IvyMike
This reminds me of my one and only question on Stackoverflow: "Throwing the
fattest people off of an overloaded airplane."
[http://stackoverflow.com/q/7746648/67591](http://stackoverflow.com/q/7746648/67591)

~~~
AmVess
Rats. I read this post, and my Windows 8 install disc vanished.

~~~
eliben
Sincere congrats. Losing your Windows install disc is one of the best things
that can happen to a person.

------
ajdecon
It is, in fact, possible to make a process immune to the OOM killer:

echo -17 > /proc/$PID/oom_adj

where $PID is the process ID you want to protect.

oom_adj can be tuned with other values to make a process more or less likely
to be killed.

[http://www.oracle.com/technetwork/articles/servers-
storage-d...](http://www.oracle.com/technetwork/articles/servers-storage-
dev/oom-killer-1911807.html)

~~~
axylone
oom_adj is depreciated, use oom_score_adj instead.

[https://www.kernel.org/doc/Documentation/filesystems/proc.tx...](https://www.kernel.org/doc/Documentation/filesystems/proc.txt)

------
zw123456
Some years back I was flying a small commuter who used small prop type
airplanes (I call them pterodactyl air). Part way through the flight, I
noticed one prop seemed like it was not working, so I leaned forward to alert
the co-pilot (the plane was that small). He told me that they would turn off
one engine and "feather the prop" to save fuel. I told him that I would be
happy to take a up a collection back in the cabin from the other passengers to
pay for the extra fuel to power both engines. He chuckled, but I was serious.
I never flew with them again.

Maybe there is a way to suspend a process (feather the prop) rather than
completely kill processes.

~~~
yannyu
Just FYI, pretty much every plane with multiple engines is able to safely fly
and land with only one engine working, and all commercial pilots are trained
to do so. If you were halfway through the flight already, it's entirely
possible you were already in descent and didn't need the other prop at all to
complete the flight.

------
Argorak
If you want to seperate your passengers in first and economy class, this is
the relevant guide:

[http://lwn.net/Articles/317814/](http://lwn.net/Articles/317814/)

Usually, I recommend that database and queue servers run the database/queue
with a priority that makes it unlikely for them to be killed.

I had a case where a colleague running a script on a server with high pressure
killed the queue, which is unadvisable, even if is crash-safe. Before that,
the queue was running for 1.5 years straight.

------
mikegagnon
That post is a great, poetic allegory. But ultimately, I think the analogy
presents a bad idea. The allegory makes the point that we could entirely avoid
OOM errors by engineering a system such that resources are never
overcommitted. This is true; we could do that.

However it would be bad.

Under-committing resources (thus removing the need for an OOM killer) will NOT
lead to a net gain compared to over-committing resources (and thus requiring
an OOM killer of some sort).

If we are unwilling to overcommit resources then it would be woefully
uneconomical to run algorithms that have bad worst-case performance (because
to avoid over committing you would necessarily need to assume the worst case
is encountered every time).

It's just not feasible to avoid algorithms that have bad worst-case
performance. Rather, we need to develop better abstractions for dealing with
components (e.g. computations, programs, processes, threads, actors, functions
etc.) that go over budget. Here's my attempt at developing a better
abstraction for web servers: mikegagnon.com/beergarden

Ultimately, we need to treat every system like a soft real-time system,
because at the end of the day every program has timeliness requirements and
has resource constraints. The current POSIX model does not provide such
abstractions and I think that's why we have these debates about OOM killers.

~~~
AsymetricCom
I love this blog's UI. Did you make it yourself?

I like the idea of the doorman but what if you could somehow pass back useful
math to the client? Of course, then you'd have to disregard that useful work
yourself or double check it negating the energy savings. Or, perhaps return a
map of traveling sales man type problem (maybe a map of metadata and their
traversal cost), and they could navigate that map depending on exactly what
kind of data they really want, thus reducing your load for valid, heavy
requests and if they return a path with lots of nodes, you know to de-
prioritize it or drop the request.

~~~
mikegagnon
> I love this blog's UI. Did you make it yourself?

Thank you! Yes, I made the UI. It's open source: sidenote.io

> I like the idea of the doorman but what if you could somehow pass back
> useful math to the client?

I think it would be great to have clients perform useful computations instead
of just burning cycles. But that's not MVP, even for a research project.

------
kalleboo
Here's a novel way to deal with an out of memory situation caused by slow
memory leaks in a long-running server process: start swapping memory that
hasn't been touched in literally days or weeks to /dev/null, and pray the
process doesn't ever need it again.

~~~
bonzoesc
That's so indescribably worse than just killing that process the mind boggles.
Breaking in a simple, predictable, and detectable way vs. corrupting data and
hoping (excuse me, "praying") nobody notices.

~~~
kalleboo
> vs. corrupting data and hoping (excuse me, "praying") nobody notices.

Well it seems to work for MySQL

Anyway, you wouldn't ever return the nulled data. If the process tries to
access the data, THEN you crash it.

------
fusiongyro
Or, here's a crazy idea: how about we actually allocate the memory when you
call malloc(), and if there isn't any, give you an error instead? Programs
could check the return code and decide what to do when they run out of memory
themselves. Crazy, I know.

~~~
wmf
If you disable overcommit then large processes can't fork()/exec(). In theory
we should all switch to spawn() but that's not going to happen.

------
cmbaus
My memory is a bit hazy in this area, but I think by default memory is over
committed in Linux. What that means is malloc() can return an address that
doesn't have physical memory assigned in the page table. Memory isn't
committed until it is written to.

This isn't the case with the default MSVC implementation of malloc() in
Windows. In Windows address space is reserved and committed with
VirtualAlloc(), and typically that is done in one step.

I think memory is over committed because Linus wanted to keep the memory
footprint lower than NT early on in the development of the kernel. The
drawback is applications may segfault when writing to memory that was
successfully returned by malloc().

~~~
kyberias
You seem to have some terminology problems here. Windows VirtualAlloc may
"commit" memory but that does not mean it actually reserves physical pages
[1]. That always happens only when the memory is accessed. On the other hand,
MSVC's malloc() probably uses HeapAlloc(), which in turn uses VirtualAlloc().
I don't think there are any fundamental differences between Linux and Windows
here.

[1] "Actual physical pages are not allocated unless/until the virtual
addresses are actually accessed." [http://msdn.microsoft.com/en-
us/library/windows/desktop/aa36...](http://msdn.microsoft.com/en-
us/library/windows/desktop/aa366887\(v=vs.85\).aspx)

~~~
MichaelGG
That link says: "Allocates memory charges (from the overall size of memory and
the paging files on disk) for the specified reserved memory pages."

It does count against the total memory allowed. My laptop has 8GB of RAM and
1GB page, giving me a 9GB overall commit limit. If I spawn a process that eats
up 1GB at a time, even Task Manager can clearly show me going up and hitting
8/9GB, then I'll get OOM in my process.

Windows won't commit memory that a process can't use. You can't overcommit,
although you might end up in the pagefile. Without the odd concept of fork,
you don't end up with processes having huge "committed" address spaces that
aren't ever going to be used.

The note about physical pages is just saying it's not mapping it, not that
it's not guaranteeing it.

~~~
kyberias
Right. I stand corrected.

------
joelthelion
I wonder if someone is suddenly going to come up with a magical solution for
the OOM problem and put and end to all these pointless discussions.

------
Systemic33
Has anyone forwarded this to Ryanair yet?

------
jameswilsterman
At least offer parachutes?

~~~
twistedpair
$150 each, correct change only.

~~~
mistercow
Also, they count as your carry on bag, and they only bring one parachute on
board, so if multiple people have to be thrown off and they've both paid for a
parachute, they have to draw straws to decide who gets it.

Edit: And remember, OOF goes off on less than .1% of flights, so they have
hundreds of times as many parachutes per flight as they have people who need
them. Rumors that parachutes are oversubscribed are therefore wildly
inaccurate.

------
antocv
The few cases when Ive seen OOM invoked, it took couple of minutes to kill
chromium after flash (of course) messed up, during that time the system was
unresponsive and it killed few random smaller processes until it hit the
correct one, flash or chromium in some weird interdependent bug. Either way, I
wasnt too happy.

After a while I noticed when the bug triggered/the system started becoming
unresponsive, and I had a terminal with killall -9 chromium & killall -9
flash-plugin ready to go, so I could myself preempt it and OOM wouldnt get
involved. There has to be better mechanism than OOM.

~~~
Piskvorrr
Too slow for the command to pass through the usual "X - WM - TE - shell -
killall" chain? Try Alt+SysRq+f ; that, in my experience, is waaaaay faster
for invoking oom_killer.

~~~
pritambaral
His issue is that oom_killer doesn't get it right straightaway. Alt+SysRq+f
would still need multiple invocations before it gets flash. Personally, I do
use Alt+SysRq+f since it predictably targets GMail-on-Chrome everytime on my
system. That is usually enough on my desktop OS for me to jump in manually
kill the offender. I can then just F5 GMail.

~~~
Piskvorrr
Even so, in OOM situations, sysrq invocation is still an order of magnitude
faster than killall invoked from a graphical terminal emulator.

As for hinting to oom_killer: I have a script which searches for chrome and
flash processes every minute, and sets their oom_score_adj in the high
hundreds. This makes _reasonably_ sure that oom killer will go after these
processes first.

