In emergency cases a passenger was selected and thrown out of the plane (2004)

derefr · on Aug 30, 2013

I guess I'm one of the few people(?) who like the OOM killer. If all your deployed software is written to be crash-only[1], and every process is supervised by some other process which will restart it on failure, then OOM is basically the trigger for a rather harsh Garbage Collection pass, where software that was leaking memory has its clock wound back by being forcefully restarted.

Of course, this works better when you have many small processes rather than few monolithic ones. But now you're designing an Erlang system :)

---

[1] http://lwn.net/Articles/191059/

HerrMonnezza · on Aug 30, 2013

> every process is supervised by some other process which will restart it on failure

I'm curious if this works in practice for you. The current OOM algorithm in Linux sums up the memory usage of a process and all its children. So there are good chances that the restarter process is killed first, and then the main software is killed too (when OOM killer realizes the last kill didn't free enough memory).

This is exactly the problem we're facing at work here: on a computational cluster, users sometimes start wild code that consumes all the memory, but the OOM decides to kill the batch-queue daemon first, because it's the root of all misbehaving processes. We have to explcitly set `oom_adj` on the important daemons to prevent the machines from becoming unresponsive because of a bad OOM decision.

derefr · on Aug 30, 2013

My "restarter process" is upstart. It's convenient, since the OOM-killer tries to not kill init (for bad things happen when you kill init), so it's a somewhat-safe place to put supervisory logic. One of the better calls Canonical has made, I think. :)

Still, in your use-case, I'd definitely recommend only letting users run their "wild code" inside a memory cgroup+process namespace (e.g. an LXC container.)

Crash-only systems only work when a faulty component crashes itself before it crashes you. Processes modellable as mutually-untrustworthy agents should always have a failure boundary drawn between them. (User A shouldn't be able to bring down the cluster-agent; but they shouldn't be able to snipe user B's job by OOMing their job on the same cluster node, either.) And on a Unix box, the only true failure boundaries are jails/zones/containers; nothing else really stops a user from using up any number of not-oft-considered resources (file descriptors, PIDs, etc.)

cbhl · on Aug 30, 2013

Do you have any good resources on where to get started going about setting up failure boundaries/jails/zones/containers like this properly?

I think it's surprisingly easy to get yourself in the situation where this is a concern for you[0] but you don't know how to solve it.

[0] Just run "adduser" and have SSH running, or just create an upstart job, or write a custom daemon that accepts and executes jobs from not-quite-trustworthy-undergrads, or...

timClicks · on Aug 30, 2013

If you are running Ubuntu, docker.io makes life pretty easy for you to create and maintain LXC containers.

adobriyan · on Aug 30, 2013

> The current OOM algorithm in Linux sums up the memory usage of a process and all its children.

It used to sum until Linux 2.6.36.

primitur · on Aug 30, 2013

>I guess I'm one of the few people(?) who like the OOM killer.

Diff'rent strokes. I also like the OOM killer, its a dastardly wonderful thing to tie lots of safety-critical things to .. and in the SIL-4 OS business (my domain), it is indeed a crash imperative to understand how to use the OOM properly. Or: not.

So in this light .. I know Erlang is "the thing" right now, but I feel I must just mention that:

>Of course, this works better when you have many small processes rather than few monolithic ones. But now you're designing an Erlang system :)

.. one could also be designing a Lua-based distribution, or JVM, or whatever you like, essentially, and integrating with oom_killer. There's nothing Erlang'y about it. Because if you're playing with the oom_killer, you're really making a distribution choice, in the topology.

If your app cares about oom, well kiddo .. you better not be doing anything less than excercising complete control over your distro, its launch policies, its use of the TextSegment as installed, and so on. Absolutely you're making Distribution decisions about the functionality of the combined system. oom_killer isn't useful by itself.

My point being, worrying about oom_killer isn't just something Erlang users need think about, nor are they the only ones who really 'get' why an oom_killer can be used nicely .. if you're building a distro, either for use as an embedded machine, a tight secure server image, or indeed even as a desktop user, well .. careful memory integration is, as you say, a harsh pass.

Incidentally, I use the oom_killer exactly as you mention, in a few embedded distro applications, specifically indeed, a kill of whatever 'lua' is hogging resources. Its an extraordinarily functional mechanism for recovery ..

kudu · on Aug 30, 2013

> I know Erlang is "the thing" right now

I don't think Erlang is "the thing" right now. The "hot" languages for concurrent programming are Go and Node.js, for better or for worse.

timClicks · on Aug 30, 2013

Those environments don't provide the supervision hierarchy which is being discussed in this thread. In Erlang, every actor you create has a supervisor, which will restart the actor (the behaviour is tunable) when it crashes.

gaius · on Aug 31, 2013

"Hot" is fashion, not engineering.

harrytuttle · on Aug 30, 2013

That is until you get a process that triggers OOM due to a bug on startup (pissing all over RAM) that ends up getting stuck in a restart loop which cripples your machine so you cant get in and fix it. Either that or it fails and neutralises supervisord or something resulting in site down anyway.

ulimit is a better solution i.e. set reasonable constraints based on available resources.

derefr · on Aug 30, 2013

I've never seen a process supervisor daemon that didn't have an "if process FOO exceeds X restarts in Y milliseconds, stop trying to start FOO" clause.

harrytuttle · on Aug 30, 2013

You've never seen an "enterprise" software init script then. 5-nines? Stick it in an endless loop and no one will notice (Atlassian I'm looking at you).

ghc · on Aug 30, 2013

I have written enterprise software. Guilty as charged.

X-Istence · on Aug 30, 2013

Yeah, but looking at Atlassian is easy, for pete's sake they gave us shitty software like JIRA, Confluence and others... and I say that with a HUGE dislike of those products due to how obtuse they can be.

harrytuttle · on Sept 1, 2013

Fair point! Agree entirely by the way.

viraptor · on Aug 30, 2013

daemontools / supervise doesn't have it and it's a fairly popular solution. (not that I recommend ever using it...)

DigitalBison · on Aug 30, 2013

> daemontools / supervise doesn't have it and it's a fairly popular solution.

Ok, it doesn't exactly support the quoted feature ("stop restarting if it exceeds X attempts in Y seconds"), but it does sleep for a second between restart attempts to mitigate the same problem: http://cr.yp.to/daemontools/supervise.html

pmahoney · on Aug 30, 2013

I'm curious why you wouldn't recommend it. I've never used it myself, but I've been reading about it the past few weeks. The design seems really well done compared to init.d scripts in the sense that every init script must reimplement daemonization, pid files, etc. (which is helped by examples, start-stop-daemon, etc. but is still a huge and delicate chore in my experience).

Is your objection that daemontools is largely unused / unmaintained and lacks features (such as flapping-avoidance)? Or something else?

Just curious, thanks.

viraptor · on Aug 30, 2013

It's mostly about the lack of features. No support for cgroups, user switching, adjustable backoff, syslog, etc. Sure - it works, but it's the same thing as with qmail - it's the very minimum that you can call a useful SMTP deaemon. The moment you want any feature on top, you have to implement it yourself.

Either one of upstart, systemd, supervisord will give you equivalent solution that has more practical features. Daemontools was good enough a couple of years ago. (I used it then happily)

In short - there's nothing wrong with it really, but many alternatives are much better.

btilly · on Aug 30, 2013

I disbelieve that ulimit is a real fix. If you disagree, save this shell script, set a reasonable set of ulimits then run it:

  #! /bin/sh
  sh $0 &
  while true
  do
      perl -e 'push @big, 1 while 1' &
      sh $0
  done

lucian1900 · on Aug 30, 2013

A particularly bad case of the OOM killer making a mistake is in-memory databases. If some other process is using lots of memory, but still less that the DB, it'll get killed for no good reason anyway.

derefr · on Aug 30, 2013

One would hope that any machine hosting an in-memory database has gobloads of RAM and not much else to do. :) [Still, this is what replication is for! "In-memory database" is only a scary concept if you don't have any hot slaves.]

lucian1900 · on Aug 30, 2013

One might have 2/3 of memory used, with well known database size growth. Then a runaway process could use 1/3 of memory, cause trouble, but it's still the DB that gets killed.

This isn't disastrous of course (replication + snapshots + redis aof), but still annoying.

derefr · on Aug 30, 2013

Another thing you can do, in this case, is to enable swap system-wide, then put just the database process into a cgroup with memory.swappiness = 0.

Thus, the database itself will never degrade due to spilling to disk, but all other processes might. But if it's a DB box, that won't precisely matter.

mitchty · on Aug 30, 2013

Wouldn't another option be to setup hugepages for the db and then just have the db process use hugepages?

They'll never swap out and bonus less memory used. (note transparent huge pages have only brought me pain so far)

powertower · on Aug 30, 2013

So OOM killer as Chaos Monkey? Sounds like an intresting idea that should be implemented as a config setting.

anonymous · on Aug 31, 2013

The OOM killer has almost never worked for me. I think once or twice only, the rest of the time, high memory pressure combined with high storage pressure effectively locks up the system. Easiest way to trigger it is to start eclipse, virtual box and start a system update (just package download, not install) and I can't move my mouse any more.

cbsmith · on Aug 30, 2013

The real irony here is that airlines actually do something very much like overcommit & OOM killer when it comes to reservations, and for precisely the same reasons: they know that not all the reservations will be used at the same time, but sometimes they do end up double booked, so then someone has to be kicked off the flight.

bruceboughton · on Aug 30, 2013

Except they do this before the flight, not once you've taken off.

Though Ryanair could well be heading down this route [1]

[1] http://www.independent.ie/irish-news/three-ryanair-mayday-ca...

Kliment · on Aug 30, 2013

Very clever, that, once you think about it. "The on-time airline" running late, stuck in holding pattern, claim you're out of fuel, get landing priority, no more delay. About as classy as the other Ryanair tactics.

bruceboughton · on Aug 30, 2013

I would agree except that it seems to go deeper than that. Pilots are ranked by how little spare fuel they take. If they dare to take too much spare fuel they have to write a memo to management explaining why.

taloft · on Aug 30, 2013

Actually, they were low on fuel, because the company was forcing captains to carry only the company-approved minimum under threat of discipline.

gnaffle · on Aug 30, 2013

It's not company-approved, it's regulator-approved, which is designed to be sufficient. Airlines have no reason to go above that, except for planning reasons (eg you might want to add some extra fuel to have the option to fly faster on some legs to make up for delays). So I wouldn't expect Ryanair to carry less fuel than other airlines.

The only exception that airlines will sometimes do is to juggle around destination and alternate airports so that they can save some fuel on that (you have to have enough fuel to reach an alternate airport, and then 45 minutes in addition to that).

bruceboughton · on Aug 30, 2013

It may be designed to be sufficient but you still have to declare a fuel emergency if you fall below it so it is not a normal event.

gnaffle · on Aug 30, 2013

While I don't like Ryanair, this scenario isn't very likely. There will always be lots of paperwork etc. after declaring an emergency, which ultimately could end up with the pilots losing their license if it was discovered that they were repeatedly doing this.

sokoloff · on Aug 31, 2013

I'm not an airline transport pilot, but in my personal flying, I've declared emergencies twice and asked for priority handling due to minimum fuel concerns once. No paperwork on any of them.

I also spend significant time on aviation/pilot forums, and I've literally never heard of "lots of paperwork" for any emergency that resulted in a safe, no injuries landing. The topic comes up regularly in the "should I have declared an emergency or not?" discussions, the common argument against is "I don't want the paperwork" (a strange trade off against a life-safety question in any case), and the most I've heard as routine followup is a phone call from an inspector.

My experience is exclusively US, so perhaps it's different in other areas of the world.

gnaffle · on Sept 1, 2013

Maybe there is a difference between parts of the world, but I would be surprised if airlines don't have some kind of routine to follow when an emergency is declared. And I would be even more surprised if airline pilots on Ryanair routinely used emergencies as an excuse without it having any consequences.

The only issue I've ever caused was triggering a TCAS warning for a landing airplane because I was slow on starting a turn. Even though there were VFR conditions and we were at a safe dinstance, the other pilots said they had to report it as a matter of company policy, and my instructor had to make a few phone calls afterwards.

furyg3 · on Aug 30, 2013

Whoa..

> At least two memorandums were sent to Ryanair pilots detailing the company's concern about what was described as "excess fuel explanations" -- a description of the reasons flight commanders have to give if they take on extra fuel over the recommended minimum fuel load.

Is "because it's only the minimum!" not a valid answer?

cbhl · on Aug 30, 2013

If you read the article to the end, they had 90 minutes extra worth of fuel, and landed with 30 minutes left after diverting due to a thunderstorm.

It was a "mayday" because they didn't have enough fuel to return to the original airport they were going to land at and/or to continue waiting in a holding pattern for another hour.

cuu508 · on Aug 30, 2013

It's not the best analogy as when airplane is low on fuel, dropping a passenger wouldn't make a big difference (I'm assuming).

But think of a a hypothetical situation where low fuel situations somehow were common, unforeseeable and unavoidable. And if dropping a passenger could indeed make a difference between full crash or safe landing. It's a shitty situation, but it makes sense to give somebody a chute and kick them out, instead of letting all die.

Dylan16807 · on Aug 30, 2013

Yeah, sure, it's more like throwing out luggage, and some passengers bring huge amounts of luggage.

But critically low memory situations aren't unavoidable. Not overallocating is one very easy solution.

twistedpair · on Aug 30, 2013

Let's just accept that it is a poor analogy and a sensational title?

k__ · on Aug 30, 2013

yes, but how many people would you have to kick out to make a difference?

cbsmith · on Aug 30, 2013

Right, so they do it at pretty much the precise time that everyone grabs the seats. They don't somehow get everyone on the plane, take off without everyone in their seats, and THEN kick out passengers.

lucaspiller · on Aug 30, 2013

> sometimes they do end up double booked

Sometimes is pretty much all of the time. My SO works as cabin crew for one of the large airlines you will have heard of, and they typically always overcommit. It isn't just because they are greedy and want lots of money though (well I guess it kind of is). As you said, they know some people won't turn up, but also they may decide to change the class of the airline (economy and business -> economy, business and first) depending on how people book or what aircraft are available on the day.

ubernostrum · on Aug 30, 2013

but also they may decide to change the class of the airline (economy and business -> economy, business and first) depending on how people book or what aircraft are available on the day.

United is infamous for this, prompting a sound piece of advice: if for some reason you're in first class on a flight scheduled for an A320, don't take a seat in row 3, since odds are high it'll be swapped to an A319 (which only has two rows in F) day of the flight. In theory there's a pecking order for who gets downgraded in that case (first, anyone who moved up to F on a status upgrade, then the paid F fares starting with whoever was latest to check in), but in practice the gate agent just bumps row 3 and calls it a day.

mherdeg · on Aug 30, 2013

I feel like aircraft substitutions are more likely to happen for operational reasons (the A320 they wanted to commit is unavailable because of mx/wx, most likely, and instead of taking a delay, they pull an A319). Some carriers do seem to use last-minute equipment swaps for yield management, but I feel like for UA, doing this is not a good idea — they burn a lot of time re-assigning seats and refunding E+ fees, etc. after a 320->319 downgrade, so I think it's something they try to avoid because it's costly to deviate from the planned equipment.

I think the parent poster was in part referring to operational upgrades (very common at British Airways — they will dramatically oversell the Y cabin and op-up people into J/F as needed), which don't necessarily require an equipment swap.

Another option uniquely available to European carriers that you might not have seen in the U.S. is to adjust the number of J/Y seats in an aircraft: on most intra-Europe flights, the "business-class cabin" is the same seats as the coach cabin, but with a blocked middle seat and a movable divider so you know where Y starts.

(Come to think of it, I guess BA will sometimes swap equipment to a flight with fewer classes of service — for example, the LHR-DME route can lose its lie-flat Club World seats overnight sometimes after unplanned equipment swaps — but I don't think BA does this for yield management, because it's costly to them to have to pay refunds to people they've inconvenienced.)

ubernostrum · on Aug 30, 2013

I don't know why United does it, just that the A320/A319 swap is so common with them that on frequent-flyer forums it often gets a stickied thread with tips on what to do, warning signs, etc. Hence the advice about never taking a seat in row 3 (and for economy passengers, never booking into the last couple rows of the aircraft, since those seats don't exist on the 319).

mherdeg · on Aug 30, 2013

There's also a counterintuitive "always choose row 3" option, where you make yourself more likely to be downgraded but also more likely to be offered a travel voucher under United's fairly generous distance-based compensation policy (internal documentation "GG OVS DOWNGRADE").

sliverstorm · on Aug 30, 2013

It isn't just because they are greedy and want lots of money

Certainly not in the current airline environment. Even with a perfectly loaded plane, they only make a little money, and you can't plan for a perfectly loaded plane with any kind of confidence without double-booking.

cbsmith · on Aug 30, 2013

> My SO works as cabin crew for one of the large airlines you will have heard of, and they typically always overcommit.

I phrased what I said poorly. They always allow overcommit, but they try to work it out so that on average they efficiently make use of their seats without constantly having to give people giveaways and bad customer experiences.

mikeash · on Aug 30, 2013

That's more analogous to the strategy of pausing processes which require memory when there isn't any available, and then resuming them once memory becomes available. The airline still lets you complete your trip when they bump you from a flight, you just have to wait until capacity becomes available for you.

The problem with translating that approach to OSes is that you can easily deadlock the entire system.

cbsmith · on Aug 30, 2013

It depends on circumstance. For a lot of people, having their flight canceled and having to switch to another is all but catastrophic. You basically have to restart and build a new travel plan, and much of what you set out to accomplish gets lost.

So... a lot like killing a process.

mhb · on Aug 30, 2013

Also, Samoa Air charges by the pound. Maybe there could be some kind of auction system for "heavier" processes to "pay" more to keep running.

http://business.time.com/2013/04/08/air-travel-by-the-pound-...

tcoppi · on Aug 30, 2013

Precisely. The OOM killer has its uses, and while it's debatable if it is suited for this workload or that workload, it is easily tunable for specific use cases, and you can disable it altogether if you'd like the classic behavior.

BrainInAJar · on Aug 30, 2013

Yes, but that's more like fork or malloc failing in the first place than letting you run and then randomly killing something when the system gets overcomitted

jballanc · on Aug 30, 2013

I wonder why Linux hasn't adopted something like OS X's "Sudden Termination" mechanism: https://developer.apple.com/library/mac/documentation/Cocoa/...

calinet6 · on Aug 30, 2013

Well FFS that makes a lot of sense.

OSX is just like, "Hey guys, if any of you happen to not need your memory sometimes, would you mind kindly letting me know and I'll go ahead and let you go at a convenient time?" Meanwhile Linux goes on a murderous rampage with unpredictable effects.

erichurkman · on Aug 31, 2013

... And Windows is sitting in a corner, fans kicked on high, trying valiantly to manage with swapping to disk until a sysadmin gives up trying to connect via RDC and yields to the age-old "have you tried turning it off and back on again?"

saurik · on Aug 31, 2013

The part of this I hate most is that if you have physical access to the machine and can hit ctrl-alt-delete you are given an out-of-band dialog that functions flawlessly; this dialog happens to have a "task manager" button, but as the task manager functionality is part of a normal user application and not that magical dialog, you get thrown back into the swap storm, only now with yet another process (taskmgr) competing for memory :(.

ajross · on Aug 30, 2013

That's a few layers up the stack. Sudden Termination is a message to the OS management layer that an app can recover robustly from an unexpected death. That's not a job for the kernel, and presumably Darwin has some handling analagous to the OOM killer for the situation where memory truly is exhausted.

FWIW: Android baked that idea into the framework from the start. All apps are essentially required by the application lifecycle to be robust against sudden failure, and the system is designed to give apps fair warning to save themselves at various times (e.g. on backgrounding). If you think about it, on a battery-powered embedded system that's pretty much a firm requirement anyway.

tmzt · on Aug 30, 2013

Android accomplishes this using a slightly modified OOM killer, which it can control directly through /proc. In addition to "out of memory condition" which traditional OOM killer supports, it has the "low memory killer" (LMK). Processes on Android that are not supposed to be randomly terminated are protected by adjusting the OOM/LMK settings, which the Android system does when launching Davlik for that activity/service.

This may have changed in recent version of Android, as I haven't kept up with internals that much.

ajross · on Sept 2, 2013

Actually it's much more than slightly modified. The lowmemkiller (which is still present, and mostly unmodified from what you saw) code runs out of the cache shrinker framework, not on an OOM event. And it chooses processes to kill based on categories that correspond to UI state (e.g. preferentially killing idle apps before background processes before foreground apps, etc...).

But that's not quite what I was saying. The point was that the promise in OS X (that a process "could be killed if necessary") if baked into Android: it's true by default of all processes, and the framework guarantees a set of lifecycle callbacks to allow processes to persist their state robustly. This isn't part of kernel behavior at all.

plorkyeran · on Aug 30, 2013

The memory alert stuff on iOS is similar; when the phone runs low on free memory it tells the app free up as much as possible, and then only starts killing processes if there still isn't enough available.

It can be a bit finnicky to work with, but it does seem to be pretty much strictly better than just killing random processes without giving them a chance to be better behaved.

MattJ100 · on Aug 30, 2013

Depending on its use, the first thing I generally do on a new server is disable the OOM killer.

At runtime: sysctl vm.overcommit_memory=2

To make it persist, just add vm.overcommit_memory=2 to /etc/sysctl.conf

HerrMonnezza · on Aug 30, 2013

Note however, that turning off overcommit might yield unexpected results upon fork()/exec(): http://www.quora.com/What-are-the-disadvantages-of-disabling...

For instance, if you run a Redis server and turn off memory overcommit, you might not be able to background save.

alextingle · on Aug 30, 2013

Warning - that site requires a login.

0x006A · on Aug 30, 2013

as mentioned in https://news.ycombinator.com/item?id=6300856 you can add ?share=1: http://www.quora.com/What-are-the-disadvantages-of-disabling...

andreasvc · on Aug 30, 2013

That allows me to see the first answer but not the rest.

calinet6 · on Aug 30, 2013

What are the true consequences of disabling it though? What happens in the OOM situation? Or does it just not allow allocation?

adobriyan · on Aug 30, 2013

Allocation fails with ENOMEM.

Piskvorrr · on Aug 30, 2013

...which is an error that is rarely handled in the code nowadays, therefore highly unexpected behavior may be expected ;)

dredmorbius · on Aug 30, 2013

File and/or tear your vendor a new one.

dredmorbius · on Aug 31, 2013

Edit: file a bug ...

IvyMike · on Aug 30, 2013

This reminds me of my one and only question on Stackoverflow: "Throwing the fattest people off of an overloaded airplane." http://stackoverflow.com/q/7746648/67591

AmVess · on Aug 30, 2013

Rats. I read this post, and my Windows 8 install disc vanished.

eliben · on Aug 30, 2013

Sincere congrats. Losing your Windows install disc is one of the best things that can happen to a person.

arethuza · on Aug 30, 2013

That got me wondering as to how much stuff (hand luggage, trollies, food and drink, in-flight magazines...) an average flight carries!

ajdecon · on Aug 30, 2013

It is, in fact, possible to make a process immune to the OOM killer:

echo -17 > /proc/$PID/oom_adj

where $PID is the process ID you want to protect.

oom_adj can be tuned with other values to make a process more or less likely to be killed.

http://www.oracle.com/technetwork/articles/servers-storage-d...

axylone · on Aug 30, 2013

oom_adj is depreciated, use oom_score_adj instead.

https://www.kernel.org/doc/Documentation/filesystems/proc.tx...

zw123456 · on Aug 30, 2013

Some years back I was flying a small commuter who used small prop type airplanes (I call them pterodactyl air). Part way through the flight, I noticed one prop seemed like it was not working, so I leaned forward to alert the co-pilot (the plane was that small). He told me that they would turn off one engine and "feather the prop" to save fuel. I told him that I would be happy to take a up a collection back in the cabin from the other passengers to pay for the extra fuel to power both engines. He chuckled, but I was serious. I never flew with them again.

Maybe there is a way to suspend a process (feather the prop) rather than completely kill processes.

yannyu · on Aug 30, 2013

Just FYI, pretty much every plane with multiple engines is able to safely fly and land with only one engine working, and all commercial pilots are trained to do so. If you were halfway through the flight already, it's entirely possible you were already in descent and didn't need the other prop at all to complete the flight.

yardie · on Aug 31, 2013

Wow, that was ridiculous. So, because you don't understand how planes work this is the pilots' fault. Even if you were serious about the fuel exactly where do you propose they put it? In-flight refuelling of commercial planes doesn't exist.

Argorak · on Aug 30, 2013

If you want to seperate your passengers in first and economy class, this is the relevant guide:

http://lwn.net/Articles/317814/

Usually, I recommend that database and queue servers run the database/queue with a priority that makes it unlikely for them to be killed.

I had a case where a colleague running a script on a server with high pressure killed the queue, which is unadvisable, even if is crash-safe. Before that, the queue was running for 1.5 years straight.

mikegagnon · on Aug 30, 2013

That post is a great, poetic allegory. But ultimately, I think the analogy presents a bad idea. The allegory makes the point that we could entirely avoid OOM errors by engineering a system such that resources are never overcommitted. This is true; we could do that.

However it would be bad.

Under-committing resources (thus removing the need for an OOM killer) will NOT lead to a net gain compared to over-committing resources (and thus requiring an OOM killer of some sort).

If we are unwilling to overcommit resources then it would be woefully uneconomical to run algorithms that have bad worst-case performance (because to avoid over committing you would necessarily need to assume the worst case is encountered every time).

It's just not feasible to avoid algorithms that have bad worst-case performance. Rather, we need to develop better abstractions for dealing with components (e.g. computations, programs, processes, threads, actors, functions etc.) that go over budget. Here's my attempt at developing a better abstraction for web servers: mikegagnon.com/beergarden

Ultimately, we need to treat every system like a soft real-time system, because at the end of the day every program has timeliness requirements and has resource constraints. The current POSIX model does not provide such abstractions and I think that's why we have these debates about OOM killers.

AsymetricCom · on Aug 30, 2013

I love this blog's UI. Did you make it yourself?

I like the idea of the doorman but what if you could somehow pass back useful math to the client? Of course, then you'd have to disregard that useful work yourself or double check it negating the energy savings. Or, perhaps return a map of traveling sales man type problem (maybe a map of metadata and their traversal cost), and they could navigate that map depending on exactly what kind of data they really want, thus reducing your load for valid, heavy requests and if they return a path with lots of nodes, you know to de-prioritize it or drop the request.

mikegagnon · on Aug 30, 2013

> I love this blog's UI. Did you make it yourself?

Thank you! Yes, I made the UI. It's open source: sidenote.io

> I like the idea of the doorman but what if you could somehow pass back useful math to the client?

I think it would be great to have clients perform useful computations instead of just burning cycles. But that's not MVP, even for a research project.

kalleboo · on Aug 30, 2013

Here's a novel way to deal with an out of memory situation caused by slow memory leaks in a long-running server process: start swapping memory that hasn't been touched in literally days or weeks to /dev/null, and pray the process doesn't ever need it again.

bonzoesc · on Aug 30, 2013

That's so indescribably worse than just killing that process the mind boggles. Breaking in a simple, predictable, and detectable way vs. corrupting data and hoping (excuse me, "praying") nobody notices.

kalleboo · on Aug 30, 2013

> vs. corrupting data and hoping (excuse me, "praying") nobody notices.

Well it seems to work for MySQL

Anyway, you wouldn't ever return the nulled data. If the process tries to access the data, THEN you crash it.

fmax30 · on Aug 30, 2013

Better yet instead of sending it to /dev/null save it on disk and reload it whenever the process needs it again. Oh wait that is how virtual memory works.

lcampbell · on Aug 30, 2013

This works for an overcommitted load, but the parent talks about a leak. In this case, you'll eventually exhaust swap.

networked · on Aug 30, 2013

Compressing the memory area used by the leaky process in question would be a gentler and more robust "solution" here. There already exists a Linux kernel module called "zram" [1] with which you can accomplish just that (though you might have to tune your swappiness a bit first).

[1] https://en.wikipedia.org/wiki/Zram

jbrechtel · on Aug 30, 2013

Is pray a syscall?

kalleboo · on Aug 30, 2013

I'm pretty sure there's a linux kernel module. Obviously it'd be required for my method to work. Might not be doable in FreeBSD (incompatible with the branding).

please_advise · on Aug 30, 2013

> Might not be doable in FreeBSD (incompatible with the branding)

Perhaps something more in line with BSD would be a sacrifice(2) syscall? It would pick a random process and...

fusiongyro · on Aug 30, 2013

Or, here's a crazy idea: how about we actually allocate the memory when you call malloc(), and if there isn't any, give you an error instead? Programs could check the return code and decide what to do when they run out of memory themselves. Crazy, I know.

wmf · on Aug 30, 2013

If you disable overcommit then large processes can't fork()/exec(). In theory we should all switch to spawn() but that's not going to happen.

ars · on Aug 30, 2013

You can do that if you like. Use vm.overcommit_memory - setting it to 2 will still allow overcommit if there is room in swap for it. (So a process doesn't have to use the memory, it's just assured of a place in swap should it need it.)

cmbaus · on Aug 30, 2013

My memory is a bit hazy in this area, but I think by default memory is over committed in Linux. What that means is malloc() can return an address that doesn't have physical memory assigned in the page table. Memory isn't committed until it is written to.

This isn't the case with the default MSVC implementation of malloc() in Windows. In Windows address space is reserved and committed with VirtualAlloc(), and typically that is done in one step.

I think memory is over committed because Linus wanted to keep the memory footprint lower than NT early on in the development of the kernel. The drawback is applications may segfault when writing to memory that was successfully returned by malloc().

kyberias · on Aug 30, 2013

You seem to have some terminology problems here. Windows VirtualAlloc may "commit" memory but that does not mean it actually reserves physical pages [1]. That always happens only when the memory is accessed. On the other hand, MSVC's malloc() probably uses HeapAlloc(), which in turn uses VirtualAlloc(). I don't think there are any fundamental differences between Linux and Windows here.

[1] "Actual physical pages are not allocated unless/until the virtual addresses are actually accessed." http://msdn.microsoft.com/en-us/library/windows/desktop/aa36...

MichaelGG · on Aug 30, 2013

That link says: "Allocates memory charges (from the overall size of memory and the paging files on disk) for the specified reserved memory pages."

It does count against the total memory allowed. My laptop has 8GB of RAM and 1GB page, giving me a 9GB overall commit limit. If I spawn a process that eats up 1GB at a time, even Task Manager can clearly show me going up and hitting 8/9GB, then I'll get OOM in my process.

Windows won't commit memory that a process can't use. You can't overcommit, although you might end up in the pagefile. Without the odd concept of fork, you don't end up with processes having huge "committed" address spaces that aren't ever going to be used.

The note about physical pages is just saying it's not mapping it, not that it's not guaranteeing it.

kyberias · on Aug 31, 2013

Right. I stand corrected.

cmbaus · on Aug 30, 2013

VirtualAlloc can reserve address space and commit pages for it depending on the flag provided. Address space is a resource of the process, and pages are a resource for the entire system.

joelthelion · on Aug 31, 2013

I wonder if someone is suddenly going to come up with a magical solution for the OOM problem and put and end to all these pointless discussions.

Systemic33 · on Aug 30, 2013

Has anyone forwarded this to Ryanair yet?

jameswilsterman · on Aug 30, 2013

At least offer parachutes?

twistedpair · on Aug 30, 2013

$150 each, correct change only.

mistercow · on Aug 30, 2013

Also, they count as your carry on bag, and they only bring one parachute on board, so if multiple people have to be thrown off and they've both paid for a parachute, they have to draw straws to decide who gets it.

Edit: And remember, OOF goes off on less than .1% of flights, so they have hundreds of times as many parachutes per flight as they have people who need them. Rumors that parachutes are oversubscribed are therefore wildly inaccurate.

antocv · on Aug 30, 2013

The few cases when Ive seen OOM invoked, it took couple of minutes to kill chromium after flash (of course) messed up, during that time the system was unresponsive and it killed few random smaller processes until it hit the correct one, flash or chromium in some weird interdependent bug. Either way, I wasnt too happy.

After a while I noticed when the bug triggered/the system started becoming unresponsive, and I had a terminal with killall -9 chromium & killall -9 flash-plugin ready to go, so I could myself preempt it and OOM wouldnt get involved. There has to be better mechanism than OOM.

BrainInAJar · on Aug 30, 2013

> There has to be better mechanism than OOM.

Like not overcomitting memory in the first place

Piskvorrr · on Aug 30, 2013

Too slow for the command to pass through the usual "X - WM - TE - shell - killall" chain? Try Alt+SysRq+f ; that, in my experience, is waaaaay faster for invoking oom_killer.

pritambaral · on Aug 31, 2013

His issue is that oom_killer doesn't get it right straightaway. Alt+SysRq+f would still need multiple invocations before it gets flash. Personally, I do use Alt+SysRq+f since it predictably targets GMail-on-Chrome everytime on my system. That is usually enough on my desktop OS for me to jump in manually kill the offender. I can then just F5 GMail.

Piskvorrr · on Sept 1, 2013

Even so, in OOM situations, sysrq invocation is still an order of magnitude faster than killall invoked from a graphical terminal emulator.

As for hinting to oom_killer: I have a script which searches for chrome and flash processes every minute, and sets their oom_score_adj in the high hundreds. This makes reasonably sure that oom killer will go after these processes first.