
The Linux kernel's inability to gracefully handle low memory pressure - emkemp
https://lkml.org/lkml/2019/8/4/15
======
bArray
Similarly, there are many annoying Linux bugs:

`pthread_create` can sometimes return back a garbage thread value or crash
your program entirely without any way to catch it or detect it [1]. High speed
threadding is hard enough as it is, without the kernel acting non-
deterministically.

Un-killable processes after copy failure (D or S state) [2]. If the kernel is
completely unable to recover from this failure, is it really best to make the
process hang forever, where your only available option is to restart the
machine? I ran into this with a copy onto a network drive with a spotty
connection, that actual file itself really didn't matter - but there was no
way to tell the kernel this.

Out Of Memory (OOM) "randomly" kills off processes without warning [3]. There
doesn't appear to be a way to mark something as low-priority or high-priority
and if you have a few things running, it's just "random" what you end up
losing. From a software writing stand-point this is frustrating to say the
least and makes recovery very difficult - who restarts who and how do you tell
why the other process is down?

[1]
[https://linux.die.net/man/3/pthread_create](https://linux.die.net/man/3/pthread_create)

[2] [https://superuser.com/questions/539920/cant-kill-a-
sleeping-...](https://superuser.com/questions/539920/cant-kill-a-sleeping-
process/541493#541493)

[3] [https://serverfault.com/questions/84766/how-to-know-the-
caus...](https://serverfault.com/questions/84766/how-to-know-the-cause-of-a-
oom-error-on-linux)

~~~
idoubtit
Point 3 is wrong. OOM killing is not random. Each process is given a score
according to its memory usage, and the highest score is chosen by the kernel.
The way to mark priority in killing is to adjust this score through /proc. All
of this is documented in `man 5 proc` from `/proc/[pid]/oom_adj` to
`/proc/[pid]/oom_score_adj`.

[http://man7.org/linux/man-pages/man5/proc.5.html](http://man7.org/linux/man-
pages/man5/proc.5.html)

~~~
simias
I haven't toyed with that in a long time (probably about a decade really) but
back when I did it was still very difficult to get the OOM to behave the way
you wanted. IIRC the scoring is fairly complex and take child processes into
account so while it's technically completely deterministic it can still be
fairly tricky to anticipate how it's going to work out in practice. And I did
know about `oom_adjust`. Often it worked. Sometimes it didn't. Sometimes it
would work too well and not kill a process that was clearly using an abnormal
amount of memory. Finding the right `oom_adjust` is an art more than a
science.

Overall I ended up in the camp "you shouldn't throw passengers out of the
plane"[1]. The best way to have the OOM killer behave well is not to have it
run at all. If I don't have enough RAM just panic and let me figure out what
needs to be done.

[1] [https://lwn.net/Articles/104185/](https://lwn.net/Articles/104185/)

~~~
SEJeff
And the OOM Killer has been rewritten about twice entirely in the past decade.
Christoph Lameter (one of my coworkers, who also wrote the SLUB memory
allocator) wrote the very first one. Very little if any of his original code
is in the current linux OOM Killer.

The current approach does indeed work much better. You can entirely disable
the OOM Killer for a given workload with those procfs handles.

------
Animats
Ah, yes, that bug.

Few programs can handle a fail return from "malloc", and Linux perhaps tries
too hard to avoid forcing one. Most programs just aren't very good at getting
a "no" to "give me more memory" Browsers should be better at this, since they
started using vast amounts of memory for each tab.

I used to hit a worse bug on servers. If you did lots of MySQL activity, so
that many blocks of open files were in memory, and then started creating
processes, you'd often hit a situation where the Linux kernel needed a page of
memory but couldn't evict a file block due to some lock being set. Crash. That
was years ago; I hope it's been fixed by now.

~~~
JJMcJ
> vast amounts of memory for each tab

What underlies this? I am astounded to see 1GB of memory returned when I close
a couple of tabs.

Chrome and Firefox both seem like this.

~~~
jamienicol
It's spread across all parts of the browser, but speaking as a Firefox
graphics engineer, we use quite a lot of memory. Painting web pages can be
slow, so we try to cache as much as possible. When elements scroll separately,
or can be animated, we need to cache them in separate buffers. If we get the
heuristics wrong (and it's hard to get it right for every web page out there)
this can be explosive. It's not helped by the fact that graphics drivers can
frequently bring down the whole process when they run out of memory. It's a
hard problem, but webrender will help as it needs to cache less.

~~~
heavenlyblue
So let's say I'd like to write a memory-efficient web page, what should I
avoid then?

~~~
gnode
Rather than guess at what to avoid, you should make use of the memory
profilers which Firefox and Chromium developer tools provide. Apparently
Firefox's memory profiler is an add-on: [https://developer.mozilla.org/en-
US/docs/Mozilla/Performance...](https://developer.mozilla.org/en-
US/docs/Mozilla/Performance/Memory_Profiler)

~~~
heavenlyblue
Yeah, but this is a pigeonholing principle.

I don’t want to spend my time developing something only to discover it doesn’t
perform well due to some reason.

I would prefer to not use any of the performance killers in the first place.

~~~
gnode
Firstly, avoid leaking memory (including objects like images and DOM nodes) in
JavaScript. Leaking memory here means retaining a reference beyond the end of
the object's use. The garbage collector only collects memory which is no
longer referenced; it does not attempt to analyse when a reference is no
longer used.

Secondly, avoid including unnecessary resources. Many web pages include many
libraries which are then mostly unused. Some packaging tools can help
eliminate such unused code.

A memory profiler helps in both cases: it detects leaks, and it measures the
cost of resources, allowing you to make educated decisions about their
inclusion.

------
sinsterizme
Glad to see this issue raised! My system hangs for minutes sometimes and is
very frustrating compared to Windows and OSX which seem to handle out of
memory in a much more user-friendly way. Which seems to be: suspending the
offending program and letting the user decide what to do from there. I'm sure
there's a reason the Linux kernel doesn't do something similar, but can anyone
enlighten me? :)

~~~
yjftsjthsd-h
Probably lack of integration; if NT hits a memory issue, it can just pass
notice to the tightly-coupled userland and GUI. If Linux runs out of memory,
even if it internally knows what process to blame... What would it do that
makes sense for a headless server, TiVo, and Android phone? Keeping in mind
that the kernel folks don't even work that closely with many userspace
vendors.

~~~
wiml
OSX handles this with a kqueue event that can notify userland when the system
moves between various memory pressure states; this is hooked into by
libdispatch and other userland libraries which will discard caches and so on.

I don't see why Linux couldn't do the same; open /sys/kernel/something and
epoll on it.

~~~
antientropic
This already exists: applications can receive memory pressure events (such as
the system reaching "medium" level, where you may want to start freeing some
caches) via /sys/fs/cgroup/memory/.../memory.pressure_level. See
[https://www.kernel.org/doc/Documentation/cgroup-v1/memory.tx...](https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt).

~~~
packetized
The first two nota benes explicitly describe this document being outdated and
not what most people expect when it comes to “memory controller”. I am not
certain that citing this is a great example.

~~~
Sean1708
What about this? Seems to do what they want.

[https://serverfault.com/a/949045](https://serverfault.com/a/949045)

------
cperciva
Further to the comments about the pager hammering the disk to read clean pages
(mainly but not exclusively binaries) even if swapping is disabled: In many
cases adding swap space will _reduce_ the amount of paging which occurs.

Many long-lived processes are completely idle (when was the last time that
`getty ttyv6` woke up?) or at a minimum have pages of memory which are never
used (e.g. the bottom page of main's stack). Evicting these "theoretically
accessible but in practice never accessed" pages for memory frees up more
memory for the things which matter.

~~~
quazeekotl
Unfortunately enabling swap in linux has a very annoying side effect, linux
will preferentially push out pages of running programs that have been
untouched for _X_ time for more disk cache, pretty much no matter how much ram
you have.

This comes into play when you copy or access huge files that are going to be
read exactly once, they will start pushing out untouched program pages to
disk, in exchange for disk cache that is completely 100% useless, even to the
tune of hundreds of gigabytes of it.

Programs can reduce the problem with madvise(MADV_DONT_NEED), but that only
applies to files you are mmap()ing, and every single program under the sun
needs to be patched to issue these calls.

You can adjust vm.swapiness systctl to make _X_ larger, but no matter what,
programs will start to get pushed out to disk eventually, and cause
unresponsiveness when activated. You can reduce vm.swapiness to 1, but if you
do, the system only starts swapping in an absolute critical low ram situation
and you encounter anywhere from 5 minutes, to 1+ hour of total, complete
unresponsiveness in a low ram situation.

There _NEEDS_ to be a setting where program pages don't get pushed out for
disk cache, peroid, unless approaching a low ram situation, but BEFORE it
causes long periods of total crushing unresponsiveness.

~~~
caf
_There _NEEDS_ to be a setting where program pages don 't get pushed out for
disk cache, peroid, unless approaching a low ram situation, but BEFORE it
causes long periods of total crushing unresponsiveness._

Here's the thing: a mapped program page is just another page in the page
cache. Now, you _could_ maybe say that _" any page cache page that is mapped
into at least one process will be pinned"_, but the problem there is that
means that any unprivileged process can then pin an unlimited amount of
memory, which is an obvious non-starter.

A workable alternative might be to add an extended file attribute like
'privileged.pinned_mapping', which if set indicates that any pages of the file
that have active shared mappings are pinned. That means the superuser can go
along and mark all the normal executables in this way, and the worst-case
memory consumption a user can cause is limited by the total size of all the
executables marked in this way that the user has access to.

~~~
jdsully
Theres no reason extra data cannot be added to entries in the page cache to
make smarter decisions. That’s how Windows and OS X do it in their equivalent
subsystems.

Nobody is suggesting these pages be pinned which is an extreme measure.

~~~
caf
The problem I'm trying to point out here is that if the extra metadata in the
page cache is entirely under user control (like for example _" is mapped
shared"_ and/or _" is mapped executable"_) then it amounts to a user-specified
QOS flag.

That might be OK on a single-user system but it doesn't fly on a multi-user
one. That's why I suggested you could gate that kind of thing behind some kind
of superuser control.

~~~
jdsully
Why can’t a user make QoS decisions for their own pages? Root controlled pages
should obviously have higher priority.

The kernel could still “fairly” evict pages across users - just letting them
choose which N pages they prefer to go first.

~~~
caf
_Why can’t a user make QoS decisions for their own pages?_

Because then you just get everyone asking for maximum QOS / don't-page-me-out
on everything they can.

The pages in the page cache are not owned by a particular user, they're
shared. If there's three users running /usr/bin/firefox, they'll all have have
shared read-only executable mappings of the same page cache pages. If you do a
buffered read of a file immediately after I do the same, we both get our data
copied from the same page cache page. So it's not at all clear how you'd do
the accounting on this to implement that user-based fairness criterion.

------
zwaps
This exact bug has been a huge issue for me when I am developing with Matlab.
Those are large simulations.

Things get swapped around and memory is often close to the limit. Linux then
becomes unresponsive, and basically stalls. Theoretically it recovers, but
that process is so slow that the next stall is already happening.

It is therefore impossible to run large scale Matlab simulations on my Linux
machine, while it is no issue in Windows.

As far as I can see, Linux is only usable with enough RAM so that it is
guaranteed you never run out. I don't know why this has never been an issue, I
guess because it is a Server OS and RAM is planable, or very infrequent?

~~~
ri0t
Add swap (Windows does that, too) and never use more memory than you have RAM
(edit: ..in one process). Storing swap on a quick storage adds to the fun and
the price.

The one trick, SSDs don't like.

~~~
Dayshine
Swap doesn't resolve this on a HDD though. The UI/terminal still locks up, and
you still can't recover once you hit the point of thrashing.

What really confuses me is that this kernel was developed when SSDs didn't
exist, so how on earth did "The system becomes irrecoverably unresponsive if a
single application uses too much RAM" get missed?

~~~
wott
> _so how on earth did "The system becomes irrecoverably unresponsive if a
> single application uses too much RAM" get missed?_

I don't know, there are/were several similar issues (very basic situations,
frequently encountered by everyone or at least many people) which are/were not
fixed for years (we might say decade(s)): that one dealing with memory
exhaustion; then right after that, the problem which follows when memory is
freed but the system is still unresponsive for several minutes(!); freezing
when writing to a USB disk; freezing when something goes wrong on a NFS
mount...

I never understood why those common and really important issues were not
tackled (or not tackled before many, many years). IMO they were such basic
functionalities, which a proper OS is expected to perform reliably as a basis
for and before all the rest, it should have been dealt with and granted
highest priority.

------
waingake
I'm so happy someone has made a clear bug report here. Because damn, this is a
thing.

~~~
throwaway2048
Yep, even with no swap whatsoever, performance is completely trashed (talking
even the mouse lags for 30+ seconds at a time) for like a solid 5+ minutes
before the oomkiller triggers, with swap, you might as well just reboot
because the system will take perhaps an hour to start responding.

Linux is completely useless with ram that is almost full in a way that OSX and
windows absolutely are not.

~~~
brendangregg
I don't think that's a fair comparison: Do you normally run OSX with the swap
files completely disabled, or Windows with the pagefile completely disabled?
That's what this bug is describing. I'd bet things get pretty nasty on OSX and
Windows too, if you tried that.

Perhaps the real bug is that Linux distros make it easy to run swapless.

~~~
Aaargh20318
Isn’t iOS basically a flavor of macOS that runs without swap ?

~~~
earenndil
Yes, but it's also a heavily integrated environment that aggressively quits
background programs on memory pressure.

~~~
Aaargh20318
But isn't that exactly what the linked article advocates Linux should also do
?

Before quitting background applications it first sends them a request to free
memory, in a well-behaved iOS program you use this to clean up your caches and
ensure your don't use more RAM than you absolutely need. You should also
suspend your state to disk when your app is backgrounded so you can just
continue where you left off if your app is killed.

Many macOS apps also do this, you can forcefully restart a Mac and after a
reboot it'll restore your session to pretty much the exact state you left it
in, including any open 'unsaved' files.

Linux could implement a similar mechanism to signal apps to clean themselves
up and maybe a 'save your state, you're about to get killed' signal.

~~~
ken
> Linux could implement a similar mechanism to signal apps to clean themselves
> up and maybe a 'save your state, you're about to get killed' signal.

Isn't that pretty much what "memory.pressure_level" [1] is?

[1]:
[https://www.kernel.org/doc/Documentation/cgroup-v1/memory.tx...](https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt)

------
nielsole
Also, when you run low on memory, iptables can crash on you:
[https://bugzilla.kernel.org/show_bug.cgi?id=200651](https://bugzilla.kernel.org/show_bug.cgi?id=200651)

Poof, no more networking

~~~
ungamed
No, just not 'that operation' afaics.

------
blattimwind
> Your disk LED will be flashing incessantly (I'm not entirely sure why).

The VM is basically paging all clean pages in and out constantly as their
tasks become runnable. A pretty standard case of thrashing.

~~~
throwaway2048
This is with swap disabled.

~~~
ploxiln
It should have been clarified, because it is not obvious:

The kernel can evict memory-mappings of executable files which are currently
running. When they jump to a part of the executable that is no longer in
memory, it can page that part back in from the executable file on disk.

This is pretty cool. But when memory is very low, the kernel will evict
practically all user-space executable mappings from memory, and will be
reading back in and evicting back out executable file contents on practically
every single context switch. It's just trying _so hard_ to squeeze out some
space to make its tasks fit in memory and complete successfully.

I think this was the desired behavior of big-iron batch-processing in the 90s?
Not sure why it has persisted so long. I'm a big fan of linux and this is my
biggest pet-peeve.

~~~
ajross
What's the other option, though? It gets a fault for a page somewhere, it
needs memory, and it has none. You have to evict _something_. What's your
choice if not swap or filesystem-backed pages?

Systems have never operated well under true VM pressure. Not in the 90's, not
now. When the working set goes beyond available memory performance falls off a
cliff.

And the report in the article doesn't seem to have a good comparison anyway. I
mean, do we seriously believe that Windows or OS X can handle this condition
(4G, no swap) better? I mean, swap is default for a reason.

~~~
ploxiln
It just has to kill something. The choice is between killing some things, or
freezing everything, for way too long (many minutes).

I'll agree that Windows and macOS aren't perfect here. But I expect better,
that's why I prefer Linux. (Better networking, better filesystems, better
configuration and debugging of core system services, easier installation of
stable C libraries for development or scripting ...)

~~~
slacka
Can speak for OS X, but I dual booted Win 7 with swap disabled on my first
machine with a SSD. Windows would just kill the Chrome tab that was eating all
my Ram and I'd move right along. Linux locks up and is totally unresponsive
even if you manually trigger the OOM killer. Windows behavior is much, much
better for desktop users.

------
deepbreath
I committed the grave mistake of purchasing a laptop with only 8GB ram and I
constantly run out of memory as a result. When it happens, I just repeatedly
mash alt+sysrq+f until it kills off some chromium tabs and unfreezes my
machine. It essentially behaves like one of those extensions that lets you
unload tabs. If needed, you can get the tab back by just reloading the page.
The machine slows down to a crawl at 96% usage, and freezes at 97% usage
(according to my i3 bar).

~~~
wolfgang42
A few months ago I _upgraded_ my system to 8GB RAM and I don't see the
behavior you describe. It does require a little more care when choosing which
programs to use[1], but not significantly so. However, you do need to make
sure you have swap enabled to let the kernel efficiently manage its resources.
It's not unusual for me to have 1-2 GB of stuff in swap; this doesn't affect
performance significantly since it's parts of the system that don't need to
run, but if you insisted that they all stay resident then it would put a
considerable strain on the system in low memory conditions.

[1] The big one for me is that I can't run the Atom text editor, Firefox, and
a virtual machine all at the same time.

~~~
foldr
>[1] The big one for me is that I can't run the Atom text editor, Firefox, and
a virtual machine all at the same time.

Is this on Linux? I have no problem with that workload on an 8GB Macbook Air
(2019). I was apprehensive about going with the 8GB model, but I've not really
had any issues.

I wonder if memory compression on OS X helps here.

~~~
snazz
Are you swapping to the “VM” section of your APFS container on an insanely
fast PCIe (NVMe?) SSD? That’s my guess. Although macOS does a great job of
handling poorly behaved programs that use tons of RAM just to begin with.

~~~
foldr
I'm sure the fast SSD helps to mitigate the relatively small amount of RAM,
yes. I can't say to what extent.

------
quotemstr
I've never liked the approach Linux kernel and userland take to memory
exhaustion. Many people confidently asserts that it never happens. The
somewhat better-informed suggest that it's unreasonable to write programs that
recover from memory exhaustion because unwinding requires allocation --- a
curious belief, because there are many existence proofs of the contrary. Then
we get a feedback loop where everyone uses overcommit because everyone
believes that programs can't recover from OOM, and people avoid writing OOM
recovery code because they believe that everyone is using overcommit and
allocation failure is unavoidable. And then they write kernel code and bring
this attitude there.

Memory is just a resource. If you can recover from disk space exhaustion, you
can recover from memory exhaustion. I think the current standard of memory
discipline in the free software world is inadequate and disappointing.

~~~
tsimionescu
It's not just memory discipline that is pretty bad (and not just in the OSS
world). I've recently seen several newer languages refuse to deal elegantly
with low-level errors.

For example, in a Java server application, if one request encounters some
buggy code that tries to read past the end of an array, that request will
fail, but all others will succeed - this will give a good chance for the
system to be usable, and getting a good bug request, with system-generated
diagnostics, for the buggy requests.

However, in Go or Rust, the same scenario panics and kills the entire process
by default - turning a potentially minor bug in some obscure part of the
system into a system-wide crash.

OOM is obviously harder to deal with (e.g. if one request is using too much
memory, there's no guarantee that it won't be other requests actually seeing
the OOM errors first), so if we don't even want to deal with the easy stuff,
how can we hope to deal gracefully with the hard ones?

~~~
angelsl
That's because you are supposed to do bounds-checking yourself in Rust, not
because they don't want to handle it gracefully.

It's like happens when you read out of bounds in C, except it fails more
reliably.

~~~
tsimionescu
You're supposed to write correct code in Java as well. The reality is that it
doesn't always happen. C doesn't claim to handle the issue at all, and doesn't
verify it, which is at least a performance gain. Rust does verify it, but
issues an error type that is not guaranteed to be recoverable at all.

------
rwallace
How are all the people talking about Windows here getting it to behave better?
In my experience, when you run out of memory on Windows, the whole machine
locks up hard for ten or fifteen minutes while it thrashes the disk before
finally killing the offending process. (Admittedly that's on spinning metal;
SSD would probably do better.)

~~~
davidparks21
On Windows I could consistently open a 32GB matrix in Matlab with 16GB of RAM
on my laptop and perform operations on the matrix. The disk would spin, and it
would take 20 minutes to do a simple operation because of the swapping, but I
could open it, perform the operation, save, and exit successfully. I could
easily background Matlab and do email or other common tasks such as browsing
with very little impact to those applications. On Linux Mint that same task
locks the mouse and brings the system to its knees, I can't even kill Matlab
and would typically resort to a hard reboot. I learned quickly that I can't do
the same things on Linux Mint that I used to do pretty easily on Windows.

~~~
rwallace
For me, Windows (7, 64 bit) behaves exactly as you report for Linux. I would
love to be able to get Windows to behave like it does for you. What version
were you using? Did you tweak any settings?

~~~
davidparks21
I haven't been running windows for a year or two now, so this was a while
back, but I think I was on Windows 10, possibly it was 8 back then, no tweaks.
But I was quite successful at this in Matlab specifically. If you overload
memory across many processes perhaps you can get into a bad place, but when it
was just one process that was abusing swap, Windows is quite good about making
sure other processes aren't dramatically affected in my experience (there was
some lag, but it was usable).

------
linsomniac
The most annoying thing about OOM is when a process goes crazy and starts
using a lot of memory, the OOM killer looks at the system and sees that
process is really active, so it kills mysql/ssh/apache/postgres to make room
for the run away.

I've set up monitoring that pages me when "dmesg" includes "OOM".

~~~
isodude
You can actually adjust the oom_score on those long lived important processes
to hinder OOM to kill them.

~~~
linsomniac
And I've always meant to go in and set SSH to be immune to OOM, and
deprioritizing others like databases, but I've just never gotten around to it.
Looks like oomd could be useful, once we get to kernels that support it in
production (looks like 4.20, Ubuntu 18.04 has 4.15).

~~~
isodude
Another interesting aspect is to set SSH to realtime. When you log in you set
it non-realtime if you work is not important to avoid sinking the server into
oblivion by a simple command.

It is possible to do when you have shared servers as well, by letting root log
in to another port instead and setting that process to realtime instead.

Thoughts in my head but never got around to do it.

------
GuB-42
I wonder how they did with Android? Especially in the early days, not with
today's 8GB+ monstrosities.

My first Android device was a Nexus One. 512MB of RAM for what is essentially
a full Linux system. Able to run a browser and multiple Java apps, all
isolated and running their own VM. Task managers often reported near 100% RAM
use and things still worked fine.

And my understanding is that optimized things further, but given how
overpowered phones are today and how bloated apps are, it is hard to tell.

~~~
sshb
Android is described further in this thread
[https://lkml.org/lkml/2019/8/5/1121](https://lkml.org/lkml/2019/8/5/1121)

------
mdellavo
facebook's solution
[https://facebookincubator.github.io/oomd/](https://facebookincubator.github.io/oomd/)

~~~
meruru
[https://github.com/rfjakob/earlyoom](https://github.com/rfjakob/earlyoom) has
worked well for me in the past.

~~~
htns
It's even in debian/ubuntu repos now. I hadn't realized that.

------
alexozer
A couple weeks ago, one of my physical stick of RAM completely stopped working
after yet another Linux out-of-memory-force-poweroff situation. No idea if
that could be the proper cause, but I do find it a little funny.

I just arrived at this thread after my entire system stalling completely at
yet another low memory situation.

Let's just say I'm extrememly grateful to discover some of these userspace
early OOM solutions in this thread.

------
alexghr
I hit this bug yesterday on my laptop (16GB of RAM / 1GB of swap) with 2
instances of Firefox (about 60 tabs), Slack, Insomnia (Electron-based Postman
clone) and a couple of `node` processes watching and transpiling.. stuff.
`kswapd0` was running at 100% CPU, I guess trying to free up some RAM by
moving things to swap (the swap partition was full by this point). Luckily I
managed to recover the system by switching to another tty and killing kswapd0
and the node instances.

Sometimes instructing the kernel to clear its caches helps: `echo 1 | sudo tee
/proc/sys/vm/drop_caches` [1]

[1]: [https://serverfault.com/questions/696156/kswapd-often-
uses-1...](https://serverfault.com/questions/696156/kswapd-often-uses-100-cpu-
when-swap-is-in-use/696185#696185)

~~~
IshKebab
I wouldn't enable swap on a desktop Linux system. When you run out of memory
if you have swap the system grinds to a halt and you pretty much can't do
anything to save it, or at least it is a battle.

Without swap it just kills processed until there is enough memory, which is
what you would have done anyway!

I think the main annoyance with Linux here is that in Windows you get to
_choose_ what to kill, whereas in Linux it can't really communicate with you
(because the kernel doesn't know about such modern things as GUIs) so it had
to pick more or less randomly.

~~~
michaelmrose
Having some swap with low swappiness allows the system to page out pages that
are unlikely to be used or haven't been used in a long time but aren't backed
by a file.

In normal conditions this frees up memory for more useful data and helps you
avoid getting to perverse conditions.

~~~
IshKebab
That's how it _should_ work yes. Unfortunately it doesn't _actually_ happen
like that, hence this entire discussion.

~~~
michaelmrose
It actually DOES happen like this. When the entire working set for actively
used apps fits in memory swap lets the system page out things that are little
used. This works perfectly fine.

This is to say that swapping out little used stuff delays the point where you
are actually out of memory and performance goes straight to hell.

This means the optimal arrangement for desktop use is some swap and low
swappiness.

One could imagine that perhaps something like

[https://github.com/rfjakob/earlyoom](https://github.com/rfjakob/earlyoom)

Might be an easier route to better behavior especially as you can more easily
tell it what it ought to kill.

The behavior of the kernel could probably be improved but it is probably
inherently lacking the data required to make a truly optimal choice along with
a gui to communicate with users. Going forward possibly desktop oriented
distros should probably come with some out of the box gui to handle this
situation built into their graphical environments before it gets to the point
of dysfunction.

------
userbinator
I'm not sure how he's getting swap even with swap off, but this seems to be
the big disadvantage to having overcommit --- the memory allocator won't ever
say NO, so an application can keep allocating memory even if that memory
becomes uselessly slow to actually access.

Then again, this "allocation will never fail" mentality has also lead to
applications being written with such an assumption, and when allocations do
fail, they crash. (Arguably, that's better than thrashing the rest of the
system.) I don't know if the modern browsers will actually stop letting you
open new tabs and just give an "out of memory" error instead of crashing, but
that's how most Windows programs are usually written --- without the
assumption that allocations can never fail, because on Windows, they can.

~~~
ars
> I'm not sure how he's getting swap even with swap off

It's not data swap, it's executables. Linux knows it can reread the executable
from disk if needed, so it uses those memory pages from other things, and
reads them in when needed.

~~~
mktmkr
This is why one uses mlockall.

------
makz
So... running Linux swapless is a thing? How popular is it?

~~~
yongjik
Kubernetes, for example, doesn't even support swap. Some bug reports say it
won't even run with swap enabled, though I didn't test myself. ¯\\_(ツ)_/¯

~~~
jillesvangurp
I don't know about that but it is very common to specify CPU and memory limits
for docker containers. Exceeding those automatically leads to the process
being killed. The reasoning is very simple: any form of swapping is completely
unacceptable on a production server because it randomly and massively degrades
server performance. If you have a cluster of stuff and one node is misbehaving
like that, you kill it because that is completely unacceptable. If that is a
regular thing, your servers are obviously mis-configured in some way and you
fix it by provisioning more hardware or tweaking the limits.

12 years ago, before I got a mac, I had a windows XP laptop with enough memory
(8GB) to disable the swap file (which world+dog will insist is a very stupid
thing to do). This was great and vastly extended the useful life of my laptop.
Alt+tabs were instant and I could run e.g. JVM applications with sane heap
settings as well as a browser, office stuff and a few small things I needed
with zero issues. On the rare occasion that something did run out of memory,
it died or I killed it. Laptop disks were stupendously slow at the time; any
form of swapping on a slow laptop disk is extremely disruptive. SSDs are much
better but there too it tends to be mostly redundant.

IMHO most forms of swapping are highly undesirable on both servers and end
user hardware. Swapping to free up memory for file caching is simply
unacceptable when you can instead just evict cache pages. If you don't have
enough memory left to cache effectively, that just means things like memory
mapped files will get a lot slower. If something allocates more memory than
you have just kill it.

~~~
toast0
> The reasoning is very simple: any form of swapping is completely
> unacceptable on a production server because it randomly and massively
> degrades server performance. If you have a cluster of stuff and one node is
> misbehaving like that, you kill it because that is completely unacceptable.
> If that is a regular thing, your servers are obviously mis-configured in
> some way and you fix it by provisioning more hardware or tweaking the
> limits.

A small swap space (~ 1G on a 64+G ram server) is a reasonable backstop
against a slow memory leak. Assuming you don't have filesystem pages evicting
anonymous pages, swap use is a clear indicator of too much memory use and
points you in the direction of something to fix; and gives you a little bit of
time to fix it on the running system. As long as swap is very small relative
to ram, it's not going to enable thrashing -- a big leak or burst in use isn't
going to fit in swap and you're going to be dead anyway.

------
jhallenworld
This is a very old problem, I used to see it decades ago when making tape
backups. Tar would use move the entire disk through the buffer cache so that
eventually everything in it was paged out. The classic solution was to use
unbuffered versions of the disk device for backups.

What I've always thought is that there should be a working set size limit on a
process which includes the buffer cache somehow. The idea is that the process
may not use more RAM than this size- if it exceeds it, it must either fail or
swap out its own pages, not those from any other process. This would fix the
problem for tar- it only needs a tiny amount of memory.

I think the situation is very similar with the web-browser example. The
browser should not be allowed to force all unrelated data to be paged out.

~~~
throwaway8941
>working set size limit on a process which includes the buffer cache

You can control this with cgroups. Plug a process into a separate cgroup and
set the `memory.limit_in_bytes` knob to whatever your heart desires.

I use it to limit the qBittorrent's memory usage on my machine. `firejail` is
very convenient for doing this. If I don't set a limit (30% RAM in my case),
it eats up all the memory with a uselessly large file cache, which does not
improve upload speeds at all.

[https://access.redhat.com/documentation/en-
us/red_hat_enterp...](https://access.redhat.com/documentation/en-
us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-memory)

------
w-m
When working on Ubuntu 16.04 LTS, this is such a productivity killer. Quite
annoyed at the time lost from this behavior, after coming from a Mac. In
shells where I run a program that may load a larger data set (e.g. before
ipython), I now regularly run `ulimit -v 50000000` to limit the shell's
virtual memory to ~50 GB of the available 64 GB on this machine.

If the program tries to use more RAM it'll then just die, and not drag down
the whole system with it. Works fine, but I really shouldn't have to do this.

------
codedokode
It is interesting how Android that typically has less memory than desktop
systems solves such problems. It kills inactive applications and background
browser pages. The program that can save its state is more complicated, but it
works better with limited amount of memory. Today there are many applications
written using languages like HTML or JS, or garbage-collected languages and
unless you can unload them from memory, there will never be enough of it.

~~~
aitchnyu
Wish desktop browsers do this by default, except for pinned tabs or last 20
pages. But then frontend guys make heavy apps with with animated meme loading
screens.

------
cyborgx7
I've been living in denial about this being a linux specific issue until I saw
this post. Eventhough I encounter this problem frequently on linux and it has
almost never happened to me on OSX or Windows, I've just been telling myself
that it was because of the hardware I was using in each case. If they found a
way to fix this, where just the browser froze and not the entire OS, it would
be a huge improvement.

~~~
dtf
Do you run Windows without Pagefile/Swapfile?

~~~
cyborgx7
I don't know. But I've had the problem described above with a linux with swap
enabled.

------
senozhatsky
lkml.org has been unstable for the past few... umm years, so The Linux
Foundation runs its own lkml archive - lore.kernel.org/lkml/

Alternative link, just in case:
[https://lore.kernel.org/lkml/d9802b6a-949b-b327-c4a6-3dbca48...](https://lore.kernel.org/lkml/d9802b6a-949b-b327-c4a6-3dbca485ec20@gmx.com/)

~~~
zfgnu
That html code in lore.kernel.org is weird, I wonder how it's generated.

~~~
mort96
It's a somewhat common trick I believe. The idea is this; you want newlines
inbetween your tags, but if you have HTML code like
`<div>foo</div>\n<div>bar</div>`, you end up with an unwanted text node with a
space inbetween the divs which changes how the page looks. By putting the
newline inside the tags instead of between them, you don't have any unwanted
text nodes.

------
KingMachiavelli
Does the Raspberry Pi suffer from this? All but the latest models have less
then 4GB of ram & their storage is often slow SD flash (technically it could
be fast but most people have cheap SD cards) so it's fits this scenario
perfectly. I guess most users aren't pushing a lot into memory like GUI
browsers do.

~~~
rocky1138
It has nothing to do with the hardware. This will happen on any system with a
default Linux install.

~~~
KingMachiavelli
I didn't mean to imply it had anything to do with hardware rather I intended
to point out that there should be a large base of users (owners of RPis)
experiencing the issue since the hardware is exactly what you need to
reproduce it (limited RAM & slowish disk speed).

------
davidparks21
In Windows on a 16GB RAM laptop, I've often fired up Matlab, opened a 32GB
matrix, and performed a few simple operations on it. In Windows Matlab
dutifully chugs away on the problem, the disk spins like mad, and I put Matlab
in the background and do email for 20 minutes. This identical use case
completely cripples my Linux Mint OS, the mouse hangs, nothing functions, and
I've never gotten it to even complete the operation. I just can't operate on a
32GB matrix with 16GB of RAM in Linux, but I can in Windows with relative
ease.

To me, this is the Linux kernel's biggest weakness against Windows. Most other
gripes about Linux (poor power management, poor driver support, etc) belong
outside the kernels domain, but this one is a glaring win for Windows over
Linux.

------
hyperion2010
I use a swap file these days because in the 4 years since I purchased my
computers I went from never hitting 32gigs of memory used at the same time, to
hitting it once a week. The worst offenders are browsers and the JVM. The swap
file saves me from those 20 seconds of distraction when running a variable
memory workload that suddenly jumps over the limit and hardlocks the computer
for hours on end. If I was doing something important I will wait for OOM
killer to maybe reap the evil children, but otherwise I just power cycle the
system and add a note to put the swap file in fstab.

------
regularfry
This is an obvious idea, so I presume there's a reason why it wouldn't work,
but what would happen if you had different rules for uid=0 pids and for
everything else? If processes running as root were never eligible for oom-
killing, and could force mallocs by triggering an oom-kill of user processes
as necessary, wouldn't you always be able to recover a thrashing system from a
root console? Or is it too hard to isolate console IO from the rest of the
system in that situation?

~~~
mort96
The issue isn't that the OOM killer is too aggressive and killing the consoles
or shells you're trying to use to rescue your system. The issue is that when
you're trashing, the system becomes unresponsive; it's hard to recover the
system because it takes many minutes to even just switch to a TTY.

~~~
regularfry
That's kind of where I'm trying to get to, I think: how can you segregate the
system such that root processes get a priority such that they don't get
affected by the thrashing?

------
C4stor
I'm seriously impressed by the quality of the discussion generated from this
on the mailing list, a great example of online collaboration imho !

------
WalterBright
I've noticed a similar problem with low free disk space with every OS I've
tried it on. All kinds of erratic behavior, hangs, etc.

------
not2b
The program using all the memory here is Chrome (or Firefox). It has more
information about what is going on than the kernel does. It should be smarter
about memory use when it is trying to consume more memory than is available.
Perhaps it could page out background tabs to disk or something similar if
memory is low.

~~~
Someone1234
Those were just examples. This issue can be reproduced using any process.

------
ajyotirmay
Yes, it has been bugging me a lot. My SWAP space remains empty, and my RAM
runs out of space. And that situation is frustrating because I can't close my
applications or even restart display server to clean up memory.

Something needs to be done here for real, otherwise Linux is a nice software

------
known
I read somewhere Linus Torvalds recommending
[https://en.wikipedia.org/wiki/Paging#Swappiness](https://en.wikipedia.org/wiki/Paging#Swappiness)
to 90 and let Kernel decide what/when to swap;

------
igneo676
What are these settings he's referring to? Genuinely asking - I used to run
into this issue _all the time_ and even if it's not a default I'd love to
toggle some flags and get a responsive system even under low-memory situations

------
hagreet
After reading through the comments I would like to know the following: Why are
there no priorities?

I can't figure it out from the answers. I think with root privileges it should
just be possible to say "GUI has higher priority" etc... Then when there is a
memory issue you kill some low-priority processes to get the memory back.

But whatever any sane person considers part of the operating system because it
is the bare-minimum of what is required to do stuff (filesystem, gui, ...)
needs to have priority and always be fast. This can be defined by the
distribution using the startup privileges.

So, why is this so difficult?

~~~
draugadrotten
Read the comment by idoubtit in the thread below to learn how to prioritize:

idoubtit 2 hours ago | unvote [-]

Point 3 is wrong. OOM killing is not random. Each process is given a score
according to its memory usage, and the highest score is chosen by the kernel.
The way to mark priority in killing is to adjust this score through /proc. All
of this is documented in `man 5 proc` from `/proc/[pid]/oom_adj` to
`/proc/[pid]/oom_score_adj`. [http://man7.org/linux/man-
pages/man5/proc.5.html](http://man7.org/linux/man-pages/man5/proc.5.html)

------
11235813213455
I had exactly this issue 3 years ago, when I was still using 4G RAM and
working with heavy frontend stacks (gulp, webpack1, ..)

------
myrandomcomment
My biggest issue is that everything gets swapped eventual in favor of disk
cache. I know there are setting, but..it is just wrong.

~~~
nisa
What is the actual setting for exactly this behavoir? At least I'd like to
disable it.

~~~
kasabali
vm.vfs_cache_pressure

I'm not sure if it'll exactly accomplish what you want though.

~~~
nisa
thanks, I'll take a look. I remember playing with it for 32mb flash devices.

------
parentheses
What is the solution here?

Are you recommending a swap space be created automatically on behalf of the
user?

One could also use compressed memory (`zram`).

~~~
zzzcpan
swap on zram (RAMSIZE - 1G) and earlyoom

------
liopleurodon
I used to have a cgroup just for chrome to limit how much total ram it could
use because of this exact thing

------
Tonitrus
The elephants in the room are Chromium and Firefox. They turned the job of
displaying an HTML page into a cpu and memory inefficient nightmare. The
display of 4KB of information, that is the informational weight of a typical
webpage, must not take up 400.000MB of memory and millions of cpu-
instructions. Look out for simpler HTML plumbers e.g. Dillo and w3m and help
refine their table rendering.

------
jangid
At one point we used to brag about using less memory. Common discussions were
- "see, my memory usage — while hitting free -m on cli", "see, my kernel size
is just 500kb, I chose just the right module" and so on…

------
sunseb
I fixed this bug myself... I bought extra RAM. :)

------
avodonosov
I increased swap to avoid that on my laptop.

------
garbre
I've tried OOM killers and thrash-protect. I've tried numerous tweaks to the
vm and swap setting. Nothing works. Memory use gets into the 90%s and the
system freezes, hard.

Nonetheless, I'm surprised someone is calling this a bug. Let's face it, Linux
is just not a desktop operating system. It's a server operating system, and it
expects that it will be professionally administered and tightly controlled to
prevent OOM situations. That OOM situations occur on servers too is beside the
point. There are reasons for the linux memory system to work as it does,
reasons Linus will yell at you about if you complain.

