
What has to happen with Unix virtual memory when you have no swap space - stargrave
https://utcc.utoronto.ca/~cks/space/blog/unix/NoSwapConsequence
======
wahern
> If you can't on your particular Unix, I'd actually say that your Unix is
> probably not letting you get full use out of your RAM.

What use is "full use" if your system live locks!? This is similar logic to
that of overcommit--more "efficient" use of RAM in most cases, with the tiny,
insignificant cost of your processes randomly[1] being killed?

What happens in practice is that people resort to over provisioning RAM
anyhow. But even then that's no guarantee that your processes won't be OOM-
killed. We're seeing precisely that in serious production environment--OOM
killer shooting down processes even though there's plenty (e.g. >10%, tens of
gigabytes) of unallocated memory because it can't evict the buffer cache
quickly enough--where quickly enough is defined by some magic heuristics deep
in the eviction code.

[1] And before you say that it's not random but can be controlled... you
really have no idea of the depth of the problem. Non-strict, overcommit-
dependent logic is baked deep into the Linux kernel. There are plenty of ways
to wedge the kernel where it will end up shooting down the processes that are
supposed to be the most protected, sometimes shooting down much of the system.
In the many cases people simply reboot the server entirely rather than wait
around to see what the wreckage looks like. This is 1990s Windows revisited--
reboot and move along, unreliability is simply the nature of computers....

~~~
toast0
> This is 1990s Windows revisited--reboot and move along, unreliability is
> simply the nature of computers....

Ugh, so much this. Where I work (until the 16th) we've moved from an
environment of stability, where problems are investigated and fixed and stay
fixed, to our acquirer's environment where things break randomly at all times,
and it's not worth investigating system problems because nothing will stay
fixed anyway. #totallynotbitter

~~~
dylan604
This is that part of being professional where you want to just keep working
until your last day, or come in with your feet kicked up and do nothing with
the "what are they going to do, fire me?" attitude. If there's severance, I'd
go with the former. If everything is settled and it's just waiting out your
time, try the latter?

~~~
halfmatthalfcat
If you’re trying to be professional, the latter is never an option.

------
lmilcin
Well, the issue really is that applications are not being designed correctly.

To create reliable applications you need to design them to work within limits
of memory allocated to them. Unattended applications (like RDBMS or
application container) should not allocate memory dynamically based on user
input.

By definition, reliable application will not fail because of external input.
Allocating memory from OS cannot be done reliably unless that memory was set
aside somehow. If the memory was not set aside then we say we are in
overprovisioning situation and this means we accept that it might happen that
a process wanting to allocate memory will fail to get it.

So the solution (the simplest but not the only one) is to allocate the memory
ahead of time for the maximum load the app can experience and to make a limit
on the load (number of concurrent connections, etc.) so that this limit is
never exceeded.

Some systems do it halfway decently. For example Oracle will explicitly
allocate all its memory spaces and then work within those.

The OS really is a big heap of compromises. It does not know anything about
the processes it runs but it is somehow expected to do the right thing. We see
it is not behaving gracefully when memory runs out but the truth is, OS is
built for the situation where it is being used mostly correctly. If memory is
running out it means the user already made a mistake in desiging their use of
resources (or didn't think about it at all) and there is not really much the
OS can do to help it.

~~~
imtringued
The downside to this is that if your application ever needs less than the
configured amount of memory then that memory cannot be used by other
applications. This is a big reason why java is such a huge memory hog even if
the application itself isn't demanding nearly as much memory.

~~~
majewsky
I believe that Linux has all the tools to solve this in practice:

1\. When you allocate memory, but do not write into it yet, those pages not
mapped in RAM and thus don't occupy actual space.

2\. When you're done with a specific page of memory, you can
madvise(MADV_FREE) it, which means that the kernel can discard these pages
from RAM and use it for caches and buffers. _But_ you still hold on to the
virtual memory allocation, so you can just start writing into the page again
when you need more memory and the kernel will map it again.

If I understand all that correctly, you can have your allocator work in such a
way that it keeps a large reserve of preallocated memory pages, but the
corresponding amount of RAM can be used for caches and buffers when it doesn't
need everything. An interesting question would be how that scenario appears in
ps(1) and top(1), i.e. whether those MADV_FREE'd pages would count towards
RSS.

~~~
AstralStorm
And 3. When kernel cannot supply you with the allocated page what happens?
(Mapped but inaccessible.)

Currently, OOM killing.

------
popeye77
I'd like to add that RHEL5 was more than tolerable in this aspect. Whatever
kernel series it lived through, it seemed logical. We knew when RHEL5 systems
started to swap that was our sign to A) look for long term usage growth that
is beginning to bump some threshold somewhere B) someone turned a knob they
shouldn't have C) a recent code change has caused a problem in the application
stack

Then RHEL6 came along, and it swapped all the time. Gone was our warning. The
stats shows tens of gigabytes of cache engaged. WUT? How do we have tens of
gigabytes of memory doing nothing? Before you could finish that thought, OOM
killer was killing off programs due to memory pressure. WTF? The system is
swallowing RAM to cache I/O, but couldn't spare a drop for programs? ...I
could go on, but simply put, RHEL6 was garbage. And really I mean the RHEL6
kernels and the way that Red Hat tweaked whatever they did for it.

RHEL7 was a little better, but still seeing echoes of the ugliness of RHEL6.
RHEL5 was just a faded pleasant dream.

The last 3 Fedoras on the other hand, the memory management seems like we're
finally digging ourselves out of the nightmare. That nightmare lasted almost a
full decade.... sheesh

------
diamondo25
Although I am not a fan of Apple, their Memory Pressure messages on iOS are
really useful to prevent this. You get warned as a programmer to clean up your
memory if possible, otherwise iOS shuts you down.

~~~
macdice
Also AIX has SIGDANGER (best signal name ever). There is also "oomd", I think
like the iOS thing, for Linux, in user space. Not sure how useful any of these
things really are but wondering if FreeBSD should get one...

------
herpderperator
What am I missing here? Why can't, as others have mentioned in other HN
comments elsewhere, the OOM killer just get invoked when there's less than X
amount of RAM left, and kill the highest-offending process? In my case, I
would prefer that to anything else. Why does this page or that page matter?

~~~
the8472
Because you're not truly out of o memory at that point. there are still pages
that can be evicted. to invoke the oomkiller sooner you have to use heuristics
to determine when a request could be satisfied in theory but not practically
due to page thrashing.

The oom killer only kicks in sometimes, e.g. when programs make truly
egregious allocation requests.

The benefit of having swap is that it turns things into a soft degradation
since it's much easier for the system to start with swapping out rarely used
pages. The gradual loss of performance makes it easier for the human to
intervene compared to the cliff you encounter when it starts dropping shared
code pages.

~~~
wahern
> The oom killer only kicks in sometimes, e.g. when programs make truly
> egregious allocation requests.

This is a myth. Allocation failures happen at least as much on small
allocations as on big ones. In fact, I see OOMs every day and the vast
majority of the time the trigger was a small allocation. For example, the
kernel trying and failing to allocate a socket buffer.

And that's really the root of the issue. You have a giant application with a
200GB _committed_ working memory set doing important, critical work; and it
gets shot down because some other process just tried to initiate an HTTP
request. It's a ludicrous situation. And people defending Linux here by saying
the same problem exists everywhere else are wishful apologists--the situation
is absolutely not the same everywhere else.

Even setting aside the issue of strict memory accounting--which, BTW, both
Windows and Solaris are perfectly capable of doing, and do by default--Linux
could still do dramatically better. Clearly there's some level of
unreliability people are willing to put up with for the benefits of
efficiency, but Linux blew past that equilibrium long ago.

~~~
the8472
E.g. == for example, other cases are permitted. What I am saying is that it
only kicks in under some circumstances, not necessarily when one wants it to.

------
trhway
i routinely see these days systems without or with very low swap. It is like
swap has become faux pas. That is especially strange giving the SSD drives
available on the machines. Gradual degradation of service vs. the service
sudden disappearance and/or stall or heavy overprovisioning and still ...

Also comes to mind - while not generic swap - kind of edge case, a modern
version of swap, ie. extending virtual memory space onto flash storage -
Facebook replacement of some RAM with NVM [https://research.fb.com/wp-
content/uploads/2018/03/reducing-...](https://research.fb.com/wp-
content/uploads/2018/03/reducing-dram-footprint-with-nvm-in-facebook.pdf)

~~~
IshKebab
Yeah because swap on Linux doesn't really work. When it is needed your system
grinds to a halt.

~~~
imtringued
I didn't enable swap for the first 6 months, every time I ran out of memory
the system froze. Now I use swap and only the offending applications do
actually freeze. There is a small problem though, one application in
particular permanently captures the damn mouse in fullscreen mode so if it
freezes I can't interact with the rest of the system with the mouse but this
is purely a bug in the application, not a problem with the operating system.

~~~
majewsky
Well, it's more of a design problem of X11. Under Wayland, the compositor
could offer a Secure Attention Key like Ctrl-Alt-Del that reclaims input focus
from a fullscreen application. As far as I'm aware, such facilities do not
exist in X11. When an application grabs the keyboard, it's grabbing the
_entire_ keyboard.

~~~
ratmice
Yes, the problem is that X11's login/lock screen is secured by grabbing the
keyboard and mouse, taking away input focus from the login screen in many
cases will allow you to bypass it entirely.

------
anticristi
I feel that the problem is ill-posed: "At what amount of free RAM should we
start freeing RAM by killing processes?"

Maybe one should solve the problem: "At what eviction-induced latency should
the OOM killer be invoked." Thanks to the infrastructure around latencytop,
the latency might be available already.

Of course, the never-ending dilemma of what process to kill is still there.

~~~
AstralStorm
Preferably none. Saner APIs like Android will call low memory signals and
start swapping out applications both ahead of time and as needed. It also has
a relatively smart memory manager to detect unexpected vs expected spikes as
well as slow leaks.

POSIX has no such API. It was designed in a simple time.

~~~
anticristi
Can you point to documentation on how Android deals with low-memory
situations? My understanding is that: (A) They moved the problem to user-
space. (B) Apps are supposed to constantly save the state they care about and
expect to be killed whenever Android sees fit (or even at every app switch for
developers).

------
not2b
The particular issue of locking up a system because there are too many tabs
open in Firefox or Chrome should be fixed in the browser, because the browser
is the program that is taking up all of the memory and is in the best position
to know how to recover a lot of memory quickly while minimizing the loss of
work for the user. One of the comments suggested having the kernel send a
'memory pressure' signal to all processes, and a process could catch the
signal and do garbage collection or drop other storage. But even without such
a feature, a program could do polling to determine that there is memory
pressure, or do a check when a user asks to open a new tab, etc.

------
amyjess
My experience for a long time has been that Linux is only usable with
earlyoom. Running without it is a guaranteed way to end up having to power-
cycle my system, often multiple times per day.

------
known
Unlike traditional swap, Android's Memory Manager kills inactive processes to
free up memory

~~~
AstralStorm
That's because the whole Android API and ecosystem was designed to make it
possible.

You cannot get away with that and POSIX applications. They have zero or bad
session support. There's no global always available database to help you
(Android has 4), nor event scheduling bus. (Dbus is a joke, it does not allow
sending event to non-existing endpoint to be delivered later.)

Every application using the standard mechanisms will get relaunched and
activated as needed if it got shut down. It will receive a Bundle with state
it managed to save and with original launch Intent. It can access an sqlite
database to be made available via a restartable ContentProvider.
SharedPreferences are also stored. Etc.

In Linux world, there are no such de facto standards and what is most common
is utterly broken. Same in Windows (only registry is persistent, it's not
meant for data storage) and in OS X.

------
mighty_bander
I managed to achieve this once, fresh from the world of Windows - missed
breaking out of a loop while exploring Python(?) on Ubuntu 10 or so. Had to
power cycle my damn machine.

