
In defence of swap: common misconceptions - rwmj
https://chrisdown.name/2018/01/02/in-defence-of-swap.html
======
gok
> On SSDs, swapping out anonymous pages and reclaiming file pages are
> essentially equivalent in terms of performance/latency

…but very much not equivalent in terms of how much you are wearing out your
SSD.

> Swap can make a system slower to OOM kill

This is almost always actually a bad thing. Turning on swap means your system
goes from randomly killing processes to suddenly running 100x slower than it
used to and _then_ randomly killing processes.

~~~
pmoriarty
Yeah, but in the meantime you might actually notice that your system is
slowing down, investigate and possibly fix the issue or intelligently kill the
right process.

~~~
mrob
In practice, everything slows to an unusable crawl, and you hit the reset
button because it's the fastest way of regaining control. IMO, earlyoom or
similar is essential for general desktop use where memory load is
unpredictable. Better to lose one process than lose all of them.

"The oom-killer generally has a bad reputation among Linux users. This may be
part of the reason Linux invokes it only when it has absolutely no other
choice. It will swap out the desktop environment, drop the whole page cache
and empty every buffer before it will ultimately kill a process. At least
that's what I think that it will do. I have yet to be patient enough to wait
for it, sitting in front of an unresponsive system."

[https://github.com/rfjakob/earlyoom](https://github.com/rfjakob/earlyoom)

~~~
kalleboo
I guess Linux is different from macOS since I just had this situation last
week. I wrote a quick app to generate a graphic and let it run in the
background, noticed my machine was feeling a bit sluggish (but still totally
usable for web browsing etc) and noticed my app had a memory leak and was
using 100 GB of RAM and climbing on a 32 GB system. I could easily quit the
app using the regular GUI.

NVMe SSDs are fantastic.

~~~
gok
macOS has also has Jetsam (like the Linux OOM killer but much more aggressive)
and swap-to-compression, so you're not as often swapping to disk as you'd
think.

~~~
kalleboo
My app was leaking uncompressed 12 MP photo buffers and the swap file was over
70 GB, so in this case it was definitely swapping to hell

------
clarry
I say pfft to this. The article makes a poor case. None of this matters if you
have enough memory. Hence, calling swap "emergency memory" (or poor man's ram)
is perfectly justified.

If you can honestly answer "How much swap do I need, then?", and the answer is
not infinite, then you can just add that amount of RAM instead of swap and be
done with it, no need for swap, unless you want to prepare for emergency.

"Ideally you should have enough swap to make your system operate optimally at
normal and peak (memory) load." Ideally you should have enough memory to make
your system operate optimally at normal and peak memory load. In a perfect
world, everything would fit in CPU's SRAM and you wouldn't even need DRAM..

~~~
cesaref
Agreed, if you can size it, you can specify the right memory for a host for an
app. There are plenty of situations where you've got dedicated hardware and a
suitable budget and you can do size the hardware correctly.

If however you have something like kubernetes, your apps move across hosts,
you have memory scaling with use (e.g connections) and you have no idea what
the load on the app will be, it becomes basically impossible to right-size the
memory so other strategies come into play. Monitoring becomes king at that
point.

~~~
throwaway2048
If your workload cant fit in memory, without special measures such as optane
swap storage you will have such crushing unresponsiveness even with something
like SSDs that it is entirely not worth the hassle in 99% of cases, no matter
what your budget is.

------
pmoriarty
I have a pretty old, slow laptop on which I've been running Gentoo for a long
time, so I'm frequently recompiling a lot of software. Over the years, as
software has gone from being bloated to being insanely bloated (qt-webkit,
firefox, and rust: I'm looking at you!), the "only" 8 gigs of memory that my
system has just sometimes isn't enough, so adding swap has let me stretch the
life of my system.

Sure, huge compiles are much slower when they hit swap, but at least it's
possible to run them with swap, while it would be impossible for me to do so
without.. at least until I can afford to get a new computer which can fit more
memory.

~~~
gen220
Surely, the solution is to acquire a _whole bunch_ of old, slow laptops and
hook them together as a build cluster! :)

Old thinkpads (2012ish) run for a hundred bucks and some change, and some of
them support up to 32 GB of ram, IIRC. The battery is a generic and
replaceable component, too. They can last a lifetime! Or at least until FF et
al renders 32 GB insufficient.

~~~
capableweb
Now I don't know pmoriarty's situation at all and don't want to assume
anything either, but, if someone has a "pretty old, slow laptop on which I've
been running Gentoo for a long time" without upgrading, there is surely a
reason for that. Suggesting that "hundred bucks" is something you can spend
just to get faster build times might work for a well paid developer in the 1st
world but many developers are not well paid so cannot afford the luxury of
upgrading their machine willy nilly like that.

While upgrading for money works for some, less bloated software would work for
everyone + no need to spend money to get acceptable build times + even if you
have a super computer, you compilation times gets a lot faster. So aiming for
less bloated software feels like a more noble goal than trying to convince
people to upgrade their hardware.

~~~
gen220
Hey, I'm sorry, I didn't mean to come off making assumptions or prescribing
advice. My intent was to illustrate that "new computer" doesn't have to mean
"shiny, expensive, newly-manufactured computer". On reflection, my words do
reflect my biases that investing one-to-two hundred dollars in such a setup
wouldn't be a cause of much concern for me, and I regret that I assumed
likewise for the readers.

I totally agree with you that less bloated software should be our goal.
Although, realistically, if you're compiling a modern web browser you don't
have much of a choice in the matter (unfortunately). I don't think that
reducing bloat by an order of magnitude (i.e. a difference that would impact
compile times) is on the radar of Chromium or FF.

------
altmind
While I find the question puzzling, i find the description lacking. The whole
explanation of anon maps is redundant, since the behavior of anon maps seems
to be the same as the behavior of plain memory - its evicted on low memory
conditions with LRU behavior and there is no anon maps specifics mentioned.

To recap: swap makes anon memory pages reclaimable, by offloading it to disk.
This does not explain anything. Its basically "swap allows anon maps to be
swapped"; so what?

>> You need to opportunistically handle the situation yourself before ever
thinking about the OOM killer.

Or, just rely on OOM killer, minimal swap(so the application dont agony in low
memory conditions) and restart policies. There are no reason to do the kernel
work handling memory over-commit.

~~~
lostdog
As soon as I figure out how to get the OOM killer to just kill Chrome every
time, I might never have to hard reset my system again!

~~~
hinoki
I’ve case you are serious, replace chrome with:

    
    
      #!/bin/bash
      echo 1000 > /proc/$$/oom_score_adj
      path_to_real_chrome

~~~
mappu
What's the maintainable way of doing this, that can survive chrome package
updates and also handle http URIs opened from other applications?

~~~
steerablesafe
How do you typically start chrome? I guess you could just change the .desktop
file somewhere in $HOME, system updates don't touch that.

------
kevin_nisbet
Another aspect that plays into this, can be interactions with GC'ed languages,
and particularly longer than expected pauses that appear to be caused by page
faults and thrashing when in a critical GC mark phase. IIRC this can even
become apparent in non-blocking sections of the GC, because the GC runs out of
available segments while still running the mark and thrashing paged out
memory. I do believe all of the systems I've personally observed this
behaviour on predate kernel 4.0 though, and didn't have SSDs, so I'm not aware
of whether it's less pathological with GCed runtimes post kernel 4.0 and
whether SSDs would be fast enough to avoid the blocking. It probably also
depends on the allocation rate of the application.

So I have recommended disabling swap on systems in the past, but inline with
the article, these are on systems that are largely dependent on application
memory, and don't benefit from the IO cache prioritizing any application
pages. As the article is pointing out, this isn't a hard and fast rule, but a
tuning that needs to be done depending on the application, and using cgroups
to fine tune this may be a better approach depending on the use case.

------
DecoPerson
That was a lot of words to say “Swap allows us to push infrequently used pages
to disk during memory usage spikes to avoid triggering the dropping of useful
file pages, or worse, the OOM killer.”

~~~
jpitz
Tfa even bolds the initial takeaway: "Swap is primarily a mechanism for
equality of reclamation, not for emergency "extra memory". Swap is not what
makes your application slow – entering overall memory contention is what makes
your application slow."

It is a mechanism for making more page types reclaimable.

------
colejohnson66
I’m still confused on swap. What’s the practical difference (besides speed)
between 16 GB or RAM and 16 GB of swap versus 32 GB of RAM and 0 swap?

~~~
NikolaeVarius
16 GB of SSD space is cheaper

~~~
colejohnson66
But if I upgrade to 32 GB or RAM, would I even need the swap? I guess my
question is: why is swap not dynamic like pagefile.sys on Windows? The
pagefile grows and shrinks as it’s needed. But a swap partition doesn’t; it’s
size is set at creation.

~~~
SomeoneFromCA
Yo do not need a partition, you can use a file.

~~~
dTal
True. It's still a fixed size.

(It's also slower. The reason we use partitions is to avoid invoking a whole
heap of filesystem code every time we want to access some memory.)

------
westmeal
One of the scenarios not mentioned is if you have a hilarious amount of RAM
and little diskspace. I run with 1 GB of swap, VM swappiness set to 0 and 16gb
of ram. I don't think I've ever gotten close to exhausting mem or even
invoking oom. What real difference is made here? Is it just something I don't
understand?

~~~
kevin_nisbet
It really depends on the application and use case, but with lots of headroom
there likely isn't a material difference (IE the application only uses 128MB
of RAM, you're just not going to see it).

Where it would come into play in your use case, is say you have an
application, that serves as an application, and can also serve files from
disk, like a web app. Say the running application uses 8GB of RAM and the
files on disk is 10GB. On your system with 16GB of RAM, the IO subsystem in
the kernel will use leftover RAM as a cache on disk access. So even though
your web server sends files from disk, in most cases it'll leave a copy in
RAM, and then can just serve the file from RAM without waiting on the disks
which are relatively slow. Our total dataset is 18GB though, so everything
we're doing doesn't really fit in RAM with application + files usage.

This is where swap comes into play. Even though the application uses 8GB of
RAM, maybe some of it doesn't get used very often. Where as the files on disk,
the total 10GB get used almost constantly. What the kernel can do, is say, 2GB
of the application, even though it's in use, we don't see the application ever
using it. Maybe it's just some buffer or house keeping data, so it's not used
frequently. We'll swap that out to disk, and our IO cache is now a little bit
bigger, allowing it to speed up access to that 10GB of files always being used
on the disk, speeding things up.

When the application does need that memory, the kernel will bring it back into
main RAM, using some free memory that's maintained, or reclaiming some memory,
which is slow. Instead of the RAM being instantly available, it has to wait
for the kernel to do its work, which leads to engineers discovering in certain
limited cases, if they disable swap, their applications get faster, since it
doesn't block waiting for the kernel to do it's work in those cases.

If you don't have resource contention on the size of the application in RAM,
and the amount of frequently accessed files / IO rate, you're not going to see
any performance difference.

* The defaults swappiness setting on many distros can also be a little aggressive, which I suspect as lead to alot of these disable swaps. Although, as the article points out, this changed linux 4.0+, but I'm not familiar with the changes being referred to here.

~~~
westmeal
Thanks so much that helped me

------
raincom
I had a production server with 512G memory with 0 swap. It was running some
ldap server. Every week, this ldap server needs a restart due to some
slownesss. Adding swap has cured it.

~~~
nineteen999
512G for an LDAP server?? Surely you mean 512M.

~~~
raincom
Yes

------
nwmcsween
IMO swap should be a found using fio with a rw test using 4-16k (Linux does
16k reads on swap req due to vm.page-cluster) threaded to the number of
processors on the system * the amount of time you're willing to let the OOM
thrash around. The swap should also use zswap to compress in ram and shunt
uncompressable to swap.

------
SomeoneFromCA
No one mentioned zswap. It is awesome, if you have at least semimodern CPU. It
feels like you have 15% more ram than you actually do.

~~~
dTal
Not just on a semi-modern CPU. I used it on a Zipit Z2 (truly ancient PXA270
Arm SoC with 32 mb of RAM) that I was using as a Debian PDA, and it was a
godsend. Probably a little slower, but when you only have 32 megabytes of RAM,
that's really the bottleneck you care about.

(I don't think Debian will run with so little memory anymore.)

------
bsder
My biggest gripe is that nobody designs for _no swap_.

With flash file systems, swap is a Bad Idea(tm).

However, whenever I disable swap, all manner of Linux systems freak out
because they are used to never having to actually deal with _out of memory_
(having swap on Linux means your entire system becomes totally unresponsive
long before anything actually flags out of memory). Linux doesn't help this by
overcommitting memory that it doesn't actually have.

~~~
mappu
_> Linux doesn't help this by overcommitting memory that it doesn't actually
have._

As long as the POSIX fork/exec pattern is implemented with a CoW address
space, you can just write all over memory to cause new physical allocations.
There's no malloc ENOMEM return to check in this case.

(You can do this on Windows too via RtlCloneUserProcess and friends.)

You could "solve" this problem by implementing fork/exec to use copying
instead of CoW (like Cygwin and PDP-11 Unix), but I don't think anyone wants
that, especially because you'd usually throw away the work with exec() anyway.

~~~
wtallis
I think he's suggesting that the fork itself should fail if the subsequent
overwrite pass to defeat CoW would cause problems. So you would have trouble
actually directly using all of your RAM if you had some processes that don't
follow a fork with an exec—but that leftover physical RAM should still be able
to be used for the disk cache.

~~~
duskwuff
> I think he's suggesting that the fork itself should fail if the subsequent
> overwrite pass to defeat CoW would cause problems.

That'd put you in the unusual position of forbidding any sufficiently large
process from ever creating children. The kernel has no way of knowing whether
a fork() will be followed by exec().

~~~
hyperman1
If posix_spawn is a syscall, it would be a decent replacement.

