My hunch is that the OS is swapping stuff back in stupidly. Once memory is available, I'd like it to page everything back proactively, preferring stuff from swap and then from file-backed mmaps. But instead it seems to be purely reactive, each major page fault requiring a disk seek to page in what's needed with little if any readahead. Basically the whole VM space remains a minefield until you stumble over and detonate each mine in your normal operation. Much better to reboot and have a usable system again.
On my Linux systems, I've turned off swap.
On OS X...last I checked, I wasn't able to find a way to do this. I'd like to turn off swap entirely, or failing that, have some equivalent way to force all of swap to be paged in now so I don't have to reboot when I hit swap. Anyone know of a way?
20 years ago on Windows 98 it just started swapping, but it was no big deal. If something became too slow to be usable, you could just press ctrl+alt+del and kill that swapped program and everything worked fine afterwards.
On the other hand, my modern linux laptop, it starts swapping, and it swaps and swaps and you can do nothing, not even move the mouse, till 30 minutes later something crashes.
Default settings for dirty ratio and dirty background ratio exacerbate the issue: more data is held onto before it is written, and once the background ratio is hit, any application writing to disk will block.
>A typical reference to RAM is in the area of 100ns, accessing data on a SSD 150μs (so 1500 times of the RAM) and accessing data on a rotating disk 10ms (so 100.000 times the RAM.
Personally, I've never given VMs swap. I'd rather have memory pressure trigger horizontal scaling (or perhaps vertical rescaling, for things like DBMS nodes) than let Individual VMs struggle along under overloaded+degraded conditions.
There can be no consensus because there is no one answer.
On Windows without swap when you hit a remotely low on RAM point, things start going really poorly for some reason - random latency. So with 16 GB of RAM even I can't disable swap on Windows without some really strange performance characteristics, I run SSDs so I really wanted it off and I just stuffed more RAM in my box - with 32 GB it isn't a problem.
On Linux however, you can pretty much turn it off and everything will run smooth until you're actually out and then you lag badly briefly, Linux's oom-killer does its thing and all is good again within the span of a few seconds.
Not sure what you're referring to here. This story doesn't recommend eliminating swap...
So, it doesn't exclusively recommend it, but it concedes that there are use cases where it makes sense.
* On a laptop to hibernate, which results in zero power consumption vs suspend which will drain the battery in a day or so
* I use tmpfs for /tmp and using swap as the backing is far more performant than regular filesystems
This seems absurd. You're running an in-memory filesystem backed by memory-on-disk? You weren't comparing to a journalled filesystem or something like that?
Once a server hits swap, it's dead. There is no recovering it other than for exceptional cases. If you are swapping out, you've already lost the battle.
I tend to configure servers with 512MB to 1GB swap simply so the kernel can swap out a couple hundred MB of pages it never uses - but that's really more to make people feel better than it really being useful at all.
A system backed by an SSD does degrade more nicely, though. The system visibly slows down but doesn't go to outright unresponsive like it does on a hard drive. You can make a case for letting that happen and having human intervention select the processes to kill, rather than letting the kernel do it. So, even though it still isn't really useful as an extension of RAM, it can still be useful in recovering from systems that you've run yourself out of memory on. Since putting an SSD in my systems I've actually gone back to running with some swap space. Though the fact I like hibernation sometimes is also a reason I run with swap in Linux on my laptop.
[1]: Swap will almost certainly completely blow out the buffers on those things and you'll be stuck with the raw hardware write speeds pretty quickly.
Is there any way to tell the OOM killer which program to kill first?
The fun OOM analogy [1] that comes up when people propose different OOM killer designs:
> An aircraft company discovered that it was cheaper to fly its planes
with less fuel on board. The planes would be lighter and use less fuel
and money was saved. On rare occasions however the amount of fuel was
insufficient, and the plane would crash. This problem was solved by
the engineers of the company by the development of a special OOF
(out-of-fuel) mechanism. In emergency cases a passenger was selected
and thrown out of the plane. (When necessary, the procedure was
repeated.) A large body of theory was developed and many publications
were devoted to the problem of properly selecting the victim to be
ejected. Should the victim be chosen at random? Or should one choose
the heaviest person? Or the oldest? Should passengers pay in order not
to be ejected, so that the victim would be the poorest on board? And
if for example the heaviest person was chosen, should there be a
special exception in case that was the pilot? Should first class
passengers be exempted? Now that the OOF mechanism existed, it would
be activated every now and then, and eject passengers even when there
was no fuel shortage. The engineers are still studying precisely how
this malfunction is caused.
[1] https://lwn.net/Articles/104185/
From TFA:
>Without swap, the system will call the OOM when the memory is exhausted. You can prioritize which processes get killed first in configuring oom_adj_score.
The linked solution document is only available to registered RH users, though, and the name is actually oom_score_adj and not oom_adj_score.
`man 5 proc` has details, but tl;dr is set /proc/<pid>/oom_score_adj to -1000 to make a process OOM-killer-invincible.
By default, it'll start killing processes when free memory drops below 10%, though you can configure the threshold. I had the same problem for years, and then I started using earlyoom and I don't have to deal with it anymore.
Use earlyoom instead of relying on oom-killer.
https://github.com/rfjakob/earlyoom
To quote from the description:
> The oom-killer generally has a bad reputation among Linux users. This may be part of the reason Linux invokes it only when it has absolutely no other choice. It will swap out the desktop environment, drop the whole page cache and empty every buffer before it will ultimately kill a process. At least that's what I think what it will do. I have yet to be patient enough to wait for it.
[...]
> This made people wonder if the oom-killer could be configured to step in earlier: superuser.com , unix.stackexchange.com.
> As it turns out, no, it can't. At least using the in-kernel oom killer.
And earlyoom exists to provide a better alternative to oom-killer in userspace that's much more aggressive about maintaining responsivity.
If RAM and DISK are the same, then writing a file system is just writing an in-memory tree. No need to pull data from the disk, just navigate the tree in your program's memory and pull the blob data out. Want to persist acorss reboots, protect against power outages, or save user settings? Just set a variable and it'll be there.
The benifits are much better then the costs.
[0] - https://web.archive.org/web/20031029002231/http://www.eros-o...
Frank Soltis' book is recommended reading: https://www.amazon.com/dp/1882419669/
Note that EROS is not providing a write-through cache. It's providing a write-back cache using checkpointing coupled with a journalling capability and ability to explicitly sync data.
So it's leaky: Your application needs to know that it needs to structure it's writes to memory so that they will make sense if the system comes back up with some of the data missing, and needs to know how to use the journalling functionality.
It can't just act as if it's running in RAM forever.
https://en.wikipedia.org/wiki/MUMPS
Setting data in memory is the same as setting data on disk, the only difference is the name of the variable:
s X=1 ; store 1 in variable named X, in memory.
s ^X=X ; store 1 in variable named X, on disk.
s X=^X ; load disk to memory
Now, desktops can have 32 GB of RAM, but everyone just uses it to run Chrome.
That was different in the early days, but that was because people accepted worse performance (GC that stops the world for seconds can be better than no GC, even when running a GUI).
Certainly nowadays, if you take out half the RAM, you will want to take out half the processes, too.
E.g., several large processes sleeping in memory on desktop would be fine if only one or two used at the same time. OTOH, clustered nodes well tuned for a single task may not need a swap.
In any case, it is a metric for thrashing that should be used to initiate culling.
it just slows my system down to a crawl, requiring me to force a reboot
it probably depends on your hardware
and if i disable the pagefile, windows update stops working and at 75% memory usage it starts panicing and closing programs
I missed the mention of zram. Zram can create ramdisks, and compress them. It can create a compressed swapdisk in ram, basically making your ram last longer in case you really run out of memory. In my experience that is a good alternative to having a bit of swapspace as reserve, as the article recommends.
1. Swap is slow
2. If using swap, your system starts to thrash
3. If thrashing, you can't close programs to free memory
4. If you can't close programs, you have to wait until the task is killed by the OS
5. If you have no swap (or very little), you don't have to wait.
Except with an SSD, swap isn't slow enough to cause that issue.
So really this article only seems to apply to servers, not desktops.
Though it tends to mean you're boned, or going to be waiting a while while all i/o is dedicated to swapping for minutes at a time.
I've found myself wanting to upgrade it to 32G ram, but honestly that's about the only use case (besides production servers) where I would ever consider swap, and at that point I consider it a problem of not enough memory rather than swap being necessary.
Both changes have made my computers much more usable. Systems should designed to fail fast when memory is low instead of slowing down.
[0] https://github.com/rfjakob/earlyoom
