Swap is the only mechanism through which a non-mmapped page in memory can be swapped out, therefore if running without swap and there exists any ram which is accessed less commonly than the next-most-commonly-accessed area of disk currently not in cache, the memory utilization is suboptimal.
I've seen a lot of tools getting this wrong, even earlyoom which, AFAIK only measures memory and swap usage, but not the sustained rate of swapping. My computer can run just fine with all of it's swap used while ram is half empty.
Any implementation of overcommit is inherently unstable and susceptible to a "bank-run" situation, adding or removing swap will not change that. If you want a robust system, preallocate memory for everything and set limits via cgroups.
These are the esoteric and technical arguments that scare me into still using a swapfile and which maybe answer why I see my swap space being used when I have like 60 GB of RAM free (neither of which make any sense otherwise). :p
Simpler answer is: at one point stuff got put into swap to make room for a memory-hungry task, and it hasn't been referenced since, so no need to pull it back into RAM.
> if running without swap and there exists any ram which is accessed less commonly than the next-most-commonly-accessed area of disk currently not in cache, the memory utilization is suboptimal.
Memory that is swapped out is a small write operation, which generally is much more resource and wear intensive than a read; a program memory page and disk cache page are not equivalent.
Additionally, the swapped out program memory may be required again and cause an unpredictable delay in program operation; when a user has to wait for a menu to open while it is swapped back in that is suboptimal use of memory.
A modern operating system should have compressed memory rather than swap. Take the pages that would be swapped out for being rarely accessed, if they compress well then free the page and store it in an area for compressed pages. This will get most of the expanded cache benefit from swap without delays, wear, or possibility of the system grinding to a halt.
The alternative is what Android and iOS do - applications keep running on a best-effort basis and are summarily executed if physical RAM runs out.
SSD space is dirt cheap (unless you get it from Apple), I'm not losing any sleep over losing a few gigs to a swap file to keep that from ever happening.
Fun thing I've had to do on Android. When the app does a really memory intensive task (client side video encoding in this case) we forked that process and trapped the sigsegv fault from the oom in the parent process and simply restarted after a sleep. It fixed a huge percentage of app deaths for us.
The reason this worked is that Android starts killing processes only after you pretty much run out of ram. So if you have a task that needs a lot of RAM even though the phone has it if it just closed a few apps you can't use that ram until after you've hit the allocation fault and let a bunch of apps die.
Which is incredibly frustrating if you need to switch between your browser and another app and the browser keeps getting killed and has to reload every time you come back
This used to happen back when phones could only keep a few apps backgrounded at a time, but I haven't seen iOS kill the most recently used background app since the first couple years backgrounding was supported at all.
But then, apps in that environment were made for no swap. If you take away swap on a desktop system it could be another story.
I'd blame bloatware rather than the OS. As globular-toast's comment points out, using an SSD for swap could cause significant wear, i.e. shortening of the device's usable lifetime.
If applications made more efficient use of memory, the OS wouldn't need to terminate processes.
> SSD space is dirt cheap (unless you get it from Apple), I'm not losing any sleep over losing a few gigs to a swap file to keep that from ever happening.
I don't lose sleep over space on my SSD either, but I don't want thrashing either. As a user I get no real feedback on the memory usage of my applications until everything gets slow. Fortunately it's easy to configure Firefox to unload idle tabs, but it might be better if -at least for GUI apps- I could confidently configure no swap.
With swap it practically never happens unless you have a catastrophic memory leak somewhere. Write endurance isn't that much of a concern with modern SSDs, I just retired an SSD which had been used as my main system and swap drive for about 5 years, and it had clocked up about 80TB of writes in that time. Its rated write endurance was 600TB, and now it's not uncommon for drives to be rated for double or quadruple that amount.
I strongly dislike swap on servers. I can understand some use cases laptops and one-off situations.
I would much rather have an application get killed by OOM killer than swapping. Swapping absolutely kills performance. Not having enough RAM is a faulty state and swapping hides that from admins resulting in hard to debug issues. The OOM killer leaves handy logs, swapping just degrades your service and needs to be correlated with RAM usage metrics.
My experience is also that swap will be used no matter how low (or was it high?) you set the swappiness number if the memory throughput is high enough, even if there is enough RAM available.
We live in a world where you are charged per megabyte of RAM you allocate to your VM's. Sure, they have the occasional peak that lasts for a few seconds, but if you provision RAM for that peak it's costing you money. The cheap way out is to give it swap.
My rule of thumb is on an average load there should be no swapping, meaning that vmstat or whatever should be showing mostly 0's in the swap column. That doesn't mean it has 0 bytes in swap, in fact it probably is using some swap. It means nothing in swap is in the working set. For example, the server I'm looking at now is showing 0 swap activity, has 2GB of RAM and is using 1.3GB of swap. When a peak hits you will get some small delays of course, but it's likely no one will notice.
Swap (paging) can also help performance. It exists for a reason. Having metrics on paging helps you tune your application, so it is also good for observability. It a feature that can be misused, but it is not a good recommendation to turn it off without knowledge of the specific situation.
none of my laptops/desktop/gaming rigs or servers feature swap. Swapping along with a managed language and garbage collection is next to indistinguishable from a application crash.
it is, trying running stop-the-world type of a garbage collection along with page-in-out, etc.
>cache impacted by those applications’ file I/O
Which cache? The disk one, that depends on the available memory, with pretty much all free memory being a disk cache. In the cases of swapping, there is no disk cache left, effectively.
That is not how paging works. The swap area is also a cache in every sense of the word. And the kernel will swap out pages that clearly are accessed less frequently than another page, even if that page is the buffer cache. When used like that, swapping is a way of getting more disk cache.
Generally speaking, systems do not swap out pages only under memory pressure. That design would be ineffective. When memory pressure is high enough, you've already lost.
I can see the benefit to not having swap in a server scenario, but to offer a counterpoint- it seems like IT likes to under allocate servers by something like 25%, so if you have a server with 256GB of RAM, by design it should never use more than 192GB. That’s a lot of RAM going to waste for the off-chance usage jumps above 75-80%.
I think I would rather have the server’s SSD be an Optane drive (or some other high-endurance flash memory) with a swap partition, and use some other means of monitoring and being alerted to high memory usage.
A lot of swap on servers is bad. A little bit is fine though and can actually be helpful in certain scenarios. I've seen servers with 256GB of RAM configured to have <1GB of swap, so in case of a runaway process the swap fills up very quickly and doesn't delay the OOM killer, but still it helps the Linux memory manager to run more optimally.
A process randomly dying is a faulty state. Making malloc fail and having programs respond appropriately is far preferable to just randomly trashing processes.
One thing missing from this discussion is the cost of DRAM. It hit $4/GB in 2011, and hasn’t changed all that much since then. In contrast if you go back another 13 years, memory cost $1/MB in 1998.
Laptops have adapted to constant-price DRAM - 8GB was a decent memory size 10 years ago, and 16 is probably the equivalent today, with lots of swapping to compressed RAM using CPU cycles that get cheaper every year, and some swapping to flash that is 10-20x faster than the hard drives of 10-25 years ago. Servers not so much for various reasons, not all rational.
The lessons we learned in the decades when RAM was always getting bigger and cheaper every year don’t always apply nowadays.
If you're on a laptop, swap is where your hibernation contents go. (This is probably a bad idea, but that's what happens.) So you need slightly more than RAM, at a minimum, or you don't do hibernation.
I recommend not doing hibernation unless you absolutely need it.
I used to hibernate religiously in the Windows XP days. Now I don’t even have that option because my Linux distro of choice (and many other distros now) doesn’t enable it by default. Some popular Linux distros don’t even create swap by default nowadays. I think Ubuntu Server is one of them.
I don’t recall seeing an option to hibernate in Windows 11 either, but I may be mistaken.
Hibernation seems less essential on a laptop than a desktop, but I don’t miss it in either case. It’s very rare that a power outage makes me reboot normally on a desktop.
I also ensure hibernation is set up on my laptops, and systemd has this hybrid sleep-first-then-hibernate thing where I've got the hibernation set for 4H of sleep. As my Dell 9320 doesn't have "real" S3 sleep, just the s0ix "connected standby" thing (which uses a lot more power when idle, despite my best efforts) it means I still have battery left if I don't use my laptop for more than 3 days.
I disable sleep on Linux because sometimes the laptop would wake up and either overheat in a bag or just make the battery dead-flat. Both those outcomes are highly undesirable. Old Dell XPS so sleep and hibernate are supported.
I find boot times quick enough on modern laptops that shutting down and restart is fine.
If I want to leave session alive during the day I just lock it and leave it running. Hibernate is enabled for if battery gets low but I hardly ever manually hibernate.
> I find boot times quick enough on modern laptops that shutting down and restart is fine.
How mobile are you with your laptop? If I had to restart (i.e., lose all my xterms' state, etc.) every time I'd stopped using my laptop for a day or so it would totally kill my workflow
Hibernation success also depends on device driver support and so many devices don't correctly support hibernation, for example, by failing to come back to life after machine is resumed, or certain functions becoming broken due to software/hardware bugs.
This is one of the best parts about hibernation. Shutting down is saving state. Booting? The kernel starts booting as usual, entirely the same as a normal boot. It gets quite far along, has already loaded a bunch of drivers, and then it finds the hibernation state & loads that.
Where-as with most s3 and s2idle suspends, there really is a lot of system level bios support required to make things happen. Many systems just do it wrong, not to spec, or desktops often not at all. These are non-issues in hibernation.
To your point though, I did have a laptop on which the wifi would disappear every other hibernation. I'd tend to hibernate it, immediately wake it up, and hibernate again, so when I really turned it on it would be good to go. For a while I had been packing a USB wifi card because I hadn't put it together. So yeah I've seen issues. A kernel upgrade half a decade ago seemed to have fixed that, but yeah I guess it's some evidence of problems. Still, the amount of time I've spent trying to get systems to suspend has been long & sad & difficult, with little evidence of what's happening. Hibernation has been a pretty reliable & consistent tool that I feel like I can almost always rely on.
Which... seems quite wrong given I can put my laptop to sleep lol. But thanks for the pointer, even though I was aware LTT did the video on sleep it'd fallen out of my memory (pun not intended).
Edit: I think I partially solved it. Running powercfg /a reveals...
The following sleep states are available on this system:
Standby (S0 Low Power Idle) Network Connected
Hibernate
Fast Startup
The following sleep states are not available on this system:
Standby (S1)
The system firmware does not support this standby state.
This standby state is disabled when S0 low power idle is supported.
Standby (S2)
The system firmware does not support this standby state.
This standby state is disabled when S0 low power idle is supported.
Standby (S3)
The system firmware does not support this standby state.
This standby state is disabled when S0 low power idle is supported.
Hybrid Sleep
Standby (S3) is not available.
The hypervisor does not support this standby state.
(I have hyper V enabled.) So it does look like S0 sleep is the culprit.
Hibernation is great! It compresses with lzo (which should be a good bit smaller than memory size) and soon one will be able to use whatever they want from the kernel crypto apis to compress (lz4 seems to be a focus but maybe Zstd too?).
S3 and s2idle both require good bios support. Many desktops flat out won't suspend. But hibernate? To hibernate requires nothing. The kernel writes it's state as it shuts down, and a kernel when loading looks for hibernation state & loads it if found. The boot path looks normal, works normally, until a good way through the kernel initializing itself. This makes it so much more reliable & available, being not dependent on bios support, not requiring special handling.
Not losing any battery when hibernating is excellent. Just yesterday I turned on a laptop for the first time in a month, and it has a full charge & my full previous state. I love that so much.
I am curious how the kernel handles making space for the hibernation state. Having to dump main memory into some space feels like it has to create a lot of pressure. If swap is already being used, thats got to be quite the effort to hibernate. It seems like it should be failing to hibernate sometimes! Somehow though I've never hit any issues, never heard of any issues.
On many more efficient linux distros you can get away with a reduced or eliminated swap file. When SSDs first became popular capacities were relatively small, and I do remember not wanting to give up 8gb of a 120gb drive to swap that would be unused on a system with 8gb of memory. I setup a few systems that way and never had an issue. That said, you are setting yourself up for instability so I wouldn't ever do this on anything mission critical. Also, the balance of storage to memory is not the same in 2024 as it was 12 years ago.
Imho: if you 'need' swap, you're running the wrong software.
If an OS+apps' working set exceeds RAM, performance will slow to a crawl in any case.
Best (and only?) argument I've seen for having swap, is that in a gradually building out-of-memory situation (say, a memory leak), swap buys the admin more time to respond / diagnose the problem.
That is a valid reason. Use swap where it applies.
But for eg. desktop systems: pointless. Some app eating all the RAM? Crash right now, please! Then it's usually obvious which app was the culprit. As opposed to slowing everything to a crawl slowly over time & leaving the user to wonder what's going on.
There are plenty of one-off tasks that can cause a short, transient memory need. Killing anything due to this is generally a bad idea.
As an example: I run a small Hetzner cloud instance (for about $4.50/month) for quick prototyping when I am away from home. It has a nowadays-considered-wimpy 2GB of RAM. It also exposes a Jupyter server that I (or friends) can use for quick experiments from anywhere in the world.
Some time ago I added torch and torchvision modules for a quick size comparison on a couple of models (just number of trainable parameters) and "pip install torch" got OOM-killed before I realized that I have zero swap and added it (and could have deleted it after the install).
This is not a common example, but a general observation that while using swap in the production servers, running one thing in a highly predictable fashion (uhm, sure) might mask the problem you want to expose, swap is extremely useful for general-purpose computers that occasionally run things that cause memory use spikes and are not particularly sensitive on whether they take a few milliseconds or a few seconds. My 2c.
Swapping to an HDD used to be slow. Swapping to an NVMe device is incomparably faster. Swapping out the data which take longer to compute that to read from the swap when needed does make things faster.
But for the swap to work efficiently, you still need to have enough RAM to run the things you want to run right now. The slowing to a crawl occurs when the running apps try to touch pages all over the place, requiring very frequent swapping in/out. Then yes, it's time to terminate something and free up some space.
> If an OS+apps' working set exceeds RAM, performance will slow to a crawl in any case
That sounds like a misunderstanding how memory is allocated and this may be the root of it. There are many instances where this would be completely expected. The most trivial perhaps being an application that mmap() a file much larger than memory. That is a perfectly valid thing to do.
"You're running the wrong software" is exactly the kind of attitude that's frustrating about a lot of current developers. Instead of compatibility you'd rather just act like users are complete idiots.
I don't think are are right. I used GIS software with SSD, and it handle huge files with SSD as ram alternative (swapped). Maybe it is better to do this in a different way, but this definitely works.
Modern MacOS relies heavily on swap. It's not uncommon to see 2-3GB of swap in use at a time, even for a while after memory becomes available.
> Some app eating all the RAM? crash right now, please?
It's ambiguous which is using "all the RAM." If you have game in one window and browser in another, and you start seeing high memory pressure, which should crash? Or if you're building software and you've fanned out to dozens of processes building dependencies concurrently but all using very little memory independently, which ones should be OOM-killed?
Alternatively you can see processes that haven't been woken or taken focus in awhile and page their memory out to swap to release main memory for applications that need it, and avoid crashing or losing data.
I'm sorry, but you don't understand swap. It's useful to retain all kinds of memory in different scenarios, like cache, or windows that haven't been accessed in a while. It can free up fast memory, reducing contention and speeding up the system. And you can control snappiness to decide what and when things get swapped.
Swap is also used by the system's hibernate function to store a copy of RAM to disk.
If you have 64GB of RAM and no combination of your apps, cache, etc will ever approach that, then probably nothing will ever swap. But for more limited systems or when running apps with higher memory pressure, swap does increase system performance.
Even in the 64gb case you're still losing out since the OS will use free memory as file cache.
No swap just means you think everything ever allocated should always stay in ram. In pretty much every system there's going to be a few GB that were allocated but never really needed. I'd rather that not be true but since it is I'd like that stuff paged out to give the os more file cache.
in HPC systems, especially in bioinformatics, astrodynamics or fluid dynamics, if you don't have swap, you are asking for trouble, mostly cuz the code has been written by graduates of each field who can't program
At least in my experience, 99% of the time that I run out of physical RAM, it's because of some runaway process that's gobbling up as much as it can for no real gain, or otherwise attempting to perform a computation that could only complete with an astronomical amount of memory.
After the offending process is killed, every other useful process has had all its memory swapped out, since the offending process was so urgent in its requests for memory. Then, since the system is so lazy at swapping memory back in, each process has degraded performance for the next several minutes of usage. So as an ultimate fix, I have a shell script on each of my machines that does "swapoff -a; swapon -a" to swap everything back within a minute.
My own experience makes me wonder, what RAM configurations are people running where they reportedly find swap so useful? Is everyone else running so many useful applications simultaneously that they use almost all their RAM?
On one hand, it’s inefficient to have more RAM than you’ll ever use. So many people buy MacBooks with the largest RAM capacity and while that worked well in the days of 16GB being the max, I’m not sure 64GB of RAM is going to come in handy even 5 years from now. RAM doubles because “marketing”, but RAM requirements for programs hasn’t really increased, certainly not proportionally.
In cases where going from 32GB to 64GB adds $400+ to the price of my computer (cough Apple cough) I think it’s worthwhile to have swap and deal with occasional (if any) performance penalties.
I think there are two related questions that aren't strictly "swap". To discuss the question, you need to generalize beyond "swap" to all file-backed pages. Under pressure to reclaim space the kernel might evict any page that is backed by a file, but people who want to "disable swap" don't want this either. So if you are against swap you need to make sure that anything that came from a file including your programs and libraries are locked into memory and can't be evicted for reclaim.
The other related issue is whether mmap is obsolete. It is, but most application authors haven't gotten the message yet. Extending your address space over a bunch of files and making the kernel figure it out doesn't work optimally on modern hardware.
If you are just trying to get along and make your program work, there's nothing morally wrong with mmap, but it is far from optimal and optimality is where the interesting discussions happen. Making the kernel deal with your magic page tables, and asking it to infer what to evict, what to read ahead of time are not the best approach. The page fault cost did not matter much when they invented mmap, back when disk accesses took tens of milliseconds. Today it matters a great deal. Using `fio` on my system, mmap is the slowest way to access the SSD.
That's interesting. Why do you think they crashed? I wouldn't think a few gigs of swap would make that much difference unless there's something up with the OS code paths for a system with no swap.
Iirc OpenBSD will dump all its RAM to swap in the event of a crash, or something, so that's a strong argument in favor of it from a practical perspective. If you're the kind of person who likes to investigate those things in depth.
Okay, but the question is more interesting if we consider machines with >= 32gb, rather than models which have strangely small amounts of memory.
Edit: Also, TFA is clearly written with servers in mind, although I think the implications are interesting for e.g. your average laptop user installing Linux and wondering how much hard drive space to give up.
I used to have swap on MLC SSD 128Gb bought per $100 but do not remember year. I used it on a laptop with really heavy use of swap for web-browsing. The SSD stopped being functional after few years of such usage. It refuses to write anything but I can read my files having few KB/s somehow. I never watched SMART on this drive but I believe swapping killed it.
Do people still have a use for swap in modern server workloads? With everything running in containers on multi tenant machines it seems like it would be a nightmare to over subscribe ram available. Every misbehaving process will get killed by the oom killer when it hits its cgroup limit.
I just wrote some server provisioning automation for a datacenter. I originally didn’t include special handling for swap, as I didn’t think anyone would be using it, but I got some user requests. So now if they add swap, I set it up accordingly. It wasn’t common, and I don’t know if their app strictly required it or they were leaning on old habits, but some are still doing it.
We need virtual memory to avoid the consequences of memory fragmentation. If every process shares the same address space, you might be unable to start a process despite having plenty of free ram, because that ram is scattered all over the address space. Virtual memory allows the process to see a contiguous range of free memory even when the physical memory is fragmented.
You make assumptions that programs start and stop often and allocate memory multiple times at runtime. Both assumptions are not true for server software and programs that allocate one big chunk and manage memory on their own, like JVM.
I've seen a lot of tools getting this wrong, even earlyoom which, AFAIK only measures memory and swap usage, but not the sustained rate of swapping. My computer can run just fine with all of it's swap used while ram is half empty.
Any implementation of overcommit is inherently unstable and susceptible to a "bank-run" situation, adding or removing swap will not change that. If you want a robust system, preallocate memory for everything and set limits via cgroups.