Linux: Minimizing Memory Usage

yason · on Nov 16, 2009

Disclaimer: This is just my opinion, vaguely based on my experiences, and there are probably counter-examples that proove me wrong in certain circumstances.

Memory footprint is very much of a concern not because we wouldn't have memory -- we do -- but because we still have disks that maybe fast by the 90's standards. And all that bloat gets partially loaded to memory at some point or another. And offloaded to swap or discarded, and then again loaded when needed. Disks kill performance.

And if you do anything complicated with your computer you'll end up to the disks eventually, if not for anything else that Linux kernel favors I/O buffers by default so a big file copy can turn much of your running image dirty.

That's why it can take seconds to start up a Firefox or a half a minute to start Ubuntu. You can speed that up with hacks but it doesn't cure the underlying reason.

I think the magnitude of the problem is pretty much revealed in the blog post where RES-SHR for gnome-terminal is a whopping 9 megabytes. How the FSCK can a simple terminal possibly waste 9 megabytes of actual memory? Why the hell does my Nautilus process that is effectively doing nothing but display a few fancy icons on my desktop consume 12 megabytes of memory?

I'm not kidding. There's no reason we couldn't have a system that boots to full desktop in less than a second. There's no reason we couldn't have these programs take tens of kilobytes in size, eating at most hundreds of kilobytes of memory -- with the same user experience. We just need to scoop up all program data in one big disk read and then unleash our horde of 3GHz CPU cores at them.

I sometimes wonder how much faster a Linux desktop would be if it was compiled with -Os rather than -O2/-O3. The CPU speed isn't the bottleneck, a typical Ubuntu desktop does nothing CPU intensive. I suspect that even a high-level bytecode environment would probably increase the responsiveness of the system in spite of being interpreted because the program code would take less space and fit better in caches.

It's NOT all right to dismiss this as a non-problem just because we have loads of RAM.

old-gregg · on Nov 16, 2009

As I mentioned above, do not ignore the shared memory which gets reported as part of the resident size: yes, gnome-terminal uses 9MB of RAM, but 7MB of those are shared libraries like clib, glib, gtk, cairo, freetype - they are all used by the rest of the system and only a single copy of each is loaded, yet counted as part of the shared size for all processes that use it.

However, I do agree with your arguments: pushing data around, that's all computers mostly do: from one cache to another, from a hard drive to the controller cache then to disk buffers to application memory then to CPU cache and then backwards. Fewer bytes == faster.

Unfortunately for us, the progress here goes backwards and most languages people like to use are not capable of integrating into OS memory management and playing nicer with OS binary loader. VM-based languages like Java and Python will clone their entire VM for each started process, i.e. gnome-terminal rewritten in Java or Ruby will probably eat 20-40MB as opposed to 2-3MB it does now. And Ubuntu is encouraging this - they already have Mono-based Tomboy which launches an entire Mono stack all for itself and they have a couple of tiny Python-based applets to monitor printing jobs, wasting about 10MB each.

old-gregg · on Nov 16, 2009

First, he forgets about the sharing memory which needs to be subtracted from the resident number, since that memory is shared by multiple applications, so gvim eats not 27mb but only 14mb which is a very respectable number in my opinion.

Second, I just have gone through a long-awaited "Arch weekend" and finished building a fully functional ArchLinux machine.

It was definitely not "optimizing for the wrong thing". Arch allows you to pick system components you actually want, as opposed to Ubuntu's approach to installing everything you may need. It speeds up your boot/reboot/resume time and it does run about 30% leaner on RAM when it reaches feature parity with my Ubuntu installation.

jbert · on Nov 16, 2009

Sharing doesn't equate with resident, it equates with virtual memory.

You can have a process with 1GB of vm, of which 100MB is shared with other processes but which has only 10MB resident (hmm...am I wrong? are shared pages never evicted from RAM?)

You can use 'exmap' (if it still compiles) to pull per-page statistics from your kernel and apportion the memory cost of a shared page pro-rata to all the processes, look at what is resident and what is not, and get an "Effective Resident" figure for each process. (You can then look at the process breakdown by shared lib, and the shared libs by ELF symbol).

old-gregg · on Nov 16, 2009

jbert, you are not wrong, vmem includes everything. But shared libs have a much higher chance to be in the working set simply because multiple processes are using them. I doubt that clib or any of GTK ever get swapped, so it is safe to assume that on most non-starved systems "real" memory consumption = RES-SHR.

jbert · on Nov 16, 2009

I think I am wrong. I now think top's 'SHR' isn't the "total amount of virtual memory shared", it's "amount of RES which is shared". (Hmm...but my test could be conflating "RES" memory with "allocated" VM (memory which is backed by something, be it swap or RAM).

So you can indeed substract SHR from RES to get a feel for per-proc RES.

(But that 'feel' doesn't tell you how many procs that SHR is shared with, and worse, which ones. If firefox fork()d then all it firefox-specific code pages would contribute to SHR, but you'd really want to account for them in the "firefox application").

(I'd also definitely expect chunks of libc and gtk libs to not be resident, primarily because they're code pages containing only code paths not yet chased on this box. And only one app has used those pages, do you really want to call that 'shared'?)

easp · on Nov 17, 2009

Subtracting shared from resident gives a best case, I think, because the "shared" memory isn't necessarily shared.

The thing that drives me nuts is the people who freak out over Apache's memory usage because they add up the resident size of each worker. Then they go to the trouble of getting FCGI working with nginx or lighttp, all to save, at best a dozen MB or so, and at worst, to use more memory, because they start the fcgi workers in such a way that they don't share memory for things like the PHP opcode cache. Then they evangelize their ignorance.

aw3c2 · on Nov 16, 2009

Third, he applies a quote from Linus Torvalds about the Linux kernel to distributions (or programs).

Erwin · on Nov 16, 2009

64-bit made things worse, especially if using interpreted languages which often have convenient but complexly indirect structures.

E.g. creating this:

       d = dict((x,x * 42) for x in xrange(100000))

a dictionary with 100k items in Python, takes up 10996 kB RSS memory on my 64-bit system, but 5200 kB RSS on a 32-bit Python (same Python version). You've got your dictionary hash buckets, the integer objects etc, all separately allocated objects. The integers themselves are also 64-bit.

Of course, if I really cared about the memory in the above case I'd create an array or even a list that fit the key/value pattern; the array array.array('i', (x*42 for x in xrange(100000))) stores the same information in the maximally compact form increasing RSS by about 500k.

But given a production system with 32G of memory memory optimisation is rarely being made.

silentbicycle · on Nov 17, 2009

I ran some informal benchmarks in Lua to see how switching from 32 to 64 bit affected memory usage, and it seemed to take very roughly 1.3-1.4 times as much. There's other noise in the totals besides just doubling pointer size, though - padding in C structs due to pointer alignment, for example.

tezza · on Nov 16, 2009

You could still used the 32bit python on the 64bit system if you were very concerned.

rams · on Nov 16, 2009

BTW, Check out smem ( http://www.selenic.com/smem/ ) for a more accurate reporting of memory usage - saves you from having to make the RSS correction, etc. It's from kernel hacker and the main author of Mercurial, Matt MacKall.

aw3c2 · on Nov 16, 2009

Thank you very much for this link. That is great! Can't believe I never knew about it.