
Linux: Minimizing Memory Usage - r11t
http://jjinux.blogspot.com/2009/11/linux-minimizing-memory-usage.html
======
yason
Disclaimer: This is just my opinion, vaguely based on my experiences, and
there are probably counter-examples that proove me wrong in certain
circumstances.

Memory footprint is very much of a concern not because we wouldn't have memory
-- we do -- but because we still have disks that maybe fast by the 90's
standards. And all that bloat gets partially loaded to memory at some point or
another. And offloaded to swap or discarded, and then again loaded when
needed. Disks kill performance.

And if you do anything complicated with your computer you'll end up to the
disks eventually, if not for anything else that Linux kernel favors I/O
buffers by default so a big file copy can turn much of your running image
dirty.

That's why it can take seconds to start up a Firefox or a half a minute to
start Ubuntu. You can speed that up with hacks but it doesn't cure the
underlying reason.

I think the magnitude of the problem is pretty much revealed in the blog post
where RES-SHR for gnome-terminal is a whopping 9 megabytes. How the FSCK can a
simple terminal possibly waste 9 megabytes of actual memory? Why the hell does
my Nautilus process that is effectively doing nothing but display a few fancy
icons on my desktop consume 12 megabytes of memory?

I'm not kidding. There's no reason we couldn't have a system that boots to
full desktop in less than a second. There's no reason we couldn't have these
programs take tens of kilobytes in size, eating at most hundreds of kilobytes
of memory -- with the same user experience. We just need to scoop up all
program data in one big disk read and then unleash our horde of 3GHz CPU cores
at them.

I sometimes wonder how much faster a Linux desktop would be if it was compiled
with -Os rather than -O2/-O3. The CPU speed isn't the bottleneck, a typical
Ubuntu desktop does nothing CPU intensive. I suspect that even a high-level
bytecode environment would probably increase the responsiveness of the system
in spite of being interpreted because the program code would take less space
and fit better in caches.

It's NOT all right to dismiss this as a non-problem just because we have loads
of RAM.

~~~
old-gregg
As I mentioned above, do not ignore the shared memory which gets reported as
part of the resident size: yes, gnome-terminal uses 9MB of RAM, but 7MB of
those are shared libraries like clib, glib, gtk, cairo, freetype - they are
all used by the rest of the system and only a single copy of each is loaded,
yet counted as part of the shared size for all processes that use it.

However, I do agree with your arguments: pushing data around, that's all
computers mostly do: from one cache to another, from a hard drive to the
controller cache then to disk buffers to application memory then to CPU cache
and then backwards. Fewer bytes == faster.

Unfortunately for us, the progress here goes backwards and most languages
people like to use are not capable of integrating into OS memory management
and playing nicer with OS binary loader. VM-based languages like Java and
Python will _clone_ their entire VM for each started process, i.e. gnome-
terminal rewritten in Java or Ruby will probably eat 20-40MB as opposed to
2-3MB it does now. And Ubuntu is encouraging this - they already have Mono-
based Tomboy which launches an entire Mono stack all for itself and they have
a couple of tiny Python-based applets to monitor printing jobs, wasting about
10MB each.

------
old-gregg
First, he forgets about the sharing memory which needs to be subtracted from
the resident number, since that memory is shared by multiple applications, so
gvim eats not 27mb but only 14mb which is a very respectable number in my
opinion.

Second, I just have gone through a long-awaited "Arch weekend" and finished
building a fully functional ArchLinux machine.

It was definitely not "optimizing for the wrong thing". Arch allows you to
pick system components you actually want, as opposed to Ubuntu's approach to
installing everything you may need. It speeds up your boot/reboot/resume time
and it does run about 30% leaner on RAM when it reaches feature parity with my
Ubuntu installation.

~~~
jbert
Sharing doesn't equate with resident, it equates with virtual memory.

You can have a process with 1GB of vm, of which 100MB is shared with other
processes but which has only 10MB resident (hmm...am I wrong? are shared pages
never evicted from RAM?)

You can use 'exmap' (if it still compiles) to pull per-page statistics from
your kernel and apportion the memory cost of a shared page pro-rata to all the
processes, look at what is resident and what is not, and get an "Effective
Resident" figure for each process. (You can then look at the process breakdown
by shared lib, and the shared libs by ELF symbol).

~~~
old-gregg
jbert, you are not wrong, vmem includes everything. But shared libs have a
much higher chance to be in the working set simply because multiple processes
are using them. I doubt that clib or any of GTK ever get swapped, so it is
safe to assume that on most non-starved systems "real" memory consumption =
RES-SHR.

~~~
jbert
I think I am wrong. I now think top's 'SHR' isn't the "total amount of virtual
memory shared", it's "amount of RES which is shared". (Hmm...but my test could
be conflating "RES" memory with "allocated" VM (memory which is backed by
_something_ , be it swap or RAM).

So you can indeed substract SHR from RES to get a feel for per-proc RES.

(But that 'feel' doesn't tell you how many procs that SHR is shared with, and
worse, which ones. If firefox fork()d then all it firefox-specific code pages
would contribute to SHR, but you'd really want to account for them in the
"firefox application").

(I'd also definitely expect chunks of libc and gtk libs to not be resident,
primarily because they're code pages containing only code paths not yet chased
on this box. And only one app has used those pages, do you really want to call
that 'shared'?)

------
Erwin
64-bit made things worse, especially if using interpreted languages which
often have convenient but complexly indirect structures.

E.g. creating this:

    
    
           d = dict((x,x * 42) for x in xrange(100000))
    

a dictionary with 100k items in Python, takes up 10996 kB RSS memory on my
64-bit system, but 5200 kB RSS on a 32-bit Python (same Python version).
You've got your dictionary hash buckets, the integer objects etc, all
separately allocated objects. The integers themselves are also 64-bit.

Of course, if I really cared about the memory in the above case I'd create an
array or even a list that fit the key/value pattern; the array
array.array('i', (x*42 for x in xrange(100000))) stores the same information
in the maximally compact form increasing RSS by about 500k.

But given a production system with 32G of memory memory optimisation is rarely
being made.

~~~
silentbicycle
I ran some informal benchmarks in Lua to see how switching from 32 to 64 bit
affected memory usage, and it seemed to take very roughly 1.3-1.4 times as
much. There's other noise in the totals besides just doubling pointer size,
though - padding in C structs due to pointer alignment, for example.

------
rams
BTW, Check out smem ( <http://www.selenic.com/smem/> ) for a more accurate
reporting of memory usage - saves you from having to make the RSS correction,
etc. It's from kernel hacker and the main author of Mercurial, Matt MacKall.

~~~
aw3c2
Thank you very much for this link. That is great! Can't believe I never knew
about it.

