Is there any particular reason why Oracle DBAs are less likely to believe this? Perhaps it's because most of them grew up in legacy UNIX environments rather than Linux.
2nd is explaining virtual/resident set size.
EDIT: Thank you for all the helpful responses!
This is overly simple and there are a lot of nuances such as shared memory segments, etc.
can someone explain to me why so many distros, including "enterprise" server stuff, ship with /proc/sys/vm/overcommit_memory set to 1?
- Virtual memory is basically "abstract memory" that is linked (mapped) to RAM or HDD, this means that by accessing this "memory" you might actually be accessing the HDD and, because of this, larger than physical ram.
- Virtual set size is the allocated virtual memory (the above) to the process.
- Resident set size is the allocated physical (RAM) memory to the process.
- Shared memory is memory that is shared on multiple processes, meaning that if you have 10 processes using 10mb of resident memory each and have 2mb of shared memory each, the total of resident memory used is not 100mb but 82mb.
At a mechanical level, virtual memory is permission from the operating system to use addresses in your address space. It is so called because, as you point out, it allows us to separate the concept of "memory for a process" from "physical memory on a chip." The reason I further refine the concept is that allocating virtual memory does not allocate actual memory. Let's look at an example:
void* addr = mmap(NULL, 10 * 1024 * 1024, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
The RSS thus usually indicates the amount of heap and stack a process is using that is unique to it.
(the article doesn't have an anchor there... but you know what I mean)
The rest of that document shows how to use atop to monitor per-process memory usage and identify a memory leak.
> you must read a book if you want to understand it.
"Go read a book" is not a very constructive or helpful answer to a question that - as it turns out - could be answered with a brief comment.
EX: How does virtual memory get mapped to L1 cache?
Would they be better at their job if they did? Probably. But do they have to? I'm not so sure anymore.
More so, I think it comes more with modern languages that don't force you to manage memory anymore. When you don't need to alloc/free everything you're using, it is easy to get lazy. Factor in never generation who've never worked outside a managed memory language. There are minute details about retaining references, having multiple copies of the same data, or other leaks (file descriptors?).
Some things about htop that I could see as friendlier:
* Htop scrolls with the arrow keys, while atop uses ^F and ^B.
* Htop displays a reminder for some commonly-used commands at the bottom of the screen (e.g. that you need to press 'F1' for help), whereas in atop you have to remember the commands or look them up by pressing 'h' or '?'.
* Htop displays gauges for system-level activity. I think this is a bad tradeoff, though:
CPU | sys 0% | user 1% | irq 0% | idle 799% | wait 0% |
MEM | tot 7.8G | free 5.7G | cache 735.6M | buff 351.2M | slab 248.1M
The killer feature for atop is logging per-process performance data and reviewing it after the fact.
It represents a fundamental misunderstanding of how modern OSes work. That misunderstanding is not the problem; modern OSes are complex pieces of software, and most people shouldn't have to understand them. OSes should just work. The problem comes in when people who don't understand how they work get the itch to "improve" their system.
It would be interesting if someone more knowledgeable than I am were to do a write up explaining memory usage in OS X, Windows, and Linux; would be an awesome resource to share with curious tinkers who may be slightly misguided in their understandings of the inner-workers of their computers.
Windows Vista had an issue where copying large files would set off such a huge swap-storm that OS became completely unresponsive for several minutes. People gave all the same VM excuses then as well, but there obviously was something wrong.
… [periodically]'installd' begins using 100% of all 4 CPU cores, my fan goes full speed and the whole computer gets very hot. Seems to happen before Software Update checks for updates. I usually go to the terminal and kill the 'installd' process, which reduces the fan speed and heat to normal within a minute.
I wonder if this guy is also one that says you need to reinstall from scratch every few months to keep things working?
Ubuntu had or has the same issue with a deamon used for the graphical package-administrations-guis rebuilding an index (I forgot the name). They tried to help themselves with appropriate nice-settings, defused the issue, but on old machines you still need to move the cronjob to monthly to have a usable system.
You simply can't use the system properly when a background-process uses all ressources. And one normally want do something else than wait for the system.
I consider such behaviour a bug.
Agreed, but to add to your point, if your background process is taking up all the resources in my system, that defeats the purpose of being a background process.
One of the most annoying features of modern OSes is when some system process just decides to start going wild, eating memory and CPU. Often I find reinstalling is the only way to fix such things.
If your foreground performance is being impacted too severely (and I haven't seen this from installd, I just noticed and researched installd while removing Mac Keeper (malware) from my wife's laptop) then reboot. It's extreme, but it has the best chance of getting your processes shut down cleanly as opposed to a kill where you could nail a process in the middle of a state that really does not want to persist. Programmers are a lazy sort. They won't consider the effect of termination at each point in their program. You are hunting for bugs using your live system as bait if you kill a program.
In the end I just reinstalled my OS, restored files from Time Machine, and everything was fine. I never did figure out why it was misbehaving. I have had (once) a similar problem with spotlight. Fortunately there I knew enough to run lsof to find it had got snuck in an infinite loop on one particular mp3 file, which I just deleted.
However, my point (which I should probably have been clearer about) is that bits of OSes are known to just start going wild for no reason, and often killing them, and eventually reinstalling, is the only option.
so... how do reboots work on OS X ? on every *nix flavor I know, there's one command that just halts the damn machine and damn the torpedoes, and there's one command that does it more gracefully, by sending progressively harder-to-ignore signals to ~every process except init, ending up on SIGKILL (which is not trappable).
Not only that, but in some cases it is flat-out wrong:
No, disk caching only borrows the ram that applications don't currently want. It will not use swap. If applications want more memory, they just take it back from the disk cache. They will not start swapping.
Try again. You can tune this to some extent with /proc/sys/vm/swappiness but Linux is loathe to abandon buffer cache, and will often choose to swap old pages instead.
I have learned this the hard way. For example, on a database machine (where > 80% of the memory is allocated to the DB's buffer pool) try to take a consistent filesystem snapshot of the db's data directory and then rsync it to another machine. The rsync process will read a ton of data, and Linux will dutifully (and needlessly) try to jam this into the already full buffer cache. Instead of ejecting the current contents of the buffer cache, Linux will madly start swapping out database pages trying to preserve buffer cache.
Some versions of rsync support direct i/o on read to avoid this, but they're not mainstream and readily available on Linux. You can also use iflag=direct with dd to get around this problem.
There are very good reasons that Linux (and most other modern operating systems) makes aggressive use of page caches and buffers. For the vast majority of applications dropping these caches is going to reduce performance considerably (disk is really really slow) and most applications for which this isn't true are probably using O_DIRECT anyway.
The arguments in favor of page caching are: (a) disks have very high latency (b) disks have relatively low bandwidth (c) for hot data RAM is cheaper disk IO both in dollars and in watts  and (d) it's basically free because the memory would have been unused anyway.
The arguments against page caching are: (a) occasionally the kernel will make poor choices and do something sub-optimal and (b) high numbers in 'free' make me feel better.
Too many inexperienced operators (or those experienced on other OSs) confuse disadvantage (a) for disadvantage (b) and decide to drop caches using a cron job.
 Old but good: ftp://ftp.research.microsoft.com/pub/tr/tr-97-33.pdf
The cache dropping is actually useful when you are doing benchmarking...
My response was more to the "Let's Put 'echo 3 > /proc/sys/vm/drop_caches' In Cron and Get Free RAM!!!!" thinking, which sadly seems to be widespread.
Linux just looks like it ate your RAM. Firefox straight up does eat it.
VMs like vmware have the exact same problem, where the host might want memory used by the guest, and you end up with weird scenarios where the guest's swap is in the host's disk cache, but the guest's memory is in the host's swap. One of the things guest tools are supposed to do is communicate with the host regarding memory pressure. Firefox lacks this feedback mechanism.
Modern web browsers have architectures similar to OSes at this point - because they have requirements similar to OSes. I think it's natural that they will take on some of the same responsibilities.
Insert here some pithy comment about Apollo missions and Twitter, or whatever.
I guess so many of us mention FF in this context because it still happens to us, even though we gamely continue to use it. But we still love it, and that's why we continue to use it. Though admittedly, most of us have a Chrome on the side...
gnome-system-monitor has a top-like monitor as well as graphs, and measures memory properly (including a discount for shared maps); smem works in the console; it doesn't have a term interface like top, but it can be combined with watch.
Question for the crowd: In this site the example given says that in reality there are 869MB of used RAM. I'm comparing this with my VPS values, and would like to know if this is the sum of some column in top. Is it? It looks like it's pretty close to the sum of the SHR column. Does this make sense? Thanks in advance.
And you can't just subtract the shared memory numbers, because different sets of pages are shared between different sets of processes, and top doesn't give enough information to figure out what's actually happening where.
Running the pmaps tool on all pids and summing the Pss number is perhaps the closest you can get to the actual memory use.