
The many load averages of Unix - jsnell
https://utcc.utoronto.ca/~cks/space/blog/unix/ManyLoadAveragesOfUnix
======
CoffeeDregs
Well that was probably as frustrating to write as it was to read... The author
does a nice job of forensics but can't _really_ pin down the meaning on any
particular OS.

My preference would be to have a well-defined, even if of-limited-use, load
rather than a poorly-defined, but maybe-more-useful?, load. From the article,
it sounds as if kernel authors tried to implement the former, found it
difficult to do as machines grew and then everyone headed off in different
directions only to implement lots of the latter...

~~~
digi_owl
Best i can tell after some quick glancing on the history of the metric, the
value of including IOWait depends more on the hardware than the software.

If your IO actions are performed by CPU rather than DMA, having them in there
makes sense (it will slow down anything else). but if it is performed via DMA
(or in other ways offloaded from the CPU) it makes less sense.

~~~
jws
Generally, processes in IOWait are not running, they are sitting around in a
queue waiting for their data.

Anyone waiting on the process is feeling pain, but it isn't eating any CPU so
it comes down to what you want the metric to mean. Should it reflect user
discomfort or consumption of cpu cycles.

~~~
digi_owl
Yes the process is sleeping, but the system (OS and hardware) is still active
doing whatever IO was requested.

------
planckscnst
Brendan Gregg has a nice video on how Solaris 10's iowait works. It starts a
little slow and may even sound imprecise or inaccurate, but it gets better
around 6 minutes in.

[http://dtrace.org/blogs/brendan/2011/06/24/load-average-
vide...](http://dtrace.org/blogs/brendan/2011/06/24/load-average-video/)

------
guardiangod
As the article said, NFS has a nasty habit of rocketing your load count when
the server fails, and NFS can be very unstable if everything aligns correctly
(or badly.)

When the NFS server decides to take a rest I've seen 1000+ loads on production
Linux (3.13+) systems

Or kernel panic

Or OOM

Or cats and dogs living together

------
unixhero
I apprechiate the effort, but that article did not tell me almost anything.

This one from Linuxjournal[0] however.

[0] -
[http://www.linuxjournal.com/article/9001](http://www.linuxjournal.com/article/9001)

------
amelius
I think a good metric is directly related to the amount of progress a single
thread with default priority makes on the CPU per unit time.

