
Linux Load Averages: Solving the Mystery - dmit
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
======
ChuckMcM
Awesome analysis, I have added it to my favorites list. Around 1990 or so when
I was in the kernel group at Sun and a team had just embarked on the multi-
processor kernel work that would later result in the 'interrupts as
threads'[1] paper. During that time there was an epic thread on email which
was something like "What the F*ck does load average mean on an MP system?" (no
doubt I have a copy on an unreadable quarter inch tape somewhere :-(). If it
helps, the exact same pivot point was identified, which is this, does 'load
average' mean the load on the CPU or the load on the system. While there were
supporters in the 'system' camp the traditionalists carried the day with "We
can't change the definition on existing customers, all of their shell scripts
would break!" or something to that effect. Basically, the response was if we
were to change it, we would have to call it something different to maintain a
commitment to the principle of least surprise. This has never been an issue
for Linux :-).

As a "systems" guy I am always interested in how balanced the system is, which
is to say that I am always trying to figure out what the slowest part of my
system is and insuring that it is within some small epsilon of the other
parts. If you do that, then system load is linear with workload almost
regardless of task composition. So disk heavy processes load the "system" as
much as "compute heavy" processes and "memory heavy" or "network heavy." In an
imaginary world you could decompose a system into 'resource units' and then
optimize it for a particular workload.

[1]
[http://dl.acm.org/citation.cfm?id=202217](http://dl.acm.org/citation.cfm?id=202217)

~~~
samstave
Uh.. complete but relevant aside:

All you old farts (TM) need to get these freaking quarter inch tapes pushed up
to some glacier S3 bucket or sum=such bucket before you kick said bucket...

I'm serious. C'mon, don't steal from the future what you actually did in the
past to make the present the reality of today!!

~~~
ChuckMcM
Its all about the context though :-). One of the things the Computer History
Museum has done a good job of is capturing a lot of the historical
underpinnings. But while its interesting, in a "wow, isn't that interesting"
sort of way that rail road track gauge is the same as cart ruts which are the
same as roman chariot widths (not exactly:
[http://www.snopes.com/history/american/gauge.asp](http://www.snopes.com/history/american/gauge.asp))

Much of this stuff was fairly constrained by the choices of the time, and as
such the information generally ages poorly.

~~~
samstave
The usefulness of the info may age poorly.. but the historical context won't

Else we end up in a tech situation far in the future where the world looks in
the mirror on acid and says "how the fuck did we get here?" \- and the tapes
provide no answers

Btw, just last weekend I was harvesting 100 year old railroad spikes from a
western timber company rail in Sonora because of the context of historical
significance - not because I plan to lay a new track...

------
siebenmann
This is great work in general and excellent historical research.

As an additional historical note: in Unix, load averages were introduced in
3BSD, and at that time they included processes in disk IO wait and other
theoretically short-term waits that weren't interruptible. This definition was
carried through the BSD series and onward into Unixes derived from them, such
as the initial versions of SunOS and Ultrix. At some point (perhaps SunOS 3 to
SunOS 4, perhaps later), the SunOS/Solaris definition changed to be purely
runable processes.

(I'm not sure what System V derived Unixes such as Irix, HP-UX, and so on did,
and their kernel source is not readily available online for spelunking.)

As of early 2016 when I last looked at this, the situation on FreeBSD,
OpenBSD, and NetBSD was somewhat tangled. FreeBSD load average only included
runable processes, but NetBSD and OpenBSD counted some sleeping or waiting
processes as well.

~~~
EvanAnderson
When details of a piece of "open" software are so easily lost I shudder to
think about the vast quantity of "closed" software that have had their history
lost.

I also kept thinking about how the term "software archaeology" (which I first
saw in the 1999 Vernor Vinge novel "A Deepness In the Sky") becomes more and
more mainstream each day.

~~~
taneq
I always thought "programmer-at-arms" was a brilliant job title and well
describes what some of us do.

------
Twirrim
Several years back the company I worked for ended up picking up some work for
a client. Every quarter we'd download a huge trove of TIFFs from some source,
and then do some image conversion work before shipping transferring them to
the customer's infrastructure.

There was a java application that powered the logic side of things, calling
out to ImageMagick to do the actual processing and conversion. For whatever
reason, after careful benchmarking we settled on a java thread count that
happened to get us the peak throughput, but also caused system load average to
hit around 400 and keep steady at around that level.

The day that happened, and I could show that no application on the server took
a performance hit, was the day that I _finally_ persuaded my boss that load
average is an interesting stat, but it's not the be-all and end-all, and that
a high load average doesn't necessarily correlate to an actual problem.

~~~
Bluecobra
I had something similar happen in the past a long time ago on a x86 Solaris 10
mail server. An employee thought it was a good idea to share best quality/full
resolution JPEG pictures of his new baby with the whole company. This swamped
the mail server (load average was well over 700) while it chugged through
delivering a 50mb email to 200+ employees. I forgot what process was the
culprit (I think GNU Mailman) but after a couple of hours it finally settled
down. I was amazed that could still SSH into it and figure out what happened.

------
sreque
One source of high load average spikes that I've seen in my job is when a
process crashes and generates a core dump. While the core dump is being
written, all threads in the process are in the TASK_UNINTERRUPTIBLE state even
though they are doing absolutely nothing, and as such they all count towards
the load average as if they were spinning on on a CPU core. If the total
virtual memory of the process is large, say in the multi-GB range, coredumping
can take on the order of a minute, and Linux will report an unreasonably high
load average if that process had a lot of running threads.

Things like the above scenario make me treat the load average metric with a
lot of skepticism. I would much rather use other metrics to infer load.

~~~
lotyrin
I rarely recommend alerting monitoring or any kind of action based on load
averages or more generally any metric derived from queue lengths. It's trends
in high-quantile queue _latencies_ your users (and therefore you should) care
about.

~~~
haimez
Kind of ironic, given that the whole article is about the divergence of system
load from being a queue length metric.

------
saalweachter
If it was bothering anyone else: yes, the parenthesis in the patch in the
email are unbalanced, and the code was checked in as:

    
    
                    if (*p && ((*p)->state == TASK_RUNNING ||
                               (*p)->state == TASK_UNINTERRUPTIBLE ||
                               (*p)->state == TASK_SWAPPING))

~~~
mentat
They're... not unbalanced.

------
simonjgreen
Under Better Metrics the author discusses ways of drilling down to find the
source of a high load average. I feel like this section should mention `atop`,
which is imo a really underrated single-pane-of-glass view into everything
your system is doing, now and historically.

If you haven't tried `atop`, give it a go.

This historical analysis in this article though is great, because while Load
Average has been an oft discussed and we'll understood topic for a long time,
the decisions that got us there are not.

~~~
teddyh
I sometimes hear about “atop”, wonder _Why haven’t I this installed?_ ,
install it, discover that it starts (and requires) _two_ additional daemon
processes, at which point I remember, and promptly uninstall it again.

------
mnw21cam
Good article. However, it is missing the reason why load averages include
tasks waiting for disc/swap.

One of the things that the load average is sometimes used for is to work out
whether it is appropriate to start some more processes running on a system.
For example, make has a "-l" option, which prevents more parallel jobs being
run while the load is above a supplied number. When a system is thrashing due
to insufficient RAM, then the load average will be high, and this option will
appropriately prevent more tasks being started which would make the thrashing
worse. If the load average was just based on CPU, then it would be low while
thrashing, and using that make option could lead to complete system collapse.

------
Filligree
Google cache:
[http://webcache.googleusercontent.com/search?q=cache:taDucb9...](http://webcache.googleusercontent.com/search?q=cache:taDucb9WN2YJ:www.brendangregg.com/blog/2017-08-08/linux-
load-averages.html+&cd=1&hl=en&ct=clnk&gl=no)

------
Florin_Andrei
> _As a set of three, you can tell if load is increasing or decreasing_

That could be accomplished with a set of two.

A set of three could in theory give you _acceleration_.

~~~
btilly
This comment makes perfect sense if load is a smooth function. But it is not.
It tends to be a step function.

The most recent 2 data points give you is whether the problem is currently
getting worse, getting better or steady. The third gives you a sense of
whether it has been doing on a while.

~~~
Florin_Andrei
> _This comment makes perfect sense if load is a smooth function. But it is
> not. It tends to be a step function._

I think that depends on the sampling frequency, doesn't it? (given a modern OS
with lots and lots of threads and processes)

~~~
lkrubner
No. Check out this video by Zach Tellman. He talks about queues and how they
break down under load. One of the least intuitive things he points out is that
when you have more processors, the breakdown tends to be more of a step
function: everything is running smoothly till the moment that it isn't.

[https://www.youtube.com/watch?v=1bNOO3xxMc0](https://www.youtube.com/watch?v=1bNOO3xxMc0)

The point he makes arises from basic queue theory and is applicable to all
kinds of systems, and how those systems react to load. It's got little to due
with particular hardware and everything to do with basic math.

~~~
hackits
That was a great talk and really sound recommendations he makes.

------
hathawsh
This analysis cleared up a mystery for me. I've noticed that when a server app
is under heavy load in Linux, the load average goes high if the bottleneck is
the CPU or the disk, but the load average goes low if the bottleneck is
network resources (like databases or microservice calls). I know why that
happens, but it's very unintuitive and it confused me when I was new to Linux.
I thought load average would measure the CPU load only. Now I know the
historical reasons for measuring system load instead of CPU load.

I kind of like it the way it is since it's handy to be able to distinguish
network load from CPU+disk load just by looking at the load average. However,
since the load average includes other stuff as well, sometimes I still don't
know what the load average really means.

------
ty_a
Holy crap, Brendan Gregg's site went down. Proof he is human I guess?

~~~
brendangregg
Yes, sorry. I guess proof this is a hobby on some personal hosting that can
get overloaded. Try refreshing. Although it's load averages (couldn't resist)
aren't that high:

    
    
        10:36:09 up 34 days, 20:05,  1 user,  load average: 2.39, 2.34, 2.08

~~~
unilynx
Load averages should probably get even higher and include network load - right
now, a saturated ethernet card doesn't show up in the load

(network card load is one of the next metrics I check next if load average and
wait%/user% etc aren't telling what's wrong)

------
ge96
Why isn't there one for ram in i3? I read something about how it's hard to
gauge ram usage despite htop displaying it as well as inxi in general on
Windows you look at task manager there is memory usage.

~~~
stephengillie
Do you want RAM use, or virtual memory use?

Here's an article on gathering this data on Windows with Powershell:

[https://www.petri.com/display-memory-usage-
powershell](https://www.petri.com/display-memory-usage-powershell)

~~~
ge96
Thanks I primarily use Linux with i3-wm

Not sure which is which ram or vram though have heard, probably not vram my
computers are generally garbage.

Thanks for the link.

~~~
stephengillie
Virtual RAM is what your OS gives out to your programs. Hardware devices will
also have addresses mapped in this space.

Physical RAM is just that - the physical RAM in your PC. Virtual RAM uses this
entirely, and then some. There is a file that maps virtual address locations
to physical address locations. The addresses which are in use by programs, but
not frequently used, are "paged" (written) to memory addresses in the swap
file on the storage device. In this way, programs get the safety net of having
_every version of every possible library ever written in any permutation of
the universe_ loaded into memory, while the OS can conserve fast storage for
other active programs.

This is why 32bit OSes present a variable amount of RAM (less than 4GB) on
systems with 4GB of physical RAM. They can only address 4GB of virtual memory,
and each device has to use a few of those addresses for their Hardware
mapping. So 32bit OSes with more devices actually had slightly less RAM
available to programs.

This only scratches the surface. If you want real fun, delve into the Windows
32bit 3GB user mode checkbox.

------
faragon
It incredibly detailed, including references and historical investigation.
Mind blowing. Kudos, Brendan Gregg.

------
vfaronov
Worth remembering that essentially the same issue exists at a lower level: the
“%Cpu” number as shown by top includes not just the share of time spent
actually executing your instructions, but also the share of time waiting on
memory access.

As explained by the same author:
[http://www.brendangregg.com/blog/2017-05-09/cpu-
utilization-...](http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-
is-wrong.html)

------
solarengineer
When I'd asked Brendan via Twitter for an article on Load Averages in Linux, I
hadn't expected such a detailed response. I've worked on a few projects where
I've had to show that even though the "load" on the Linux system was low, it
was really the steal% and the iowait that were killing performance. I'm sure
that from now on, so many system and support engineers will have a good
article to reference. Thanks, Brendan!

~~~
brendangregg
Yes, thanks for the question, it reminded me that I'd never got to the bottom
of uninterruptible before, so I was really determined to do so this time.

------
sytringy05
My company took over production support of an ESB from another company for a
client a couple of years ago. The worker nodes had about 100 JVMs running on
it and its resting Load Avg was around 30. This on a 2 CPU RHEL vm.

Out of morbid curiosity, I restarted one of the test servers and ran top. Load
Avg was in the order of 2200 for about 3 hours.

The worst part was that the guys we took it over from didn't even think it was
a problem.

~~~
simtel20
What was the business impact of the high load? When reducing the load improves
the company's bottom line, it should absolutely be pursued.

------
mnarayan01
Page swapping seems like it makes a lot of sense to include in the load
average. Disk I/O seems like something more towards the opposite end of the
spectrum, though TASK_KILLABLE
([https://lwn.net/Articles/288056/](https://lwn.net/Articles/288056/))
presumably mitigates this where used.

------
rotten
What we need is a systems model that allows us to assess the overall health of
a server in a single metric. Indicators of something under strain will reflect
itself in the metric and draw our attention for further drilldown and
analysis. "Load Average" is the metric we (the systems community) have
generally been using for this. Unfortunately it appears that the model it is
based on may be rather dated and may have flaws which mean we will miss, or
misinterpret system health status by relying on that number. So the million
dollar question is - starting from scratch, how can we design a model of our
system that yields an reliable system health indicator metric?

------
mobilethrow
OT: what could cause a system to have a load of 1 when _idle_?

I have one (unimportant) Linux system that idles with a load of exactly 1. The
issue persists through reboots. It is a KVM virtual machine and qemu confirms
nothing is going on in the background.

Any ideas how to find out what's causing it?

~~~
blinkingled
Any processes stuck in D state?

~~~
mobilethrow
There is one, a [hwrng] process, but I don't think that's it. It's also in D
state on other virtual machines on the same host without this symptom.

(The process is probably from virtio-rng.)

~~~
blinkingled
I am assuming the guest (and host) kernels are sufficiently recent? I remember
older kernels having a bug with load calculation. Does disabling Virtio-rng
help? D state processes will cause rise in load average depending on NRCPUs.

------
fanf2
I thought that including disk wait in the load average was a common Unix
feature. Sadly I can't go spelunking through the archives right now, but it
would be interesting to see what Solaris and BSD do, for comparison with
systems a little bit closer to Linux than TENEX :-)

~~~
brendangregg
Solaris and BSD load averages are based on CPU only. As for avoiding TENEX,
here's the comment from the freebsd src:

    
    
        /*
         * Compute a tenex style load average of a quantity on
         * 1, 5 and 15 minute intervals.
         */
        static void
        loadav(void *arg)
        {
        [...]
    

:)

------
gciruelos
there's a very good (and old) article about linux load averages here:
[http://www.linuxjournal.com/article/9001?page=0,0](http://www.linuxjournal.com/article/9001?page=0,0)

------
js2
It's been years but I really remember that Solaris load avg used to similarly
be affected by I/O, particularly NFS.

------
JaggerFoo
Great article. Interesting, insightful, and actionable.

Cheers

------
caf
brendan, you could consider adding an option to offcputime that merges all
kworker stacks together, since they're really just separate workers in the
same thread pool.

~~~
brendangregg
good idea, thanks.

------
SoMisanthrope
Brilliant. Time to patch it back to CPU loads.

~~~
dbenhur
No, long time past to discard it. Measure what you actually care about, not
some synthetic, poorly understood "load"

------
Steeeve
This is an incredible analysis! Well done!

------
shimon_e
netdata is a good tool if you are looking for precise data on where the
bottlenecks are on your server.

------
Annatar
"Do we want to measure demand on the system in terms of threads, or just
demand for physical resources?"

The intent behind load averages is to measure how (over)loaded the hardware
is; if you now try to re-define that intent, it will just be yet another Linux
atrocity where Linux will be "special" and behave differently from how every
other UNIX-like OS behaves (exempli gratia: ss versus netstat). I argue that
this will help the momentum against Linux already fueled by, and well underway
with the systemd fiasco. You would break the rule of least surprise. It's bad
enough that Linux measures load differently from all other UNIX-like operating
systems and this would make the situation even worse.

