
How CPU load averages work, and using them to triage webserver performance - ingve
http://jvns.ca/blog/2016/02/07/cpu-load-averages/
======
heinrichhartman
Be aware, that load averages are exponentially damped averages not moving
averages over windows of length 1,5,10 mintues.

More quirks with linux performance tools are outlined in Brendan Gregg's talk:

[http://www.slideshare.net/brendangregg/broken-linux-
performa...](http://www.slideshare.net/brendangregg/broken-linux-performance-
tools-2016)

[https://www.youtube.com/watch?v=9U4jFpsEyYE&t=150](https://www.youtube.com/watch?v=9U4jFpsEyYE&t=150)

------
catchmrbharath
One of the best articles I read on the subject was this
[http://blog.paralleluniverse.co/2014/02/04/littles-
law/](http://blog.paralleluniverse.co/2014/02/04/littles-law/)

------
sserrano44
Load average measures how long is the queue of processes waiting for the CPU.

Most web applications aren't CPU intensive, rather they depend on many other
services like the database, S3, ... If a web worker is waiting for another
service it should release the CPU until the service responds.

So is entirely possible that all workers are waiting for other services, your
app is not responding and the CPU is free.

~~~
creshal
> If a web worker is waiting for another service it should release the CPU
> until the service responds.

Emphasis on "should". With your average drivel (Wordpress e.g.) this doesn't
happen and the load on the application server scales nicely with the response
time of the database server. 10 requests waiting on a locked table for 2
seconds = load of 10.

~~~
icebraining
Are you saying that the PHP mysql driver waits for the results with a busy
loop? I have to say I find that hard to believe.

~~~
creshal
> I have to say I find that hard to believe.

¯\\_(ツ)_/¯

Steps to reproduce: Set up one mysql server. Set up one PHP application server
(nginx+PHP fpm in my case, shouldn't make a difference with mod_php) on a
different machine. Configure Wordpress to use said mysql server. Lock the
database in some way (say, mysqldump of a Wordpress installation with millions
of records in wp_options), then hit the application server with regular
traffic (or ab), watch as load reaches pm.max_children. On the application
server, not the database server.

------
daveguy
Related, but not the same... There was an article on HN about various tools to
evaluate resource usage in the last month or two: top, lsof, netstat, etc. I
think it was put out by a big name in web services: Netflix, Amazon, etc (but
I'm not sure). I was sure I had it bookmarked, but I haven't been able to find
it. Does anyone remember that post or have it readily available?

~~~
severine
This?
[https://news.ycombinator.com/item?id=10654681](https://news.ycombinator.com/item?id=10654681)

------
jbert
Isn't the calculation (assuming (for simplicity) one CPU, running at 100% only
on application load)

60 requests/sec => each request takes 1/60s CPU-second == 16.6ms of CPU time
to process? (This is time-on-cpu, and doesn't include time-waiting-for-cpu. I
think time-on-cpu is the number you want if you're looking at optimising your
codebase)

~~~
deathanatos
She does mention:

> each request was taking 6 / 60 = 0.1s = 100ms of time _using-or-waiting-for-
> the-CPU_.

(emphasis mine)

In my original read, I thought her core count was greater than her load, so
that would also be her direct time-on-cpu. Now I'm not so sure.

And while time-waiting-for-cpu might not be important to optimizing the
codebase, you probably still want to know that your serving processes are
waiting for CPU; after all, it is that number that your user's browser is
seeing (at least, between the two it is moreso that one). Such a result might
indicate a larger machine or more machines are required, for example.

------
halayli
Any decent web server shows the response time. This can easily be verified. I
am not sure why such number isn't tracked in the first place.

