
Understanding CPU Steal Time - when should you be worried? - itsderek23
http://blog.scoutapp.com/articles/2013/07/25/understanding-cpu-steal-time-when-should-you-be-worried
======
aliguori
This analysis of steal time is not entirely correct.

Steal time exists to fix a problem. When a hypervisor needs to pre-empt a
_running_ guest, without steal time, when the hypervisor eventually resumes
that guest, as far as the guest can tell, the process that was running when
the whole guest was pre-empted had run the entire time.

This means that if a guest is pre-empted, then CPU usage reporting in the
guest becomes horribly wrong with some processes having much higher reported
usage than they actually got. This affects fairness and can cause lots of bad
things.

Steal time is simply a way to tell a guest that it was pre-empted. The guest
OS can then use that information to correct its usage information and preserve
fairness.

However, it is not a general indication of overcommit. When a guest idles a
VCPU, that VCPU will be put on the scheduler queue. It may receive an event
that would normally cause it to awaken the VCPU however if the system is
overcommitted, it may take much longer for the VCPU to be woken up.

Most clouds are designed to allow multiple VCPUs per physical CPU too and
there certainly is capping in place. You can still see steal time even though
you are getting your full share.

Let me give an example:

1) You are capped at 50%. You run for your full 50%, go idle, the hypervisor
realizes you've exhausted your slice, and doesn't schedule you until the next
slice. No steal time is reported.

2) You are capped at 50%. You have a neighbor attempting to use his full time
slice. Instead of getting to run for the first half of your slice with the
neighbor running for the second half, the hypervisor carves up the slice into
10 slots and schedules you both in alternative slots. Both guests see 50%
steal time.

You will get the same performance in both scenarios even though the steal time
is reported differently.

~~~
rodgerd
My rule of thumb is pretty simple: if I see steal but still have an abundance
of idle, I don't have a problem. If I see steal and low/no idle, I have a
problem with an overcommited hypervisor.

It's derived from stress testing and production across a variety of
virtualisation platforms, and it's generally proven pretty accurate.

------
falcolas
Closely related to CPU Steal time is memory ballooning. If an instance is
starting to require a lot of memory, and other instances are not, hypervisors
(particularly vmware) will steal memory from other VMs on the same machine and
give it to the misbehaving VM.

This can result in swapping on the unfortunate target VMs.

You can detect it by seeing a vmware program running using a lot of CPU
(ironically no memory), and by watching your free memory percentage decrease
while your programs are not actually consuming more memory.

~~~
lsc
My understanding is that memory sharing is generally not done in multi-tenant
xen systems, and that this is one of the reasons why Xen is so dominant in
that space.

OpenVZ, generally speaking, shares memory, but does not use ballooning;
ballooning is specific to systems where each user has their own kernel.

Generally speaking, I think it's much less bad to oversubscribe CPU than
memory. Among other things, if you take cpu away from a heavy user to give to
a user who has not used their fair share, all the new user has to do is reload
cpu cache from main memory, which is slow, but super fast compared to
reloading main memory cache from disk, which is what the light user has to do
when you take memory away from the heavy user.

Of course, the equation is quite different when the system is all owned by the
same entity.

------
sehrope
Using micro instances I've seen it go up to 99% during CPU intensive work (ex:
app build). The hang ups waiting for it continue made me decide to switch the
build server to an m1.small instance instead. It's idle the vast majority of
the time but the extra $$ for it is totally worth it when you're running a
build.

The steal % is usually zero on the m1.small instance. I just tried maxing out
the cpu and watching "top" this is as high as it got:

    
    
        Cpu(s):  5.3%us, 39.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 55.7%st
    

To max out the cpu I ran the following in a separate ssh terminal while
watching "top". An m1.small only has a single v-cpu so only a single running
copy should be necessary.

    
    
        while :; do date > /dev/null ; done

~~~
archivator
Micro instances only provide "burst" CPU usage - if you keep above a certain
threshold for long enough, it will throttle you by stealing CPU time (hence
the 99%).

~~~
sehrope
Yes that's exactly what happened. I originally thought our build times were
short enough that it wouldn't go over the threshold but it was. It's
surprisingly easy to trigger the cpu throttling on a micro instance.

Would be nice if they could/would average out the cpu usage over a longer
rolling window. That'd be perfect for a use case like this (build server)
where you're idle the majority of the time but want to max out cpu during the
build itself. Seems like the perfect use case for a shared server.

------
MattJ100
I've been having issues with steal time recently, but what I'm seeing isn't
adequately explained by any of the articles and documentation I could find.
Here is an example from one EC2 node:

    
    
      19:26:19     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
      06:04:05 PM  all   14.85    0.00    5.94    0.00    0.00    0.00   14.85    0.00   64.36
    

For me, when the machine was under load, %steal was almost always very close
to %usr. It wasn't always the same, sometimes more and sometimes less. Can
anyone explain how these numbers are related to each other?

~~~
falcolas
They aren't related. Steal is the ratio of how many CPU cycles were promised
to your system, but that your system didn't get. User is how much CPU (of the
promised maximum) your programs are using directly (i.e. not time which is
spent doing system calls (IO being the big one), which falls under the system
bucket).

~~~
MattJ100
Then why are they the same in this case? Coincidence? (I won't believe that...
but unfortunately I don't have more samples right now to demonstrate the
correlation).

~~~
falcolas
This is pretty old, so sorry if you don't get this, but I have been seeing
something similar today:

    
    
         procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
         r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
         0  0  33960  44844 147332 466588    0    0     0    63  130  198 20  2 59  1 18
         0  0  33960  44844 147340 466652    0    0     1    66  123  187 19  2 59  1 18
    

Etc.

I believe what's happening here is that the CPU cycles are being requested and
subsequently stolen, and then since the system is still idle, they're re-
requested and given, all in the same time period polled by the monitoring
tool.

It's just a theory, but it makes sense (at least to me ;).

------
gopalv
When I ran into this issue in EC2, it was mitigated by leaving cpu0 relatively
idle.

All apache processes were marked as taskset -c 1-7, the cpu steal and system
load went down massively once that was in place.

~~~
mh-
was this on an HVM instance?

------
JimmaDaRustla
I use Munin on my VPSs, and it shows the steal time which is nice. I don't
typically see it showing up other than a 1 pixel line on RamNode. Hopefully
that doesn't change under higher loads in the future.

~~~
bearbin
I see the same, although the VPS has basically zero load most of the time due
to what it does. (Munin host and Buildserver)

------
Nimi
It's been a while since I used AWS, so pardon me if the question is silly,
but:

Is it really cost-effective to track metrics like steal time, instead of using
a large instance and having the host machine for yourself?

------
thehme
Interesting article. I wonder how this works in Windows.

~~~
falcolas
Still capable of occurring, but I don't know how Windows would report it
(haven't run Windows in a VM in a long time).

------
bluedino
This is the same as CPU Ready in VMware?

