
EC2 Neighbour Caught Stealing CPU - aritraghosh007
http://blog.sematext.com/2013/04/22/ec2-neighbour-caught-stealing-cpu/
======
sehrope
> What we see is that somebody, some other VM(s) sharing the same underlying
> host, is stealing about 30% of the CPU that really belongs to us.

That's not how it works. CPU steal time is when the hypervisor stalls your CPU
because you're using _more_ than your allotted share. Depending on the
hypervisor configuration it could happen either consistently (ex: AWS micro
instance) or on demand (ex: whenever your neighbors are actually using their
fair share). If your neighbors are not using 100% of their slice of the CPU
then you can generally use it yourself[1] but if you can't they're not
"stealing" from you. You're just not able to use their unused capacity[2].

[1]: For regular AWS instances, not for micro instances which stall almost
immediately when using a lot of CPU. Try a non-trivial compile on a micro
instance and see what happens.

[2]: Alternatively, you can look at as they're preventing you from "stealing"
from your neighbors when they're not using it.

~~~
lucaspiller
Could it also be that the hypervisor is allocating more CPU to the other VM as
he isn't using 100% of his allocated share (note that he said that the VM
isn't very CPU intensive)? If his was using 100% of the allocated share then
the other VM would be throttled back.

~~~
maffydub
I believe that steal only records the percentage of time that your VM wanted
to use the CPU but couldn't. If your VM doesn't want to use the CPU, the time
is always recorded as idle, never steal.

------
kbar13
CPU steal is a stat that looks scary and is compounded by the number of
sensationalist blog faux-tutorials about how cpu steal is the devil.

[http://adrianotto.com/2010/02/time-stolen-from-a-virtual-
mac...](http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/)

^ read

------
maffydub
We use EC2 a lot (on Project Clearwater -
[http://www.projectclearwater.org/](http://www.projectclearwater.org/)), but I
don't think we've ever seen significant levels of steal except on m1.small
nodes.

m1.small nodes expect up to 50% steal because you're explicitly only renting
half the CPU. Check how /proc/cpuinfo compares with 1 ECU:
[http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_...](http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it).

In fact, you can even see over 50% steal because the hypervisor steals from
you if you're doing network-intensive work such as accepting TCP connections
at a high rate (hundreds of new connections per second).

------
chris_wot
I thought steal time was the time that your local VM was trying to get access
to CPU cycles over its allocated resource...

Amazon AWS guys got straight to the point about it in the forums:

 _The steal time means you 're trying to go over your allocated resources. AWS
doesn't give you a dedicated amount of resources, it gives you access to more,
so that when the host machine isn't under heavy load you can use more
resources, which would otherwise be wasted._ [1]

1\.
[https://forums.aws.amazon.com/thread.jspa?threadID=79519#](https://forums.aws.amazon.com/thread.jspa?threadID=79519#)

------
okrasz
This is not fully correct. It is correct that the CPU time has be been stolen,
BUT nobody told that you will be the only one who will be getting a physical
CPU. AWS gives you specific part of CPU and the one that is not committed to
your server is seen as seal time. This is normal. Note that the seal time
stays on virtually constant level. Even if your neighbor wasn't busy you would
still not get additional CPU as AWS does not allow CPU bursting (using unused
CPU) to other instance types than Micro, which I've read is actually running
on second priority on otherwise unused CPU cycles.

------
jaryd
Is there any actionable advice for those of us who are not your customer and
without access to your monitoring panel?

~~~
maffydub
My advice would be to understand what steal is, and what CPU usage your cloud
provider is offering you.

steal is time when your VM wants to run but the hypervisor doesn't let it
because another VM is using it. It does not include time when your VM was idle
and another VM was using the CPU.

As a result, steal is often very low until your VM's CPU usage starts
increasing. Then, steal increases rapidly.

For example on EC2, 2 m1.small VMs normally share 1 CPU. You might benchmark
your application and see it can handle 1000 transactions per second at 10%
total CPU. You might then (possibly naively) assume that you can put ~10000
transactions per second through before you hit 100% CPU. In fact, as you went
through 4000 transactions per second and towards 5000, your steal would rise
rapidly and you'd max your CPU out.

