
Predictive CPU isolation of containers at Netflix using a MIP solver - fanf2
https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7
======
sanxiyn
For a related work by Google (in 2013), see "CPU performance isolation for
shared compute clusters":
[https://ai.google/research/pubs/pub40737/](https://ai.google/research/pubs/pub40737/)

------
mochomocha
I'm one of the author of the post, happy to answer any question.

~~~
yomritoyj
In my experiments with MIP there was a dramatic performance difference between
free and non-free solvers. Did you also experience this?

~~~
wenc
I worked with MIPs for 8 years and commercial solvers have always been several
orders of magnitude faster/better than open-source solvers.

This is generally true for domain-specific software -- unlike general purpose
software, incenting a small pool of specialized talent to make open-source
contributions is always hard.

The key to high performance in MIPs comes from having good heuristics, not
necessarily from improving the basic algorithms (the algorithms are pretty
standard -- simplex or interior point). Finding effective heuristics is hard
and tedious, but they make a significant difference in solution speed. For
instance, naive Simplex may take 40 minutes to solve a problem but with
heuristics the solution time might be 5 seconds.

That said, Cbc _is_ competitive for smaller problems, and here's the thing:
many production sized problems aren't that big -- it really depends on your
problem domain. I've deployed commercial solvers on Cbc (30k
variables/constraints) and it was more than adequate.

I don't have any details on this, but Gurobi (a best of class solver) also
offers an on-demand cloud SaaS which you can pay for on demand [1]. The
economics of this may work out for some types of problems.

[1] [https://www.gurobi.com/pdfs/user-
events/2017-frankfurt/Gurob...](https://www.gurobi.com/pdfs/user-
events/2017-frankfurt/Gurobi.pdf)

Also, if you're in academia, you can get Gurobi/CPLEX licenses for __free
__(yes). My research group in grad school didn 't spend a cent on these
solvers, and we still got a taste of best-of-class solution performance
(that's how they get you :).

------
longcommonname
I love this type of work, the final analysis shows a significant change but
one that I've had difficulty with in the past. Decrease your variance and your
99 percentile and your going to see gains even if you see an increase in your
volume of times near the median.

So often we see projects that talk about cutting costs by order of magnitudes
but this one is taking an order of magnitude of pods greater than most of what
we will do and reducing it fractionally

------
panpanna
Maybe not applicable to this article, but since caches and numa was
mentioned...

Has anyone studied effects of CPU allocation on cache coherency performance
(different L2 caches talking to synchronize data) and in the end overall
system performance?

~~~
truth_seeker
Intel VTune Amplifier can help you with that, although it has to be tested
with each release of your code to get the most juice out of it as every
application is different in terms of Computation, Disk IO and Network IO.

[https://software.intel.com/en-us/vtune](https://software.intel.com/en-
us/vtune)

------
truth_seeker
Setting CPU affinity to process worked for me. I always use this trick in my
DevOps script for deployment.

[https://linux.die.net/man/1/taskset](https://linux.die.net/man/1/taskset)

~~~
effie
What is the benefit? Does it run faster, or more consistently?

~~~
truth_seeker
Yes, I have seen upto 8% improvement in execution time. Here consistency is
about having less L1/L2 cache misses

------
seaghost
Anyone has an experience running openfoam in the containers? —cpuset-cpus
gives me huge performance hit. It’s painfully slow if I allow 16 cores to
docker, while running it directly in VM with 16 cores it’s much faster?

------
daxfohl
Why not just buy some hardware to run these jobs rather than doing all this
work to reduce the aws bill a couple percent?

