
Improving Kubernetes Scheduler Performance - philips
https://coreos.com/blog/improving-kubernetes-scheduler-performance.html
======
JoachimSchipper
Am I missing something, or are they really boasting that they are able to
schedule _tens_ of jobs per second? ("Average pod throughput was 16.30
pods/sec") That seems very low...

~~~
ideal0227
Placing quality and throughput are always a tradeoff. Kubernetes, now, uses a
naive algorithm that goes over all its nodes and finds the best placement.
This is simple and effective for most web workload, when most of your
scheduled jobs will take "forever" to run. And as it states in the blog post,
after some simple optimization, now the scheduler can schedule tens of jobs
per second. There are other limits make it slower overall. So we stop
optimizing scheduler to avoid adding unnecessary complexity in its early days.

------
shepik
So, they optimized "Round" which took 18 seconds, and by that reduced total
cpu time from 53 to 23 seconds. Did I get that right?

~~~
smoyer
They also removed two rate-limiters that someone had (presumably
intentionally) put into the code. You always have to wonder when you see
"protective code" like that - why was it added in the first place? Was it
premature optimization? A guess? Often if a weak component is strengthened,
you end up with "forgotten" code like this and it's always an effort to find
and remove it.

~~~
hongchaodeng
Unleashing rate limiters is like releasing a water gate -- now you know which
part is too weak to stop the flood.

By doing this, we can have more insights of system performance and find out
which part is too "weak".

There is no reason that we couldn't increase the rate limiter if we keep
improving the system in overall.

