
What Toasters and Distributed Systems Might Have in Common - twakefield
https://www.mailgun.com/blog/for-devs/what-toasters-and-distributed-systems-might-have-in-common
======
pwnna
This article described something I've wanted to do for a bit: using PID (and
possibly other control algorithms) to control software systems. I've thought
about this for a bit and I have some reservations (perhaps unfounded because
I'm just not that knowledgeable) actually trying to implement it in live
systems:

1\. PID tuning is somewhat non-trivial. It's interesting to see that the
author was able to tune their PID using a simulation. This is something I've
thought about doing for a while but I don't know how I can effectively
validate that the simulation approximates reality to an reasonable extent.
Sure I could build the simulator and validate it against the live system, but
the resource spent doing that might be better off just improving the live
system. It would be nice to have some theoretical arguments over why the
simulation would be correct. Now this shows my lack of knowledge, so perhaps
someone else knows how to deal with this more effectively.

2\. Software systems these days are always changing, especially when these
things are running on your servers that you're constantly updating. Tuned PID
constants are only valid for a particular system configuration (the "plant").
If the plant changes too much, the controller performance can degrade, or
worse: it can become unstable and blow up. This makes any system controlled by
the PID difficult to change: you want to avoid changes to the system because
that means you have to retune the PID, which may be difficult to perform. This
effectively becomes a technical debt, especially if the original author of the
PID no longer works on that project (and assuming only a few knows how it
works to begin with).

There are some other thoughts about non-linearity of software systems.
Although these are routinely dealt with in the real world, it makes a topic
that's already somewhat complex even more complicated, which further adds to
the non-maintainable status of the controller.

This all makes me feel like PID may not be the best choice in software
systems, but I don't have concrete proof either.

Sorry if this felt a little bit rambling. It's something that has been on my
mind for a bit but I haven't had a chance to formalize.

~~~
anton_
Regarding the point that the software systems being more flexible. A nice
thing about controllers such as PID is that they deal well with disturbances
and changing environments. In this particular case the PID controller runs
alongside the system with a fixed sampling rate. It means that it does not
really care about dynamics of the system in control. Say we want to update the
caching TTL and that affects how the system behaves. The PID algo only cares
about the delta of the error over a fixed period of time (sampling rate) and
will adjust accordingly. Also, as mirceal mentioned, there are autotuning
algos if it gets really bad.

Unfortunately it is a problem if the system changes too much the algo can stop
working properly. But that could happen any time you make a major change in
any complex system. To me it’s more of a problem of testing and sanity-
checking, not of PID or other control algorithms. When you get to a certain
point of complexity, testing all possible conditions becomes unrealistic and
that might lead to bugs when a big part of the system gets rewritten.

On the point that only a few engineers might know how it works. One of the
purpose of the article is actually put an idea out there to make more
engineers familiar with the concept and collect some feedback. :)

I’ll update the comment with “On non-linearity of software systems” when I get
a chance. This is actually a really good one and I have a few thoughts about
it.

~~~
anton_
On non-linearity of software systems. Correct me if I’m wrong but I’m assuming
you mean it in a sense of being probabilistic and not easily predictable.
Which is an interesting point because a lot of modern algorithms are like
that. Let’s take neural networks. Being able to explain what the model is
doing and how to make sure it’s correct is a whole area of research. For
example, the results of a convolutional net can change drastically by altering
a single pixel in an image[1].

We also faced with a problem of explaining a machine learning model that was
much simpler than a neural net. It is a global problem and unfortunately I
don’t have any solution of the top of my head.

[1] [https://arxiv.org/abs/1710.08864](https://arxiv.org/abs/1710.08864)

------
aidenn0
[http://nikital.github.io/pid/](http://nikital.github.io/pid/) is really neat.

I do wish it either had an option for a saturated integral, or had an option
for "wind" blowing in a particular direction as any significant change to the
setpoint will make large integrals useless and small values have little use
absent a force for them to counteract.

~~~
aidenn0
Why complain when it's open source:
[https://jasom.github.io/pid/index.html](https://jasom.github.io/pid/index.html)
has both improvements I suggested, plus a display of the X and Y error values

~~~
nikital
Cool, I made this toy some years ago :) Thanks for the additions, merged your
code into the original.

------
ddebernardy
Wasteland? [0] [1]

[0]:
[https://en.wikipedia.org/wiki/Wasteland_(video_game)](https://en.wikipedia.org/wiki/Wasteland_\(video_game\))

[1]:
[http://wasteland.wikia.com/wiki/Broken_toaster](http://wasteland.wikia.com/wiki/Broken_toaster)

------
stiglitz
“IP warmup.” Is this an algorithm to send broadcast emails at the maximum rate
that avoids ISP spam detection? I find this concept disturbing. Morals of spam
aside, we have two systems clearly working at odds rather than doing
productive work.

