
Kubethanos a tool to kill half of your pods randomly to test your system - berkay-dincer
https://github.com/berkay-dincer/kubethanos
======
k_sze
This needs to come with a Blip mode where the pods are not killed, but
completely frozen/suspended for a period of time, and then you bring them back
and see what happens.

~~~
lmilcin
That's actually half decent idea.

Components getting completely killed is actually the simple case. In fact,
many HA strategies require that malfunctioning components be outright killed.

Components that get temporarily unresponsive, isolated, network partitions,
these are much more challenging scenarios.

------
noodlesUK
How likely is it that large numbers of pods just randomly die as opposed to
nodes dying? I also agree with the other poster, freezing/partitioning the
network and resuming things later on to see how things fall apart then is a
good idea.

