
Show HN: CLI Using Chaos Engine on K8s to Validate Autonomous Monitoring Tool - dgildeh
https://github.com/zebrium/zebrium-kubernetes-demo
======
dgildeh
Hi everyone - I've been playing with a new Chaos Engine from Maya Data called
Litmus to create incidents in a Kubernetes cluster for an Autonomous
monitoring solution to detect.

The purpose of this repository is to build a realistic app environment running
multiple services on a Kubernetes cluster. And then run a series of chaos
experiments to see if an Autonomous Monitoring solution (without any pre-
configuration) can automatically detect any incidents caused by the chaos
experiments.

For those of you wondering what Autonomous monitoring is - the key difference
with current monitoring tools, is instead of you having to tell the monitoring
tool what to look for through setting up and maintaining a long list of alert
rules, the monitoring tool figures out what to alert on using Machine
Learning. You just send your logs/metrics/traces to it and it figures out when
an incident is occurring and its root cause without you having to tell it
(configure) anything! This approach is becoming more important as environments
become more distributed and complex and dynamic making it harder to know what
to alert on.

You can get it up and running in just 2 commands to spin up the cluster and
run all the Chaos experiments end to end. It provides a good example of
running Chaos Experiments in Kubernetes, and also demonstrates where state of
the art machine learning has got to in the monitoring space today!

~~~
gdcohen
Gavin from Zebrium here. Thanks for doing such a great job with this. I played
with an earlier version of this and it's come a long way. I love how you've
made it really easy to setup and run experiments with the Litmus Chaos Engine
([https://litmuschaos.io/](https://litmuschaos.io/)).

------
spicyponey
Thanks for this. It's pretty smart of Zebrium IMO to reach out to and embrace
chaos from Litmus to show off and I guess educate your smart monitoring.

~~~
gdcohen
We love Litmus because it gives our users a way to quickly and
deterministically inject failures into their app so they can test the
effectiveness of our solution.

