
Show HN: Prometheus fork for cloud scale anomaly detection across metrics/logs - nak923
https://github.com/zebrium/prometheus
======
nak923
Anil from Zebrium here. We provide an Autonomous Monitoring service that until
now has used logs as the source of our anomaly detection. We wanted to augment
this by correlating log anomalies with anomalies detected within a group of
related metrics.

Since Prometheus is very popular in Kubernetes environments, we wanted to
support discovering and scraping Prometheus targets and send those scraped
metrics to our software running in the cloud for anomaly detection. Latency is
important to us as we need to receive the metrics in near real time as they
get scraped. We also need to preserve labels, types, and full fidelity of time
stamps for anomaly detection and log correlation purposes. And we need to do
all this while being as efficient as possible in sending the metrics over the
wire, as this data will be going over the WAN from a user’s Kubernetes cluster
to our software which is running in the cloud.

To achieve all of this, we have built and open sourced a forked instance of
the Prometheus server and a new remote server that collects metrics. Quick
summary of what we achieved: near real time metric updates, preserving of
valuable information such as labels and types, ability to handle out of order
samples, greater than 500x bandwidth reduction. You can read all about it in
this blog: [https://www.zebrium.com/blog/a-prometheus-fork-for-
efficient...](https://www.zebrium.com/blog/a-prometheus-fork-for-efficient-
cloud-scale-autonomous-monitoring) or in the github repo:
[https://github.com/zebrium/prometheus](https://github.com/zebrium/prometheus).

~~~
QuinnyPig
Why a fork rather than contributing this back upstream?

~~~
nak923
Thanks. We started with fork, so that we can contribute back to upstream in
the near future.

