Of course the alerts work. Alertmanager can do it all. Meanwhile, a lot of the a...

trabant00 · on June 28, 2023

That's not what I'm asking. Of course Alertmanager works. Did you get the system to the point where all the false positives are eliminated and you don't miss any failure either, such that you don't need L1 to stay in 3 shifts in front of a couple of monitors and decide when to call Ops? Meaning you can have your alerting system automatically and directly call L3.

pphysch · on June 28, 2023

This discussion is getting pretty far beyond the scope of Nagios vs. Prometheus.

Yes, alertmanager gives you full control over the routing and deduplication of alerts.

trabant00 · on June 28, 2023

I am not asking about the tech. I know it. I did VictoriaMetrics exclusively for 2 years. I submitted bug fixes to them. I wrote a metrics duplicator loadbalancer with local buffer which stood before vminsert to get HA without paying VM for license. I was ingesting 400Mbits per second of metrics from 20 teams, hundreds of apps.

I am asking if you got the queries good enough to alert L3 directly.

But I'm not really asking, and you did not really miss understood my questions either. We both know the answer.

dengolius · on June 29, 2023

Could you please elaborate on "I wrote a metrics duplicator loadbalancer with local buffer which stood before vminsert to get HA without paying VM for license. I was ingesting 400Mbits per second of metrics from 20 teams, hundreds of apps." ? Why you need to write own duplicator while vmagent can achive such case?

trabant00 · on June 29, 2023

That was added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/14... I did it before this, when replication between clusters was not available (it was in Enterprise if I remember correctly).

dengolius · on July 3, 2023

Ok, I see.