I was quite convinced VictoriaMetrics is superior than competitors after reading the author's several comprehensive technical details and benchmarks since a while back.
If you haven't already, may want to checkout M3 if you're thinking of scaling to a cluster and doing zero downtime upgrades. You can scale out horizontally with the Kubernetes operator, it avoids downtime as it replicates data at write time and does quorum based master-less reads.
Ah alright. Haven't looked at M3DB yet, will check it out. We've a lot of non K8s components too, so wanted a solution outside the operator, but yeah I get the idea behind it.
>"GitLab CI pipelines lint and validate the configurations and then upload them to an S3 bucket. There’s a sync server on the Alertmanager cluster to check for new config and automatically reload Alertmanager in case of any config updates."
I would be curious is this syncing being done by something custom they wrote or is this something built into Alertmanager now? I had look at the latest Alertmanager docs and nothing jumped out at me regarding this.
Yes, we basically have a short lived periodic job on our Alertmangaer server which does a `sync` from the S3 bucket which stores the configs. If something has changed, just trigger a `docker restart alertmanager` basically.
Alertmanager doesn't have any solution for this, although if you decide to use the Alertmanager using Prometheus Operator, then you get automatic config reloads. We decided to keep our AM cluster independent and out of K8s, as we pointed out we have a hybrid environment.
Alertmanager can run in a clustered mode with `--cluster*` flags. You can read more here[1]. So basically mutliple nodes can run and all Prometheus instances are configured to send all alerts to all AM instances. The alerts are deduplicated at cluster level.
Victoria Metrics right now runs as a single node setup. That works out for now, because as mentioned Prometheus maintains a WAL. So even if the Remote storage is down for sometime, the core functionality of alerts isn't affected. And we've setup alerts on VM health, so if it goes down, we've to act on it and get it back up. Once that happens, all the previous data also is ingested automatically.
This is something that we would surely revisit on at a later point of time, and setup a proper HA on it if needed. :)
Have tried to use the API in Python for algorithmic trading but have struggled to come up with a simple solution that can do the backtesting and live component together. Nithin had previously mentioned that this market is too small for them to spend time on, but any leads on building this from the tech team? Blog post would definitely help.
So, we maintain labels `prometheus_replica` which prints the pod name of the Prometheus. Since the operator deploys it as a Statefulset, the pods are ordered, so the naming will be like prometheus-k8s-0,-1 and so on. Having these labels help to deduplicate the metrics on the ingester/federation level.
Although yes you're right, it was a bit difficult to get it running out of the box with native Federation capabilities. A secondary long term storage like Victoria/Cortex/Thanos is something you can check out.
Angular has been known for breaking their syntax/API across major version upgrades and more often than not it leads to a complete rewrite of your app. We took the decision to rewrite and explored React, Vue etc. Even though Vue v2.x was still a new kid on the block, it turned out to have a relatively easier learning curve, bundle size was far lesser than Angular or React and the docs were amazing and was faster in our benchmarks. So we made this decision around 4 years back and quite happy so far.