Hacker News new | past | comments | ask | show | jobs | submit login
Grafana Mimir and VictoriaMetrics: performance tests (victoriametrics.com)
58 points by nikolay_sivko on Sept 9, 2022 | hide | past | favorite | 16 comments



Interesting test, but I find some of these benchmarks kind of miss the point. Even Grafana's.

The appeal of Thanos/Cortex/Mimir is the long term object storage. The value isn't that it is simpler or cheaper to run. The value is that I can compare data to months if not years ago. It can cost much more than the price of the instances to store good metrics over time even when the data is rolled up.

Scaling the read/write path separately has a lot of benefits as well too, but I would guess that doesn't come up often for most folks.

How much telemetry you can get in/out of your system over a day is important, but how much you can get in/out of it over years is overlooked.


Both VictoriaMetrics and Grafana Mimir perfectly fit for long-term storage for Prometheus data. The difference is in the used data storage types - VictoriaMetrics stores data to persistent disks (aka block storage), while Grafana Mimir stores data to S3-like object storage. Both storage types - block storage and object storage - can be used for long-term storage. They have the following differences in the context of major cloud providers (AWS, GCP, Azure):

- Object storage space usually costs 2x-8x less than block storage space.

- Object storage has up to 100x highest latency for data access than block storage (hundreds of milliseconds for object storage vs milliseconds for block storage).

- Block storage usually has much lower network-related error rate comparing to object storagr. For example, it is quite common practice to retry reading data from object storage on network errors, while block storage-based filesystems are much more reliable for this aspect in major cloud providers.

- Cloud providers tend to charge every read operation for object storage, while reading from block storage is free. This point is usually overlooked when estimating costs for block storage vs object storage.

Given these differences, block storage usually provides better performance than object storage. Block storage also can cost less than object storage when the stored data is read frequently.

VictoriaMetrics is optimized for HDD-based block storage, so there is no need to use more expensive SSD-based block storage in most cases. Additionally, VictoriaMetrics compresses production metrics 2x-10x better than Prometheus-like solutions, which store data to object storage (Thanos, Cortex, Grafana Mimir). This also reduces long-term storage costs.

On top of this, enterprise version of VictoriaMetrics can be configured to downsample historical data, so it will take less disk space [1].

[1] https://docs.victoriametrics.com/#downsampling


To be fair, the benefits of "object store" it is scalable and bottomless while you have to play with EBS volume expansion etc. Some folks find managing fleet of EBS volumes not a big deal others find it problematic.

I think having "long term storage" on S3 compatible location is a way to go but you need ability to use local storage as cache to queries on recent data or just date range you're working with can be fast.


Agreed with this. That's why we at VictoriaMetrics are investigating a hybrid storage scheme - to store recently added data at block storage, while gradually moving older data from block storage to object storage in background. On the query side, the requested data should be transparently queried from both object storage and block storage.


> The value is that I can compare data to months if not years ago.

I'd say it is quite specific case. The most important data is recent data. Monitoring system should help you to identify current issues. Be reliable and performant, so you don't spend minutes waiting for response while your production is on fire.

Second in importance is data for last N days. The period when you analyze recent changes (updates, releases) or incidents. You want this data to be easy to get and pivot, changing queries ad-hoc and get results immediately. So root cause analysis won't take days of work.

Data older than month is rarely accessed. It is usually used for capacity planning, retrospective analysis - things which you do once in 3 months, or even once a year. Here, you can afford long, slow queries.

Both, VictoriaMetrics and Mimir, do a lot to provide fast access to the recent data: to get it stored and to get it ready for queries.


In my experience, the business use of retaining years-old metrics is overvalued, because changes in the system tend to prevent any kind of long-term apples-to-apples comparison. You’ll have changed your metrics engine, or your collector, or your tagging strategy, or your hosting strategy, or your containerization, or your deployments, or etc. etc. Even if you can find the like metrics from last year, you can’t trust that they meant the same thing then that they do now.


In industrial settings where the process does not change but equipment is overhauled it can be quite valuable To compare year over year


VictoriaMetrics core developer here. Benchmarks are non-trivial to run properly. Especially when conducting a benchmark against one of the competitor. We tried hard to create a benchmark based on production-like data. It collects data from the most frequently monitored scrape target in Prometheus ecosystem - node_exporter. At the same time the benchmark runs queries based on real-world alerting rules for the metrics collected from node_exporter. Resource usage (cpu, ram, disk io, disk space usage) is recorded during the benchmark, so later these metrics can be analyzed and compared. The benchmark runs for 24 hours, so it catches all the transitional states during this duration such as periodic data compactions. We hope the benchmark feamework will be re-used by others for comparing the performance and resource usage for Prometheus-like monitoring systems such as Cortex, M3DB, Thanos, Promscale, etc.


Been using victoria for both personal homelab and at work (~100m active TS, 15b index size). Nothing comes close to price/performance (resources-wise) ratio. The only thing i miss is rebalance/node drain, but you need to sacrifice something to gain something else.

Another issue is when people start learning about MetricsQL, once dashboards are written with extended PromQL, you can't migrate it easily. Not that i'm planning to do so in observable future.


You can always stick to pure PromQL only, to keep it compatible. Or maybe some day Prometheus (now Grafana) devs will pick up the extensions and introduce it to PromQL.


I've been using VictoriaMetrics for years already and I must say it's simply perfect.

It's not only about its performance but the entire ecosystem and the absolutely zero friction upgrade experience.

It's so, so good.

Happy to share notes if interested.


I just wanted to add my favourite feature which is how easy it is to have high availability.

I run all my Prometheus in pairs, each sending all metrics to two VM on-prem (Hetzner) servers. Here's the magic, each VM server will dedup metrics (remember, they are all coming from two Prom instances). This way you have a simple and reliable HA long retention storage.

I only deal with 500k active series but I'd bet I can deal with 10-20x more with the same setup.


Can you explain more about the dedup feature? Does it require some kind of unique id from the source? Is it done on ingest or on query? Is that something included with VictoriaMetrics or a separate addon?


It's a VM built-in feature at ingestion time. In case two independent Prometheus instances send the same metric within the same time interval, only the first one will be stored.

https://docs.victoriametrics.com/Single-server-VictoriaMetri...


How does it compare to Clickhouse?


VictoriaMetrics is based on ClickHouse ideas, but is specifically optimized for storing and querying floating-point time series with arbitrary sets of (key=value) tags. Such time series are also known as metrics or measurements. See [1], [2] and [3] for more details.

[1] https://medium.com/@valyala/how-victoriametrics-makes-instan...

[2] https://www.youtube.com/watch?v=p9qjb_yoBro

[3] https://faun.pub/victoriametrics-creating-the-best-remote-st...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: