Hacker News new | past | comments | ask | show | jobs | submit login
Chronosphere launches with $11M Series A to build scalable monitoring tool (techcrunch.com)
119 points by roskilli 7 days ago | hide | past | web | favorite | 35 comments






Congrats on the launch, Chronosphere is joining a fast-growing club of monitoring (and related) spinoffs from large Bay Area tech companies:

Uber -> Chronosphere Google -> Lightstep Facebook -> Honeycomb Twitter -> Buoyant (and Zipkin, OSS)


What I am not getting from my superficial knowledge is that why is Prometheus getting so much traction over elastic search. Elastic search claims to be as good for metrics and events. The ES database itself is more advanced with eventual consistency and search capability. It can do log analytics and it can be backend to tracing tool like Jaeger. Why so much investment in Prometheus. Disclaimer: I have not used Prometheus too much myself.

I think you'd be very hard pressed to scale an Elasticsearch cluster to 10s of millions of writes/s without breaking the bank (and even if you had a pile of money to light on fire I don't think an Elasticsearch cluster with the number of nodes you'd require to support that would work very well).

Elasticsearch is a great piece of technology and its very versatile which makes it a great fit for a lot of problems (Uber, where M3 was developed, is a heavy consumer of Elasticsearch for logging purposes for example), but for the types of metrics workloads and scale that M3/Prometheus were designed for Elasticsearch simply wouldn't work.


Es is very efficient with metrics especially in recent releases.

This might be true but Es has to overcome the history of burning their users via breaking changes, perf and reliability.

At their core these systems are basically specialized column stores, they have complete different read/write patterns to something like ES. The basic query unit for example is always going to be the scan, I'm not even aware of any monitoring system with some kind of secondary index capability. ES supports a bunch of nice result aggregation stuff on top of Lucene, whereas these systems are primarily /built for/ this use case

What's interesting about some of the more modern monitoring systems like M3 and Prometheus is that they have a reverse index on top of the column store entries to very quickly find the relevant metrics for a multi-dimensional query.

In fact M3 uses FST index segments, a common Apache Lucene segment which is used by ElasticSearch, for secondary index metric name and dimension full-text search capabilities: https://github.com/m3db/m3/tree/master/src/m3ninx/index/segm...


Is there a resource to learn about this more? Also, a general introduction about how to design indexes depending on read/write patterns?

elasticsearch is the one thing i've worked with that i've had to learn to pretend i know nothing about. you do not want to get labeled the expert on that thing. it's nice and finicky.

er, crap, i've outed myself


The technical details of their software are described in https://eng.uber.com/m3/

This looks like a competitor to Cortex (https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-ho...).


They took a different path on the ”never build your own database” question.

Another viable player that took the path similar to M3DB is VictoriaMetrics [1]. This allowed implementing various features [2] and optimizations [3] without the need to negotiate their integration into upstream Prometheus. Such negotiations can stuck forever. [4]

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/

[2] https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/Exte...

[3] https://medium.com/@valyala/measuring-vertical-scalability-f...

[4] https://github.com/prometheus/prometheus/issues/3746


> Chronoshere, a startup from two ex-Uber engineers, who helped create the open source M3 monitoring project to handle Uber-level scale, officially launched today with the goal of building a commercial company on top of the open source project.

I recall a thread here from 2-3 weeks ago about how “Uber-scale” wasn’t really Uber scale, and that most of these publicized “Uber-scale” projects ended up getting canned internally. Any insider insight to this M3 project?


Rob, co-founder and M3DB creator here, Uber collected billions of metric samples and we had tens of billions of metrics in M3 at Uber. Netflix for reference has not published any numbers higher than single digit billions of time series. The system has run in production for several years now at Uber now. That's my thoughts on the matter, hah.

New Relic touts collecting trillions of data points per day.

According to https://eng.uber.com/m3/

> Released in 2015, M3 now houses over 6.6 billion time series. M3 aggregates 500 million metrics per second and persists 20 million resulting metrics-per-second to storage globally (with M3DB), using a quorum write to persist each metric to three replicas in a region.

So, if that's accurate, they're collecting one trillion data points every two seconds.


So we collected and aggregated more than 1 billion samples of metrics per second, which resulted in writing more than 30-40 million unique metric datapoints per second to storage. This resulted in more than 10 billion unique time series being stored (each with a very large number of distinct datapoints each).

This was 3.6 trillion metric samples per hour or 2.5 trillion metric datapoints stored a day (after aggregating samples).


No, they're collecting one BILLION (with a b) data points every two seconds. Gotta go to 2000 seconds (a little over half an hour) for the TRILLION.

With a 25:1 reduction/summarization before writing. If they're smart, they do that summarization on the way in, rather than at the back-end layer. That's a billion data points written per minute, or a trillion and a half written per day!


Oops, don't know how I misread that! Thanks for the correction!

Congrats Martin and Rob on the launch. M3 is one of the best tools I used at Uber. Something which just works. I'm sure you guys will be successful as I first hand witnessed the value it brings to an organization.

Hey Rob a co-founder and M3DB creator here, more than happy to answer any queries anyone might have. We're committed on continuing M3 being 100% apache 2 licensed, clustering and all other M3 features included. We're focused on providing reliable metrics hosting at scale.

Congrats Rob! Great to see you guys weren’t just good indoor soccer players ;). The M3DB architecture doc was a good read

Haha TY for the kind words, I had to stop playing years ago now unfortunately with family commitments - also I mainly enjoyed playing on a team than actually developing real soccer skills (which I relied on others in the team to pull me upwards, heh)

Is the name inspired by the building in C&C Red Alert by any chance?

https://cnc.fandom.com/wiki/Chronosphere_(Red_Alert_3)

That’s my personal reference to the word, but searching around a bit, it seems that it was registered as a trademark by a medical company already in 1991, 5 years before Red Alert.

https://trademark.trademarkia.com/chronosphere-74147725.html


Congrats on the launch!

Metrics monitoring is hugely useful for figuring out what's going wrong (or right...) and where - especially when you can slice and dice by dimensions/tags. Microsoft (where I work) uses lots of metrics internally, for every sevice. It's nice to see M3/Chronosphere making this kind of thing more affordable and widely accessible.


One thing that I often miss when reading about this stuff is benchmarks. So it's faster than Prometheus? Prove it. So it's faster than Postgres, or TimescaleDB? Prove it.

It should be trivial, and the fact that it's not there and what you find instead is terms like "Uber-scale" is slightly worrying.

I'm not trying to take anything away from the achievements made here by the guys at Uber, but anyone seriously considering using this in production would probably need a better contrastive comparison between alternatives.


It's not raw speed or raw performance on a single node that M3DB is optimized for, it's for a reliable scale out story when you have a considerable number of instances required to collect the raw data you operate on (organizations of certain size/complexity run into this, not just a handful of organizations/companies).

Benchmarks tend to favor the authors and are frequently game-ified, look at GPU benchmarks like 3DMark that frequently had manufacturers release optimizations that were really only utilized in specific benchmarks.


> There weren’t any tools available on the market that could handle Uber’s scaling requirements

This isn't a problem that you can build a business around.

Edit: Ah, I get it. This is like a Mesosphere play--they're shepherding the M3 technology in the open source ecosystem and offering a commercial version. That makes more sense.


Really enjoyed using M3 at Uber. Great to see you guys continuing to support the open source community!

It just seems another clone of datadog

The marketplace for DaaS (dashboards as a service) is really big.

Splunk kool-aid drinker here; pardon my ignorant question, but why not just use Splunk?

Actually I think my real question is, why are there such a proliferation of these monitoring/logging/visualization -AAS startups? Who are the target customers, in terms of spends?


The most obvious answer is most of the new alternatives are open source and “free” while Splunk isn’t.

Congrats on the launch Rob and Martin. M3 is an amazing product which I had the privilege to use at Uber. Wish you two the best for your journey ahead!



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: