
Cortex: a multi-tenant, horizontally scalable Prometheus-as-a-Service - biggestlou
https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/
======
sciurus
Looking at this architecture, I wonder if it would make sense to replace
Prometheus itself with something simpler? Normally Prometheus pulls metrics,
evaluates rules against them, stores them, and answers queries on them. It
looks like in Cortex the latter three functions are handled by other services,
and Prometheus is just pulling and forwarding metrics.

Also, the article says Cortex "completes" Prometheus. While it does fill many
of the gaps, there is still a big one: First-class support for pushing
metrics. This seems especially important if you want to offer Prometheus as
SaaS. Your customers won't want to run their own Prometheus installs (that's
what they pay you for, after all), and asking customers to open their
firewalls so that your Prometheus can discover and scrape their infrastructure
is probably a non-starter.

~~~
GauntletWizard
Prometheus is winning because push-based metrics are the wrong approach.
Integration with service discovery to tell if your metrics are operating or
not is, in my experience, the #2 or #3 preventer of outages.

~~~
comboy
I love Prometheus but in some cases push-based metrics have they advantages.
E.g. I'm using push based system in my home automation, where apart from
metrics (stats of temp and humidity) it also works as a state machine (door
open, switch off) so that I can build logic on top of that by listening to
changes. I think having realtime data helps in evaluating some server problems
too. Obviously everything depends on use case and not every solution scales
well.

Re having downtime indicator that's really a few lines of code, if no event
received in the last X seconds then the host is down or busy enough to need
attention. With websocket connection you can react to such downtime pretty
much immediately instead of waiting for the next poll.

I know there are workarounds but I really wish Prometheus would support some
push based system out of the box. It also helps with tracking simple scripts
that only run occasionally etc. And yes, I know, they provide workarounds, the
docs are really good.

It seems to me that having a realtime system with push based metrics is cheap
enough that we should have it already. Imagine ideal monitoring system of the
future. Does it poll and refresh data every so often or are you able to see
everything as it's happening?

------
SlowRobotAhead
Off topic, but Cortex is bad name.

There is already an amazingly popular tech usage of Cortex, you almost
certainly have one either on your person or feet from you night now.

For SEO reasons alone it makes no sense to call your new service by a well
known name belonging to someone else.

------
lima
What's the current state of the art for space-efficient long term Prometheus
storage?

PromHouse looks very promising: [https://github.com/Percona-
Lab/PromHouse](https://github.com/Percona-Lab/PromHouse)

~~~
csmarchbanks
Thanos ([https://github.com/improbable-
eng/thanos](https://github.com/improbable-eng/thanos)) or Cortex are both
great projects depending on your needs.

I would be wary of PromHouse since at a glance it seems to be storing each
timestamp in a separate row, which will not be as space efficient as either of
the above options.

Full disclosure, I am a maintainer of the Cortex project.

