
Prometheus: Monitoring for the next generation of cluster infrastructure - Artemis2
https://coreos.com/blog/coreos-and-prometheus-improve-cluster-monitoring.html
======
ymse
This is more a job ad than an article. I've wanted to try out Prometheus a
while, but can't figure out how to:

1\. make it highly available

2\. play nice with firewalls

If I deploy Prometheus outside a NAT, and want to monitor 100 physical
machines on the inside with node_exporter, as well as a dozen different
services, how to make these metrics available?

What if I have four identical NATed sites and want them all monitored by the
same outside Prometheus instance(s)?

~~~
XorNot
Prometheus can monitor via proxies. Or monitor via federated scraping.

Though it is an odd requirement to need to only have Prometheus outside a
firewall.

Edit: guessing from your post - the way I'd do it is run a Prometheus instance
at all 4 sites, and have them all federate each other. That way each site is
the HA redundancy for the others.

~~~
ymse
That's sensible, thanks.

What if the inside of the network is further segregated, and the Prometheus
instance does not have access to all endpoints. Can I use the push gateway as
a "proxy", or is a direct route required.

Sorry for the stupid questions. Now it's getting entirely academic of course
:)

Edit: just re-read your post and saw the proxy note. Time to read the
documentation again. This setup should make for a good blog post.

~~~
bbrazil
It's advised not to use the pushgateway in that fashion, it's for service-
level batch jobs - not trying to subvert your organization's network security
policies. See
[https://prometheus.io/docs/practices/pushing/](https://prometheus.io/docs/practices/pushing/)

------
jamescun
I'm still not convinced of "Google Infrastructure for Everyone Else". Google
built Google's infrastructure because that is what Google needed. Your CRUD
app probably doesn't need that.

~~~
ownagefool
True, but if you're hosting 52 crud apps, it becomes very appealing.

------
Dowwie
Do the comparisons from the Prometheus web site still apply? Examples on the
following page pertain to projects with a lot of ongoing activity, such as
InfluxDB:
[https://prometheus.io/docs/introduction/comparison/](https://prometheus.io/docs/introduction/comparison/)

The storage comparison really should be updated..

~~~
jrv
You're right. We've neglected that whole section for too long and it's a bit
out of date. I don't have enough knowledge about the current state of InfluxDB
though (besides clustering becoming a closed-source feature and well-indexed
tags now being a thing, as well as them branching out into more things like
dashboarding and metrics collection). Input welcome!

