
Vulcan: An API-compatible alternative to Prometheus - pandemicsyn
https://github.com/digitalocean/vulcan
======
agentgt
One thing I don't really like about Prometheus is it seems to prefer the pull
aka scraping model over the push model.

I think the push model is better in terms of security and discovery (which is
how I think most of the other metric aggregators work).

I don't even like log scrapping. I just push the log data through kafka or
rabbitmq and have something else pick it up.

I do like how Prometheus has a dimensional model instead of just raw
timeseries.

Speaking of which I still haven't found an effective way of merging or
correlating metric data with log data (particularly since it is two different
systems).

I sort of made some experimental headway with Druid since it kind of has
generic event and metric support. This was only possibly because the events
are being pushed and not pulled (ie pushing to a bus allows for syndication).

~~~
andrewstuart2
The #1 reason Prometheus went with pull versus push is that it's fundamentally
impossible to overload a pull-based system. If you have a push-based system,
there's a hard problem of knowing whether you're getting all the messages or
if system load or outages at scale are causing you to drop some metrics on the
floor (as was the case at SoundCloud which drove them to start Prometheus).

I'd recommend listening to their recent CNCF online meetup [1] for some more
of the background on push/pull and why they made the choices they did.

[1]
[https://www.youtube.com/watch?v=gfpG2_zIOvw](https://www.youtube.com/watch?v=gfpG2_zIOvw)

~~~
agentgt
I will for sure check it out but I have found the opposite to be true (well
not the opposite but a different problem). With pull I loose invaluable data
because either the pulling system is down or it just doesn't know about all
the nodes. Of course the converse could be said... what happens if nodes just
choose not to push. Consequently I made the nodes smarter. IF they can't push
they crash.

I guess it is architecture choice but honestly if our nodes can't push to the
bus (because this bus is more than just metrics that can be dropped it is
critical biz shit) it is a fatal state for us.

I think if you have a good enough bus like Kafka or RabbitMQ you can create a
big enough buffer to prevent absolute chaos of overload (and now you just
monitor queue size). If you see overload happening (ie massive queue) you can
selectively drop messages (particularly with RabbitMQ as it has routing). But
you are absolutely right that bad stuff can happen.

------
skybrian
I'm guessing the Prometheus they mean is a time-series database. See:
[https://prometheus.io/](https://prometheus.io/)

~~~
andrewstuart2
It's specifically a TSDB built for metrics and instrumentation. Basically
intended to be Google's Borgmon for everybody else in the same way Kubernetes
is essentially Borg for everybdoy else.

~~~
agentgt
It is more than TSDB because it has dimensions. Most metric systems (aka
Graphite) are just: <time> <name> <number> <maybe unit>.

It is sort of nice to have dimensions for rollup/filtering instead of doing
weird name mangling like graphite does. For example one might add a dimension
of build # or release version to each metric.

Having a dimension like build # you could even effectively say if your
releases are improving performance wise and graph it. TBH haven't done it yet
but I think it would be useful instead of just manually doing it with time
filtering.

~~~
zphds
We do this with telegraf + influxdb. It has support for tags in measurements
where we add the release version among other metadata which we can then
visualize in grafana with a simple group by in the query.

You can also shove in multiple related metrics as part of the same
measurement.

[https://docs.influxdata.com/influxdb/v0.13/write_protocols/l...](https://docs.influxdata.com/influxdb/v0.13/write_protocols/line/)

------
nixgeek
Kinda don't get why DigitalOcean "forked" rather than solving the long-term
retention problem by working with upstream, especially given the number of
comments which state "Development in this area should be done in Prometheus
first then merged into Vulcan".

Feels like another "We want our shed to be blue".

~~~
kanwisher
One of the devs here, we are actively working with the Prometheus devs. We are
all at Prometheus Conf this week. The upstream Prometheus will unlikely ever
have a distributed backend, however it will support projects like this

