
Introducing Thanos: Prometheus at Scale - henridf
https://improbable.io/games/blog/thanos-prometheus-at-scale
======
sytse
Really great project. Instead of a fragile centralized approach Thanos
embraced the federated nature of Prometheus. We see us using this for
GitLab.com. I just discussed Thanos with Ben who works on the Prometheus
project itself and who leads monitoring for GitLab:
[https://youtu.be/JzlwwGZ3yQ4](https://youtu.be/JzlwwGZ3yQ4)

~~~
INTPenis
My interpretation is that Thanos is for massive scale, perhaps suitable for
gitlab but most organisations would manage with just prometheus to start with
and might scale up to thanos later.

Is this a correct observation?

~~~
witcher
Either that or just use basic Thanos setup (as described in a blog post) that
gives you better Prometheus HA support and global view. You can always add
setup for long-term metric retention later on.

------
bboreham
Great project - just wanted to mention Cortex which is a different take on
long-term large-scale Prometheus.

[https://github.com/weaveworks/cortex](https://github.com/weaveworks/cortex)

Best to read the design document [1] to learn more about the different choices
that Cortex makes.

Also no connotations of large-scale destruction :-)

(Note I work on Cortex)

1: [http://goo.gl/prdUYV](http://goo.gl/prdUYV)

~~~
audip
Is Weaveworks Cortex completely open source? Does it require Weave Cloud
subscription to post messages to Slack?

~~~
bboreham
Yes it is Open Source: the whole project is on GitHub, Apache 2 licenced.

Posting to Slack (and OpsGenie, PagerDuty, etc) is a feature of the Prometheus
AlertManager, which Cortex builds upon, no subscription required.

The commercial Weave Cloud product gives you a hosted instance of the same
code, storage - we ingest all your metrics and store them for a year - and a
nice GUI with user logins, team permissions, etc. Plus the Deploy and Explore
features which are hosted versions of two more Open Source projects.

------
irrational
Read the title - so disappointed this was not an Avengers - Aliens crossover.

~~~
stephengillie
If only it were hosted on _Pandora Nexus_.

(Unoriginal names are a pet peeve of mine. This is worse - it's "drafting"
like a racecar behind an unrelated recent marketing campaign.)

------
52-6F-62
Not directly related, but I just watched the demonstration video for SURVIVAL—
I am pretty blown away by the apparent scale of it.

I haven't heard of Improbable before now and it looks like so much work has
been done already.

Has anybody worked with SpatialOS here yet?

~~~
mwitkow
(disclaimer: I work at Improbable)

The reason why we built Thanos was to enable monitoring of large scale
simulation systems, which are inherently stateful, such as the Survival demo
(link [https://youtu.be/lGWON5TtS04](https://youtu.be/lGWON5TtS04)).

~~~
52-6F-62
Oh yes I already caught the demo.

So Thanos was your solution for building the energy economy and the larger
ecosystem? The energy simulation is what really struck me as a fundamentally
challenging element in a game. Maybe largely because in the first world we
don't really have to face that problem at its root anymore.

Frankly, I also want to know if there are plans to release something based on
that demo

~~~
mwitkow
We use Thanos to provide the observability features (monitoring in particular)
to Workers (i.e. user processes we run on our Cloud) that perform the
simulation. You can have multiple Workers collaborating on a simulation of the
economics or ecology that export monitoring variables that you want to track.

Since the simulation is inherently dynamic, and the number of Workers can
change, Thanos helps us with achieving the necessary scale and retention for a
hosted platform that is SpatialOS.

~~~
52-6F-62
I understand better now. I didn't read the post in detail, to be honest. I dug
into the rest a bit.

Well it all looks like a pretty immense amount of effort and work to put that
all together. I'm not a game dev, but I signed up to poke around. I've done a
little VR tinkering before, so I'm curious about the potential applications in
that realm as well.

I'm looking forward to see what comes of it.

------
Diederich
I would love to hear any experiences folks here have with this. We are
seriously looking at it right now.

~~~
tostaki
I suggest the second part of this talk from the last Kubecon
[https://www.youtube.com/watch?v=IpGfmmJ2hcw](https://www.youtube.com/watch?v=IpGfmmJ2hcw)

~~~
MetalMatze
Wow. Nice to see my talk here. If you have any questions, feel free to ask.

------
kintoandar
Thanos seems like the closest thing to a silver bullet for Prometheus missing
features (as by design).

Quick question:

In a multi Prometheus setup, if all the Thanos nodes are behind a load
balancer (without sticky sessions), do a particular query from a dashboard
interface like Grafana to that Load Balancer result in the same dataset, if
run multiple times?

~~~
witcher
If by "Thanos nodes" you mean Thanos querier instances, then yes -> Does not
matter to which one you actually ask. All have the same view and access to the
old metrics (Store Gateway) and fresh ones (Prometheus+Sidecar - Scraper)

~~~
kintoandar
Thanks, that's exactly what I was looking for

------
latchkey
Storing the Prometheus data in long term storage raises one question for me...
what is the process for upgrading the TSDB data format when it changes over
time?

~~~
fabr
The format already went through one format change since Thanos was started.
The format encodes a version itself and Thanos simply supports reading
multiple ones.

------
Dowwie
What app did you use to make those beautiful diagrams?

~~~
witcher
My hands + Google Drawing (: (blog post co-author here)

------
AaronM
Can grafana query Thanos directly? or will a datasource plugin be required?

~~~
tedreed
The Thanos query nodes have the same interface as Prometheus itself, including
the web UI (with a few small changes), so you can just use the same Prometheus
plugin pointed at Thanos.

------
nickthemagicman
The name Thanos is not the greatest branding right now.

~~~
ashleyn
Almost immediately, a joke about half my data disappearing came to mind.

~~~
ayekat
From Wikipedia:

> In the Western classical tradition, Prometheus became a figure who
> represented human striving, particularly the quest for scientific knowledge,
> and the risk of overreaching or unintended consequences. In particular, he
> was regarded in the Romantic era as embodying the _lone genius whose efforts
> to improve human existence could also result in tragedy_ [...]

... effectively, Thanos is this at large scale.

