Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Thanos: Prometheus at Scale (improbable.io)
177 points by henridf on May 18, 2018 | hide | past | favorite | 55 comments

Really great project. Instead of a fragile centralized approach Thanos embraced the federated nature of Prometheus. We see us using this for GitLab.com. I just discussed Thanos with Ben who works on the Prometheus project itself and who leads monitoring for GitLab: https://youtu.be/JzlwwGZ3yQ4

My interpretation is that Thanos is for massive scale, perhaps suitable for gitlab but most organisations would manage with just prometheus to start with and might scale up to thanos later.

Is this a correct observation?

Either that or just use basic Thanos setup (as described in a blog post) that gives you better Prometheus HA support and global view. You can always add setup for long-term metric retention later on.

Yes, a single Prometheus server can probably handle the scale of most SaaS applications.

Great project - just wanted to mention Cortex which is a different take on long-term large-scale Prometheus.


Best to read the design document [1] to learn more about the different choices that Cortex makes.

Also no connotations of large-scale destruction :-)

(Note I work on Cortex)

1: http://goo.gl/prdUYV

Is Weaveworks Cortex completely open source? Does it require Weave Cloud subscription to post messages to Slack?

Yes it is Open Source: the whole project is on GitHub, Apache 2 licenced.

Posting to Slack (and OpsGenie, PagerDuty, etc) is a feature of the Prometheus AlertManager, which Cortex builds upon, no subscription required.

The commercial Weave Cloud product gives you a hosted instance of the same code, storage - we ingest all your metrics and store them for a year - and a nice GUI with user logins, team permissions, etc. Plus the Deploy and Explore features which are hosted versions of two more Open Source projects.

Read the title - so disappointed this was not an Avengers - Aliens crossover.

If only it were hosted on Pandora Nexus.

(Unoriginal names are a pet peeve of mine. This is worse - it's "drafting" like a racecar behind an unrelated recent marketing campaign.)

Not directly related, but I just watched the demonstration video for SURVIVAL— I am pretty blown away by the apparent scale of it.

I haven't heard of Improbable before now and it looks like so much work has been done already.

Has anybody worked with SpatialOS here yet?

(disclaimer: I work at Improbable)

The reason why we built Thanos was to enable monitoring of large scale simulation systems, which are inherently stateful, such as the Survival demo (link https://youtu.be/lGWON5TtS04).

Oh yes I already caught the demo.

So Thanos was your solution for building the energy economy and the larger ecosystem? The energy simulation is what really struck me as a fundamentally challenging element in a game. Maybe largely because in the first world we don't really have to face that problem at its root anymore.

Frankly, I also want to know if there are plans to release something based on that demo

We use Thanos to provide the observability features (monitoring in particular) to Workers (i.e. user processes we run on our Cloud) that perform the simulation. You can have multiple Workers collaborating on a simulation of the economics or ecology that export monitoring variables that you want to track.

Since the simulation is inherently dynamic, and the number of Workers can change, Thanos helps us with achieving the necessary scale and retention for a hosted platform that is SpatialOS.

I understand better now. I didn't read the post in detail, to be honest. I dug into the rest a bit.

Well it all looks like a pretty immense amount of effort and work to put that all together. I'm not a game dev, but I signed up to poke around. I've done a little VR tinkering before, so I'm curious about the potential applications in that realm as well.

I'm looking forward to see what comes of it.

I would love to hear any experiences folks here have with this. We are seriously looking at it right now.

(disclaimer: Blog post co-author) You are welcome to join our growing community to know more. (: Follow slack join button here: https://github.com/improbable-eng/thanos

I suggest the second part of this talk from the last Kubecon https://www.youtube.com/watch?v=IpGfmmJ2hcw

Wow. Nice to see my talk here. If you have any questions, feel free to ask.

Fantastic, thanks!

Thanos seems like the closest thing to a silver bullet for Prometheus missing features (as by design).

Quick question:

In a multi Prometheus setup, if all the Thanos nodes are behind a load balancer (without sticky sessions), do a particular query from a dashboard interface like Grafana to that Load Balancer result in the same dataset, if run multiple times?

If by "Thanos nodes" you mean Thanos querier instances, then yes -> Does not matter to which one you actually ask. All have the same view and access to the old metrics (Store Gateway) and fresh ones (Prometheus+Sidecar - Scraper)

Thanks, that's exactly what I was looking for

Storing the Prometheus data in long term storage raises one question for me... what is the process for upgrading the TSDB data format when it changes over time?

The format already went through one format change since Thanos was started. The format encodes a version itself and Thanos simply supports reading multiple ones.

What app did you use to make those beautiful diagrams?

My hands + Google Drawing (: (blog post co-author here)

Can grafana query Thanos directly? or will a datasource plugin be required?

The Thanos query nodes have the same interface as Prometheus itself, including the web UI (with a few small changes), so you can just use the same Prometheus plugin pointed at Thanos.

The name Thanos is not the greatest branding right now.

Almost immediately, a joke about half my data disappearing came to mind.

From Wikipedia:

> In the Western classical tradition, Prometheus became a figure who represented human striving, particularly the quest for scientific knowledge, and the risk of overreaching or unintended consequences. In particular, he was regarded in the Romantic era as embodying the lone genius whose efforts to improve human existence could also result in tragedy [...]

... effectively, Thanos is this at large scale.

... Balanced. As all things should be.

or alternatively:

Mr. database engineer. I dont feel so good.

That and simple confusion. Also, it will get harder and harder to Google as time goes on especially with the next movie coming out.

Your data is using up too many resources!

I don’t know what a Thanos is, but upon first reading my brain said “Theranos”. Probably not a great brand typo-association either.

I did exactly this when I read the title, as well. I was very confused for a couple seconds.

Because of the Avengers or?

I mean it's the 5th highest grossing film of all time and still moving up.

does this account for inflation?

Sorry film is a strong word for it. Movie I should say.

How is 'film' a strong word? What's the difference between a 'film' and a 'movie'? I'd normally call them films myself. Does it mean anything else to you?

My guess is they're using "film" as a term for a movie with greater redeeming value, sort of like the "literature" distinction with books. Pedantically, Infinity War was likely not recorded on actual film.

If you consider yourself having more sophisticated, artistic taste in movies, you may reserve the term 'film' for those. For the movie snob, Casablanca is a film, Spiderman 4 is a movie. Nothing too wrong with making that separation in my mind.

I agree with the sentiment that Avengers may not be the most sophisticated movie out there. Most people know this and would agree. But randomly calling attention to it to make sure people notice that you have a sophisticated taste is a little pretentious.

> If you consider yourself having more sophisticated, artistic taste in movies, you may reserve the term 'film' for those. For the movie snob, Casablanca is a film, Spiderman 4 is a movie.

I had no idea this was a distinction. I just thought 'film' was what the British called them and 'movie' was the Americanism. We say 'I'm going to the see a film' rather than 'going to see a movie'.

The distinction is a snobby put-down of a popular film :)

Film is inappropriate for most modern movies because they're no longer filmed, they're digitally recorded and distributed. When there's no actual film stock used in the production of a movie, you shouldn't call it a film.

In the case of a Marvel movie, it's almost more accurate to call it an animation, but movie covers all production methods.

Incorrect. According to the dictionary, film is synonymous with motion picture, which is irrelevant as to the underlying medium. Also, people use it that way, so your pedantry is outdated and wrong.

Huh, funny thing, I can film something using my cellphone camera no problem. The power of language.

No, you can't. That's a malapropism that's become common as a result of non-technical people appropriating a technical term. Just because a lot of people say it, doesn't make it right.

It’s not a malapropism - that means accidentally using a word that sounds similar, not just using any incorrect word.

The irony is blinding.

Just because a lot of people say it, doesn't make it right.

Actually, that’s exactly how the English language works. There’s no governing body that matters who determines what counts as correct English. The closest thing is the dictionaries but those change all the time, in response to....people using words differently.

In some cases, that's true. But not when it's a technical term that has a specific technical meaning. A good example of this is psychological terminology. The general public has latched onto terms like depression and borderline but uses them in ways that are incompatible with the real diagnoses. This causes real issues for people suffering from those illnesses since lay people mistakenly believe they have some idea of what those people are dealing with. The psychological community is vastly outnumbered by the general public, but their definitions remain the official meaning of those words.

Similarly, we can't claim that centrifugal and centripetal forces are the same things just because more than 90% of the population uses the term centrifugal for both. Some words have specific meanings that don't change no matter how many ignorant idiots decide them mean something else.

Come on. Watching a film vs. a movie isn't a technical term, it's something that everyone uses all the time, and the dictionary disagrees with your pedantry. You're just wrong here.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact