llandy3d's comments

llandy3d · 2024-04-19T16:37:49

I don't know if I can consider my code "Great" but I dedicated way too many months on a prometheus library where I focused on quality since I did it for me.

It's relatively small and I think the main take away would be the use of Protocols for the pluggable backend system. I hope you get something out of it :)

https://github.com/Llandy3d/pytheus

llandy3d · 2023-07-05T23:20:29

Well, I never claimed that it was a super simple readme that explained everything you need to know :)

The target was clear, people that had issues with multiprocessing with the current ecosystem.

But as I noticed the interest growing, I also noticed a lack of simple guides on how to go from zero to have graphs in grafana for your metrics. That's why I started the "tutorial" section in the docs (pythe.us) initially for FastAPI to try to fill that void. I'm also considering making a full video tutorial to complement it.

So to recap, here I'm sharing the existence of this project so I can hopefully gather some feedback for the future direction. For beginners right now there is the "Quickstart" section in the documentation that guides you through monitoring latency in a flask application. It does not cover how to setup prometheus yet but I'm sure the prometheus docs can help with that for now :)

llandy3d · 2023-07-05T21:20:04

In theory the push gateway is to be used for batch/one of jobs, in limited cases because it had drawbacks

But yeah that's what I'm trying to solve in a different way, registries in different processes are separated but with the "backend" architecture, if they are the same metric they will end up in the same place. Redis can be considered the shared memory for all the processes that together make a single service, when you scrape one of the processes, you get back the correct value. This is possible because redis is single-threaded & the float increment operations are atomic. I'm sure not everyone might want to use redis, so the library implements a "Backend Protocol" that you just have to respect the interface and so you can build a different solution with different toolings, it's pluggable! (This made it extremely easy for me to implement a different backend in rust that might also remove the dependency on the redis client in the python side :) )

llandy3d · 2023-07-05T18:32:17

Correct, for multiprocessing support it uses Redis, the idea is that you have a sidecar with your service that will sync metrics between all of them for scrapes.

The default are blocking operations and they are fast, they are pipelined for scrapes (retrieving all the metrics value is the slowest part sped-up with this).

If you are curious there are some benchmarks tests that I used to make sure the library was correct comparing with the official one, in my tests this approacher is faster than the mmap files one but they are testing a really limited scenario so take it with a grain of salt: https://github.com/Llandy3d/pytheus-bench#the-results-

Correct for the TTL, if a metric doesn't get scraped/incremented for more than ttl, it will be cleaned up. 1 hour is the current default value.

To go a little further into details, the Rust based backend has a separate thread for changing metrics value, so on your python code, wether sync or async, the operation is extremely fast. On the Rust side, the operations are collected and pipelined together asynchronously. From the tests I've made this is enough, but if someone has an insane amount of metrics to modify, it is possible to add support for multiple "writer" threads, the important bit is that operations for a single metric are done in order, but this can be easily achieved by hashing with the number of threads. I hope this answer your questions!

llandy3d · 2023-07-05T16:58:30

The correct naming would be "prometheus client library" as per prometheus docs.

If you didn't have issues with multiprocessing that's great, but it seems like it's not uncommon for people to have issues with that approach, for example (the reason I started to experiment on this as a sideproject) in the k6 team at grafana we had the problem of too many files getting created and not properly getting cleaned up, we were forced to go into single process mode and with kubernetes that's fine. But if you are not running in kubernetes, I'm not sure you had a clean alternative :)

danpalmer · 2023-07-05T21:33:39

We ditched Prometheus in favour of Statsd before moving to Kubernetes. We had regular enough releases that we were losing significant data due to the scrape interval and felt a push model worked better for us. The push gateway had too many caveats and when we used it we felt we couldn’t trust the data coming from it.

llandy3d · 2023-07-05T16:44:44

I've probably cut the wrong word to make the title more concise, it's a prometheus client library. (https://prometheus.io/docs/instrumenting/clientlibs/)

This is the library to use in your python code to expose metrics so that Prometheus can scrape them. The diagram would definitely be useful, it started as something that "you already knew if you need it" but I noticed that there is more interest from people with no experience with these system, I was considering adding more information or making small guides/tutorial to ease into the why you need the monitoring, even for small applications since the price is small. Thanks!