However every time I look at the site it seems more and more an abandonware. Only a quickstart and no proper documentation.
The documentation could use a lot of work, but do note that it now lives here:
However, why should I prefer Graphite over, for eg, Munin?
It's still open source but also has a proper documentation.
It doesn't support alarms, right?
You can write a monitoring system that queries Graphite, and send notifications from there. Graphite can give you the raw data that backs the graphs:
Why should I care about Graphite?
I guess for me the best part is the simplicity of the whole thing. Getting data in is just a simple TCP or UDP socket call (which you can do with almost anything, from nc to curl). Getting graphs out is a URL (albit a bit complex to create by hand :). Tying it all together is a simple, but functional, web interface.
Graphine solves the problem of graph generation and building dashboards in Graphite. By default Graphite builds PNG graphs which are expensive to build on the server and aren't dynamic. Graphine coverts these static PNG files to SVG and lets the browser do all the heavy lifting with regards to rendering.
It is, however, not RRD:
Having said that, Graphite does not use THE RRD, but it is a kind of round robin database. :-)
We have stupid bash scripts that run on cron every minute. The scripts collect all sorts of system data. Things like load average, iowait, data in, data out.
They also collect all sorts of info about our custom network daemon. They take all of this info and using netcat, echo and sometimes statsd shove this all into graphite.
Having all of this info gives us several advantages. When we notice an issue we are able to track it back in time. We notice an increase in CPU usage at the same time as a sharp increase in data in? Our daemon preformed a bunch of "foo" actions at the same time, well obviously something about "foo" action is killing the box.
In addition to historical data we also can do A/B testing from an operations perspective. Take a couple of boxes split into two groups, tune sysctl "y" to x value on all of group A boxes. Turn up the same load and see how sysctl "y" affects performance of the box.
Because graphite has such a simple api for importing data we can easily expand our scripts push them to the boxes and we have new sources of data. Little fuss.