
Grafana 4.0 with alerting is released - yobo
http://grafana.org/
======
torkelo
This release has been long in the making. We started on Alerting way back in
March this year and it's finally released! Read more about all the highlights
in the release here: [http://grafana.org/blog/2016/11/09/grafana-4.0-beta-
release/](http://grafana.org/blog/2016/11/09/grafana-4.0-beta-release/)

Oh, and if your in New York tomorrow, signup for GrafanaCon:
[http://grafanacon.org](http://grafanacon.org)

~~~
Mahn
Congrats on release! We've been using Grafana for a few years now, and
personally built-in alerting is the only thing I would have added. Some may
argue that this "violates" good concern separation practices, but honestly you
are going to be alerting about the same data you feed to Grafana, so at the
end of the day it makes a lot of sense. Call it a "two in one" if you will.
Either way this will make monitoring with Grafana much more streamlined.

~~~
zphds
Note that keeping them separate has a benefit that when your 'Visualization'
portal is down, your 'Alerting' systems are unaffected (and vice versa).

Collectd, telegraf. etc, can be configured to send the same metrics to your
favorite TSDB and Alerting system (like riemann) in parallel.

------
cheald
Influx + Telegraf + Grafana is such a simple, sweet stack. No work to
maintain, trivial to set up, I can ship just about anything I want into it,
and reporting is fast.

With alerting in place now, I'm even happier than ever. A huge thank you to
the Grafana team for solving a huge pain point!

~~~
mattkrea
What kind of volume are you sending into Influx? It crashed on me probably 5
times a day with only 100 requests per second.

~~~
cheald
Right now it looks like it's around 50/sec. A lot of data points get rolled up
by Telegraf on individual machines, and then it's shipped in via the UDP line
protocol. I've written much larger volumes, though, and never had an issue
with stability.

~~~
user5994461
If I may ask. How is UDP doing for you?

I checked my graphite setup once. We had 27% of metrics lost over UDP. That
was bad.

pro-tip: "netstat -anus" and look at the error counters.

~~~
cheald
About 4% err-to-received ratio. That's probably due to untuned UDP buffer
sizes though; despite dropped packets, we're getting enough information to
provide the information we need.

------
user5994461
Quick note for the ones who are tired of the giant clusterfuck of open-source
tools for monitoring + alerting + storage + other, which is no less than:

\- statsd

\- collectd

\- graphite

\- whisper

\- carbon

\- prometheus

\- grafana

\- seyren

\- riemann

\- nagios

\- icinga

\- zabbix

There are multiple modern SaaS software that will do all of that in a single
tool with better integrations, more polish, less work and no maintenance.

1) See [https://www.datadoghq.com](https://www.datadoghq.com) and last news
[https://techcrunch.com/2016/01/12/investors-feed-datadog-
a-h...](https://techcrunch.com/2016/01/12/investors-feed-datadog-a-
hefty-94-5-million-round/)

2) [https://signalfx.com/](https://signalfx.com/) and last news
[https://techcrunch.com/2015/03/12/signalfx-emerges-from-
stea...](https://techcrunch.com/2015/03/12/signalfx-emerges-from-stealth-to-
modernize-cloud-application-monitoring/)

3) [http://www.bmcsoftware.uk/it-
solutions/truesight.html](http://www.bmcsoftware.uk/it-
solutions/truesight.html) if you're not anti entreprisey (that was the
"Boundary" startup, bought by BMC a few years ago and integrated in their
offerings).

And don't think that they are "new" fancy tools. They've been around for many
years.

~~~
all_usernames
For installations of a few hundred instances or more, some of the SaaS
offerings cost more than the engineering salaries it would take to maintain
the OSS tools.

~~~
jrv
Case in point: [http://blog.runnable.com/post/153498635761/how-we-
saved-98-o...](http://blog.runnable.com/post/153498635761/how-we-saved-98-on-
infrastructure-monitoring)

~~~
otisg
Unfortunately, the post doesn't share things like: how much infra is needed
and how much does it cost, how much time it took to set up, how much
maintenance it needs, how long upgrades of the setup take, how much time
future hacking of missing features will take, and so on. After that sort of
stuff is truthfully taken into account I suspect most if not all savings would
be lost.

------
boazjohn
What's new in v4: [http://docs.grafana.org/guides/whats-new-
in-v4/](http://docs.grafana.org/guides/whats-new-in-v4/)

------
jhacobian
When you team Grafana up with a general purpose database like Crate.io some
pretty amazing things can happen. Not only can crate just "roll with the
punches" of auto-sharding whilst dynamically scaling performance over N number
of database nodes, it also possesses powerful aggregation capabilities. If
that weren't enough, crate also dynamically gzips data by default which is
impressive given its zippy performance.

You get all of this for free with Crate.io without giving up the flexibility
of a general purpose SQL database...

Wanna start storing log data in crate as well? No problem! Just design your
table schema, and API ingest layer (My favorite is NodeJS) but you can use any
language you like.

Or if security (facing the public) isn't an issue (if you're on a subnet safe
from the public internet) then you can certainly just use the built-in REST
API which crate exposes.

With Crate, I've been able to store hundreds of GB of systems log data without
worrying about silly things like table-bloat (the autosharding of partitioned
tables handles the spectre of bloated table shards for me for free).

Thanks to the amazing developers over at Crate.io for taking the best of
Elasticsearch and making it sane, fast, and chock-ful of SQL goodness!

Also a big thank you to the Grafana team for recognizing the potential
synergies that Crate.io & Grafana could catalyse for unifying time-series &
log data streams.

------
kawsper
Grafana really looks interesting, and it is interesting that you can add all
the different backends to it, for an example I didn't know you can use
Elasticsearch as a timeseries backend.

Is it correct that Grafana works best with Graphite? At least that seems to be
my impression, and it is a bit sad, since I think Graphite is cool, but it
really has a lot of moving parts.

~~~
jmedefind
I've used it with Influxdb, Elasticsearch, and Prometheus.

They all worked great. I can't think of any reason to use Graphite.

~~~
regecks
Indeed. We reduced our write iops by 95% by moving from Graphite/Carbon to
Influx. Try one of the newer databases!

------
pfranz
I'm currently using Prometheus, Grafana, and Alertmanager. I'm a big fan of
the linux terminal, versioned config files, and separation of concerns but the
rest of my team prefers web interfaces so I'm basically the only one
maintaining Alertmanager. Grafana Altering looks appealing.

What have other people had success with?

~~~
user5994461
I've had success with killing all the s __* free open source tools (Grafana,
graphite, prometheus, whisper, icinga, nagios, carbon, ganglia, influxdb,
zabbix...)

And using a single paid tool that does the job better AND doesn't kill me in
maintenance work.

See [https://www.datadoghq.com/](https://www.datadoghq.com/) as leader or
[https://signalfx.com/](https://signalfx.com/) as the second comer, or
[http://www.bmcsoftware.uk/it-
solutions/truesight.html](http://www.bmcsoftware.uk/it-
solutions/truesight.html) if you're enterprisey.

~~~
jmedefind
I don't see how anyone can afford SaaS metrics/alert services at any sort of
real scale.

$15/month/host gets expensive fast. Datadog doesn't start providing discounts
till you are at 1000+ hosts.

~~~
user5994461
All vendors provide discount if you negotiate. ;)

$15 * 500 hosts = $7500 per month.

If you think it's expensive, I can only advise you to check how much the
hardware will costs on EC2 to run the free tools, plus how much work it will
take to get the 8 different and independent OSS tools to work not only alone
but integrate together, plus how much additional work and maintenance to keep
it working without hiccups (war story: there is nothing worse than a
monitoring tool that is less reliable than the thing it monitors).

~~~
jmedefind
Oh I agree. That's why we ditched ec2 for our own bare metal cloud based on
joyent and saved over $200k/year

------
majewsky
True story: Our monitoring stack now has three distinct components with
alerting functionality.

~~~
raziel2p
We'll probably be in the same position. Grafana will make simple thresholds
easy to visualize, Kapacitor can do more advanced anomaly detection, and we
still need something like Sensu to do alerts that aren't really bound to
metrics - and it provides a dashboard of alerts. Kinda annoying, but it works,
I guess.

------
creatio
Anybody got tips on how to start with implementing an alert system? Or what to
read to get started?

------
pizza
I can't be the only one who laughed out loud while reading the ad for
GrafanaCon. It contains the word "democratization" and takes place on an
aircraft carrier..

------
poezn
Is anyone using a log management tool in conjunction with Grafana? I.e. if you
see something anomalous or see an alert triggered, how do you investigate
what's going on?

~~~
otisg
We've used Grafana with Sematext Logsene (which exposes Elasticsearch API, so
it's like having Grafana talk to ES).

Here's a short howto + video: [https://sematext.com/blog/2015/12/14/using-
grafana-with-elas...](https://sematext.com/blog/2015/12/14/using-grafana-with-
elasticsearch-for-log-analytics-2/)

------
RRRA
It'd be nice if this meant being able to use Grafana as a frontend to
alertmanager.

(Writing those "ALERT ..." requires a steep learning curve.)

~~~
thesorrow
This is exactly what I was thinking! I'd love to know what the dev of
Prometheus think about alerting in Grafana...

~~~
fidget
[https://twitter.com/fabxc/status/803870900097523712](https://twitter.com/fabxc/status/803870900097523712)

> I repeat: Your alerts and dashboards belong into your SCM, not a random SQL
> database!

(And I 100% agree, particularly for alerts)

------
smegel
Hmm didn't know this was written in Go. Seems like Go is doing quite well in
this space also with Bosun and Scollector.

~~~
katabatic
Go code compiles to compact, statically linked binaries with relatively
compact memory usage and reasonably good concurrency support, and an excellent
standard library for handling networking - it's a natural fit for monitoring
stacks. Even some monitoring systems that aren't Go on the back-end have Go-
based collectors.

------
tokenizerrr
Do users still have issues full access to data sources, regardless of what
dashboards they have access to? This is what keeps me from using Grafana to
expose some data to clients.

------
mcncfie
Congrats guys!

------
rsmets
I've had alerting via grafana built and deployed for the last 16 months. Not
sure what took so long... but cool to see it native now. Keep up the good
work.

------
yclept
Switch to Datadog and don't look back. Most valuable SAAS for my teams.

