
Simple Alerting for the ELK Stack - be_erik
https://github.com/Yelp/elastalert
======
chrissnell
We're using this. We're not running the full ELK stack; rather, we've replaced
Logstash with fluentd. Our devs have two choices for acting on their logs:
they can use elastalert (running in Kubernetes) to alert on events that are
recorded in Elasticsearch. We've provided a sample elastalert template that
can be easily customized to a developers needs.

They can also deploy a custom fluentd parser/transformer in Kubernetes. To
make this work, they apply a special label in their Kubernetes replication
controller that specifies the name of the custom fluentd parser service. The
primary fluentd service pulls the logs from Docker and when it detects this
label for a particular log entry, it routes that entry to the custom parser
service. This allows us to have a standard log pipeline that works out-of-the-
box for most projects but also _self-serve_ custom parsing for the apps that
need it.

------
sciurus
Stack Exchange's monitoring system, Bosun
([https://bosun.org/](https://bosun.org/)), also has support for alerting on
the results on Elasticsearch queries.

------
otterley
It sounds like they're using a search engine as a substitute for a proper
CEP/ESP like Apache Flink. Was the latter considered? I'm rather curious as to
why they chose a search engine instead.

~~~
twic
ELK is a very standard stack for collecting and disseminating logs. It's a
fairly natural, and i think sensible, step to build a pattern-based alerting
tool as a bag on the side of ELK.

If you were going to do it more properly, the first choice would probably be
Riemann, which has already seen some adoption in this role. There are already
plugins on both sides to forward events to the other:

[https://www.elastic.co/guide/en/logstash/current/plugins-
out...](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-
riemann.html)

[http://riemann.io/api/riemann.logstash.html](http://riemann.io/api/riemann.logstash.html)

~~~
otterley
With all respect to the very smart author of Riemann, as a practical matter,
its use of Clojure is simply not a hurdle most SRE types are going to overcome
to use it.

ElasticSearch is a great search engine. It's not a CEP or SEP. I don't mean to
discourage its use, but for this purpose it's not a good fit. It does make
sense to tee incoming logs to both as they suit different purposes, but I'd
delegate pattern analysis to the latter.

~~~
zenlikethat
Using Clojure, especially to slap together Riemann configs, is really not an
insurmountable obstacle, especially when something as useful as Riemann is on
the line. The more the self-defeating attitude that "Clojure's too hard"
propagates the less likely people will be to even try. Most SRE's are pretty
smart, I believe in them.

~~~
otterley
There are some great commercial options in this space like SignalFX and Sumo
Logic. The cost of Riemann's learning and implementation challenges would have
to be made significantly less for entire teams to make it a viable alternative
to those, in my experience.

If anyone runs a team of 25 or more SREs and effectively uses Riemann (and
there's more than 3 domain experts), and is not otherwise a Clojure shop, I'd
love to hear from you.

------
be_erik
Is anyone running this in production and care to share their experiences? This
seems to be a pretty good replacement for Splunk's alerting mechanisms in the
ELK stack, which has always been one of the parts I miss the most.

~~~
tzakrajs
I am using it right now in production. It's worst trait is that it is not
clustered and does not share state about what alerts have been sent already.
This means you can have no more than one agent running without having
duplication of alerts. This also complicates high-availability for obvious
reasons.

~~~
tzakrajs
(Correction/Public Shaming: Its)

------
yahyaheee
It's a nice product, just be aware that it is purely event driven. So if you
are trying to alert on no event then things get messy

~~~
Liuser
Unless I'm misunderstanding your comment, isn't that a flatline type of alert,
which is supported by Elastalert?

~~~
yahyaheee
No that is just if a value drops off, but that rule will never fire unless an
event occurs in elasticsearch matching that query. I found this out the hard
way

------
jonaf
How does this compare with the log stash alerting plugin that's available?
Does it work without log stash (i.e., just elastic search)?

~~~
be_erik
It works against elasticsearch and is somewhat agnostic as to how those
documents got there in the first place. I've been playing with it as part of
an EFK (fluentd instead of logstash) install.

