

Riemann, a distributed systems monitor (built in Clojure) - whalesalad
http://aphyr.github.com/riemann/

======
snowwindwaves
i've been using mango <http://mango.serotoninsoftware.com> since 2006 to
monitor control systems, mostly for small hydro power plants but also
communications networks, smart homes, solar arrays, etc. Unfortunately
development on the freely available source has stopped so I no longer get a
better product every year.

Bigger projects i've used either citect or wonderware, both of which get the
job done but show their age and are at times painful to use, although not as
horrible as many of the other legacy control system HMI software out there.

Mostly data is collected by polling modbus slaves, although OPC would be
another important protocol to support.

It seems like wonderare or citect is ready to be replaced by a distributed
system that uses the web browser as the display client. systems monitoring
software such as nagios, openNMS, cacti overlaps with the control system HMI
software arena as both

1\. display real time data, preferably with some context (eg gauges to
indicate how close to maximum or minimum limit, alarm or shutdown thresholds
the variable is) and sometimes overlayed on a diagram to assist in visualizing
or understanding the process

2\. "trend" (log and plot) historical data for analysis and reporting
purposes. Better yet would be interactive plotting (zoom etc).

I've been wondering about graphite (which riemann can use) as part of the
solution, and people seem to be producing great plots with d3.js. As an aside
I've used kst and veusz for desktop interactive plotting with success.

In summary: if riemann supported the modbus protocol it could be useful for
control systems.

~~~
aphyr
1\. The dashboard is in a rough spot right now--I haven't quite finished the
transition to the next-gen dash--but it does do "realtime" visualization of
events matching arbitrary queries with under 50-ms end-to-end latency. It'll
push about a thousand events/sec, depending on size and rendering complexity.
The websocket protocol is pretty straightforward, if you wanted to build one-
off system diagrams with streaming updates.

2\. Yeah, that would be great. The historical event store space is pretty
terrible right now, and it's such a big problem that I doubt I could
realistically tackle it. Librato Metrics and Hosted Graphite are both
approaching this as a service, and there's openTSDB if you have Hadoop people
in-staff. Riemann has out-of-the-box integration with librato and graphite,
but I haven't set up an openTSDB cluster yet.

Modbus: that'd be cool. Implementing a Riemann server (i.e. a thing that
accepts events from the wire) is pretty straightforward, though I'd need to
understand the protocol. If you're interested in building it, I'm happy to
discuss how.

------
aphyr
Thanks for your interest everyone. I'll try to answer any questions here.
Going through some rough health stuff at the moment so I won't be on IRC, but
I do read the backlog and will respond when I get a chance. Cheers! :)

------
dgtized
I'm having a little trouble following your examples. In particular some of the
examples are wrapped in (streams) which I am interpreting as the data source
to query, but then many of the examples are just a bare (where) with clause.
Or is that the final target ie you wrap it in streams if you want to make a
new stream? The system looks pretty slick, but I am having some problems with
understanding some of the core concepts in the query DSL.

~~~
aphyr
Ah, yeah I should standardize the docs a bit. It'll be clearer when you've
looked at the stock config: (streams ...) just denotes the section of the
config where streams live. Since most streams are composable, I sometimes omit
the context.

It might be easier to think of streams as literal streams, rivulets, deltas,
and tributaries, which events flow through, rather than a query language with
well-defined clauses like sql.

------
jondot
Looks great, I'd love to read the code. Any idea if this is in the same space
as Esper?

~~~
necrodome
From the mouth of the project's owner:

 _From skimming the docs, it looks like Esper is much bigger, much cooler, and
with a more abstract version of events. It implements a lot of the primitives
I've been considering but haven't built yet. It looks more difficult to set
up, and has a commercial offering for support and HA; neither of which are
present in Riemann right now._ [1]

I recently finished building a monitoring system using Esper and JRuby since
the client asked specifically for that, but I wished I had used Riemann from
the beginning.

[1] [https://groups.google.com/forum/#!msg/riemann-
users/GhVMYJow...](https://groups.google.com/forum/#!msg/riemann-
users/GhVMYJowfe4/Me-6a7OQ8U0J)

~~~
jondot
Thanks

------
astine
I built something like this once for a client, also in Clojure. I'm going to
download and read the code to see how it compares.

------
Heliosmaster
Am I the only troubled by the name? Riemann's interest were far from
monitoring systems...

~~~
aphyr
Riemann originated as a system for discrete calculus over metric streams; e.g.
Riemann sums.

------
gjvc
systems monitors are often in themselves too complex :-(

~~~
aphyr
You're right to be concerned about complexity: simple things are easy to
understand, easy to predict, and easy to change.

That said, I think you'll find many of the ideas in Riemann to be radically
simple. The config file is just a Clojure program. Streams are just functions
that take events. Events are just maps of keys to values. Everything is an
event: there is no concept of a first-class host or service, no need to update
the config when you add a host, and no poller loops.

Riemann tries to draw strong boundaries between the different layers of
monitoring. It speaks a simple network protocol and interoperates with other
systems for event collection, visualization, alerting, and storage, instead of
building in those systems. In many ways Riemann is defined not by what it
includes, but by what it leaves out.

That said, there's a lot of work required to make simple abstractions behave
correctly, especially around IO and error handling. Wherever possible I try to
draw clear internal boundaries to isolate this complexity, but it's still
there. If you have specific complaints about code or interfaces which seem too
complex to you, I'd be happy to try and explain or change them.

~~~
pjscott
The tone around here is often negative, but I want to give you a huge
compliment: _you understand simplicity._

Reducing hairy things to simple abstractions can save weeks of work in a
matter of minutes. It is the single most powerful programming technique I know
of. And I'm going to seriously consider switching a bunch of stuff over to
Riemann.

~~~
aphyr
Thank you. :)

------
d--b
This looks pretty cool. Has anyone tried it and liked it?

~~~
ispolin
I use it (and like it) to monitor a middleware we wrote. The developer, Kyle
Kingsbury, has a pretty good talk about it here: <http://vimeo.com/45807716>

