Hacker News new | past | comments | ask | show | jobs | submit login
Observium: An auto-discovering network monitoring platform (observium.org)
17 points by based2 on March 22, 2015 | hide | past | favorite | 18 comments

Another system focused on the wrong thing in monitoring: on alerts and charts. Those are merely methods of consuming data, not the only ones and not even the most important ones a decent monitoring system should do.

Sending e-mail or displaying a set of charts or a status table is simple. Allowing to collect, collate and aggregate the data (metrics and events) in arbitrary way, also as an afterthought, is what monitoring system should do. With virtually everything on the market, when a need for any processing not anticipated by monitoring system author arises, one needs to write much stuff outside said system.

We need less systems resembling invoicing systems and more systems resembling general purpose databases.

This is why monitoring still sucks.

Monitoring has been pretty stuck conceptually in a rut of being little more than a noisy, whiny service on a huge bus of data for a long time. But, the mere term monitoring is a basis for slightly to much more things like capacity management, demand management, closed-loop horizontal auto-scaling, security analysis, advanced predictive analytics in general, and of course the usual data-driven novel business insight that people keep spamming as "Big Data" crap. Treating your monitoring data as a data warehouse is somewhat relatively new in IT beyond the usual tech companies, surprising enough.

I agree. I want to do this:

(me) Tell me "What other abnormal activity is correlated to that peak in network activity?"

(monitoring) "Sure, servers srv1, srv5 and srv7 CPU activity peaked at the same time, and for the following 30 seconds, server syslog12 disk IO went crazy, mostly with repeats of the log message referenced below."

It's not clear to me what you're looking for. Most monitoring systems log to a general purpose database, so can't you query against that database when you want to analyze your data?

I admit I do very little systems/network administration work, so maybe there's something I'm missing.

Those monitoring systems use some backends, sure. RRD/Whisper, SQL databases, some even use NoSQLs. But all those are abstractions on inappropriate level with regard to monitoring. You can't do stream processing on those. You can't do processing in real time. The only viable operation is to query historic data; even then, it's hardly doable.

Also, you say I can query against such database. But does the monitoring system allow me to query that database? I have no documentation for it. Most of the time I can't easily fill events generated out of that back to the database. And at last, current so-called "state of the art" monitoring systems don't facilitate running custom queries, so I would need to write something totally external just to run the query.

Graylog2 and Riemann go a little in this way, but they stopped way, way too soon to be an answer to current state of monitoring systems.

What is your opinion on the ELK stack?

This is a great free tool, just don't ask for something the developer doesn't agree with. He and the IRC community supporting it will tear you to shreds.

The thing about Observium is the feature set happens to select for a unusual number of unskilled+entitled+whiny users.

Observium's niche is user-friendly software for comprehensive monitoring of networks (and some server/appliance things). Setup is as easy as Wordpress and it will monitor pretty much everything on a network (ISP/carrier). Adding a device is pretty much one click, add hostname.

In exchange it expects things like having resolvable hostnames for devices and standardized port descriptions.

Compare this with Nagios or Cacti where you have to configure checks/graph collection for each thing you care about. The skill level required to set them up is a bit higher than Observium but still free.

Alternatively you have commercial software like Orion or Intermapper where setup may be easier but you have to pay for every metric you collect.

Because of this Observium gets a stream of vocal small MSP/ISP-type people who demand support and aren't willing to pay for it. I believe Obserivum went to a pay-for-access model in part to keep this under control.

I use Observium, pay for the subscription (~$200/yr is really trivial for this), and have contributed patches (that were committed) that fixed bugs and added device support (without any compensation aside from it making $DAYJOB easier). From the WISP community alone I saw probably half a year of "I won't buy a subscription until you support X device/X pet feature" from people who often came off as the client from hell in how they said it.

So, yeah, I don't like that I end up with a bit of anxiety over the reception my patches will get but I see the occasionally rough reaction as a immune response from a community that's under stress.

I agree with this, the developer on this is no doubt good at what he does, but him and the community that supports him is very poisonous. Does not take criticism well, and is generally hateful to people who say something he doesn't agree with.

Do you have any examples? I am not teasing here with you. Seriously I'd like to know what they do since I'm still looking for monitoring solution (currently testing munin).

I feel that it's getting more common recently.

…I'm not pointing in any direction… ducks

On their site they say you only have the "Ability to suggest new features" if you pay £150/year to obtain the Professional edition. Maybe their opinion changes if you throw money at them.

Not only that, I don't think the Community edition receives any new features at all.

The Community edition is updated every half a years and seems to receive the new features then but I can't verify it.

> The Observium Community Edition only receives critical security updates between 6-monthly release cycles and it is intended for small non-critical deployments, home use, evaluation or lab environments.

-- http://observium.org/wiki/Edition_Split#Observium_Community_...

Or send patches who the main developer doesn't agree with, you'll waste your time.

Give LibreNMS a try (www.librenms.org), it's a fork of Observium but with different values.

I do agree with dozzie though and that's something LibreNMS hasn't yet resolved which is being able to correlate data from different devices / systems to give you the one true answer of what's just gone wrong. I'd like to think we could get closer to being able to do that with more people contributing + some fundamental changes to the software.

Also check ntop out -- it has similar traffic reports, but no such fancy multibox support last time I looked.

That's more a job for SNMP clients anyway. Actually, just look at Nagios/Icinga :) It's a very mature and well-rooted infrastructure monitoring solution.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact