Hacker News new | past | comments | ask | show | jobs | submit login

Another system focused on the wrong thing in monitoring: on alerts and charts. Those are merely methods of consuming data, not the only ones and not even the most important ones a decent monitoring system should do.

Sending e-mail or displaying a set of charts or a status table is simple. Allowing to collect, collate and aggregate the data (metrics and events) in arbitrary way, also as an afterthought, is what monitoring system should do. With virtually everything on the market, when a need for any processing not anticipated by monitoring system author arises, one needs to write much stuff outside said system.

We need less systems resembling invoicing systems and more systems resembling general purpose databases.

This is why monitoring still sucks.

Monitoring has been pretty stuck conceptually in a rut of being little more than a noisy, whiny service on a huge bus of data for a long time. But, the mere term monitoring is a basis for slightly to much more things like capacity management, demand management, closed-loop horizontal auto-scaling, security analysis, advanced predictive analytics in general, and of course the usual data-driven novel business insight that people keep spamming as "Big Data" crap. Treating your monitoring data as a data warehouse is somewhat relatively new in IT beyond the usual tech companies, surprising enough.

I agree. I want to do this:

(me) Tell me "What other abnormal activity is correlated to that peak in network activity?"

(monitoring) "Sure, servers srv1, srv5 and srv7 CPU activity peaked at the same time, and for the following 30 seconds, server syslog12 disk IO went crazy, mostly with repeats of the log message referenced below."

It's not clear to me what you're looking for. Most monitoring systems log to a general purpose database, so can't you query against that database when you want to analyze your data?

I admit I do very little systems/network administration work, so maybe there's something I'm missing.

Those monitoring systems use some backends, sure. RRD/Whisper, SQL databases, some even use NoSQLs. But all those are abstractions on inappropriate level with regard to monitoring. You can't do stream processing on those. You can't do processing in real time. The only viable operation is to query historic data; even then, it's hardly doable.

Also, you say I can query against such database. But does the monitoring system allow me to query that database? I have no documentation for it. Most of the time I can't easily fill events generated out of that back to the database. And at last, current so-called "state of the art" monitoring systems don't facilitate running custom queries, so I would need to write something totally external just to run the query.

Graylog2 and Riemann go a little in this way, but they stopped way, way too soon to be an answer to current state of monitoring systems.

What is your opinion on the ELK stack?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact