Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nagios vs Icinga: the story of one of the most heated forks in free software (freesoftwaremagazine.com)
80 points by edward on April 28, 2015 | hide | past | favorite | 49 comments


Nagios is great, but Icinga fixed some serious issues with it that were long-standing. At least in the beginning, the reason for the fork was extremely clear, and there was no doubt the fork made sense. As of right now, I'm not so sure there's any huge advantage of Icinga 1.x over Nagios 3.x, other than a better UI.

However, Icinga2 and icingaweb2 are both coming along very nicely. In a position I just recently left, I was using Icinga2 in production, and I really loved it for its performance and clustering capabilities. It really blows Nagios out of the water in those aspects. When you configure the active/passive "master" nodes in a datacenter, the checks get split between the two nodes with only one updating the DB (per the docs), and in practice, this seemed to keep the load very low for us, even with over 20,000 checks polling at least once per minute.

I realize the story is more about history than a direct comparison between Icinga 1.x and Nagios, but I just wanted to share our experience with anyone who may be considering the two options - don't forget to consider Icinga2!

It should also be noted that I still really love Nagios, and the later versions have resolved at least some of the issues we've had with it in the past.


Have you tried OpsView? They package up Nagios with an easy installer and nice UI. Graphing gets enabled by default. It's all very slick and easy. I've used it a couple times and it makes Nagios a far better system. OpsView is mainly open source, though they have a commercial edition with extra features


OpsView is mainly open source, though they have a commercial edition with extra features

It's only free for up to 25 hosts. Then it costs at least $65/mo (for up to 250 hosts) plus the fees for extra features.


Nope. Never seemed appealing to me, as I have easily installed Nagios with the graphing plugins and other added features in the past by hand - for a few companies at least.

One of the biggest wins of Icinga2 for Nagios shops is the fact that it can be sort of retrofitted to work with your existing Nagios hosts using NRPE.

Of course, NRPE is highly insecure and this is not recommended, blah blah, but it works, and I'm sure there are more than just a few people still using and older version of NRPE in production.


Oh, hmm I think that might be a new limitation then. (Which only makes sense since they gotta make money somehow...)


25 hosts: free. 26 hosts: high costs.

Yeah, they could make that more appealing :)


If you are worried about spending money when expanding to 26 hosts, you aren't their market anyway. If you're worried about spending a few grand on monitoring for around ~100 hosts or more, then you have a very strange business model.


The problem is 26 bare metal servers is very different from 26 VPSes.


We use Icinga2, after having used Nagios previously. Icinga2's configuration files are a joy to use when compared to Nagios, and the interoperability with Nagios Plugins is really nice.

On the other hand, we're getting really sick of every upgrade breaking our environment (if it's not the post-install scripts choking on our use of a non-default DB name, it's the configuration parser choking on previously-valid configuration files, or...), and we wish that the new UI could catch up with the capabilities of the classic UI.

We also wish we could pin a package version, but the Icinga devs remove the previous version from their repo the moment the new package goes up. Our solution to this will be to host the truncated packages on our own repo; we just wish we didn't need to do that.


I had the same problem with other projects like that before. We were lucky on finding older builds in some Linux distribution repo. Maybe you can also try looking in Suse's or Debian's repos for them. Another, often used alternative is automatically building an open source project for yourself. Especially if you find that new versions build your system on a regular basis. Building it yourself is the first step to get involved in the release process.


Nagios and derivatives are a hassle in modern infrastructure.

There are no good APIs (add/remove/modify hosts or services sanely).

The configuration files are not easy to templatize, and hard to understand even if you are able to programatically generate them.

They are even more clunky in a dynamic environment like AWS.

Check_mk is the sanest Nagios improvement add-on I've seen: http://mathias-kettner.com/check_mk.html And that has its own issues.


I started working as a sysadmin about two years ago, and one of the first things I did was set up Nagios to monitor our company network and devices. I was only dimly aware of the Nagios/Icinga split, and since Nagios never gave me any trouble, I have not given it much thought since.

But now I wonder - is there any technical reason to choose one over the other?


I'm in your position right now.

I set up Nagios to monitor about 50 servers last year and now I'm almost done switching to Icinga2. I think it's worth it if you have spare time, but the major improvements I've actually noticed have been UX, clustering, and graphite plugins. other than that it's Nagios with a nice GUI.

Not worth the switch if you have pressing issues at the moment.


Site seems to be suffering, here's a cached link:

http://webcache.googleusercontent.com/search?q=cache:www.fre...


In light of the recent hostile takeover of nagiosplugins by nagios from the monitoringplugins folks, it puts the Nagios guy's statements that amount to "they just didn't want to work with us" in a very different light.


In general what would you support and (very important why): Someone who focusses on a clean architecture even if it means slower development and less features, or someone who will integrate nearly everything even if it means the code base, build process etc are a mess?

My feeling always says the first, but all my successes (which I define as "someone was getting his job done using my open source tool") came from the latter. So I wonder if my sample size is simply too small, or if I was too naive in the first place, thinking clean architecture is good and important.


> Someone who focusses on a clean architecture even if it means slower development and less features

This kills the company when their competition outdoes them.

> or someone who will integrate nearly everything even if it means the code base, build process etc are a mess?

This kills the company slightly slower, when technical debt means slower development and less features in the long run, allowing their competition to outdo them.

> My feeling always says the first, but all my successes (which I define as "someone was getting his job done using my open source tool") came from the latter. So I wonder if my sample size is simply too small, or if I was too naive in the first place, thinking clean architecture is good and important.

It's also possible your bar for "clean architecture" is high, or different from what others would consider "clean architecture".

I have written some terrible, terrible code, that hasn't caused major problems. Either because it was replaced with a proper solution before growing too unwieldy, or because it was contained well enough that it never became unwieldy in the first place. You could say that even though the micro-architecture was terrible, the macro-architecture was at least acceptable - which IMO is the vastly more important of the two to keep "proper" for long term maintenance. Although one has to be wary of a system growing until the micro-architecture becomes macro-architecture, if the former is of poor quality.


The summary is when you fork internationally you don't really need to follow trademark law in practice although it'll generate a lot of heat and forkers will end up having to follow the law sooner or later so the lesson from the article is forkers are better off in the long run following trademark law from the start of their fork...

The fundamental argument that generated the fork was the community permanently diverged between simple and complicated and the primary maintainer decided to ally with the simple team, probably from a strong historical "eat your own dogfood" culture. Maybe rephrased some people have to ping 10K boxes per minute and other people have to check 1000 services or files or aspects on 10 boxes and its not REALLY possible for the same codebase to serve both groups equally well in actual real world practice.

Yo, heres some patches to make your screwdriver(tm) more hammer-ish to help professional nail installers work somewhat more effectively. What are you crazy the wood screw installation professionals such as myself will not tolerate that, patch denied. OK we'll fork and BTW we're taking the name with us so on the other side of the planet, people will be installing nails using our hammer shaped fork of Screwdriver(tm) what could possibly go wrong? Oh no you won't, at least with my trademarked name. (consults with lawyers inserted here) Um, yeah I guess you're right we're gonna rename our hammer shaped screwdriver the Hammer(tm). And years later the worlds carpenters happily use Screwdriver(tm) and Hammer(tm) and sometimes both at the same jobsite without knowing the somewhat contentious history of those different tools.

The two user communities have little if any overlap. This near total lack of overlap lets the two communities live together in peace and plugins generally interoperate etc etc. More like relations between the Word and Excel communities than like Android vs iPhone communities. Aside from that exciting little trademark spat a long time ago relations have always appeared pretty calm and cool.

I admin'd a large nagios system a bit more than a decade ago that basically pinged the heck out of a large number of customers using configs generated by polling our own network devices automatically (read only access, auto generated configs, etc). It depended on the system being simple stable reliable and fast, it was always speed limited in deployment. I needed Oracle support like I needed a hole in head, so new complicated slow unstable features held negative appeal for that implementation. So I stuck with nagios, although I can totally understand a user with a completely different use case needing icinga. I wonder if that system is still running at ex-employer? Probably. It was pretty bulletproof software.


> Maybe rephrased some people have to ping 10K boxes per minute and other people have to check 1000 services or files or aspects on 10 boxes and its not REALLY possible for the same codebase to serve both groups equally well in actual real world practice.

Really? Why? The first thing seems quite easy all around. Configuration-wise one would expect it to be simple. Computation-wise, pinging 10K boxes per minute means sending <200 packets per second (and receiving at most the same number). Maybe 3X that or so to get multiple samples. That's nothing for a modern machine.

The second one obviously requires some more complexity but I don't know why the extensible configuration and plugin system it'd require would have to interfere with simply pinging a bunch of machines. Computation-wise, again sending <200 requests per second should be trivial unless they're spectacularly heavy-weight.

The Nagios plugin protocol does seem to be somewhat heavy-weight, though. Looks like it still fork()+exec()s on every probe? https://nagios-plugins.org/doc/guidelines.html That seems like a much greater problem in terms of computation than the actual probes themselves, particularly if probes are written in a scripting language with non-trivial startup overhead. Doesn't look like Icinga is any different? What significant architectural differences exist between the two?


Nagios has an embedded Perl interpreter[0] so it's possible to avoid fork()/exec() if performance is critical. There are some caveats about how to use and code for it though [1].

[0]: http://nagios.sourceforge.net/docs/nagioscore/3/en/embeddedp...

[1]: http://nagios.sourceforge.net/docs/nagioscore/3/en/epnplugin...


The downsides and caveats are very similar to writing mod_perl based web apps last I saw those docs. The downsides shouldn't be a problem for most Nagios plugins though since they shouldn't require state hopefully.


Check out the passive check protocol. Nagios has tons of features like that which you need to understand in order to get it to scale.

The default out of the box config is something you monitor a handful of machines with, but Nagios is regularly run with 100k+ active checks. That requries some careful architecting.


Article should perhaps have "(2012)" in the title?



I have been using Nagios for very, very long time. Since just after first english docs. I have never heard of (or don't remember) Icinga.


A few years ago, I was comparing Nagios to Icinga (I think it was just forked), but ultimately settled on Shinken, which is not a fork but is still Nagios compatible (both plugins and configuration).

It's been working like a champ. I will be reinstalling that server soon - does anyone have any insight on Shinken vs. Nagios or Icinga?


I am curious why configuration compatibility is a selling point, for you in particular. Is there some popular inventory management tool out there that has been cranking out Nagios configuration files?


I've been using NConf (http://www.nconf.org/) to manage a Nagios installation.

We migrated over to Icinga 1.x without much trouble. Our existing config still worked, NConf still worked, plugins still worked.


I've been unimpressed with NConf. It's useful for what it is; I even use phpMyAdmin occasionally.

Shinken sounds interesting, but I'd never heard of it. What I'm trying to understand is why anyone would have so much invested in their Nagios configuration syntax. If you have more than a few dozen hosts, you are probably already using a higher level configuration abstraction. NConf, for example.

But once you stop counting servers by the dozens, you are hopefully using an even higher level configuration abstraction for data center management. In which case you either already committed to a Big Vendor years ago, or you'd have no incentive to switch away from a legacy Nagios deployment.

Anyone who needs to replace a Nagios deployment and prioritizes configuration file level compatibility in their new monitoring platform sounds crazy to me.

(edit: plugin compatibility is all that anyone should care about, and even that's just based on good decisions about uniform output)


I am the guy who mentioned shinken[0].

I totally agree, and as I mentioned, configuration compatibility was NOT a selling point at all - I just mentioned it as a property of shinken that makes it (supposedly) a drop-in nagios replacement without even the need to reconfigure.

I see that as a selling point for testing (not relevant to me because I didn't have a previous nagios installation) - to test shinken for real, you just install it (a couple of apt-gets) and run it - you don't need to configure it.

Once you've actually decided to commit to a new system, configuration compatibility with your old system is much less important, of course (provided it's not a tangled mess of 1000 hand edited files)

[0] http://en.wikipedia.org/wiki/Shinken_%28software%29


I'm pretty sure everyone cares about plugin compatibility, and a few people are impressed by the finesse involved in bringing configuration compatibility.

Your point about the ease of compatibility testing is quite salient.


It wasn't a selling point for me. I was just pointing it out.


I've been wondering: When you either have nagios or icinga, do you need any other monitoring solution? I remember the FIFA live stream guys from a previous article built their custom monitoring solution (named Sauron). Should they have simply used one of these instead of reinventing the wheel?


I love the Nagios vs Icinga bug comparison chart here:

https://wiki.icinga.org/display/Dev/Bug+and+Feature+Comparis...


From the article:

Ethan's comment about the wiki page above is quite critical: "This comparison on the wiki is both inaccurate and skewed, as it contains incorrect bug data and mixes new features that have not yet been implemented. This has apparently been done with the intention of trying to make Nagios look bad in comparison to Icinga."


It's working as intended then.


I wonder if that is really true. If it is, since that is a wiki and I assume if one were logged in they could edit, perhaps Ethan would like to... Of course, I can't take sides - we use Nagios at my current company and I'm pretty satisfied with it. We've used it for several year without major complaint.


Never ask a butcher if his meat is better or worse than another’s.


This is very old, what's the current relevance?


Possibly... Node.js/IO.js? Or Arduino.cc/Arduino.org


I vote Arduino.cc/Arduino.org

BUT

ffmpeg vs libav was rediculus and Ubuntu used EDIT avcon and put a weird warning on ffmpeg on it being deprecated and please install EDIT avcon just got under my skin!!! [1]

[1] http://stackoverflow.com/questions/9477115/what-are-the-diff...


This program is not developed anymore and is only provided for compatibility. Use avconv instead (see Changelog for the list of incompatible changes).


seriously, still makes me mad when I think back on first encountering this. especially since ffmpeg is superior in every way IMO.


I was not expecting "one of the most heated forks in free software" to involve two pieces of software I had never before heard of; nor am I entirely sure, after reading the article, what these packages actually do. Despite the mystery, it's more interesting than reading yet another account of the emacs vs xemacs split, or gcc vs egcs, which I suppose are drifting on back into ancient history at this point.


I expect anyone working on web services what Nagios is (at least heard about it). If you've never worked on web services you're excused :p

Nagios is a monitoring server. It's the thing that sends emails and other notifications when various checks it runs periodically fail. Stuff like: process dead, CPU 100%, disk IO through the roof, database not responding, etc.


It's true, I have no experience with web services. Thanks for the explanation!


FWIW Nagios and Icinga are monitoring solutions.

The plugins that were talked about so much are small pieces of software that check if enough free disk space or RAM are available, that processes are running, ports are open etc.

And then the core component aggregates the results from those checks, do alarming if configured, threshold checks (for example only five failed checks in a row result in alarming) and offer a web GUI where one can see all checks for a given host, get an overview of hosts with failing checks etc.

I didn't have any contact with monitoring tools before joining an ISP, which relies heavily on them.


Got it. Thanks. I can see how that would be useful.


"one of the most heated"

So it's not the coolest. Gotcha!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: