
Nagios vs Icinga: the story of one of the most heated forks in free software - edward
http://www.freesoftwaremagazine.com/articles/nagios_and_icinga
======
newsvatore
Nagios is great, but Icinga fixed some serious issues with it that were long-
standing. At least in the beginning, the reason for the fork was extremely
clear, and there was no doubt the fork made sense. As of right now, I'm not so
sure there's any huge advantage of Icinga 1.x over Nagios 3.x, other than a
better UI.

However, Icinga2 and icingaweb2 are both coming along very nicely. In a
position I just recently left, I was using Icinga2 in production, and I really
loved it for its performance and clustering capabilities. It really blows
Nagios out of the water in those aspects. When you configure the
active/passive "master" nodes in a datacenter, the checks get split between
the two nodes with only one updating the DB (per the docs), and in practice,
this seemed to keep the load very low for us, even with over 20,000 checks
polling at least once per minute.

I realize the story is more about history than a direct comparison between
Icinga 1.x and Nagios, but I just wanted to share our experience with anyone
who may be considering the two options - don't forget to consider Icinga2!

It should also be noted that I still really love Nagios, and the later
versions have resolved at least some of the issues we've had with it in the
past.

~~~
MichaelGG
Have you tried OpsView? They package up Nagios with an easy installer and nice
UI. Graphing gets enabled by default. It's all very slick and easy. I've used
it a couple times and it makes Nagios a far better system. OpsView is mainly
open source, though they have a commercial edition with extra features

~~~
moe
_OpsView is mainly open source, though they have a commercial edition with
extra features_

It's only free for up to 25 hosts. Then it costs at least $65/mo (for up to
250 hosts) _plus_ the fees for extra features.

~~~
MichaelGG
Oh, hmm I think that might be a new limitation then. (Which only makes sense
since they gotta make money somehow...)

~~~
voidz
25 hosts: free. 26 hosts: high costs.

Yeah, they could make that more appealing :)

~~~
dvanduzer
If you are worried about spending money when expanding to 26 hosts, you aren't
their market anyway. If you're worried about spending a few grand on
monitoring for around ~100 hosts or more, then you have a very strange
business model.

~~~
brianwawok
The problem is 26 bare metal servers is very different from 26 VPSes.

------
falcolas
We use Icinga2, after having used Nagios previously. Icinga2's configuration
files are a joy to use when compared to Nagios, and the interoperability with
Nagios Plugins is really nice.

On the other hand, we're getting really sick of every upgrade breaking our
environment (if it's not the post-install scripts choking on our use of a non-
default DB name, it's the configuration parser choking on previously-valid
configuration files, or...), and we wish that the new UI could catch up with
the capabilities of the classic UI.

We also wish we could pin a package version, but the Icinga devs remove the
previous version from their repo the moment the new package goes up. Our
solution to this will be to host the truncated packages on our own repo; we
just wish we didn't need to do that.

~~~
erikb
I had the same problem with other projects like that before. We were lucky on
finding older builds in some Linux distribution repo. Maybe you can also try
looking in Suse's or Debian's repos for them. Another, often used alternative
is automatically building an open source project for yourself. Especially if
you find that new versions build your system on a regular basis. Building it
yourself is the first step to get involved in the release process.

------
ashayh
Nagios and derivatives are a hassle in modern infrastructure.

There are no good APIs (add/remove/modify hosts or services sanely).

The configuration files are not easy to templatize, and hard to understand
even if you are able to programatically generate them.

They are even more clunky in a dynamic environment like AWS.

Check_mk is the sanest Nagios improvement add-on I've seen: [http://mathias-
kettner.com/check_mk.html](http://mathias-kettner.com/check_mk.html) And that
has its own issues.

------
krylon
I started working as a sysadmin about two years ago, and one of the first
things I did was set up Nagios to monitor our company network and devices. I
was only dimly aware of the Nagios/Icinga split, and since Nagios never gave
me any trouble, I have not given it much thought since.

But now I wonder - is there any technical reason to choose one over the other?

~~~
NovaS1X
I'm in your position right now.

I set up Nagios to monitor about 50 servers last year and now I'm almost done
switching to Icinga2. I think it's worth it if you have spare time, but the
major improvements I've actually noticed have been UX, clustering, and
graphite plugins. other than that it's Nagios with a nice GUI.

Not worth the switch if you have pressing issues at the moment.

------
impostervt
Site seems to be suffering, here's a cached link:

[http://webcache.googleusercontent.com/search?q=cache:www.fre...](http://webcache.googleusercontent.com/search?q=cache:www.freesoftwaremagazine.com/articles/nagios_and_icinga)

------
forgottenpass
In light of the recent hostile takeover of nagiosplugins by nagios from the
monitoringplugins folks, it puts the Nagios guy's statements that amount to
"they just didn't want to work with us" in a very different light.

------
erikb
In general what would you support and (very important why): Someone who
focusses on a clean architecture even if it means slower development and less
features, or someone who will integrate nearly everything even if it means the
code base, build process etc are a mess?

My feeling always says the first, but all my successes (which I define as
"someone was getting his job done using my open source tool") came from the
latter. So I wonder if my sample size is simply too small, or if I was too
naive in the first place, thinking clean architecture is good and important.

~~~
MaulingMonkey
> Someone who focusses on a clean architecture even if it means slower
> development and less features

This kills the company when their competition outdoes them.

> or someone who will integrate nearly everything even if it means the code
> base, build process etc are a mess?

This kills the company slightly slower, when technical debt means slower
development and less features in the long run, allowing their competition to
outdo them.

> My feeling always says the first, but all my successes (which I define as
> "someone was getting his job done using my open source tool") came from the
> latter. So I wonder if my sample size is simply too small, or if I was too
> naive in the first place, thinking clean architecture is good and important.

It's also possible your bar for "clean architecture" is high, or different
from what others would consider "clean architecture".

I have written some terrible, terrible code, that hasn't caused major
problems. Either because it was replaced with a proper solution before growing
too unwieldy, or because it was contained well enough that it never became
unwieldy in the first place. You could say that even though the micro-
architecture was terrible, the macro-architecture was at least acceptable -
which IMO is the vastly more important of the two to keep "proper" for long
term maintenance. Although one has to be wary of a system growing until the
micro-architecture becomes macro-architecture, if the former is of poor
quality.

------
VLM
The summary is when you fork internationally you don't really need to follow
trademark law in practice although it'll generate a lot of heat and forkers
will end up having to follow the law sooner or later so the lesson from the
article is forkers are better off in the long run following trademark law from
the start of their fork...

The fundamental argument that generated the fork was the community permanently
diverged between simple and complicated and the primary maintainer decided to
ally with the simple team, probably from a strong historical "eat your own
dogfood" culture. Maybe rephrased some people have to ping 10K boxes per
minute and other people have to check 1000 services or files or aspects on 10
boxes and its not REALLY possible for the same codebase to serve both groups
equally well in actual real world practice.

Yo, heres some patches to make your screwdriver(tm) more hammer-ish to help
professional nail installers work somewhat more effectively. What are you
crazy the wood screw installation professionals such as myself will not
tolerate that, patch denied. OK we'll fork and BTW we're taking the name with
us so on the other side of the planet, people will be installing nails using
our hammer shaped fork of Screwdriver(tm) what could possibly go wrong? Oh no
you won't, at least with my trademarked name. (consults with lawyers inserted
here) Um, yeah I guess you're right we're gonna rename our hammer shaped
screwdriver the Hammer(tm). And years later the worlds carpenters happily use
Screwdriver(tm) and Hammer(tm) and sometimes both at the same jobsite without
knowing the somewhat contentious history of those different tools.

The two user communities have little if any overlap. This near total lack of
overlap lets the two communities live together in peace and plugins generally
interoperate etc etc. More like relations between the Word and Excel
communities than like Android vs iPhone communities. Aside from that exciting
little trademark spat a long time ago relations have always appeared pretty
calm and cool.

I admin'd a large nagios system a bit more than a decade ago that basically
pinged the heck out of a large number of customers using configs generated by
polling our own network devices automatically (read only access, auto
generated configs, etc). It depended on the system being simple stable
reliable and fast, it was always speed limited in deployment. I needed Oracle
support like I needed a hole in head, so new complicated slow unstable
features held negative appeal for that implementation. So I stuck with nagios,
although I can totally understand a user with a completely different use case
needing icinga. I wonder if that system is still running at ex-employer?
Probably. It was pretty bulletproof software.

~~~
scottlamb
> Maybe rephrased some people have to ping 10K boxes per minute and other
> people have to check 1000 services or files or aspects on 10 boxes and its
> not REALLY possible for the same codebase to serve both groups equally well
> in actual real world practice.

Really? Why? The first thing seems quite easy all around. Configuration-wise
one would expect it to be simple. Computation-wise, pinging 10K boxes per
minute means sending <200 packets per second (and receiving at most the same
number). Maybe 3X that or so to get multiple samples. That's nothing for a
modern machine.

The second one obviously requires some more complexity but I don't know why
the extensible configuration and plugin system it'd require would have to
interfere with simply pinging a bunch of machines. Computation-wise, again
sending <200 requests per second should be trivial unless they're
spectacularly heavy-weight.

The Nagios plugin protocol does seem to be somewhat heavy-weight, though.
Looks like it still fork()+exec()s on every probe? [https://nagios-
plugins.org/doc/guidelines.html](https://nagios-
plugins.org/doc/guidelines.html) That seems like a much greater problem in
terms of computation than the actual probes themselves, particularly if probes
are written in a scripting language with non-trivial startup overhead. Doesn't
look like Icinga is any different? What significant architectural differences
exist between the two?

~~~
teh_klev
Nagios has an embedded Perl interpreter[0] so it's possible to avoid
fork()/exec() if performance is critical. There are some caveats about how to
use and code for it though [1].

[0]:
[http://nagios.sourceforge.net/docs/nagioscore/3/en/embeddedp...](http://nagios.sourceforge.net/docs/nagioscore/3/en/embeddedperl.html)

[1]:
[http://nagios.sourceforge.net/docs/nagioscore/3/en/epnplugin...](http://nagios.sourceforge.net/docs/nagioscore/3/en/epnplugins.html)

~~~
devonkim
The downsides and caveats are very similar to writing mod_perl based web apps
last I saw those docs. The downsides shouldn't be a problem for most Nagios
plugins though since they shouldn't require state hopefully.

------
teh_klev
Article should perhaps have "(2012)" in the title?

------
raffapen
Slightly reminding the Bukkit saga
([http://www.slideshare.net/RyanMichela/kicking-the-bukkit-
ana...](http://www.slideshare.net/RyanMichela/kicking-the-bukkit-anatomy-of-
an-open-source-meltdown)).

------
njharman
I have been using Nagios for very, very long time. Since just after first
english docs. I have never heard of (or don't remember) Icinga.

------
beagle3
A few years ago, I was comparing Nagios to Icinga (I think it was just
forked), but ultimately settled on Shinken, which is not a fork but is still
Nagios compatible (both plugins and configuration).

It's been working like a champ. I will be reinstalling that server soon - does
anyone have any insight on Shinken vs. Nagios or Icinga?

~~~
dvanduzer
I am curious why _configuration_ compatibility is a selling point, for you in
particular. Is there some popular inventory management tool out there that has
been cranking out Nagios configuration files?

~~~
tapoxi
I've been using NConf ([http://www.nconf.org/](http://www.nconf.org/)) to
manage a Nagios installation.

We migrated over to Icinga 1.x without much trouble. Our existing config still
worked, NConf still worked, plugins still worked.

~~~
dvanduzer
I've been unimpressed with NConf. It's useful for what it is; I even use
phpMyAdmin occasionally.

Shinken sounds interesting, but I'd never heard of it. What I'm trying to
understand is why anyone would have so much invested in their Nagios
configuration syntax. If you have more than a few dozen hosts, you are
probably already using a higher level configuration abstraction. NConf, for
example.

But once you stop counting servers by the dozens, you are hopefully using an
even higher level configuration abstraction for data center management. In
which case you either already committed to a Big Vendor years ago, or you'd
have no incentive to switch away from a legacy Nagios deployment.

Anyone who needs to replace a Nagios deployment _and prioritizes configuration
file level compatibility_ in their new monitoring platform sounds crazy to me.

(edit: plugin compatibility is all that anyone should care about, and even
that's just based on good decisions about uniform output)

~~~
beagle3
I am the guy who mentioned shinken[0].

I totally agree, and as I mentioned, configuration compatibility was NOT a
selling point at all - I just mentioned it as a property of shinken that makes
it (supposedly) a drop-in nagios replacement without even the need to
reconfigure.

I see that as a selling point for testing (not relevant to me because I didn't
have a previous nagios installation) - to test shinken for real, you just
install it (a couple of apt-gets) and run it - you don't need to configure it.

Once you've actually decided to commit to a new system, configuration
compatibility with your old system is much less important, of course (provided
it's not a tangled mess of 1000 hand edited files)

[0]
[http://en.wikipedia.org/wiki/Shinken_%28software%29](http://en.wikipedia.org/wiki/Shinken_%28software%29)

~~~
dvanduzer
I'm pretty sure everyone cares about plugin compatibility, and a few people
are impressed by the finesse involved in bringing configuration compatibility.

Your point about the ease of compatibility _testing_ is quite salient.

------
plq
I've been wondering: When you either have nagios or icinga, do you need any
other monitoring solution? I remember the FIFA live stream guys from a
previous article built their custom monitoring solution (named Sauron). Should
they have simply used one of these instead of reinventing the wheel?

------
christianbryant
I love the Nagios vs Icinga bug comparison chart here:

[https://wiki.icinga.org/display/Dev/Bug+and+Feature+Comparis...](https://wiki.icinga.org/display/Dev/Bug+and+Feature+Comparison)

~~~
bryanlarsen
From the article:

Ethan's comment about the wiki page above is quite critical: "This comparison
on the wiki is both inaccurate and skewed, as it contains incorrect bug data
and mixes new features that have not yet been implemented. This has apparently
been done with the intention of trying to make Nagios look bad in comparison
to Icinga."

~~~
thejrk
It's working as intended then.

------
louwrentius
This is very old, what's the current relevance?

~~~
darylteo
Possibly... Node.js/IO.js? Or Arduino.cc/Arduino.org

~~~
baldfat
I vote Arduino.cc/Arduino.org

BUT

ffmpeg vs libav was rediculus and Ubuntu used EDIT avcon and put a weird
warning on ffmpeg on it being deprecated and please install EDIT avcon just
got under my skin!!! [1]

[1] [http://stackoverflow.com/questions/9477115/what-are-the-
diff...](http://stackoverflow.com/questions/9477115/what-are-the-differences-
and-similarities-between-ffmpeg-libav-and-avconv/9477756#9477756)

~~~
baldfat
This program is not developed anymore and is only provided for compatibility.
Use avconv instead (see Changelog for the list of incompatible changes).

~~~
craigyk
seriously, still makes me mad when I think back on first encountering this.
especially since ffmpeg is superior in every way IMO.

------
marssaxman
I was not expecting "one of the most heated forks in free software" to involve
two pieces of software I had never before heard of; nor am I entirely sure,
after reading the article, what these packages actually _do_. Despite the
mystery, it's more interesting than reading yet another account of the emacs
vs xemacs split, or gcc vs egcs, which I suppose are drifting on back into
ancient history at this point.

~~~
oblio
I expect anyone working on web services what Nagios is (at least heard about
it). If you've never worked on web services you're excused :p

Nagios is a monitoring server. It's the thing that sends emails and other
notifications when various checks it runs periodically fail. Stuff like:
process dead, CPU 100%, disk IO through the roof, database not responding,
etc.

~~~
marssaxman
It's true, I have no experience with web services. Thanks for the explanation!

------
sarciszewski
"one of the most heated"

So it's not the coolest. Gotcha!

