

Ask HN: What tools do you use to monitor your LAMP server(s)? - olalonde

What tools do you use to monitor your web server(s)? More specifically, server uptime, resource usage (CPU, RAM, bandwidth, etc.), Apache, MySQL and PHP/Ruby/Python.
======
RyanGWU82
We have a cluster of about 60 servers, and use Monit and Ganglia to manage it.
Monit will alert you when certain problems occur -- for example, if the server
load is too high, if a service stopped running, or if can't reach the
database. Ganglia records data and graphs that data over time. We use it to
see how busy our server are getting, or how our peak resource usage compares
to off-peak. Monit is important for noticing a problem; Ganglia's useful for
diagnosing that problem.

That said, I'm a huge fan of Cloudkick. Their software is ridiculously easy to
install and use, and the web application makes it VERY easy to understand
what's going on with your services. It's perfect if you're just monitoring one
server -- it's free -- but even worth the money if you've got a whole cluster.
If I could do it all over again, I'd definitely go with Cloudkick.

~~~
staunch
You would have to pay $11,388 per year to use Cloudkick with 60 servers and
still lose all data >1 year.

I don't get the logic of paying that much for Cloudkick at all. Monitoring
tends to be a couple days work (at most) to setup perfectly and then very
little on-going. Hardly worth $949 every month IMHO.

~~~
sanswork
In my experience with running nagios it's never as easy as set and forget to
keep your monitoring system effective. Even more so when you're growing or
scaling and systems are changing more regularly. To me that $11,388 per year
is cheap compared to the costs of having a salaried employee spending more
time on a slower to implement solution.

If you don't have a full time sys admin it's also one of those things that is
very easy to put off or delay or just not keep on top of.

As for the data loss I can't think of a time I've ever wanted to see resource
statistics for over a year ago and down times and outages are kept in a
problem tracking system.

~~~
z2amiller
One thing I've done in the past to help prevent 'configuration rot' of nagios
configs is to hook it into the same files that drove our deployments - when
the deployment changes, the nagios config changes with it. Nagios supports
template-based configs so once the tests are developed it isn't too hard to
write some automation that spits out a config file. This worked pretty well
for a site with ~350 servers and devices (switches, APC rack PDU's) with ~3000
tests.

------
patio11
<http://scoutapp.com> again saved my bacon yesterday when my god monitoring
script for DelayedJob workers decided to pull an Ark with my processes ("let's
have two of everything!") and ran out of swap, bringing the server to a
virtual standstill. Scout sent me an email, the email rang my cell phone, and
I was able to recover before it ruined my best day of sales ever.

~~~
itsderek23
We actually just talked about the monitoring stack we use at Scout. There
isn't one do-it-all tool (if there is, it'd be pretty ugly).

Scout,Monit,Hoptoad,New Relic, and Pingdom:

[http://blog.scoutapp.com/articles/2010/10/19/monitor-
rails-c...](http://blog.scoutapp.com/articles/2010/10/19/monitor-rails-
cluster)

------
jedwhite
Nagios is great for alerts and problem monitoring. Cacti great for visualizing
performance over time. Both take a bit of time and effort to set up on most
popular Linux distros, but are well worth it. I'm using a base level Rackspace
cloud server and costs about $20/mo.

For a reasonably priced hosted solution, check out cloudkick.com. It is also
very good and let's you monitor ec2, Rackspace cloud servers, and your own
boxes with an agent installed.

------
kranner
I use munin to monitor <http://codeboff.in>. The stats page is up at
<http://codeboff.in:8080/> if you want to take a look.

Also I use Supervisor (<http://supervisord.org/>) to launch all my processes
and it is configured to send me an email if any of them grows beyond a memory
threshold and/or crashes (in which case it also attempts to restart it a few
times).

~~~
fnl
Thanks for the link to supervisorctl - didn't know that one - and it looks
pretty neat! :P

------
mrud
For alerts i am using nagios as it has all kind of checks for most services
integrated. Besides services i also monitor security updates (any security
upgrades available which are not installed) for debian hosts and resource
shortage of openvz containers.

I am also using monit for immediate actions like restarting web server or
checking some programs.

I manage both via puppet, that means I deploy a new host and the nagios
configuration gets adjusted automatically with the corresponding services.

For performance data there are several tools like cacti, munin or collectd.

\- Munin has iirc the problem thats it gets slowish as it polls all data from
hosts then generate the carts for all hosts. Only a problem if you have many
hosts...

\- Collectd is a quite nice and fast monitoring solution (supports updates
every second) and has advanced features like sending the data to multicast
addresses. The only imho major drawback is that there comes no good webui
bundled with collectd though many external uis exists.

EDIT: I forgot to mention icigna <http://www.icinga.org/> a fork of nagios
after some trademark problems iirc.

Monit and nagios are integrated with the ticketing system, e.g. recovery
message closes automatically the corresponding ticket.

~~~
taa
Can you share the relevant snippets from your puppet setup? I'm currently
looking at using puppet with nagios/monit to manage the 50+ servers that
recently became my responsibility.

~~~
mrud
If you start with puppet you should definitely have a look at the type
documentation <http://docs.puppetlabs.com/references/stable/type.html> It
helps a lot to see and understand the already integrated types for puppet.

For monit i am using the monit pattern from[1]. Quite simple but IWFM. Nagios
is a little bit more complicated. If you have any questions i added my
mailaddresse to my profile.

You need to use exported resources for the nagios part. Therefore you have to
enable Stored Configurations on the puppet server [2]. For the server part i
use:

    
    
      package { nagios3: ensure => present }
      service{nagios3:
        ensure => running,
        enable => true,
        require => Package[nagios3],
      }
      exec { "chmod_nagios":
        command => "/bin/chmod 644 /etc/nagios3/conf.d/*",
        refreshonly => true,
        notify => Service["nagios3"]
      }
      file { "/etc/nagios3/conf.d": ensure => directory }
    
      Nagios_host <<||>>    { notify => Exec["chmod_nagios"] }
      Nagios_service <<||>> { notify => Exec["chmod_nagios"] }
    
    
    

The client setup is quite easy, im using nagios-nrpe for the check. You have
to deploy your own configuration files, i omit them here.

    
    
      package { nagios-nrpe-server: ensure => installed }
    
      service { nagios-nrpe-server:
        ensure => running,
        require => Package[nagios-nrpe-server],
        pattern => "/usr/sbin/nrpe"
      }
      @@nagios_host { "host_$hostname":
        ensure => present,
        address => $ipaddress,
        host_name => $hostname,
        use => "generic-host",
        target => "/etc/nagios3/conf.d/host-$hostname.cfg",
      }
      @@nagios_service { "apt-${hostname}":
        ensure => present,
        use => "generic-service",
        host_name => $hostname,
        service_description => "apt check",
        target => "/etc/nagios3/conf.d/apt.cfg",
        check_command => "check_nrpe_1arg!check_apt"
      }
    
    
    
    

[1]
[http://projects.puppetlabs.com/projects/1/wiki/Monit_Pattern...](http://projects.puppetlabs.com/projects/1/wiki/Monit_Patterns)

[2]
[http://projects.puppetlabs.com/projects/puppet/wiki/Using_St...](http://projects.puppetlabs.com/projects/puppet/wiki/Using_Stored_Configuration)
You should not use sqlite, use mysql if you want to use the dashboard

    
    
      [puppetmasterd]
       storeconfigs = true
       dbadapter = sqlite3
       dblocation = /var/lib/puppet/storeconfigs.sqlite

------
cashmo777
I've tried many packages, have settled on Nagios + Cacti. Featureful, rock
solid, free, scales well, documentation galore. How can you beat that? A live
example -- Wikipedia uses Nagios for their monitoring solution:
[http://nagios.wikimedia.org/nagios/cgi-
bin/status.cgi?host=a...](http://nagios.wikimedia.org/nagios/cgi-
bin/status.cgi?host=all) , they monitor more than 2,000 services on 426 hosts.
At my college, Nagios has an icon for each host that links to a corresponding
wiki page.

------
619Cloud
We prefer to just let somebody else manage it: ServerDensity is the best we
have found. You can also use PagerDuty in tandem with ServerDensity.

------
krobertson
New Relic, Nagios, Cacti, Pingdom, few custom alerts/stat graphing (using
flot), alerts and on-call handling between all with PagerDuty

Also working on integrating Splunk for analyzing log files and alerting on a
few aspects.

------
viraptor
Munin for stats, nagios for reports / actions, monit for keeping things
running in general. I don't use monit for resource management, since it's just
not complex enough for some of the stuff I need to check.

I tried cacti once and god it's bad... It works, but the lack of a simple
overview of how it works / how it's supposed to be configured is just too
clear.

I'm keeping an eye on <http://www.shinken-monitoring.org/> \- might be worth
checking out in the future if I ever get into large-scale-nagios problems.

------
Sthorpe
STACK: MySQL, RoR, Nginx, Mongrel, Monit Fun. We use:

\- Newrelic

\- Engineyard: http monitor

\- Hoptoad for errors

\- Some custom daemon scripts that fire emails when background processes
aren't working.

------
there
custom network management system that does icmp polling and service checks,
snmp polling for bandwidth, cpu, memory, and server-specific data that is
exported server-side (a lot of it is gathered from log files) and collected
through snmp.

alerts through jabber, email, and sms. i setup a monitor and cheapo eeebox to
watch the data in realtime for my most important stuff.

<http://www.flickr.com/photos/symmetricalism/4435593589/>

------
lindvall
I've been very happy with Scout (<http://scoutapp.com/>) for monitoring. It's
a nice mix of standard monitoring and a pretty simple plugin system for custom
monitoring.

All of the plugins are written in Ruby and the interface for reporting data is
pretty simple.

I'm using it for everything from monitoring the KB/s that systems are swapping
(and alert if it's anything measurable) as well as custom things like job
queue depths.

------
jberryman
We use monit combined with a number of helper bash scripts running in cron.
For example, we have a cron job that does integrity checks on sqlite databases
and writes to a file if it encounters corruption. Monit is set to monitor that
file for size change and sends an alert if anything is written to it. That
kind of thing is a bit of a hack, but seems to be the way to do it if you are
using monit.

We also use pingdom for external uptime monitoring.

------
craighyde
You should check out Rigor for performance and uptime monitoring of your web
servers. It's a subscription service that goes a step beyond the basic
cpu/process details and lets you monitor the accessibility of individual pages
and transactions. They have a free trial that takes two minutes to set up at
<http://rigor.com>.

------
WALoeIII
Collectd + Visage (<http://github.com/auxesis/visage>) Monit + Monit
Aggregator (<http://github.com/mattfawcett/monit-aggregator>)

I've hacked on both a bit, but they're great starts for small clusters.

------
alanpca
I'm definitely biased, but I use the product that I develop, Netmon --
www.netmon.ca. It may be worth checking out for you.

------
MichaelGG
Opsview. <http://www.opsview.com/>

It's built on top of Nagios, and makes the entire process very easy and
straightforward to setup and run.

Edit: The community version is free, and I haven't run into any limitations on
it. The paid version offers a few extra modules.

------
jrockway
Logcheck. This lets mildly-important things annoy you until you fix them. Lots
of "failed login for root" everyday, so I changed my ssh server to only allow
logins as me. Lots of authentication failures for SMTP AUTH, so I enabled
fail2ban. Now I don't get any annoying emails, and my box is slightly more
secure.

------
alexweber
you can use Cactus to monitor server load/cpu/etc, I'm not sure about the
others...

~~~
ronnier
This is the program/code alexweber is referring too:

<http://www.cacti.net/>

------
shizcakes
Shocked at the lack of zabbix. Alerts, graphs, history, all in one place.
Install the agent and you get the basics for free - I just set up a demo of it
a couple days ago in 30 minutes flat.

~~~
CRASCH
Me too. This is what I use. It was dead simple to setup. It had great default
settings. Pfsense also has an agent package available for it and is a one
click setup. It was also very easy to setup custom scripts for monitoring my
custom back end software. I tried nagios, hobbit, opennms, and others recently
and zabbix worked best for me.

~~~
walterheck
I've been using zabbix for around 4 years now for all my monitoring needs, and
i love it. In fact, I love it so much, that I decided to launch a hosted
'zabbix as a service' company this year, check it out: <http://tribily.com>

------
runjake
Xymon (formerly known as Big Brother). Yeah it's old, but it's simple,
reliable and what I know. I've tried to switch to Nagios and OpManager, but
ended up back to Xymon.

I use Cacti for graphing. I have some big gripes with it, but I've invested a
lot of brain matter in it and it works.

<http://xymon.com>

<http://cacti.net>

<http://www.opmanager.com> (slow, unreliable j2ee app with a lot of features
and great tech support.)

------
fseek
For security monitoring: OSSEC (open source at ossec.net) and
<http://sucuri.net> (paid external mon)

------
staunch
Munin/monit

------
taa
I use pingdom.com for external monitoring, especially for response time
reports.

------
udfalkso
Munin

------
thirsteh
Nagios

------
ilium
Nagios is probably the best free option, although configuration can be pretty
complicated. I like AppFirst, it does most of the stuff you want out of the
box, and you can use any Nagios plugin to add specific functionality for
whatever else you want.

------
zackattack
host-tracker.com

