

Ask HN: What monitoring/management tools do you use? - chuhnk

I am a system administrator who has been working at my first and only job for the past 3 years. I manage 9 live servers and 25-30 internally. We use LAMP for live/staging and CentOS or Redhat everywhere.<p>This being my one and only job I've had to rely on a lot of reading around the internet to find the best management and monitoring tools. I like mmonit/monit for health checking running services which alert me via sms, email and xmpp when there are failures. I use chef for automated deployment of new software, Anthill for code releases and a whole lot of bash scripting for backups.<p>I was wondering what management/monitoring tools you guys use to manage large clusters of servers? And if you knew of what the big players like google, twitter, amazon and ebay use. It would be interesting to see the difference between tools used to manage 10 servers vs 1000.<p>Thanks HN
======
scorchin
I work for a web development firm and we handle a lot of large intranets for
big internationals. Most of the systems are built using combinations (pick
two) of PHP, JSP, Ruby, Perl, MySQL and Oracle.

The servers (total around 60) tend to be spread across different continents
and it always helps to have as much information as possible.

We were originally using Nagios, but found that it was lacking some of the
tools that we required. Instead we switched over to Zabbix and have hooked
into it using a custom PHP/Python application.

This custom app just provides a worldwide view of traffic, points out specific
spike times and most importantly: shows the nearest commits to the master
repos when spikes occur.

The main reason for switching over to Zabbix was the documentation and API. It
made creating our own tools a lot easier.

Info that might be useful:

\- Running on CentOS

\- Apache webservers with nginx front-ends

\- Database servers work in teams of 3, with master-master replication and
then a slave.

------
ashishbharthi
For our java infrastructure we use Introscope. I dont know if they have
version for LAMP architecture. <http://www.ca.com/us/application-
management.aspx>

