

Centralized Logging With Rsyslog - gmcquillan
http://blog.urbanairship.com/blog/2010/10/05/centralized-logging-using-rsyslog/

======
illumin8
I've had much better results from using Syslog-NG along with php-Syslog-NG.

Syslog-NG can already split your log files into subdirectories with the
hostname of each server, but it also has the capability of redirecting
messages to named pipes. This is great because you can pipe it into mysql and
stuff all of your log messages in a database. Combine with a php front-end and
now your developers and sysadmins can search logs intelligently across
multiple servers, and get really fine-grained on their search strings. Want to
tail the output from all Tomcat servers in your app server pool looking for a
specific string? Go right ahead.

------
sugarcode
One big advantage of rsyslog over syslog-ng is that you can spool messages to
disk if the remote syslog server is down (syslog-ng only offers this in their
'enterprise' paid version).

------
psadauskas
I've been pushing to implement this for our application, but I'm told that we
used to, and had to turn it off because it would saturate the IO of the
logging server.

Has anyone else experienced this? Is it just a simple configuration tuning
problem?

~~~
moe
_it would saturate the IO of the logging server._

Rather unlikely unless your deployment is very large or you're doing
extraordinarily expensive filtering/binning on the sink-host.

The first bottleneck is normally disk _space_ , not disk I/O. Those logs pile
up very quickly, depending on how long you retain them.

The raw network and disk I/O, however, are rarely of concern. Before you
approach either limit you're already logging to the tune of ~300G per hour -
and have probably switched to a distributed architecture of some sort long
ago.

Writing broad streams of sequential text is very cheap.

Making sense of what you wrote, ideally before actually writing it, but at the
very least before being forced to purge it due to storage constraints, is the
difficult part. ;-)

------
mmt
Protip: Use syslog-ng.

Besides longer log message (arbitrarily long, with a recompile) and reliable
delivery, it obviates my main use for logrotate, since it can be configured to
write to a filename (including directory) based on time, date, or other
variables.

~~~
js2
Protip: look at rsyslog, syslog-ng, and splunk and decide what is right for
your environment.

~~~
mmt
The "pro" that I am is sysadmin, and I'm asserting that evaluating all three
is a waste of time.

Splunk, given its cost and complexity, is almost never right for startups.

Non-ng syslog is, on the other hand, so simplistic that it's not worth the
effort of fancy configuration. Is there some kind of compelling advantage that
I've been overlooking?

I never quite understood the conceit that every environment is a precious-and-
unique snowflake requiring careful evaluation of any given tool.

~~~
js2
Turns out I am a sysadmin as well, and I'm asserting that each has various
strengths. I have used syslog-ng as long ago as 2001, so I have some
experience with it. Today I would recommend rsyslog. It is the default logger
in Ubuntu 10.04 LTS and Fedora is also transitioning to it:

<http://fedoraproject.org/wiki/Releases/FeatureRsyslog>

Further, I think that RELP and on-demand disk spooling of messages are
compelling features. Its performance and reliability are good enough to feed
your web-server access logs through.

I wouldn't overlook rsyslog, but I'm also not saying "just use it" because
syslog-ng is certainly worth evaluating as well.

Edit: see also [http://www.linuxjournal.com/content/centralized-logging-
web-...](http://www.linuxjournal.com/content/centralized-logging-web-
interface)

~~~
mmt
This more in-depth discussion has more value. Thank you.

 _I think that RELP and on-demand disk spooling of messages are compelling
features_

I think we're coming at the question from different perspectives. One of my
primary goals is to avoid wasting my time. Since I've already evaluated and
experimentally proven syslog-ng, switching means a large time investment.

As such, features like REPL and, arguably misfeatures[1], like disk spooling,
fail to compel such an investment.

Once rsyslog has matured, something that I expect will be accelerated by its
inclusion in major distros, it may be a no-brainer.

For my "money," there are far more interesting and productive problems to work
on than logging, which is why I do give the "just use it" advice.

 _Turns out I am a sysadmin as well_

By choice or necessity? Just curiosity on my part.

[1] I have yet to encounter an environment of non-trivial size where the risk
of losing logging outweighs the risk of disk filling up and/or performance
degradation from additional contentious I/O. For me, it's a killer feature of
centralized logging: elimination of a particular source of
failure/degradation.

------
kvs
Has anyone tried Facebook Scribe?

~~~
thwarted
Yes, and it works pretty solidly. We used to use syslog-ng, writing to named
pipes, but you have to be sure that there is something reading from the pipe
before you start writing to it, otherwise it blocks. You don't really want to
use the syslog protocol (via the syslog(3) library call) for random logging
because you may end up hitting the upper bounds of the log lines. I wanted to
use rsyslog, mainly for the local buffering, but it ONLY seems to support the
syslog format/protocol, including prefixing all lines with a date and time and
a hostname.

We use scribe and we have a stdin2scribe program (python) that can be used to
hook into any log output (like apache access and error logs). We have it set
up in a two tier system, all systems that we'd want to log from run a "scribe
leaf" on a port on localhost, and this forwards all logs to a "scribe
aggregator" (behind a load balancer), with a buffering space on the local disk
when the aggregator can not be contacted. It's a pretty solid system and I
recommend it.

We also have services, and command line and library interfaces to those
services, that let you grab all the logs that came in on a certain time frame,
or tail all the data coming into the aggregators in real time (one of them is
a wrapper around a more generic tool that just tags the logs, the wrapper
takes grep-style filtering arguments and the output is pipped to a pretty
printer).

------
aguynamedben
If you're interested in more volume and flexibility, check out Flume, a new
open-source project from Cloudera (the Hadoop/logging experts). Solid software
and community behind it.
<http://archive.cloudera.com/cdh/3/flume/UserGuide.html>

