

Another Redis case: Centralized logging - theonlyroot
http://sunilarora.org/another-redis-use-case-centralized-logging
Would like to know what logging strategy people have been following in a distributed environment.
======
ajays
The article was fairly devoid of the necessary details. One of the main ones
being: how many events are we talking about? 10 events/sec? 100/sec?
10_000/sec ? And what is the size of these events? How many event emitters are
connecting to the Redis server?

With the details, it would be a much more interesting post.

------
antoncohen
Take a look at logstash [1], it's on GitHub [2]. I think it could replace or
integrate with RedisLogHandler.

logstash takes logs from various inputs (syslog, files, Redis, HTTP),
filters/normalized the formats into JSON, and outputs various formats
(ElasticSearch, Redis, MongoDB, Graylog2). There is a WebUI with search and
graphs. It's designed to scale-out and run on multiple machines.

[1] <http://logstash.net/> [2] <https://github.com/logstash/logstash>

------
mkelly
Depends on how much you care about latency, right?

The easy solution is just, y'know, write the log to a file and scp it back to
some central place every so often. But then you have to either (a) keep track
of how much of a file you've copied, which is a pain; or (b) only grab files
that you're no longer actively writing to (as determined by naming scheme or
something), but that introduces some latency, depending on how often you
rotate.

~~~
StavrosK
I prefer sending the logs to some other computer over UDP.

~~~
rarrrrrr
This only works if you can tolerate loss of some log items.

~~~
StavrosK
Very true. If you can, then it's a great way to log things.

~~~
prodigal_erik
Yikes, losing messages? If anyone designs a system that knowingly makes itself
harder to diagnose under stress by destroying evidence, I only hope it's
themselves and not their successors who are there to endure the pain.

~~~
StavrosK
As I said, it depends. If you're logging cycle completion times or web page
visits, you don't care about losing a few.

------
aedocw
A similar concept, though using MongoDB with a capped collection is
<http://graylog2.org>.

------
rbucker
I'm working on a universal subscriber, however, it currently does a nice job
with redis. <http://sub-watcher.com> It forwards messages back to redis and to
syslog. And it has several filtering options.

------
andrewvc
My only question: how is this better than syslog?

~~~
tptacek
It's queryable.

It's trivially capped and rolled.

It's centralized.

It provides a common logging interface across platforms.

It's extremely simple (the script that pipes syslog output directly into Redis
is probably just a couple lines long).

I have to believe that the people who really seem to like syslog have never
worked in organizations that had to deploy things like Splunk or (worse)
LogLogic and ArcSight just to make sense of the giant morass of useless text
gunk they generate.

Have you noticed how none of the cool kids postprocess http log files anymore?

~~~
cdavid
I don't really understand your list in view of the article. How is this
solution more queryable than syslog: they record events into redis without any
schema related to it (just a string), so I fail to see the improvement there.
They put it back to a log file anyway.

It is not less or more centralized than syslog configured with centralization
(which is trivial to set up).

How is this more common than syslog across platforms (unless you include
windows in "across platforms ?").

It is not simpler than syslog either, since writing to syslog is just a matter
of using the right python logging backend.

Analysing tons of data from syslog is a pain, but I don't see how any solution
will not require at some point in the stack to enforce a format/structure in
your log. How is this fundamentally different than post-processing http log ?

~~~
tptacek
He's LPUSH'ing logs into a list key. Just by doing that one thing, he now has
evented logging; he can subscribe to his raw logs with BRPOPLPUSH and without
changing anything clientside start indexing them in better ways, or applying
different policies to different logs.

And Redis is easier to understand than syslog. We're pretending that there is
zero friction to understanding syslog, as if any competent Unix person should
automatically grok it. But syslog is a janky old rube goldberg machine and
understanding it well pays off solely in the form of understanding a janky old
rube goldberg machine that nobody will be using in 10 years.

~~~
mihasya
The lady doth protest too much, no? The guy is using Redis as a buffer to what
ultimately ends up just being a central log file. You know. rsyslog. None of
the fancy things you have in mind. Which brings me to the same point someone's
already made: using Redis buys you nothing, even if you had a more involved
use case in mind.

Saying that he can do something by subscribing to an event in Redis is sorta
silly, isn't it? You could just as easily tail the logs, as many systems
actually built for this purpose do. Once again, there are already things in
*nix for this.

The reality is, this is a janky, nubile solution to an already well-solved
problem that is now getting thousands of views because of HN and antirez
tweeting about it. Instead of learning the correct way of doing things, I bet
a bunch of inexperienced developers are now going to say "BOY THIS IS SWELL"
and cut themselves a nice fat chunk of technical debt for the not-that-far-off
future.

------
tedjdziuba
Holy balls, talk about going out of your way to avoid syslog.

~~~
tptacek
Syslog is a pile of shit, Ted. It's a relic. You clearly happen to love that
relic, and I think you should find a way to place it just-so in a nicely lit
alcove in your apartment. The rest of us should move on from it. I don't think
less of you for admiring it. I have useless old things on display in my house
too.

* Freeform text is a terrible way to track system events.

* Periodically rotated flat files are not a great way to store log information.

* Goofy little UDP messages are not a good way to convey system events

* The syslog PRI field dates back to when we exchanged messages with UUCP.

I could keep going, but since you're just going to reply with "lolwut umad?",
I'll leave it at that.

~~~
moe
Yes, syslog is a pile of shit. It's a relic.

And I'll add it tends to ship in a horrible default configuration with events
scattered randomly over multiple files, no safe-guards against filling up the
disk and no safeguards to ensure the stupid daemon is actually running.

However...

 _Freeform text is a terrible way to track system events._

Nothing stops you from logging structured text.

 _Periodically rotated flat files are not a great way to store log
information._

Modern syslog daemons will write to pretty much anything you want.

 _Goofy little UDP messages are not a good way to convey system events_

Modern syslog daemons offer tcp transport. Some even try to offer some
delivery guarantees (disk-backed spool), although personally I wouldn't rely
on that for truly critical stuff.

 _The syslog PRI field dates back to when we exchanged messages with UUCP._

Thanks, I always wondered where those were from...

And, well, you forgot a couple bullets:

* syslog() is available _everywhere_ , out of the box

* It's trivial to move from file-based logging to syslog

* We have mature syslog-daemons that dispatch events pretty reliably

* Unless you're facebook you probably don't need anything more fancy.

So, I'd say syslog gets the job done quite well, as long as you don't mistake
it for a message queue.

~~~
tptacek
1.

It's not just that logs tend not to be structured, it's that the metadata is
all inband. And syslog in particular exacerbates the problem by decoupling
logging clients from log storage policy: the log generator has no idea if its
syslog has a super-smart storage policy and so has to assume everything needs
fine grained human readable timestamps, a custom facility/severity notation,
the proctitle and pid, and so on.

2.

Most syslog daemons will store anywhere. Sure. The last syslog daemon I
actually read had stack overflows in it, so it's been awhile for me. But if
you've got syslog writing to a real store, what value is syslog adding? The
standardized network protocol? Redis has a trivial network protocol too.

3.

The issue with syslog facilities isn't that the "UUCP" facility harms anything
directly; it's that one the few bits of out-of-band metadata syslog truly
offers forces applications to decide whether they're "kern" or "daemon" or
"local2". Whose application actually breaks down cleanly into syslog
facilities?

4.

I dispute that syslog is "good enough if you're not Facebook". I think you do
too: you probably aren't doing all your metrics stuff with syslog logs; you
probably aren't even getting your web stats though syslog. You may have even
delegated your web logs to _client side Javascript_ , because that is how bad
text logs stored in flat files are to work with.

~~~
moe
_what value is syslog adding? The standardized network protocol?_

Exactly. Every piece of software under the sun either logs to syslog by
default or has a config-switch/patch to make it so.

Yes, syslog the protocol is pretty nasty. Yes, syslog daemons are pretty
nasty. But they get the job done and you are free to put anything (including
redis) at the end of the pipe.

Coping with syslog is by far easier than trying to make all "legacy software"
including the kernel speak something else entirely.

~~~
tptacek
But who's suggesting that? Nobody is saying "let's have the kernel log to
Redis". By all means, hot potato the kernel stuff and the wrapper stuff and
your authlog and whatever through syslog before it gets dumped into Redis.

But if you control it, why bother with syslog? Syslog is a piece of junk.
Don't bother. Any ad hoc scheme you come up with that uses a real backend
store will be better than syslog.

~~~
moe
_But if you control it, why bother with syslog?_

I'd ask this backwards: Why bother with a homegrown solution when syslog
exists and gets the job done?

A home-grown solution takes at least a couple weeks to stabilize, likely much
longer before the last bugs are ironed out. Syslog takes about a day to beat
into shape.

 _Any ad hoc scheme you come up with that uses a real backend store will be
better than syslog._

You know better than that.

Shipping messages reliably is a surprisingly tedious problem. First you
realize you need a disk-spool. Then you realize that spool should be size-
capped anyhow. Eventually your boss says you need a network topology more
complex than A->B for some idiotic but inevitable reason.

And then next month you run into some redis limitation and realize some kind
of datastore abstraction would have been a better idea to begin with. Hmm,
perhaps dump to plain-text files until we figure that one out?

See the pattern here?

At the end you have reinvented syslog. Sure, yours may be nicer or at least
_different_.

But that's a whole lot of work to avoid something that, despite all its warts,
already works.

