With the details, it would be a much more interesting post.
logstash takes logs from various inputs (syslog, files, Redis, HTTP), filters/normalized the formats into JSON, and outputs various formats (ElasticSearch, Redis, MongoDB, Graylog2). There is a WebUI with search and graphs. It's designed to scale-out and run on multiple machines.
The easy solution is just, y'know, write the log to a file and scp it back to some central place every so often. But then you have to either (a) keep track of how much of a file you've copied, which is a pain; or (b) only grab files that you're no longer actively writing to (as determined by naming scheme or something), but that introduces some latency, depending on how often you rotate.
I get that you were making a snarky allusion to syslog, but what part of the syslog UDP protocol do you feel beats Redis' TCP protocol?
Ideally, I'd use a function that sent things to a small server over UDP, which would then put them in Redis. This assumes you don't mind losing a few lines, of course.
UDP is marginally faster than TCP, but the tradeoff for that is that under heavy load, UDP imposes more costs on the rest of your traffic. Since we're talking about logging, though: who gives a shit how fast it is? With either transport, if logs are taking more than hundreds of milliseconds to clear, you have a problem you need to fix.
It's trivially capped and rolled.
It provides a common logging interface across platforms.
It's extremely simple (the script that pipes syslog output directly into Redis is probably just a couple lines long).
I have to believe that the people who really seem to like syslog have never worked in organizations that had to deploy things like Splunk or (worse) LogLogic and ArcSight just to make sense of the giant morass of useless text gunk they generate.
Have you noticed how none of the cool kids postprocess http log files anymore?
It is not less or more centralized than syslog configured with centralization (which is trivial to set up).
How is this more common than syslog across platforms (unless you include windows in "across platforms ?").
It is not simpler than syslog either, since writing to syslog is just a matter of using the right python logging backend.
Analysing tons of data from syslog is a pain, but I don't see how any solution will not require at some point in the stack to enforce a format/structure in your log. How is this fundamentally different than post-processing http log ?
And Redis is easier to understand than syslog. We're pretending that there is zero friction to understanding syslog, as if any competent Unix person should automatically grok it. But syslog is a janky old rube goldberg machine and understanding it well pays off solely in the form of understanding a janky old rube goldberg machine that nobody will be using in 10 years.
Saying that he can do something by subscribing to an event in Redis is sorta silly, isn't it? You could just as easily tail the logs, as many systems actually built for this purpose do. Once again, there are already things in *nix for this.
The reality is, this is a janky, nubile solution to an already well-solved problem that is now getting thousands of views because of HN and antirez tweeting about it. Instead of learning the correct way of doing things, I bet a bunch of inexperienced developers are now going to say "BOY THIS IS SWELL" and cut themselves a nice fat chunk of technical debt for the not-that-far-off future.
The redis is easier than syslog is a bit of a strawman, because you will have to understand syslog anyway, since that's the only thing spoken by quite a few applications. In the OP'case, they are already using redis, so the cost on that side is very low, though.
What you really want to do (and what everyone does btw) is to push your logs to a central syslog-server and stream them into redis or whatever analytics solution from there.
Because that's exactly what I'd have to do until other daemons like, say, redis itself, speak this fancy new protocol.
So, what do I put into the redis configuration-file to make it log to itself?
Context matters, a lot. When my app logs a timeout against redis then my next question is "so what did redis do at the time, did it perhaps log something"?
Following your advice I'd either have to look in two places (redislog and syslog) or feed my syslog stream into redislog, to have everything end up in one place (redis).
Any sane person would do the latter. Under that premise, what's the benefit of having some apps log to redis directly when the syslog-stream also ends up in redis anyways?
I think the key point here is that all the above mentioned implementations have significant adoption and are in a sense "battle-tested". For example, what if your background worker has failed and log events are piling up in the Redis list you are using as queue? Do you have monitors in place to detect that situation, and at what value do your alarms go off? Projects like this have a way of taking a lot more time than originally thought, often at the expense of your core development time. I personally don't like spending the time writing and maintaining code for a project that isn't aligned with the problem I'm trying to solve, so I avoid it whenever possible.
On the flip side, if you are setting out to build a really robust logging system on top of Redis, and that's something of value to your organization, then more power to you!
The thing to realize here is that Redis isn't like Riak or Mongodb or even MySQL. It is stupid simple to stand up a Redis instance. The code to push logs to it: also stupid simple. Even without clever indexing, just stuffing text crud into it, Redis is already natively a great log store.
tail -n 1000
* Freeform text is a terrible way to track system events.
* Periodically rotated flat files are not a great way to store log information.
* Goofy little UDP messages are not a good way to convey system events
* The syslog PRI field dates back to when we exchanged messages with UUCP.
I could keep going, but since you're just going to reply with "lolwut umad?", I'll leave it at that.
And I'll add it tends to ship in a horrible default configuration with events scattered randomly over multiple files, no safe-guards against filling up the disk and no safeguards to ensure the stupid daemon is actually running.
Freeform text is a terrible way to track system events.
Nothing stops you from logging structured text.
Periodically rotated flat files are not a great way to store log information.
Modern syslog daemons will write to pretty much anything you want.
Goofy little UDP messages are not a good way to convey system events
Modern syslog daemons offer tcp transport. Some even try to offer some delivery guarantees (disk-backed spool), although personally I wouldn't rely on that for truly critical stuff.
The syslog PRI field dates back to when we exchanged messages with UUCP.
Thanks, I always wondered where those were from...
And, well, you forgot a couple bullets:
* syslog() is available everywhere, out of the box
* It's trivial to move from file-based logging to syslog
* We have mature syslog-daemons that dispatch events pretty reliably
* Unless you're facebook you probably don't need anything more fancy.
So, I'd say syslog gets the job done quite well, as long as you don't mistake it for a message queue.
It's not just that logs tend not to be structured, it's that the metadata is all inband. And syslog in particular exacerbates the problem by decoupling logging clients from log storage policy: the log generator has no idea if its syslog has a super-smart storage policy and so has to assume everything needs fine grained human readable timestamps, a custom facility/severity notation, the proctitle and pid, and so on.
Most syslog daemons will store anywhere. Sure. The last syslog daemon I actually read had stack overflows in it, so it's been awhile for me. But if you've got syslog writing to a real store, what value is syslog adding? The standardized network protocol? Redis has a trivial network protocol too.
The issue with syslog facilities isn't that the "UUCP" facility harms anything directly; it's that one the few bits of out-of-band metadata syslog truly offers forces applications to decide whether they're "kern" or "daemon" or "local2". Whose application actually breaks down cleanly into syslog facilities?
Exactly. Every piece of software under the sun either logs to syslog by default or has a config-switch/patch to make it so.
Yes, syslog the protocol is pretty nasty. Yes, syslog daemons are pretty nasty. But they get the job done and you are free to put anything (including redis) at the end of the pipe.
Coping with syslog is by far easier than trying to make all "legacy software" including the kernel speak something else entirely.
But if you control it, why bother with syslog? Syslog is a piece of junk. Don't bother. Any ad hoc scheme you come up with that uses a real backend store will be better than syslog.
I'd ask this backwards: Why bother with a homegrown solution when syslog exists and gets the job done?
A home-grown solution takes at least a couple weeks to stabilize, likely much longer before the last bugs are ironed out. Syslog takes about a day to beat into shape.
Any ad hoc scheme you come up with that uses a real backend store will be better than syslog.
You know better than that.
Shipping messages reliably is a surprisingly tedious problem. First you realize you need a disk-spool. Then you realize that spool should be size-capped anyhow. Eventually your boss says you need a network topology more complex than A->B for some idiotic but inevitable reason.
And then next month you run into some redis limitation and realize some kind of datastore abstraction would have been a better idea to begin with. Hmm, perhaps dump to plain-text files until we figure that one out?
See the pattern here?
At the end you have reinvented syslog. Sure, yours may be nicer or at least different.
But that's a whole lot of work to avoid something that, despite all its warts, already works.
Out of pure curiosity, what do you see as the tool most likely to displace syslog in the future? Is there any alternative available that fixes most of these problems without rolling your own from pieces and parts?
Pretty much any other time I've had to work with it, I've wished for something better, so I, for one, am very happy by the thought of people starting to "go out of their way to avoid syslog".