
Logs Are Streams, Not Files - ph0rque
http://adam.heroku.com/past/2011/4/1/logs_are_streams_not_files/
======
jrockway
Am I the only person that doesn't want logging to block or buffer? Streams are
great until you realize that when the thing that you are piping stdout to
blocks then _your_ program goes to sleep too... all because something couldn't
process your debug logs? Files have the same problem... no space left on
device, so all your apps have to crash!?

Using non-blocking IO is a solution, but then when the fd is unwritable for a
while, you buffer data and your machine runs out of memory. So also not good.

I'm a big fan of the strategy Varnish uses. It mmaps a block of memory, and
then treats it as a ring buffer. A separate program can connect to this and
read the logs to a file or a pipe, but if that program fails, you don't stop
serving web pages. As long as allocated memory doesn't randomly disappear and
the Universe doesn't implode, logging adds no overhead or risk of failure to
your app.

Logs are essential, but I don't necessarily want a failure in the logging
system to break my app. When you use files or pipes, you run this risk.

~~~
haberman
> Am I the only person that doesn't want logging to block or buffer?

If it's an error that you're logging the moment before you crash, then you
probably want it to block.

At Google LOG(ERROR) blocks but LOG(INFO) doesn't, for this reason.

~~~
jrockway
Smart! I think it might finally be time for me to send my resume in the
direction of Google ;)

------
ChuckMcM
"Logs are a stream, and it behoves[sic] everyone to treat them as such. Your
programs should log to stdout and/or stderr and omit any attempt to handle log
paths, log rotation, or sending logs over the syslog protocol."

If you were being pedantic one might say that logs are a repository for
temporally tagged event notifications that have occurred in the past. You can
store these temporally tagged event streams in files, in a memory cache, in an
email inbox, or a round robin database (RRD).

Not particularly profound, but useful. Logs from disparate processes which
share a common time basis however are the systems analyst's go to tool for
trying to untangle secondary and tertiary properties of a loosely coupled
system.

------
enoren
This assumes though that you only have a single log. Often times I find
breaking up logs based on concerns/packages/names or even level to be useful,
which you could not do without reinventing the wheel of output filtering.

This also assumes that anyone is actually really reinventing the appenders as
most frameworks already have a library of common appenders.

~~~
imbriaco
I would argue that you want to do the filtering at the aggregation layer and
not at the application level. So while you may want to set different flags or
parameters that will be logged (ie: severity, component, etc) those parameters
should be injected into the log stream rather than to alternative streams. Let
the aggregation point use those clues to filter the logs appropriately.

~~~
prodigal_erik
If you have multiple streams, you can give lower latency or better durability
to critical or low-volume messages.

~~~
encoderer
This should definitely take place at the aggregation level. There are great
tools for this already. This is definitely an implementation detail we
developers should delegate to the systems guys.

Not to mention, it's just less development work to send out messages with,
say, different syslog levels and be done with it.

------
burgerbrain
Correction: streams are files. _Everything_ are files.

~~~
jasonwatkinspdx
Rather than just being snarky why not correctly correct the post by mentioning
named pipes?

~~~
burgerbrain
Named or anonymous, pipes are still just types of files. "Everything is a
file" isn't snark, it's a saying almost as old as unix itself.

~~~
jasonwatkinspdx
I'm well familiar with the expression. However it's not helpful to someone who
isn't already familiar. It's another form of "you're doing it wrong" without
explanation.

~~~
burgerbrain
It's an intellectual teaser. You read it and think to yourself, "well this
can't possibly be true, let me look into this further". Then you learn.

~~~
cema
Do you expect people to "look into it further" with everything they think
cannot be true? I would not.

~~~
burgerbrain
If they have any interest in the topic in the first place? Of course.

I certainly did, so many years ago...

------
gaius
_if they wisht to log to syslog, each program needs to implement the syslog
protocol internally_

This is simply not true. I changed the logging method of a C program recently
from timestamped text file to syslog in about 10 lines... Utterly trivial in
any language.

------
akkartik
So it's basically arguing that server-side code benefits from logging to
stdout rather than a named file? Interesting idea if you aren't using stdout
for something else, but it doesn't seem particularly radical since you can
just use _cat_.

A few months ago I published a suite of scripts that allow me to run complex
queries using shell pipes. They work by communicating in YAML. The first
stage, _cat_logs_ , simply parses logs from a file and emits yaml for each
line. _cat_logs_ understands that log files are split across multiple files,
and that they may be gzipped.

<http://akkartik.name/blog/2010-09-25-05-29-01-soc>

------
jimfl
Even better, logs are databases, and not a linear sequence at all. Often, I
find that I am interested in some very specific subset of what happened;
between these dates, from this IP address, in this module, errors of this
kind, etc. If log entries were put in a database with a well-thought out
schema, using the log in this way would be trivial.

------
viraptor
Another project worth mentioning (although "as a service") is
<http://www.graylog2.org/> The web interface is not there yet (no complicated
queries, or easy drilldown), but it definitely performs well and has an
interesting design.

~~~
wiredfool
Interesting project, but I was hoping that it was something that it isn't.
I've been looking around for something that would do ad hoc querying and
graphing of 'stuff' frm the debug and production logs that I have on had,
stuff like page gen time statistics vs time, or split by heaviest pages, or
split by web app machine. And then follow that till I find something
actionable.

I've taken a look at statsd+graphite, but that seems to be much more of a
realtime solution, and I want to mine the logs I have on hand.

~~~
StavrosK
Why not write a simple script that will parse your logs and send the results
to stated?

~~~
wiredfool
Statsd takes realtime info and dumps it in. You have to at least know what
you're measuring when you start. I Ve got a month or two of debug logs, and I
dont know whee the gems are yet, touugh I'd probably start with timings and go
from there.

------
silentbicycle
Or you can just use daemontools, and these problems go away.

~~~
jrockway
Not really. If your program writes its logs to /foo/bar then daemontools does
not do anything for you.

~~~
silentbicycle
True, but half the problem there is programs that try hard to handle logging /
daemonizing / etc. themselves, and do it badly.

------
arctangent
I think we have some semantic trickery going on.

It's fairly obvious that a log file is a file. But "the log" is clearly a
never-ending stream of information.

The author makes other good points but the title is linkbait.

------
mcantor
> _Logplex handles input streams (which we call “sinks”)_

Considering the complementary term they use ("drains"), wouldn't it be more
intuitive to call them "faucets" or "spouts"?

Just sayin'.

~~~
metageek
Or, better yet, call the inputs sources and the outputs sinks. Just like, say,
the entire rest of the world.

------
shasta
I disagree. Logs and streams are very different kinds of output. For number
one, streams are usually more fluid, and I tend to have a higher volume of
stream output than logs. For number two, it's more important to produce logs
at regular intervals, IMO.

------
aerique
Just pointing out "tail -F" instead of "tail -f".

~~~
calloc
There are differences between tail -F and tail -f.

tail -f will continue reading a file for as long as possible, but if that file
is renamed underneath it, then tail won't know and you will be wondering why
you don't see anymore log lines being output to your terminal.

tail -F checks to see that the file we opened is still in the same location as
before, so if the file is renamed underneath us, and a new file is created in
its place (think log rotation) then it will open that new file and continue
outputting data to its standard out.

~~~
throw_away
I wish there was a tail -f <wildchar> that follow even if new files are
created that match. my systems timestamp their logs by hour, and more than
once I've been confused as to why the app just "stopped" just to realize that
it's five past.

~~~
rbanffy
I am on my phone now (and, while its heart is a Linux kernel, it doesn't have
a terminal, bash or tail. Maybe if you don't expand the wildcard tail will
detect new files.

~~~
throw_away
no, if you give it a literal wildcard (like you would to find, for example),
it just complains that it can't find the file called *.

~~~
rbanffy
Guess if you really want it, you could submit a patch (or roll out your own
fork)

[http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail...](http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c)

Doesn't look very bad, but I am known for being an optimist.

------
phlux
Heh.

I think it would be wonderul if everyone started logging to twitter.

Then you can just follow each machine to get a read of its logs... or, at
least pipe the output of LogWatch or something like it to twitter.

Maybe for 4/1/2012 - we should organize all sys admins to output logs to
twitter randomly choosing @jack or whomever else and flood them with @ log
bursts...

