
Log Everything as JSON. Make Your Life Easier. - kiyoto
http://blog.treasure-data.com/post/21881575472/log-everything-as-json-make-your-life-easier
======
NyxWulf
I've seen several articles like this, and there are a number of things to
consider.

Logging to ascii means that the standard unix tools work out of the box with
your log files. Especially if you use something like a tab delimiter, then you
typically don't need to specify the delimiter.

As an upside you are aren't storing the column definition in every single
line, which if you are doing large volume traffic definitely matters. For
instance we store gigabytes of log files per hour, grossing up that space by a
significant margin impacts storage, transit and process times during write
(marshallers and custom log formatting). Writes are the hardest to scale, so
if I'm going to add scale or extra parsing time, I'd rather handle that in
Hadoop where I can throw massive parallel resources at it.

Next you can achieve much of the advantages of json or protocol buffers by
having a defined format and a structured release process before someone can
change the format. Add fields to the end and don't remove defunct fields. This
is the same process you have to use with protocol buffers or conceptually with
JSON to have it work.

Overall there are advantages to these other formats, but the articles like
this that I've seen gloss over the havoc this creates with a standard linux
tool chain. You can process a LOT of data with simple tools like Gawk and bash
pipelines. It turns out you can even scale those same processes all the way up
to Hadoop streaming.

~~~
wladimir
_Especially if you use something like a tab delimiter, then you typically
don't need to specify the delimiter._

What if the fields contain tabs? For every human-readable format that can
contain arbitrary user input (nearly all of them) you need some form of
escaping (I guess you could do length prefixing in ASCII decimal, but it'd not
be pretty either and incompatible with basic tools).

But by far the biggest problem is the logging of text messages aimed at a
humans, not the delimiting. Regular expressions can help in searching logs in
quick-hack jobs, but if you need to parse logs for visualization or reporting,
which is very common in organizations, using them is error-prone. After all,
you rely on English messages of a certain form. The complexity of that could
quickly move from "easy with regexps" to "we need NLP in our log parser!"
(never mind security problems with one field leaking into another due to a
slightly wrong regexps).

The application might change the message to make the message more readable for
humans, or even move around fields, and your automated parser breaks.
Structured messages, on the other hand, won't change for those concerns as the
formatting for humans happens in the back-end.

I get a bit annoyed at the "WTF kid, learn UNIX tools" kind of responses here.
UNIX tools are one way of doing things, not the holy one perfect way. Tool
support is important, but there are also tools for processing JSON, XML,
streams available. Shouldn't you use the best tool for the job, not the one
you happen to know?

(I don't mean that JSON is necessarily the best format in every use-case, but
for automated processing every structured format at all trumps arbitrarily
delimited and escaped files. You can easily convert between structured formats
should the need arise)

~~~
burgerbrain
> _"Shouldn't you use the best tool for the job, not the one you happen to
> know?"_

Well, 'best' in this case should be determined by combining a number of
factors. Certainly technological superiority should be weighted quite heavily,
but in certain circumstances the _"Nobody ever got fired for buying IBM"_
effect is also very important.

Of course there is also the _"The best tool for the job is the one you
have/know."_ quip, which I don't generally agree with myself... so I guess
what I'm saying is that your mileage may vary.

(disclaimer: I log with JSON)

~~~
wladimir
But such thinking can block innovation. It's a form of path lock-in. Both UNIX
and Windows "gurus" are guilty of this, of seeing their way as the "one true
way". Just because it's always been that way.

New ideas are not always better, but sometimes they might be. In the longer
run "I'm used to this" (on its own) is not a good argument as there will
always be new people that are not used to your specific blub, and if they can
be more productive or make more reliable/secure systems then eventually you'll
be out of the market.

See also: <https://news.ycombinator.com/item?id=3892410>

~~~
burgerbrain
Oh certainly, I fully agree. I think I'm just saying that I think logging is
going to be one of those things, for better or worse, that most developers
look at and do the mental calculus of _"It's a good idea, but do I want to go
out on a limb here, with_ this _issue?"_ In at least some cases all the
factors added together just won't work out to it being worth the risk/effort.

Basically just the IBM thing. Was IBM always the best choice? No. But even so,
it was often the best choice for the individuals making that call. This is the
sort of thing that you have to recognize and contend with if you want to
introduce change.

------
rachelbythebay
This article feels like it would work just as well with "Protocol Buffers",
"Thrift", "XML", or even maybe "ASN.1". If that's truly the case, maybe the
better thing to say is "please don't (only) log in ASCII", followed by "please
use a format which is hard to get wrong".

JSON scares me a little. Don't you have to worry about escaping a whole bunch
of characters just in case something gets the wrong idea about what you have
in a field? I saw a page not too long ago which listed about a dozen
characters which should be substituted in some manner when used in JSON.

Full disclosure: I got tired of ASCII logging from my web server and wrote
something to stream binary protocol buffers (!) to a file instead.
<http://rachelbythebay.com/w/2012/02/12/progress/>

~~~
lpolovets
Over the last 10 years, I've gradually moved from ASCII to
Thrift/ProtocolBuffers to JSON. Here are some random thoughts:

1) As others said, you should definitely use a library for JSON
reading/writing.

2) JSON is simple and surprisingly fast to read/write. In one large benchmark
([http://code.google.com/p/thrift-protobuf-
compare/wiki/Benchm...](http://code.google.com/p/thrift-protobuf-
compare/wiki/Benchmarking)), the Jackson JSON library (for Java) comes in
somewhere between Thrift and Protocol Buffers.

3) JSON feels friendlier than protocol buffers and Thrift because it's human
readable instead of binary.

4) JSON is more convenient than multi-line formats like XML because you can
grep for things easily. For example, if you have {time:015128752,
event:"authentication error", user:"bob smith", details: "..."}, then I can
take do the following:

    
    
      cat log.json | grep 'user:"bob smith"'
    

It's hard to do that with XML because even if you find the right user, the
entire log message/object spans multiple lines.

5) JSON is not as compact as binary, but it gets surprisingly close if you
gzip it. All of those repeated descriptive attribute names are great for
humans, but compression algorithms love them as well.

6) The format is fairly universal, but if you ever come across a language
without JSON parsing libraries, the format is not so hard that you couldn't
write your own parser (compare that to Thrift, XML, etc)

~~~
deno
> JSON is more convenient than multi-line formats like XML because you can
> grep for things easily.

Only if the text representation is pretty-printed in a certain way.

> It's hard to do that with XML because even if you find the right user, the
> entire log message/object spans multiple lines.

XML tooling makes grepping obsolete.

    
    
        cat log.xml | xpath //log[@user="bob smith"]

~~~
matan_a
Just please consider that XPath needs to store the items in a tree structure
first (XML needs to be parsed and loaded into memry) before it can be useful.
Running that on a large log file would be an interesting performance
experiment.

~~~
deno
Nonsense, XPath works very well with streamed content, and there are even
implementations of XPath engines for FPGA and GPGPU, all of which have very
strict memory limitations.

DOM is completely optional, and you only need it for convenience or to observe
the modifications, e.g. the way browsers make use of it.

Just think how you would compile any XPath expression — it’s very similar to
REGEX, only for structured data.

~~~
empthought
I think you're overstating the capabilities and implementation of common XPath
processors here, and silently implying the use of a _subset_ or _modified
version_ of XPath as specified. Arbitrary XPath will likely require loading
the entire document in the worst case, because it allows navigation to any
portion of the tree.

~~~
deno
Navigation to any portion of the tree does not imply any need to keep the
entire document in memory at once.

Worst case scenario XPath expression will not yield any results before
_traversing_ the entire tree. However any XPath 1.0 (and I imagine 2.0 is no
different in that regard) can be compiled into a deterministic state machine
(DFA), which only needs to keep tabs on how many elements it has seen and what
conditions has been met.

What XPath 1.0 specifically doesn’t allow are arbitrary sub-expressions in
predicates. Those would be problematic in certain conditions.

The only potential issue with performance is when running a large number of
XPEs against a single stream. So there are various techniques to remedy that,
including merging state machines for branching expression etc.

~~~
empthought
There's a following:: axis in XPath which lets you look ahead to arbitrary
elements. So the processor needs to load the document into memory, which of
course is not streaming.

~~~
deno
I don’t see how this axis itself is a problem. Could you provide a specific
example?

~~~
empthought
Just /node/node2/following::* in a predicate or whatever. Perhaps you're
conflating XPath-as-used with XPath-as-specified?
<http://msdn.microsoft.com/en-us/library/ms950778.aspx> explains that you
can't support all of XPath (even 1.0) and process things the way that you
describe.

~~~
deno
Right, they completely eliminate any buffering. That’s very strict, but you’re
right. I thought you meant that there’s a subset of XPath that can’t work on
anything but DOM.

~~~
empthought
Yeah, that's all. The reason it's relevant is that XPath processors often
value conformance over expanding the utility of their system. I didn't want
someone going off and using libxslt or xalan expecting it to process things in
a memory-efficient manner.

------
skrebbel
The real takeaway is that log files invariably tend to become interfaces for
something. They often end up being used for monitoring tools, business
intelligence, system diagnostic tools, system tests, and so on. And they're
great for this. But not when they contain sentences like "Opening
conection...", which break half those tools the moment someone fixes the typo.

The log strings became an interface. Avoid this. If it's an interface, it has
to be specced, and it has to allow for backward compatibility, just like any
other interface that crosses component / tool boundaries.

Whether you do the actual data storage with JSON or something else doesn't
matter. It's an implementation detail (though I agree that keeping it not only
machine-readable, but also human-readable, is probably a good thing).

Design the classes that represent log files, and treat them like they're part
of a library API. Don't remove fields. Ideally, use the same classes for
writing (from your main software) and parsing the logs (from all that other
tooling) and include version information in the parser so that the class
interface can be current yet the data can be ancient.

~~~
gbog
That exactly the opposite of my understanding of logs, and the reason why I
can't agree with the OP.

For me logs are a way to store extensive historical data about what happened,
in the cheapest and simplest way possible. Logs are a "write a lot and read
almost never" kind of storage. For this kind of storage, the simplest way to
do it is tab delimited flat files.

Logs are for debug or "legal" purposes: a client complains he has lost all his
data, your boss comes in the room with fume steaming outside his ears telling
you names because you "can't store f __* client's F __* valuables cleany".
Then, using awk or grep kungfu, you come up ten minutes later with the exact
millisec when the client did click on "Yes, I am sure I want to delete my
data", his IP, his session_id, his browser fingerprint and so on. In case of
security audit, you are required to send 10 years of server logs to a trusted
third-party (they will not do anything with it, they just want to make sure
you have logs). You zip the thing and send it by email to then (thus crashing
security auditor's email server). You have done your duty.

These are what logs are for. Having JSON or any other format inside just make
it more fragile, less versatile, will mess with line-oriented commands and is
unnecessary.

If you are using your logs regularly to track some business data about your
product, you are using logs for the wrong purpose and should consider using
something else.

~~~
gizzlon
You just email them? No encryption or anything like that?

Kind of funny that a security audit is the thing that trigger this .. =/

------
sciurus
There's been a lot of noise about logging in the linux ecosystem lately.

There's Project Lumberjack ([http://bazsi.blogs.balabit.com/2012/02/project-
lumberjack-to...](http://bazsi.blogs.balabit.com/2012/02/project-lumberjack-
to-improve-linux-logging/)) to encourage applications to generate structured
logs and better document/integrate tools for working with those logs. The
proposed structure is Common Event Expression (<http://cee.mitre.org/>).

At the last kernel summit, ideas (<http://lwn.net/Articles/492125/>) were
presented on how to make kernel messages more structured.

More radically, there's The Journal (<http://lwn.net/Articles/468049/>), a
proposed replacement for syslog.

~~~
oasisbob
Also, the newest syslog RFC (<http://tools.ietf.org/html/rfc5424>) allows for
structured data to be included with the message.

The nice thing about this approach is that you can serialize the received
messages however you'd like: JSON, XML, TSV, whatever.

Support for RFC 5424 in syslog daemons and logging libraries is thin, but will
hopefully improve soon.

------
a3_nm
What if I need, say, to find the 10 IPs that make the most requests? With the
Apache log format, I can write the following in about 15 seconds:

    
    
      cut -d ' ' -f1 log | sort | uniq -c | sort -nr | head
    

Say you need to follow accesses to a particular file? The following quick and
dirty one-liner probably works well enough:

    
    
      tail -f log | grep --line-buffered file.pdf
    

How do you do that with json?

Granted, as soon as your logs stop being a sequence of records (lines) with a
fixed sequence of neatly delimited records, you will need something more than
text. However, I still don't know of tools to work with json from the command
line that are as concise, efficient, flexible and robust as the standard unix
utilities for text.

~~~
keenerd
> How do you do that with json?

Depends on the schema used, but it would probably be something like

    
    
        jshon -a -e "ip" -u < log | sort | uniq -c | sort -nr | head

------
Ixiaus
I dunno, this feels like the "web developers" approach to logging. I can't say
that it wouldn't be cool to be able to parse logs into a structured format,
but honestly, tools are already there to parse logs that are very powerful
(gawk + shell pipes + {whatever_unix_tool_you_can_think_of}). If you don't
have programmers that can knock out a real quick awk liner to process a
logfile for you in any custom way you want, then I could see where this
approach (using JSON) is useful because then they can use something they are
familiar with instead of something they are not. But really, you should know
how to use the Unix tool chain if you're a programmer.

------
leif
TSV and you're done

smaller

readable, esp. with `column -t <log`

works with awk/cut/join/grep/sort/column/etc./etc./etc.

if you have complicated enough logs that you can't maintain the shell scripts
that parse them, you probably also have enough log data that json's going to
blow up your space and you probably want indexes anyway, so throw it in a real
database (oh hi I work for one of these, log analysis is actually one of our
strong suits)

but others have already commented to this effect

------
delinka
"Alex ... [realizes] that someone added an extra field in each line"

Someone?!? Who's touching the server configuration and why? Unless Alex put a
publicly accessible web interface on his .conf files, this shouldn't be
happening.

Back on topic. The increase in size for logging in JSON could easily be a deal
breaker.

~~~
pbiggar
OT, but I really like the way you expressed this: "could easily be a deal
breaker". Too many comments on HN instead say "This would never work because
of the increase in size" instead. Your comment recognizes the nuance without
getting bogged down in what-ifs and disclaimers. Very well put.

------
jakejake
I've done various different log formats over the years including JSON.

One thing I've done for logging errors or warning is to log them in RSS
format. I monitor them just like any RSS feed. It's really handy because
there's already tons of ways to read these logs so we don't have to create
anything.

I wouldn't use this for a debug log because it would probably be unusable if
there was a large volume of logs, but for watching errors it's great.

------
zmj
This idea is as old as Lisp. <http://sites.google.com/site/steveyegge2/the-
emacs-problem>

------
jacques_chester
One of the non-functional requirements of logs is that they should be fast to
write. Marshalling data into a structured format takes longer than spitting
out sprintfs.

If you really need structure for ease of querying, you might as well go all
the way and throw it into a proper data store.

~~~
gorset
Remember we're talking about writing to disk. The time it takes to marshal the
data is pretty much insignificant to the time it takes to do the actual write.

We use JSON on disk heavily in several places where we generate or receive
huge amounts of data. JSON-in-a-file is a pretty good datastore for sequential
data processing because it's so convent and you can work with the data using
cat, tail or any tool supporting JSON.

~~~
jacques_chester
> The time it takes to marshal the data is pretty much insignificant to the
> time it takes to do the actual write.

And JSON is more verbose.

Basically this is another read/write dichotomy.

When do we pay the price of structuring data? At write time? Or at read time?
I'd rather pay it at read time, as writing may be shaving performance from my
actual primary production system.

~~~
gorset
This is premature optimization. Do you think it's expensive to write a few
brackets, quotes and escape a few characters? We have harder problems to think
about than optimizing for writes in our logging system :-)

How can you trade write/read without some structure in the log message? And do
you know that sprintf actually costs something too? It's basically a mini-
language which must be parsed and interpreted. You can go google
sprintf+performance to see stories of people finding this out - but most of
the time it doesn't matter, because the cost is insignificant and you can do
what's most convenient for you.

~~~
jacques_chester
My point is that structure is imposed _somewhere_ if you want to query your
data. Either that structure is in the data, or in tools that parse the data.
There is a "conservation of structure", if you like.

Most of the bunfights between proponents of different technologies is really
an argument about where to pay structural tax. You can pay more at write time
or read time on a sliding scale.

For logs, I think the smart option is to pay as little write-time overhead as
possible. Their purpose is to maximally describe an event with minimal
interruption to service. Every strictly unnecessary adornment to structure
takes you further away from that core non-functional goal.

------
frsyuki
We're also using Fluentd as well as original JSON-based logging libraries.

Fluentd deals with JSON-based logs. JSON is good for human facing interface
because it is human readable and GREPable.

On the other side, Fluentd handles logs in MessagePack format internally.
Msgpack is a serialization format compatible with JSON and can be an efficient
replacement of JSON.

I wrote plugin for Fluentd that send those structured logs to Librato Metrics
(<https://metrics.librato.com/>) which provides charting and dashboard
features.

With Fluentd, our logs became program-friendly as well as human-firnedly.

------
dasil003
Loggly supports this, and they provided a good interface for querying the data
as well. We used it for a while as a way to unify a couple GB of daily log
data from our Rails app running on multiple instances. I even wrote a library
that allows you to quickly add arbitrary keys to the request log entry
anywhere in the app.

Unfortunately we had to disable it temporarily as the Ruby client did not cope
well when latency increased to the Loggly service. It was fine for a while
since we are both on AWS, but one day our site started getting super slow. It
took a while to track down the problem because the Loggly client has threaded
delivery, so a given request would not be delayed. But the problem was that
the next request couldn't be started until the delivery thread terminated.

Okay I realize this is not the best architecture. There should be a completely
isolated process that's pushing the queued logs to Loggly so that the app
never deals with anything but a local logging service. Loggly supports
syslogng, but that would be standard logging not JSON, so I think if we want
to go this route we need to come up with something on our own...

------
Simpletoon
I only need three programs to deal with anti-ASCII, pro-complexity JSON, XML,
etc. crowd: tr, sed and lex.

All the effort these Javascripters expend putting data JSON just gets undone
by my custom UNIX-style filters; then I can actually work with the text.

Are they making life easier? For who? Seems like it's just more work for
everybody, translating text back and forth into myriad formats.

But what can you do?

~~~
deno
Your “plain text” probably has some implicit structure. XML, JSON,
ProtocolBuffers, just make that structure explicit.

Dropping to plain text only to run sed or grep is a classic case of “if you
have a hammer….” XML has a myriad of tools that do make your life easier — you
just need to learn to embrace them.

~~~
Simpletoon
It all starts as a stream. That is the "universal format".

sed was designed to edit streams. A stream can be transformed via stream
editing into any text format, for any downstream consumer. It's line based.
That's the only limitation.

lex can handle multiline "records". There is nothing you cannot do with lex.
But only if you know how to use it. It's usually faster than any scripting
language. Worth learning to use? Your choice. But it is what it is. It works.
It, or some clone of it, was used to build the compiler that someone used to
compile the shared library you're using as part of your special solution for
the format of the month.

If you produce a stream as JSON, that's great. But now we're limited to
consumers that understand JSON.

If you know your consumer wants JSON, then sure use some specialised library.
But that's not what this guy is suggesting. He wants everything in JSON.

Well, not every consumer wants JSON.

This is a case of "I learned [X]. Please everyone use [X]."

None of us want to have to learn every language and every application.

Now consider if [X] is UNIX. For better or worse, it's the foundation on which
most stuff talked about here runs. Perhaps it seems crude, it lacks
sophistication in the eyes of a younger generation. It's a "hammer". But what
can you build without a "hammer"?

In his case, [X] is Javascript. What's the foundation for Javascript? A "web
browser".

Perhaps some people think nothing is possible without a web browser that can
run Javascript.

It's a very narrow view.

~~~
deno
Your plain text is less portable than any structured format. You’re creating
ad-hoc parsers to process your ad-hoc format. There has to be some implicit
structure to this text, otherwise you wouldn’t be able to use lex.

All it does is it ties your format to the specific implementation of your
parser, including all the bugs in your custom stack. Your logs are now .docs,
just in plain text.

> It all starts as a stream. That is the "universal format".

False. It starts as a data structure in the memory of the producing entity.
The most direct or lightweight format would be a direct memory dump of the
process. This would be unpractical, so the choice is between a generic
portable structured data format and an ad-hoc serialization format.

Here’s your pipeline:

Structured data (Producer) → Plain text → Structured data (Consumer)

It’s like creating JPEGs of your logs and then running OCR to get the
structured data back. That would be insane, right? But that’s the _exact
analogy,_ just your pipeline is a little lighter.

Now consider the alternative:

Structured data (Producer) → Portable structured data (.xml) → Structured data
(Consumer)

The data in a structured, portable and uniform format like XML can be
leveraged to offer rich and powerful tools, like XPath/XQuery/XSLT, all the
while remaining agnostic to the specific data domain.

It’s just the logical thing to do.

------
rhizome
Except that fixed-field loglines are much faster to process than parsing JSON,
which makes a difference when working with large logs.

~~~
kelnos
Depends on what you value more: CPU time, or your developers' time keeping the
log format and parsers in sync after any change.

~~~
mkross
You could test that the logging and parsing are in step, which then reduces
the developer time to "Doh, red test" and a simple fix.

~~~
kelnos
Why increase your burden even to that? Computers are there to do stupid
repetitive stuff for us so we don't have to. Why design a logging format that
requires even _minimal_ manual intervention to keep automated parsing working?

------
daenz
Logging to mongo (as JSON) has proven useful to us. Makes it easy to slice and
dice the data.

~~~
asuth
Ya, logs are soooo much more valuable when they're in a database that you can
query. We log to MySQL and then correlate logs with users, events, IPs, pages,
etc.

Logs are just as relational as anything else, it took me awhile to realize it
though.

------
joelthelion
I would love to see a JSON based shell, instead of the traditional shells
based on raw strings. Heck, we could have a whole ecosystem of tools built
around JSON or similar semi-structured representations.

------
mmphosis
Log Many things as [my favorite format]. Make My Life Easier by doing the
difficult work.

I would log in a fast compact, but not limited, and heavily documented binary
format at a hardware level with lots of fail-safes. Maybe what I am doing is
more appropriately called creating a journal. [My favorite scheduler] would
very lazily and at opportunistic idle times convert the older non-human
readable binary logs and insert the log data into [my favorite] database as
very query-friendly information.

~~~
chronomex

      I would log in a fast compact, but not limited, and heavily documented binary format
    

Such as ASN.1, perhaps?

------
Hopka
How do you even log as JSON?

Is your entire log file a giant JSON array? That would be challenging for most
parsers I know because they would have to read the entire array into memory
first.

Or do you log one JSON object per line? Then you would get problems as soon as
you have line breaks inside strings and still have to parse until the object
ends in some other line. Also, JSON objects do not have to be single-line to
be valid, so you would in fact be working with some self-defined subset of
JSON.

~~~
drostie
_Is your entire log file a giant JSON array?_

That's not necessarily a bad choice, but it's problematic for the reasons you
describe. Still, since JSON is concatenative, you could indeed store all of
your objects with a comma at the end and then use:

    
    
        import json; json.loads("[%s null]" % file_contents)[:-1]
    

_Then you would get problems as soon as you have line breaks inside strings_

Then you are not logging JSON. JSON does not permit that.

 _Also, JSON objects do not have to be single-line to be valid, so you would
in fact be working with some self-defined subset of JSON._

Yes, and? Working in a subset of JSON which forbids newlines as whitespace --
that's still JSON, and it solves your problem elegantly.

Do... do you have multiple logging programs, logging to the same file, and one
of them wants to insert newlines? Is this a real problem in your dev stack?

~~~
Hopka
It is a real problem in someone's dev stack. I have already written code to
preprocess such log files before feeding chunks into a real JSON parser. It
didn't make my life easier.

I'm not against logging as JSON at all, but as pointed out, you have to use a
subset that makes parsing the logs easy.

------
wolframarnold
I like this idea a lot. Frameworks like Rails come with excellent log
messages, granularity and a pub/sub mechanism. Often this can be a lower
hanging fruit than throwing in a ton of custom instrumentation for some third
party analytics tool, especially when you're pressed for time.

My question is how fluentd can be hooked into Rails so that Rails' native
messages use it and how does it work in the Heroku infrastructure?

------
kablamo
I've been thinking about this recently as well. I wrote a simple JSON logger
for Perl recently. It will probably be on CPAN this weekend. Until then you
can see it on prepan and github.

prepan <http://prepan.org/module/3Yz7PYrBSd>

github <https://github.com/kablamo/Log-JSON>

------
thezilch
Or provide unit test for said log parser and require (or don't) all tests to
pass pre-commit. A JSON struct isn't going to stop your colleague from
removing nor renaming a field. Removing the logging all together. Or changing
the format himself, if your company is really setup for allowing colleagues to
so easily break your code -- not that anyone's perfect.

------
anonymoushn
Is it worth switching to JSON to avoid having to edit your bash 1-liner when
you change the format of the log?

------
sauravc
We've been logging all of our analytics data in JSON for years now.

------
majmun
I tried this , it was no good ( because of escaping of special characters, and
parsing performance. )

then i switch to newline and n r . and all my problems were solved (for now)

------
webjunkie
Okay, and as soon as I switch to JSON, I have not just 5 million referrers
logged per day, I also have 5 million times the word "referrer" in my log.
Nice.

~~~
deno
Trivially solved by compression. In those five million referrers you’re
probably also repeating “Firefox” and similar strings over and over, so
compressing logs is already standard practice.

~~~
webjunkie
Compressing logs that are being written to? Never heard of that...

------
wooptoo
Why not go even further and store them in MongoDB?

