It's beyond me how he doesn't understand that text logs are a universal format, easily accessible, that can be instantly turned into whatever binary format you desire with a highly efficient insertion process (Splunk is just one of those that does a great job).
Here is the thing he doesn't seem to understand - all of us who are sysadmins absolutely understand the value of placing complex and large log files into database so that we can query them efficiently. We also understand why having multi-terabyte text log files is not useful.
But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location. Because you know what everyone has their own idea of what that primary storage location should be, and they are mostly incompatible with each other.
The nice thing about text - for the last 40 years it's been universally readable, and will be for the next 40 years. Many of these binary repositories will be unreadable within a short period, and will be immediately unreadable to those people who don't know the magic tool to open them.
Uh, I don't know what world you live in but I'd like the address because mine sucks in comparison.
Text logs are definitely not a "universal format". Easily accessible, sure. Human readable most of the time? Okay. Universal? Ten times nope.
Give you an example: uwsgi logs don't even have timestamps, and contain whatever crap the program's stdout outputs, so you often end up with three different types of your "universal format" in there. I'm not giving this example because it's contrived, but because I was dealing with it the very moment I read your comment.
But at least you have a fighting chance. What if that exact same data was dumped into a binary file, that you did not know how to decode?
Originally, you had a problem - the data wasn't formatted in a manner that you could parse cleanly.
Now, you have a new problem - not only is the data not formatted properly, it's now in some opaque binary file.
Saying that there are poorly formatted text files isn't a hit against text files, it's a hit against poor formatting. The exact same problem exists if the file is in binary form, and not formatted properly.
> a binary file, that you did not know how to decode
I guess nobody ever advocated putting stuff in a binary file with an undefined format. Databases, syslog-ng, elasticsearch and the systemd journal all have a defined format with plenty of tools to access the data in a more structured way (eg. treating dates as dates and matching on ranges).
I agree the issue at hand is not just binary vs. plain text, it's more "how much you want to structure your data".
The classic syslog format is very loosely defined, with every application defining its own dialect, each with its own way to separate fields and handle escaping. To fix that you could store the log data as JSON as many online services are doing. But once you have JSON, grep is no longer enough to properly handle the data even if it's still plain text. Now that you have both a quite verbose format on disk and the need for custom tools, why not store the log as binary encoded JSON (eg. something like JSONB in PostgreSQL)? Or make it even more efficient with an format optimized for the specific usage? Add some indexes and you get more or less what databases, ElasticSearch and the journal do.
Also keep in mind that most of the logs right now gets rotated and compressed with gzip, I'd doubt that the above binary formats are less resilient to errors than a gzip stream.
That's what the grandparent was explaining though. We have near-ubiquitous tools for dealing with plaintext files. Every Linux admin knows them and uses them in many more situations than just log files. They can be scripted and piped, and an admin worth his salt could easily find the info he needs with them.
A binary file from whatever logging system, OTOH, is effectively proprietary. Even if the logging system provides you with tools to work on them, you have to 1) know that it's a log file for that logging system, and 2) be familiar enough with the tools in order to work with it.
And the specs will be gone in 40 years. While ASCII will stick around.
Why would they be gone? You realize ASCII is a 'spec' too?
If a binary format has an open specification, it's as future proof as ASCII. ASCII's durability is due to a clear and open specification that's easily implemented. Not some magic sauce that makes it instantly human readable.
That text you see? It's not what's actually in the file. That's just 1's and 0's like every other format. There's literally no difference between ASCII and any other "binary" format.
Does that really matter? Log files are often unimportant when they get over a month or two old, what is it in your log files that has to be kept for 40 years?
Longevity of log files hardly seems like a reason to pick an otherwise inferior format.
It is not about reading 40 years old logs, but rather reading logs from today generated by 40 years old system.
For example, many nuclear power plant in the west were built 40 years ago. Amongst the myriad of sensors, devices in a power plant, I think that most of them are outputting ASCII logs. There are still readable today. (Same can be said about avionics, space probes, etc.)
Now imagine yourself 40 years from now on, trying to fix or reverse engineer a very legacy system, you will have to recompile a journalctl from 40 years ago before being able to read anything.
There's a good chance that you'd be reading EBCDIC logs. :)
40 years from now, you will probably be able to invoke journalctl on the system and parse the dumped output as plain text. Or call gunzip on the compressed logs, $DEITY knows if we will be still using gzip by then. And if the system does not boot, you won't be able to connect the peripherals anywhere else... :)
There's no tool out there that generates log files it cant itself read. So there's not going to be any "oh gee I have these files being generated and nothing can read them" situation.
However, there is just about near-zero system out there that generates text logs that it can itself read. Text logs are write-only for most logging systems, while all binary logs I know of are read+write.
Stepping back though this entire argument is absurd. Thinking about "whatever will those people do 40 years from now with the tools of today" is fairly braindead once you understand that the quality of the tools will affect their longevity. So if the logging system becomes an actual, factual problem over time, the tools will die off by naturally-artificial selection.
I have already worked on very basic embedded system where you only way of getting logs is connecting to the device using a serial line, and after fiddling a bit with the baud rate, you can get some readable output.
In this case, you can't really do anything from the device itself.
Arguably, this is not the use case for binary logger but I was originally addressing the "40 years old logs" argument, that do exist in the real world.
> There's no tool out there that generates log files it cant itself read.
There are plenty of tools that don't read their logs - more precisely, computer units where you don't log in, units that you don't operate on console. Embedded devices that perform some function and also keep some log, but which cannot be used for reading that log. You will need to read that log using something else. Plain text (ASCII, and now ISO Latin and UTF-8) is a fairly stable format for everything, and will be for the next 50 years.
People usually read log files because something went wrong, like a system crash, why do you assume the OS that generated the log file will be readily available?
I've been producing a few services recently which output a chunk of JSON for each log message followed by a newline.
I think it actually solves most of the problems text logs have that binary don't (inability to easily present structured data, etc.) yet keeps the advantages of a text log (human readable, resistant to file corruption, future-proof).
Speaking for myself - multiline .json output is problematic, as most of the parsing tools work best when the data is on a single line, and it's a cognitive struggle to deal with multi-line output, even if you are clever with your tools. I usually have to end up writing a json parser in python to get the data into a format that I can manipulate it. (Thankfully, python does 95% of the work for you when reading a json file)
But - here is the thing, even though the .json format isn't convenient for me, I can, with about 20-30 minutes effort, write a parser that can get the data into a convenient format, because it started out as a text file.
If you're just grepping for a single word or phrase it really isn't much different to grepping regular logs.
If you're extracting structured data (e.g. getting the time stamp and a status code), it's actually easier than screwing around with awk and figuring out which exact column the time stamp finishes on and hoping that server #7 doesn't put it on a different column.
Well - to be clear, if I I run into a log file with it's data on a single line, 95% of the time it will take < 30 seconds to extract the data I need. If I run into a multi-line json file, trying to re-integrate all the data back into a single record will take me on the order of 30 minutes. (Mostly because I usually only do it once or twice a year, so I typically start from first principles each time. Multi-Line .json log files are very rare.)
95% of the time I just give up on the multi-line .json files - unless it's really, really critical, I probably don't want to spend 30 minutes writing code to re-assemble the data.
Text Log files, wherever possible, should capture their data on a single line. If they need to go multi-line, then having a transaction ID that is common among those lines, makes life easier.
.json files (or xml files), are an interesting halfway point between pure text, and pure binary. They aren't easily parseable without tools, but, if you have to, you can always write your own tools to parse them.
Maybe I am misunderstanding but it sounds like you are encountering bad json log file practices because json entries are spanning multiple lines. Which implies they are being printed in non compact form aka prettified. Thats a problem in the pure text world too. And hurts worse when it happens there. Its kind of an apples to oranges comparison.
Json log files should ideally print using compact form (which will never have raw newlines) so each entry only takes one line, which is then separated by a raw \n
If that practice is followed each line will represent the complete json object. So you can then pipe the file through jq, Perl, python etc one line at a time.
Printing prettified json to a log should be avoided because it then requires having to reconstitute individual events syntactically before being able to grep for an event. if pretty output is desired pipe it through a prettifier.
Config files are a different story, those should most definitely be pretty printed with one atom per line for nice diffability and the best read and editability json can offer. Sadly json for config files is, unfortunately, a bad idea if you want humans to enjoy editing them by hand. In that case using yml is the best option I have encountered (ansible).
I have no problem with json output in log files, but I would greatly prefer it be constrained to the message portion of a logline. At a minimum I generally want three things per line, a timestamp (in ISO 8601 or something close), a message type (info, warning, error, etc) or log entry source, and the message itself. I don't want to be looking into the JSON structure itself for a timestamp, especially when the field encoding the timestamp may be called something slightly different based on what generated the log...
In that respect, whether the message is JSON, or YAML, or XML doesn't matter, that can easily be worked on later, but the first thing I want to be able to do is filter by time and type.
>I don't want to be looking into the JSON structure itself for a timestamp
A) JSON parsers are relatively common and reliable.
B) The timestamp would be human readable even without the parser.
>especially when the field encoding the timestamp may be called something slightly different based on what generated the log...
I often come across logs that put timestamps in different places on the line and encode them differently (or don't output a timestamp at all, sometimes). This is no different to having to deal with a differently named JSON property.
My point is really around having the date be in a well defined place that isn't necessarily defined by the application that's logging. If the log entry date is at the beginning of the line, there's no ambiguity as to whether it's the log entry date or some other date being logged, and it also doesn't require parsing the JSON at all to filter by the date. If it's not at some very standard location that's easy to filer by (a possibly changing JSON property does not qualify), they it's hard to know you are filtering on the right data, and may also require transform before filtering. JSON parsers are fast. Multi-GB log files will still cause some extra overhead and slow the operation down, so it's best to reduce the working set before parsing the JSON.
>My point is really around having the date be in a well defined place that isn't necessarily defined by the application that's logging. If the log entry date is at the beginning of the line, there's no ambiguity as to whether it's the log entry date or some other date being logged, and it also doesn't require parsing the JSON at all to filter by the date.
Take this example:
1-1-15 1:1:1 Info Log message A
12-13-15 12:34:55 Debug Log message B
12-13-15 1:34:55 Error log message C
12-13-15 1:34:55Error log message D
[12-13-15 1:34:55]Error log message E
It doesn't require parsing JSON to get the date, you're right about that. It's harder than parsing JSON, though.
Note two replies of mine prior where I state ISO 8601 or similar. Also not where I said the json would be constrained to the message portion of the entry. Preferably there's a logging mechanism that takes care of that for you, so you can't screw up the timestamp and type portions of the entry. In that case, your entries become:
2015-01-01 01:01:01 Info Log message A
2015-12-13 12:34:55 Debug Log message B
2015-12-13 01:34:55 Error log message C
2015-12-13 13:34:55 Error log message D # let's assume that was 1 PM data for the sake of the example
2015-12-13 01:34:55 Error log message E
Getting the date is trivial. Getting the type is also trivial. Give a static field size to type and it's event more so. The point is, you abstract the message from the rest of it, so the message can't screw up the metadata of the entry, and log whatever you want for the actual message (xml, json, plain text, whatever, just no raw newlines). This is what we have today with syslog, sans newline replacement and a slightly different date format (but still unambiguous). It works. It's useful. It's VERY easy to filter type type or date. You can take the first X chars and split on space/whitespace if you need to. You can log a message of a few megabytes and if there's no raw newlines there's efficient utilities to ignore that until you have what you want (/bin/cut).
Not to disagree with you in any way, but `jq` is something you might look to add to your toolbox. As must JSON as we see anymore, it's a good tool to have.
Like you, I also deal with the kinda weird uwsgi logs. I feel like "universal format" probably didn't mean the format of all the lines in all the logs is the same - though your definition is probably more accurate.
Despite that, I can be pretty sure when I walk in to a foreign system there will be nginx logs, just where I expect them, almost certainly in the format I'm used to. And even if the format differs, it's not much of a problem. Binary logs, big problem.
Sure, on a site that uses ElasticSearch for its logs I would have no idea where to look at. I'd be more at ease with SQL, but first you need to locate the DB, figure out the schema, get the SQL dialect right.
That said, I'd be far more at ease writing a SQL query to extract analytics from logs than cooking up some regexes and doing complex stuff with awk.
And I find the --since/--until parameters to journalctl far easier than matching dates by regex. Or even the --boot parameter to restrict logs to a specific boot, which with would be probably doable with awk but definitely not as trivial.
I think that binary logs give you some compelling features, without taking away any: you can always just dump the logs on stdout and use grep as much as you want. :)
"Text logs" is not a format at all, so it can't really be a universal format, either. But if there were such a thing as a "universal" format, it would probably by definition encompass everything in time and space. You think timestamps are a problem? Just wait until your logs get trapped in a quantum state. Talk about a heisenbug...
"But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location"
The way I read his article, he's not really opposed to additionally keeping your logs around as text.
But you make a good point of using text as the primary storage location, since you can always easily feed it to some binary system for further analysis.
Would the best practice then be to keep your logs around as (compressed) text, but additionally feed it to your log analysis system of choice for greater querying capabilities?
Exactly. And I think that's what every shop that has discovered Splunk (or other such tools) has started doing. Sysadmins love log data in queryable format in a database. I'm the hugest advocate of this. I have some queries that took greater than 30 minutes when coming from a modest text files, that can be performed in under 50 msec when in a database.
But don't cripple me by shoving your primary log files into binary format so I can't quickly pull data out of them with awk/grep/sed when I need to quickly diagnose a local issue.
Agreed. Logs are for when everything and anything is broken. They aren't supposed to be pretty or highly functional, they are just meant as a starting point for gathering data.
our product stores all the logs raw in flats files on the file system, we don't use databases for keeping the logs in, this allows you to scale massively (ingestion limit is that of the correlation engine and disk bandwidth). You then just need an efficient search crawler and use of metadata so search performance is good too.
Issue is if you every need to pull the logs for court and you have messed with them (i.e. normalized them and stuffed them into a DB) then your chain of custody is broken.
Best of both worlds means parsed out normalisation so I don't have to remember that Juniper calls source ip srcIP and Cisco SourceIP, but the original logs under the covers for grepping if you need.
Cool, so which standard binary log storage format should we all switch to?
Should I submit patches to jawstats so that it'll support google-log-format 1.0 beta, or the newer Amazon Cloud Storage 5 format? Or both? Or just go with the older Microsoft Log Storage Format? Or wait until Gruber releases Fireball Format? Has he decided yet whether to store dates as little-endian Unix 64 bit int timestamps, or is he still thinking about going with the Visual FoxPro date format, y'know, where the first 4 bytes are a 32-bit little-endian integer representation of the Julian date (so Oct. 15, 1582 = 2299161) and the last 4 bytes are the little-endian integer time of day represented as milliseconds since midnight? (True story, I had to figure that one out once. Without documentation.)
Should I write a new plugin for Sublime Text to handle the binary log formats? Or write something that will read the binary storage format and spit out text? Or is that too inefficient? Or should I give up on reading logs in a text form at all and write a GUI for it (maybe in Visual Basic)?
Do you know when I should expect suexec to start writing the same binary log format as Apache, or should I give up waiting on that and just write a daemon to read the suexec binary logs and translate them to the Apache binary logs?
Should I take the time to write a natural language parsing search engine for my custom binary log format? Do you think that's worth the time investment? I would really like to be able to search for common misspellings when users ask about a missing email, you know, like "/[^\s]+@domain.com/" does now.
I look forward to your guidance. I've been eagerly awaiting the day that I can have an urgent situation on my hands and I can dig through server logs with all of the ease and convenience of the Windows system logs.
The system should provide a standard API for writing and reading logs. The precise format of the underlying log files is thus rather unimportant at this level of abstraction. Other than the logging subsystem and recovery tools, there's no need for any software to be accessing such log files directly (outside of the API functions). This is how Windows has done it for years.
Even if you can manage one per OS, it's not good enough. Have you ever worked in a non-monoculture or dealt with recovery of a system severely damaged by malice or accident?
I doubt my Linux (including webOS & Android), FreeBSD, and OS X boxes are going to settle on a single binary format in the next couple of decades or even a single API & toolset. In your brave new world the very first thing I'm going to need to do if I have to combine logs across them is to extract data from at least three formats and the most convenient format is often going to be text - i.e. right back where we started, but with extra work for each OS. More likely you'll get a mix of things using the system APIs, custom binary formats, custom text formats, and syslog. Adding more steps to get at the same data doesn't help.
More importantly, binary logs are unreliable when you're dealing with a system that's completely trashed. You can often get usable text logs off a disk that's throwing I/O errors every few dozen bytes or even from a corrupted raw disk image. They may not be cryptographically "sealed", but I'd rather have them than an error message about the binary format being corrupt. That should be an implementation detail, but I haven't seen much interest from the binary logs camp in making the file formats resilient.
You missed the joke at the end where he correctly pointed out that Windows' logging is a total joke, and that discovering information from Windows logs is essentially impossible unless the tool writer specifically predicted your use case.
And that's the nub of it: text logs are for when you may have many varied, complex reader use-cases, and you don't understand all those cases well enough yet to lock them down forever, and you have a thousand excellent tools at your disposal that you would like to be able to continue to use.
Recent log spelunking for me included 'cat log.? | grep fail | sed 's/^.worker_id$//g' | awk '{ print $5, $4 }' | sort -n -r | sed 30q'.
There's no analogue in any binary logging system I've ever found.
It seems to me that a simple transitional tool for a binary logging system would be for the implementer of the binary logging system to also include a tool that consumed a binary log file on stdin and produced a stream on stdout in one (or more, selecting which by command line arguments) common text log formats.
That lets you develop an ecosystem of supporting tools that take advantage of any strengths of the binary format, while still allowing the freedom of using the (initially, at least, probably far more capable) set of tools available for the text formats.
If there is some (not initially necessarily net for all users -- benefit being, after all, something that varies from user to users, but significant for some subset of users) benefit, the point is to mitigate the cost of moving out of a native text format, and increase the number of users for whom there is initially a net benefit, which also increases the initial use of the binary format and the effort likely devoted to building auxiliary tools which leverage it to some advantage, increasing the speed at which the net benefit of the format for a wider range of users is increased.
This may or may not ever make it a net benefit for every user, but that's okay. There's a whole lot of space between "this technology is the best choice for everyone" and "this technology is the best choice for no one".
This isn't really true as the Windows event logs contain text as well as the other structured data, which you can search for using tools on the system. For example to search for some specific text in the system log using Powershell:
Get-EventLog -LogName System | Where {$_.Message -Match "something"}
To process text as fields, as with awk, one would use the Split method (at least to start off with):
Get-EventLog -Log System | Where {$_.Message -Match "something"} | %{ $_.Message.Split()[5,4] }
But as message text is often parameterised, it may be easier to take advantage of this data to get what you need. For instance, this command would extract the latest machine sleep and wake times from the system log, and calculate the duration:
Most binary formats contain text; that isn't what distinguishes them from text formats.
One of the objections though is that with binary formats you're limited to the capabilities of the tools that have been built to handle that particular format, which you're illustrating nicely. In a binary format world, I would have to know the capabilities and limitation of dozens, maybe hundreds of different tools for extracting useful information from logs, instead of the small handful of tools I use to do the same job now, which can be applied to any log file formatted as plain text.
And that's assuming that all these other tools will be as powerful as Powershell, which isn't a bet I'd want to make.
madhouse has some fair points about the limitations of text logs, but "everything should be stored in binary formats" is a not a great idea. Actually, "a terrifying new hell" is probably closer to how I feel about it.
In the case of wanting to stick to a text-only workflow, rather than taking advantage of the structured data features, then you only need a tool that converts the binary log format to your preferred text format. Which isn't too arduous. In systemd that would be journalctl, in Windows anything that can use the event log API such as Powershell or many other utilities.
The examples I posted above were just to show the equivalent capabilities in Powershell but really it's all flexible enough to use whatever you like.
Binary logs may be fine for you, but don't force it on us!
This is really the important point here. For small systems, grep works fine. The number of people administering small systems is much greater than the number of people administering large systems. The systemd controversy has caused people to fear that change they don't want will be imposed on them and their objections insultingly dismissed: a consequence of incredibly bad social "change management" by its proponents.
They are therefore deploying pre-emptive rhetorical covering fire against the day when greppable logs will be removed from the popular Linux distributions. Plain text is the lingua franca; binary formats bind you to their tools with a particular set of design choices, bugs and disadvantages. My adhoc log grepping workflow has a different set of bugs and disadvantages, but they're mine.
That really the key for me. My go to example is searching for IP numbers across different logs. If I have just one machine, and I want to find an IP in the SSH, web and mail logs I shouldn't have to use multiple tools for getting that data.
Logstash, Splunk and other tools store stuff binary, as he writes, and that's perfectly valid, the only solution in fact. But I don't want to be force to run a centralized logging server, if I have just the one or two servers.
If it's okay to claim that binary logging is the only way to go, because you have hundreds of servers, it's also okay to claim that text files are the only solution, because I just have one server.
Finally, isn't those binary logs (those that come from individual services) going to be transformed into text when I transmit them to something like Splunk, only to be transformed back to some internal binary format when received? It seems we could save a transformation in that process.
Yes, which means that if say systemd logs where to be shipped to his ElasticSearch instance, he need to configure Journald to log to text files first, and then what's the point of having the binary format?
Yes, ElasticSearch is storing data in binary, and that's fine, but you're not going to ship the raw Systemd binary log to ElasticSearch, nor any other binary logs for that matter.
In fact in the examples he provides both sources are plain text. Syslog-ng and Apache are plain-text logs. He then transfer them to ElasticSearch, where they're store binary, but that's not what anyone is complaining about. The original source should be text, what you choose to do afterwards is your business.
Oddly enough, even for large (>=1e5 physical machines) systems, grep works fine. Better yet, if the logs are important, you're shunting them off for some sort of longer-term storage for post-processing and indexing _anyway_, irrespective of the underlying disk format. Some folks continue to use plain text even then, just with some distributed systems magic wrapped around the traditional Unix tools.
(If you're shunting _all_ of your log data off at that scale, you're crazy, and you'll melt your switches if you aren't careful.)
The name of the game is to think of the problems that you're solving and how they relate to the business bottom line. No sooner, no later. Additionally, what's most troubling is that we've turned this exercise into an emotional one, not one with any sort of scientific-oriented perspective.
I can personally say with conviction that I'd like to sit down and actually collect data on, e.g., how many instructions it takes to store logs to disk in plain text versus a binary format, how many it takes to retrieve logs from disk in both situations, and how much search latency I incur when trying to retrieve said logs from disk in the same. At scale, which is where most of my attention lies these days, that's the kind of thing that matters because those effects get amplified automatically—often to operators' and capacity planners' horrors—by the number of machines you have.
If you're dealing with smaller systems, it won't matter as much, but at that point, you're probably dealing with the other side of this, which is having information on how many requests you get for historical log data and what sort of criteria were used in that search. If you're getting requests less frequently than, say, once per quarter, it likely wouldn't be worth your time to invest in what Mr. Nagy is evangelizing.
tl;dr: Continue using your ad hoc grep-fu, but be mindful of how much time it takes you to get the data you're looking for. That alone will be your decision criterion for adopting something like this.
grep definitely breaks down on large systems. I have one environment with approx 5 million nodes - (1e6), and the only way to coherently manage the log updates from them is in binary format.
But even still - I like to have the text files as journals of original entry - so I can occasionally do a tail -f incoming.log| egrep -i "somedevice".
And having the original files in text format is zero impediment to getting them into handy binary database form.
I hate arguing semantics, but 1e6 is not just large but very large indeed. (:
That said, I'd be curious to know some more of the details of that system actually! If you're aggregating all of those devices together, using something binary in that context definitely makes sense. In fact, if I were in your shoes and tasked with designing some means of solving that problem, I would probably use something like protobuf or capnp to emit those messages since they're well-known and well-understood serialization mechanisms.
Now, that's the integration and aggregation side of this exercise.
On a local node-by-node basis, though, I absolutely agree; having the raw text as journals of original entry for inspection in real time with `tail -f` (or, if you're using multilog, `tail -F`…) would still be incredibly useful.
Going back to Mr. Nagy's article, the space of problems that `tail -f` solves is barely overlapped by the space of problems solved by aggregation. I think he's conflated the two spaces in his article here (and especially in the one previous) whereby he's applied a one-size-fits-all solution to both where it demonstrably does not fit all.
The remote nodes all log to central DNS servers, and Trap Servers. The DNS servers have a nice update.log file that provides their IP address information, and some nice text configs. The trap data, goes into a binary file (database actually) and requires analysis through a web interface.
As a result - the DNS updates are used by me approximately 20x more often than the trap data, when doing diagnostics, even though, in theory, the trap data is incredibly richer, and, of course, has the 15 mandatory fields that are functions of the binary logging. (Time, Date, Event ID, Trap Type, etc, etc...)
Memories of supporting subscriber CPEs and having to go through Drum to analyze the data coming out of logged SNMP traps/notifications are flashing back. Thanks for that. (:
But, yeah, assuming that the nodes in discussion here are not amd64 machines but are instead subscriber CPEs, that's a totally workable (and, frankly, agreeable) solution.
There are a lot of hobbyists, a vast number of people with a Linux box in the corner of the office or a few cloud instances, a smaller number of people running IT for multinationals and one or two people who have whole datacenters to themselves. The larger the system, the lower the computer/human ratio.
I would tend to agree with the OP but with a caveat - most of the people who administer system work on small systems, while most people who's full time job is administration work on large systems. Basically there are an awful lot of people in the world who's job description includes part time system administration.
If I read it correctly there are about 250 million active sites (roughly). It seems unlikely that they are all massive corporate sites.
As an aside, the idea that systemd is a good thing is hilarious to me at the least because it is so brash about making an important change to a huge chunk of the system. Yes the bugs will eventually get ironed out, but in the meantime? Count me out! I have work to do and am not interested in being a free tester for Redhat on my live systems.
The link I included states that those are unique hostnames. Perhaps they are including subdomains on the same ip address, but you might note that rather than quoting the 1 billion sites, I reduced that by their estimated 25% being actually active. Additionally they state that there are on average 3 users per site in 2014. Maybe that doesn't mean anything, but as a rough estimate that all implies far more small sites than large ones.
You don't need evidence for the obvious. There are a few million personal desktop pcs with linux on them, then there are single servers used by exactly one person. Count that against the people working as a professional sysadmin on a big system.
For sure the storage format should not hinder you from using grep if you want. Even with systemd you can pipe journalctl's output and use the same old regexes as its default behaviour is to be a glorified `cat` (but being able to use the --since and --until flags instead of matching date ranges by regexes makes it much better than `cat` for me).
Take this philosophy to an extreme and you end up with a dedicated data format and tooling/APIs to access the data for every subsystem, not just logging. Essentially, this is Windows.
The downside to this is that now you don't have a set of global tools which can easily operate across these separate datasets without writing code against an API. I hear PowerShell tackles this; I don't know how well. The general principle though harms velocity at just getting something simple done, to the benefit of being able to do extremely complex things more easily. See Event Viewer for a good example of this.
Logs don't exist in isolation. I want to use generally global tooling to access and manipulate everything. I don't want to have to write (non-shell) code, recall a logging-specific API or to have to take the extra step of converting my logs back to the text domain in order to manipulate data from them against text files I have exported from elsewhere for a one-off job. An example might be if I have a bunch of mbox files and need to process them against log files that have message IDs in them. I could have an API to read the emails, and an API to read the logs, or I could just use textutils because I know an exact, validating regexp is not necessary and log format injection would have no consequence in this particular task.
I do see the benefits of having logs be better structured data, but I also see downsides of taking plain text logs away. Claiming that there are no downsides, and therefore no trade-off to be made, is futile. It's like playing whack-a-mole, because nobody is capable of covering every single use case.
Honestly - I agree about the ELK stack side - piping all your logs into ES / Logstash is a great idea. (Or Splunk / Greylog / Logentries)
If you run any sort of distributed system, this is vital. And while that counts as binary logs, I would argue that on the local boxes it should stay text.
I would agree, if you are running any sort of complex queries on your data - go to logstash, and do it there - it much nicer than regexes.
If on the other hand, you just want to see how a development environment is getting on, or to troubleshoot a known bad component tail'ing to | grep (or just tail'ing depending on the verbosity of your logs) is fine.
I don't have to remember some weird incantation to see the local logs, worry about corruption etc.
One problem I will point out with the setup described is syslon-ng can be blocking. If the user is disconnected from the central logstash, and their local one dies, as soon as the FIFO queue in syslog-ng fills, good luck writing to /dev/log , which means things like 'sudo' and 'login' have .... issues.
Instead, if you have text files being written out, and something like beaver collecting them and sending them to logstash, you have the best of both worlds.
Windows has had binary logging forever. Is windows administration some wonderland of awesome capability for getting intelligence out of logs? Hell no.
For administering Unix like systems, the ability to use a variety of tools to process streams of text is an advantage and valuable capability.
That said, your needs do change when you're talking about managing 10 vs 10,000 vs 100,000 hosts. I think what you're really seeing here is a movement to "industrialize" the operations of these systems and push capabilities from paid management tools into the OS.
I think that largest problem with Event Log is overreliance on structure. Often you have one particular log record that you know is the problem, but no idea what it means because you have some generic event code and bunch of meaningless structured data.
Freeform text logs usually contain more detail as to what exactly happened.
Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are down/crashing/losing data is far worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...
Yeah,in the presence of adequate tooling you don't need to grep logs. But how much more effort is required to use those tool-friendly loggings? Where is your god when the tool fails?
For me the main reason to access plaintext logs is they seldom fail, and they are simple. They are a bore to analyse, they CAN be analysed.
Anyway, this discussion only makes sense if the task at hand involves heavy log analysis, don't complicate what is simple when it isn't needed.
As for the razor analogy, you're right, however I wouldn't change my beard to be "razor compatible only". In the software world I'd say it is still not uncommon to find yourself "stranded in a desert island".
Oh jeez. Yes there are better and more performant tools for parsing optimised binary databases; nobody disputes that. And yes, tools like Splunk are more user friendly than grep; nobody disputes that either. But to advocate a binary only system for logs is short sighted because logs are the goto when everything else fails and thus need to be readable when every other tool dies. There's quite a few scenarios that could cause this too:
* log file corruption - text parsing would still work,
* tooling gets deleted - there's a million ways you
can still render plain text even when you've lost
half your POSIX/GNU userland,
* network connection problems, breaking push to a
centralised database - local text copies would still
be readable.
In his previous blog post he commented that there's no point running both a local text version and a binary version, but since the entirety of his rant is really about tooling rather than log file format, I'm yet to see a convincing argument against running the two paradigms in parallel.
The ease of recovering data from a corrupted log file depends on whether the logged events have been written as sequential records. This is true for text-based logs (the record delimited being a newline), and is also true of the most popular binary (i.e. structured) log formats, namely Windows event logs, and systemd's journals. Probably not if you're storing them in a more general purpose database though.
So this really is dependant on the file format of your log data, rather than an inherent difference between text and binary logging.
The difference is that a general purpose database typically organises data by fixed-size pages, so new data could be anywhere in the file as there is no guarantee of page ordering with regard to inserts. Whereas a specialised file format for logging would add new records at the end of the file (or in a circular fashion, depending on the design). But will have features similar to a database like a defined schema, and some form of indexing. This is true of systemd journals and Windows event logs anyway.
You can do the same kind of indexing with text files too, eg you see this with dictionary and thesaurus databases.
Thus if you're going to sacrifice the "read anyway" ability of a log file then you really need to go for a fully optimised database to really take advantage of a binary format - rather than this half-and-half approach that has none of the real benefits of either but all of the same drawbacks of both.
> If 'tooling gets deleted' is a problem you probably have much bigger concerns than log files.
You do have a bigger concern, but once that needs to be addressed by consulting the log files.
I fully accept that most of the situations I exampled are rare fringe cases, but log files are the go to when all else fails and thus there needs to be a copy that's readable if and when everything else does fail.
'tooling gets deleted' could easily happen after changing logging systems... while it would be shortsighted to uninstall your old logging system entirely (if you have logs laying around in that format) it's not unheard of.
The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.
> The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.
So expose the shared storage to a system running any current mainstream Linux distribution. I understand what you're saying, but this still doesn't seems like a huge concern.
... We were talking about logging systems with proprietary tools for manipulating logs. Ergo, 'any current mainstream Linux distribution' wouldn't have them installed by default.
This is a discussion for a sake of discussion. The way I see it is that author has a niche situation on his hands and therefore should use a product designed for that particular niche, instead of complaining how everyone's wrong and trying to shove his perspective down peoples' throats.
Sounds like somebody in the systemd camp. I really dislike added complexity when it is totally unnecessary. If people want to transform their logs into a different storage format, that is up to them. Text files, however, are a fantastically simple way of storing... (drumroll please) text. Surprising /s
> For example: find all logs between 2013-12-24 and 2015-04-11, valid dates only.
That’s a straw man. If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates. But I suppose
Not to mention 99.9% of the searches one does of a log file isn't really that complex. Heck, I'm willing to wager that 90% + of my searches over the last 20 years have been in log files from a particular day.
That's the thing about having simple text log files - the cognitive load required to pull data out of them, often into a format that can then be manipulated by another tool (awk, being one of the more well known), is so low that you can perform them without a context switch.
If you have a problem, you can reach into the log files, pull out the data you need, possibly massage/sum/count particular records with awk, all without missing a beat.
This is particularly important for sysadmins who may be managing dozens of different applications and subsystems. Text files pull all of them together.
But, and here is the most important thing that people need to realize - for scenarios in which complex searching is required, by all means move it into a binary format - that just makes sense if you really need to do so.
The argument isn't all text instead of binary, it is at least text and then use binary where it makes sense.
More to the point: Text logs are just as structured as binary logs, but they have the additional property of not being as opaque and, therefore, being immediately usable with more preexisting, well-tested, well-known tooling.
> If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates.
Even _if_ I agreed with your assumption[1], are you actually suggesting that
is a serious solution? I admit that it is shorter than the author's solution, _but it still proves his point_.
And then what about multi-line log lines? `grep` can't tell where the next line is; sure, I can -A, but there's no number I can plug in that's going to just work: I need to guess, and if I get a truncated result or too much output, adjust. Worse, I get too much output _and_ a truncated record where I need it…
Using regexs for time is like using regexs for HTML: it's possible-ish, but most people are probably doing it wrong and storing things using their correct data structures is a much simpler solution.
After reading the article I wonder if there are lots of tools that do all the binary advantages in indexes but leave the logs as text files, why that is not fine. To get the binary advantage the log does not have to be binary.
The example with the timestamps is also strange. No matter how you store the timestamps, parsing a humanly reasonable query like "give me 10 hours starting from last Friday 2am" to an actual filter is a complex problem. The problem is complex no matter how you store your timestamp. You can choose to do the complexity before and create complex index structures. You can choose to have complex algorithms to parse simple timestamps in binary or text form, you can build complex regexes. But something needs to be complex, because the problem space is. Just being binary doesn't help you.
And that's really the point here, isn't it? Just being binary in itself is not an advantage. It doesn't even mean by itself that it will save disk space. But text in itself is an advantage, always, because text can be read by humans without help (and in some instances without any training or IT education), binary not.
Yesterday I was thinking there might be something about binary logs. Now I'm convinced there isn't. The only disadvantage seems to be that you also lose disk space if you store it in clear text. But disk space isn't an issue in most situations (and in many situations where it is an issue you might have resources and tools at hand to handle that as well) It is added complexity for no real advantage. Thanks for clearing that up.
Another advantage of using structured data rather than free-form text is that you can more precisely encode the essence of the event, with fields for timestamp, event source, type of event, its severity, any important parameters, and so on. This permits logging to be independent of the language of the system operator. Rather than grepping for what is almost always English text, one can query a language-independent set of fields, and then, if a suitable translation has been done, see the event in one's native language.
When applied widely throughout a system, this leads to the internationalisation of log messages. Thus lessening the anglocentric bias in systems software. Windows has done this for years, at least with its own system logging (other applications can still put free-form text into the event logs if they wish.)
About your first point: Independence. You are less independent of English and more dependent on the binary format and the tools who can handle it. It's a trade-off. And it might be just an opinion, but for me it's not a good trade-off. Learning English was a one-time endeavour for me. But binary formats and tools have to be learned separately.
About what you put in the log message: You can also put different fields in a line of text. Not getting the advantage or trade-off here.
About the internationalisations: As non-English developers we force all our systems who have logging internationalisation to English system language so we have a common ground for the messages. Understanding the English message is nearly no burden. Log Messages are Event triggers, either in code or in a developer's/admin's mind. If I get a log message in my native language I don't know which event that triggers, which makes it actually harder.
Really. I don't know any non-English person who considers log internationalisation a good thing. Fighting anglocentricism is a very anglocentric topic. Outside of UK/US that's a non topic. We (non-English people) are happy that there is a language we can use to talk to each other and we don't really care how it came to be that widely known.
And even if you don't speak English, I don't see the advantage of parsing \x03 instead of "Error:.*". Both are strings that have a meaning which is rather independent of its encoding.
This is also just anecdotal, but I used to work with a Chinese sysadmin (I was in the UK, he was in China) who found it much more preferable to work on the localised Windows servers we had installed over there, as all the UI and messages were in his native tongue. I'm sure it was easier for him to gain expertise in the tools he needed for his work, than to become proficient enough in English to understand every obscure log or error message that the system might throw at him.
With regards to disk space - compressed text logs are pretty common. The frequency with which they are compressed is adjustable, and, gzcat is a pretty well known mechanism for opening them.
Now I'm no systemd apologist but maybe some of the hate towards systemd, journald and pals is unwarranted. If one gives these newer tools a chance, they actually have some nice features. Despite the Internet's opinion, seems like they were not actually created to make Linux users' lives difficult.
If binary logs turn out to be the wrong technological decision, I'm sure we'll figure that out and change over to text logs again. All it would take is a few key savvy users losing their logs to journald corruption and the change in the wider "ecosystem" would be made. But if all goes well... then what's to complain about? :-D
It's not a counter argument to journalctl's usefulness being independent of binary logs, right? In fact using that nice tool for querying I don't really care how it stores logs under the hood.
Right, I'm sure you could do the same thing with grep and a classic /var/log/messages - in fact there's probably something to do this already. Or you'd find a gz from the day in question and read all of it. Just happened that I'd recently read the man page for journald and that was something I recalled.
Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are is worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...
My main problem with this is that ascii is not something that will ever change over time. The data format is wonderfully static. Forever. Introduce a binary format? You get versioning. It is a major downside.
What you lose when you move away from text logs is not any real benefit; what you lose is the illusion of control you have with text logs.
Text logs can be corrupted, text logs can be made unusable, you need a ton of domain-specific knowledge to even begin to make sense of text logs, etc.
But there's always a sense that, if you had the time, you could still personally extract meaning from them. With binary logs, you couldn't personally sit there and read them out line by line.
The issue is psychology, not pragmatism, and that's why text logs have been so sticky for so long.
A substring of text may or may not be a date and based on the excellent tools available in linux you can decide how to extract that "data point". If binary logging is little more than a stream of text, then that is fine, but I seriously doubt that is the push happening. Personally I prefer having a raw stream of data that I have to work with as best as I can rather than having to use some flag defined by somebody else to range across dates. That is the fundamental difference it seems: do you want a collection of tools that can be applied in a variety of ways or do you want the "one way" (with potential versioning... have fun!).
Again if the binary log is simply better compressed data, well we have ways of compressing text already as an afterthought. This really, fundamentally, seems to be a conflict in how people want to administer their systems and, for the most part, this seems to be about creating a "tool" that people then have to pay money for to better understand.
> Does database store the data in text files? No? That's my point.
This guy is a first class idiot who knows enough to reformulate a decided issue into yet another troll article. "a database (which then goes and stores the data in a binary format)". How about a text file IS a database. It's encoded 1s and 0s in a universal format instead of the binary DB format which can be corrupted with the slightest modification or hardware failure.
I think there are a number of issues that are getting mushed into one.
* Journal is just terrible.
* some text logs are perfectly fine.
* when you are in rescue mode, you want text logs
* some people use text logs as a way to compile metrics
I think the most annoying thing for me about journald is that it forces you to do something their way. However its optional, and in centos7 its turned off, or its beaten into such a way that I haven't noticed its there.... (if that is the case, I've not really bothered to look, I poked about to see if logs still live in /var/log/ they did, and that was the end of it. Yes, I know that if this is the case, I've just undermined my case. Shhhhh.)
/var/log/messages for kernel oopes, auth for login, and all the traditional systemy type things are good for text logs. Mainly because 99.9% of the time you get less than 10 lines a minute.
being able to sed, grep, tee and pipe text files are brilliant on a slow connection with limited time/mental capacity. ie. a rescue situation. I'm sure there will be a multitude of stable tools that'll popup to deal with a standardised binary log format, in about ten years.
The last point is the big kicker here. This is where, quite correctly its time to question the use of grep. Regex is terrible. Its a force/problem amplfier. If you get it correct, well done. Wrong? you might not even know.
Unless you don't have a choice, you need to make sure that your app kicks out metrics directly. Or as close to directly as possible. Failing that you need to use something like elastic search. However because you're getting the metrics as an afterthought, you have to do much more work to make sure that they are correct. (although forcing metrics into an app is often non trivial)
If you're starting from scratch, writing custom software, and think that log diving is a great way to collect metrics, you've failed.
if you are using off the shelf parts, its worth Spending the time and interrogating the API to gather stats directly. you never know, collectd might have already done the hard work for you.
The basic argument he puts forth is this: text logs are a terrible way to interchange and store metrics. And yes, he is correct.
I have no idea on its effect in practice, but theoretically it should have negative effect because it soaks in all the logs, and then it forwards them to the logging daemon, even when journal storage is turned off.
Of course you need to log some data in textual format for emergencies, but if you had a tool that indexes events on timestamps, servers, monitorees, severity and event type, while severely reducing the storage required, you would be able to log much more data, and find problems faster. Arguing binary vs text logs is like arguing serial port vs USB on some industrial systems.
Great to see some effort in this area. I've been using New Relic and it's pretty great for errors because we've setup Slack/email notifications. However, there's nothing for general log (e.g.: access log) parsing. I'm installing an ELK stack on my machine right now and hope that it's enough
Doesn't this just mean that we should have a more "intelligent" version of grep? For example, this "supergrep" could periodically index the files it is used on, so searching becomes faster.
It seems to me that most of the worry about a binary log file being "opaque" could be solved with a single utility:
log-cat <binary-log-file>
… that just outputs it in text. Then you can attack the problem with whatever text-based tools you want.
But to me, having a utility that I could do things like, get a range of log lines — in sorted order —, or, grep on just the message, would be amazing. These are all things that proponents of grep I'm sure will say "you can!" do with grep… but you can't.
The dates example was a good one. I'd much rather:
Also, my log files are not "sorted". They are, but they're sorted _per-process_, and I might have multiple instances of some daemon running (perhaps on this VM, perhaps across many VMs), and it's really useful to see their logs merged together[2]. For this, you need to understand the notion of where a record starts and ends, because you need to re-order whole records. (And log records' messages are _going_ to contain newlines. I'm not logging a backtrace on one line.) grep doesn't sort. |sort doesn't know enough about a text log to adequately sort, but
Binary files offer the opportunity for structured data. It's really annoying to try to find all 5xx's in a log, and your grep matches the process ID, the line number, the time of day…
I've seen some well-meaning attempts at trying to do JSON logs, s.t. each line is a JSON object[1]. (I've also seen it attempted were all that is available is a rudimentary format string, and the first " breaks everything.)
Lastly, log files sometimes go into metrics (I don't really think this is a good idea, personally, but we need better libraries here too…). Is your log format even parseable? I've yet to run across one that had an unambiguous grammar: a newline in the middle of a log message, with the right text on the second line, can easily get picked up as a date, and suddenly, it's a new record. Every log file "parser" I've seen was a heuristic matcher, and I've seem most all of them make mistakes. With the simple "log-cat" above, you can instantly turn a binary log into a text one. The reverse — if possible — is likely to be a "best-effort" transformation.
[1]: the log writer is forbidden to output a newline inside the object. This doesn't diminish what you can output in JSON, and allows newline to be the record separator.
[2]: I get requests from mobile developers tell me that the server isn't acting correctly all the time. In order to debug the situation, I first need to _find_ their request in the log. I don't know what process on what VM handled their request, but I often have a _very_ narrow time-range that it occurred in.
Windows systems have had better log querying tools than grep for years now, with a well structured log file format to match. It's good to see Linux distributions finally catching up in this regard.
Not that the log files on Linux are all entirely text-based anyway. The wtmpx and btmpx files are of a binary format, with specialised tools for querying. I don't see anyone complaining about these and insisting that they be converted to a text-only format.
Here is the thing he doesn't seem to understand - all of us who are sysadmins absolutely understand the value of placing complex and large log files into database so that we can query them efficiently. We also understand why having multi-terabyte text log files is not useful.
But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location. Because you know what everyone has their own idea of what that primary storage location should be, and they are mostly incompatible with each other.
The nice thing about text - for the last 40 years it's been universally readable, and will be for the next 40 years. Many of these binary repositories will be unreadable within a short period, and will be immediately unreadable to those people who don't know the magic tool to open them.