I don't agree with the second assertion there. Text logs are only opaque as far as the format is concerned, but not so much as far as the content goes. Using the example in the article;
127.0.0.1 - - [04/May/2015:16:02:53 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0"
That isn't necessarily an argument against binary logging, but the notion that text log files are opaque in the same way as binary logs isn't really true.
The environment I work in I am frequently looking at logs that other teams generate. If I needed to ramp up on their custom logging toolset just to perform simple queries I am going to give up and waste the the teams time by getting them to perform the queries for me.
> That's a lot more information than you could get from a binary log without any tools.
Arguably you need a tool to get the information you showed above - a single line from an apache log. The tool may have been grep, cat, vi, awk, less, or whatever. That it was installed as part of a base-build on your computer, or at the behest of your usual configuration management system, is either kind of aside, or kind of the point.
Journal uses a bunch of diagnostic & query tools that get installed at the same time that the journal is installed. Yes, the tool / command to get the same type of data you're looking at above -- something that is comparably readable to a line from an apache log file -- is going to be different. But only different.
With a text based logging system, I can take the usb stick with the system that does not boot on my headless homeserver to any computer and read the logs there. I could even boot the original linux system on that server, running a really old kernel and practically no userland tools, and read them there. Cause that server was using journald, that was not possible.
Still don't know what went wrong.
I'm sorry, but I don't find the "but I can view text on a machine from the last century" argument convincing. We're not in the past century, and when doing forensics, we usually do that on a reasonable machine, where all the tools we need are available. Otherwise its an exercise in futility.
For one, because it is not packaged for my distribution. For two, because I get exactly nothing in return. All binary logs do for me is forcing me to use an additional tool.
> I'm sorry, but I don't find the "but I can view text on a machine from the last century" argument convincing.
POGO-E02. I really don't know how old this is, but it has USB-2 and I bought it 2 years ago, though it was marked as classic then. Maybe 2009?
> and when doing forensics, we usually do that on a reasonable machine, where all the tools we need are available
I'm normally doing that at my own environment, with the tools I am used to, and on my machine. Nothing of that includes a binary log viewer.
Are you running Slackware?
I think is it more telling about lack of organization rather than wrong technical choice, but sometime you have to deal with legacy systems and it is good to be able to rely on something as universal as text.
But you're right that there are some work flows and use cases where it'll bite you big time. A recent migration to systemd on my Debian lvm-on-dmcrypt laptop caused me some hours of pain, so I'm not unsympathetic.
Back in the early 90's I was involved in managing a very large network of MS-DOS + Windows 3.x machines. The migration to Windows 95 introduced the same concerns, with similar responses. That's the nice thing about working in IT long enough.
> The migration to Windows 95 introduced the same concerns, with similar responses.
For me, that is the second big large negative point, apart from the missing universal access (which like you said might get better over time, maybe). This route of having a binary journal with its dedicated journal viewers feels awful lot like being on windows. It's the same negative feeling I get when I get in contact with Gnomes regedit clone. Stepping back to Windows 95 is hardly progress.
Anyway, memory may be failing, but the big problem was one of configuration data (typically small volumes) that used to be kept in .ini (text) files, now being shuffled into the registry. There wasn't a size or complexity issue that drove that move, unlike the challenge of managing and merging many large log files from disparate services on multiple hosts.
In the particular case the toolkit did eventually catch up, but it took a very long time (3-5 years for us, I think, to recover the same level of deployment, configuration, automation). With Journal, in contrast, the toolkit's already there, and ultimately I'm just not convinced that 'I don't have Journal tools installed on this computer' is a persuasive argument against the tool.
I'm not saying there are no compelling arguments, just that one isn't.
journalctl -D /<mnt>/<other_system>/var/log/journal
wouldn't work in this case
If you get to the point when standard unix tools are useless, well, it's time to use a _real_ database and/or log management system. Not the time to write your own.
No one is (should be?) going around grepping 100 GBs/day worth of logs.
On the flip side, if the system is huge - then we can use tools like splunk.
grep/tail/awk are the first three tools I use on any system - if you create logs that I can't manipulate with those three tools, then you haven't created logs for your system that I can use.
I'm also not getting why he just doesn't use scripts to parse the logs and insert them into a database at that point. Why use some ad-hoc logging binary format if you're doing complex queries that SQL would be better suited for anyway, on proven db systems?
Maybe I'm missing something.
As the author himself points out: "I'm sorry, but deciding how much and what we log is not your job. Its ours, and this is the amount we have to deal with."
That goes both ways. If I only have one or two servers, having to run a centralized logging services doesn't scale either, the overhead is not worth the trouble.
If I want to look for an IP in logs from multiple service, text files are perfect. Doing the same across multiple servers, yes, then you want centralized logging. Binary logging ruins the first case, while text based works in both (sort of).
I don't really see the point of binary logs. Either you're small enough that text files won't be an issue, or you're large enough to have centralized logging.
It seems that there's a push towards "scaleable solution" for everything, but people keep forgetting that you need to scale down as well. Most of us will never have to run more than a handful of servers, and in these cases the Twitter/Google/Facebook-like infrastructure just isn't worth the hassle.
He needs a log database, clearly. And when you put it that way, it's obvious why grepping logs is a nice, quick solution in many cases when you aren't getting "100Gb of logs a day".
Logs have lots of redundancy, so they compress quite nicely. So it is actually practical to grep those files since on disk they are not so large, and 100Gb of memory data is not a problem to grep.
For a small system - a desktop PC and maybe a custom router box - you do not need one central place for the logs. Thus you don't need an easy way to change it. You don't need to preserve logs in a more efficient way than logrotate does. They don't need to be stored more structured than the filesystem does, the queries are local, and grep is more than efficient enough.
Maybe a binary log is the best choice for you - it seems to be what you want. But that does not generalize to the general public. That is why the rant feels very misplaced for me.
Logging format and log storage format are two very different things.
Also, I'm not shocked people prefer text files. I'm shocked why they're so much against binary log storage. There's an important distinction between the two: you can prefer text, if that fits your case better, without hating on binary storage.
Except according to the article (which you posted and are defending all over this thread, so I'm guessing you actually wrote it?) the author has NO intention of honoring those who prefer text logs, in fact using the phrase "so vigilantly against text based log storage". To use your own reply, you can prefer binary, if it fits your case better, BUT DON'T HATE ON TEXT STORAGE.
Many organizations have a fully functional, well-debugged logging infrastructure. The basic design happened years ago, was implemented years ago, and was expected to be useful basically forever. Growth was planned for. Ongoing expenses expected to be small.
That's what happens when you build reliable systems on technologies that are as well understood as bricks and mortar. You get multiple independent implementations which are generally interoperable. You get robustness. And you get cost-efficiency, because any changes you decide to make can be incremental.
Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?
Short term solutions are generally non-optimal in the long term. Using AWS, Google Compute and other instant-service cloud mechanisms trades money, security and control for speed of deployment. An efficient mature company may well wish to trade in the opposite direction: reducing operating costs by planning, understanding growth and making investments instead of paying rent.
Forcing a major incompatible change in basic infrastructure rather than offering it as an option to people who want to take advantage of it is an anti-pattern.
One interesting problem with almost all of the "advantages" of binary logs, is if they're good reasons today, they would have been really awesome reasons in '93 when I started admining my first linux box. The problem with changing the way I've been doing things is I'm already used to the staggering change in performance from a 40 meg non-DMA PATA drive in '93 to dual raid fractional terabyte SSDs. Its really quite a boost in raw power. Yet what I need to log hasn't changed much. So performance gains have been spectacular. So the comparative appeal is incredibly low. It wasn't a "real problem" in '93. Its maybe a thousandth of that problem level today due to technological improvement.
"Hey, if you change everything in your infrastructure, and all your machines, and all your command lines and procedures and ways of thinking to access logs, you MIGHT be 5% more efficient, well, eventually, in the long term" "Eh so what I remember transitioning from spinning rust to SSD and getting 100x the overall system-wide performance a couple years ago, if I want 5% its more economic just to wait for the next tech boost. Also shrinking basically zero load and effort by half is worthless if there's any cost at all, and unfortunately the cost is absolutely huge."
But, to reply: yes, many organisations have fully functional, well-debugged logging infrastructures. A lot of them also use binary log storage, and have been for over a decade, and are more than satisfied with the solution.
Both rsyslog and syslog-ng have been able to assist with setting such a thing up for about a decade now.
> Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?
The journal has a syslog forwarded, but both rsyslog and syslog-ng can read directly from the journal. Interoperability was there from day one. Smooth upgrade mechanism took a while to iron out, but it's there now, too.
* you need to use a new proprietary tool to interact with them
* all scripts relating to logs are now broken
* binary logs are easy to corrupt, e.g. if they didn't get closed properly.
>You can have a binary index and text logs too! / You can. But what's the point?
The point is having human-readable logs without having to use a proprietary piece of crap to read them. A binary index would actually be a perfect solution - if you're worried about the extra space readable logs take, just .gz/.bz2 them; on decent hardware, the performance penalty for reading is almost nonexistent.
If you generate 100GB/day, you should be feeding them into logstash and using elasticsearch to go through them (or use splunk if $money > $sense), not keeping them as files. Grepping logs can't do all the stuff the author wants anyway, but existing tools can, that are compatible with rsyslog, meaning there is no need for the monstrosity that is systemd.
And again, there is no need for proprietary tools at all. Everything I want to do is achievable with free software - so much so, that I use only such software in all my systems.
As for compressing - yeah, no. Please try compressing 100Gb of data and tell me the performance cost is nonexistent.
As for LogStash & ES: Guess what: their storage is binary.
Also note that my article explicitly said that the Journal is unfit for my use cases.
I suppose that if there was a large push to universally log things in binary the possibility exists that sanity would prevail and we'd get one format that everyone agreed upon, but I don't see any reason that this would be the case when historically it basically never happens.
So, at least from my prediction of a future where binary logging is the norm, we have a half dozen or so competing primary formats, and then random edge cases where people have rolled their own, all with different tools needed to parse them.
Or we could stick with good ol' regular text files and if you want to make it binary or throw it in ELK/splunk or log4j or pipe it over netcat across an ISDN line to a server you hid with a solar panel and a satellite phone in Angkor Wat upon which you apply ROT13 and copy it to 5.25 floppy, you can do it on your own and inflict whatever madness you want while leaving me out of it.
You just can't say that about binary log formats. Text is a lowest common denominator; and yes, that cuts both ways, but the advantages of universality can't be trivially thrown away.
We don't unexpectedly find machines that don't conform to our policies. We control the machines, we know where and how to find the logs. If we found any where we had to grep, we'd be having a very bad day.
Our lowest common denominator is not text, because we control the environment, and we can raise the bar. Being able to do that is - I believe - important for any admin.
To get the benefits you're claiming, the storage format of your logs is actually irrelevant. If you're going to have an environment where you have to exert that much control over the output of your applications, when you parse the logs doesn't matter. You could do your parsing with grep and awk as the very last step before the user sees results, and you'd see the same benefits. Parsing up-front, assuming you know what data you can safely throw away, might appear to some as a premature optimisation.
> We have well documented tools and workflows, so anyone new to the system can catch up and start working with the logs within minutes.
It sounds like this is something which could be usefully open-sourced, to show how it's done.
> Our lowest common denominator is not text, because we control the environment, and we can raise the bar. Being able to do that is - I believe - important for any admin.
It's a question of what you choose to optimise for. Pre-parsed binary logs in a locked-down environment might be as flexible as freeform text, but I'd need to see a running system to properly judge.
I don't think I'm saying that. The article presents two setups and a few related use cases, where I believe binary log storage is superior.
> With the services you run, you might be able to dictate that the log formats are restrictive enough that writing a parser for each one isn't a problematic overhead.
I don't need to dictate all log formats. If I can't parse one, I'll just store it as-is, with some meta-data (timestamp, origin host, and so on). My processed logs do not need to be completely uniform. As long as they have a few common keys, I can work with them.
For some apps or groups of apps, I can create special parsers, but I don't necessarily need that from day one. If I'm ok with only new logs being parsed according to the new rules (and most often, I am), I can add new rules anytime.
> Parsing up-front, assuming you know what data you can safely throw away, might appear to some as a premature optimisation.
>> We have well documented tools and workflows, so anyone new to the system can catch up and start working with the logs within minutes.
> It sounds like this is something which could be usefully open-sourced, to show how it's done.
LogStash is a reasonable starting point. Our solution has a lot of common with it, at least on the idea level.
> Pre-parsed binary logs in a locked-down environment might be as flexible as freeform text, but I'd need to see a running system to properly judge.
Only our storage is binary. That is all the article is talking about. Within that binary blob, there are many traces of freeform text, mostly in the MESSAGE keys of application logs which we care less about (and thus, parse no further than basic syslog parsing). You still have the flexibility of freeform text, even if you store it in a binary storage format.
> Embedded systems don't have the resources!
> I'd still use a binary log storage, because
> I find that more efficient to write and parse,
> but the indexing part is useless in this case.
When I wrote the logging system for this thing http://optores.com/index.php/products/1-1310nm-mhz-fdml-lase... I first fell for the very same misjudgement: "This is running on a small, embedded processor: Binary will probably be much more efficient and simpler."
So I actually did first implement a binary logging system. Not only logging, but also the code to retrieve and display the logs via the front panel user interface. And the performance was absolutely terrible. Also the code to manage the binary structure in the round robin staging area, working in concert with the storage dump became an absolute mess; mind you the whole thing is thread safe, so this also means that logging can cause inter thread synchronization on a device that puts hard realtime demands on some threads.
Eventually I came to the conclusion to go back and try a simple, text only log dumper with some text pattern matching for the log retrieval. Result: The text based logging system code is only about 35% of the binary logging code and it's about 10 times faster because it doesn't spend all these CPU cycles structuring the binary. And even that text pattern matching is faster than walking the binary structure.
Like so often... premature optimization.
Again, transport and storage are different. While I prefer binary storage, most of my transports are text (at least in large part, some binary wrapping may be present here and there).
> You basically have a very fast laser that
> can do volumetric scans at a high framerate,
> did I get this right?
Now the challenge is to get the wavelength spectrum. You can either use a broadband CW light source and a spectrometer. But these are slow, so you can't generate depth scans at more than about 30kHz (which is too slow for 3D but suffices for 2D imaging). Or you can encode the wavelength in time and use a very fast photodetector (those go up to well over 4GHz bandwidth).
This is what we do: Have a laser that sweeps over 100nm at a rate >1.5MHz and use a very fast digitizer (1.8GS/s) to obtain a interference spectrum with over 1k sampling points. Then apply a little bit of DSP (mapping time to wavelength, resampling, windowing, iFFT, dynamic range compression) and you get a volume dataset.
BTW, all the GPU OCT processing and visualization code I wrote, too.
> What do people typically use it for?
Frequency-sweeping... How are you doing that? Is the laser itself able to frequency sweep? Or are you chirping pulses?
> Frequency-sweeping... How are you doing that?
A much more thorough description is found in the paper that introduced FDML for the first time:
> Is the laser itself able to frequency sweep?
> Or are you chirping pulses?
What you're doing sounds a lot like time-domain spectroscopy in an odd sort of way.
What are the advantages of this versus just chirping a pulsed supercontinuum source?
> What you're doing sounds a lot like time-domain
> spectroscopy in an odd sort of way.
> What are the advantages of this versus just
> chirping a pulsed supercontinuum source?
Sweep uniformity: The phase evolution of the sweeps is very stable; the mean deviation in phase differences between sweeps is in the order of millirad. Which means that for the time→k-space mapping the phase evolution has to be determined only one time and can then be used for hours of operation; in fact the system operates to repeatable that even after being powered off over the night, the next morning you can often reuse the phase calibration of the previous day. Without that, you'd have to use a second interferometer and sample a k-space reference signal for each and every sweep in parallel and use that for k-space remapping.
Ease of synchronization: Trigger signals have very small jitter. Also the jitter between electrical and optical synchronization is in the order of few ps, which is important for things like Doppler-OCT.
Coherence: Supercontinuum Sources have issues with coherence stability, which degrades the imaging range.
Sentisivity issued: Chirping Pulsed Supercontinuum Sources (which are actually used for OCT) is challenging. It requires a lot of dispersion. High dispersion means a lot of loss, which in turn means it requires another output amplification stage, which in turn will also produce significant optical noise. And optical noise is the bane of OCT, since that reduces the sensitivity. In contrast to that if properly dispersion compensated an FDML laser will exhibit very little noise.
Price: Pulsed Supercontinuum Sources suitable for chirping and OCT applications are quite expensive. Our laser is not cheap as well, but it's still more price effective.
People like text logs because local corruptions remain local. Some lines could be gibberish, but that's all. I'm not suggesting that this couldn't be done with binary logs, but you have to carefully design your binary logging format to keep this property.
Otherwise I agree with the author that we shouldn't be afraid of binary formats in general, we need much more general formats and tools though (grep, less equivalents).
I'm not fond of "human readable" tree formats like XML or JSON either. bencode could be equally "human readable" as an utf-8 text if one has a less equivalent for bencode.
From my experience (I do not want to troll and presume you have not tried it), systemd starts off where it picked up when an old log is corrupted and stars a new one. There is a command line utility to verify the integrity of these files (on my Windows laptop at work, cannot check). Now, I am not sure the state of log file repair. I was told it is not possible. However, it seems this means the file is corrupted in a way it is not easily indexed. It is likely it is still readable. I wish I had seen this last time.
Granted, I use Arch Linux on an old laptop. I had these corruptions routinely happen when I had disabled ACPI controls (I do not use the fancy WMs, I am back to Ratpoision) and completely, and I mean completely drained the battery until it came crashing to a halt). So, I am not surprised about these corruptions.
Anyone using systemd boxes in production who can comment on this? Flamewar or not, I would like to know more. I do not really care for it one way or the other. Parts I like, parts I do not.
The last few entries of a log file before something catastrophic happens are precisely the entries that are the most important to make sure they aren't lost.
The advantage of the traditional unix pipe manipulation tools is that most of them are simpler and faster than regex.
I think you just described PowerShell (or things that follow down the same path, e.g. TermKit) ;-)
Text is not synonymous with unstructured.
Only entropic bits are truly "unstructured data." The question is one of how much semantic structure you can rely on in the data you are processing, which is a continuum.
Reading this was a waste of my time.
Being a universal open format text is a better format than binary, unless you don't care about being able to read your data in the future. There's already enough issue with filesystems and storage media, no need to add more complexity to the issue.
On the other hand, if you have logs, you need to store them in a centralized place and have an aging policy, etc... Grepping is definitely not the answer. Systems like Splunk exist for a reason.
(For example, I use Kibana at home. Works great, though I have no text logs stored.)
It took a while to get developers to use it, but now it's indispensable - particularly when someone asks me 'what happened to the 1000 emails I sent last month'
I now know, as previously, the data would have been logrotated
Also, if the author has a 5-node cluster producing 100Gbs of logs a day, the logs may also be too verbose or poorly organized. I work on a system that produces 100s of Gbs of logs a day but with proper organization they're perfectly manageable.
I think that a more nuanced solution is to log things that are useful to manual examination in text form, but high-frequency events that are not particularly useful could reasonably be logged elsewhere (e.g. a database or binary log that is asynchronously fed into a database).
In conclusion, as is frequently the case with engineering, I think the author oversimplifies the problem here and tries to present a one-size-fits-all solution instead of taking a more pragmatic solution. Textual logs are useful when meant for human consumption (debugging) and when they can be organized such that the logs of interest at any time are limited in size, and some other binary-based format is useful for aggregate higher-level analysis.
As for our logs being too verbose: nope, read the article.
Also, it's not an one-size-fits-all solution: I have no problem with people using text. All the article wants to show, is that binary logs are not evil, bad, useless, etc, and that there are actually very good reasons to use them.
For example, storing logs in a database is one kind of binary log storage: most databases don't store the data as text.
This obviously only works when you are trouble shooting a specific issue, not when you need to investigate something that happened in the past (where the logging for the session wasn't enabled). However, it has proven to be an excellent tool for troubleshooting issues in the system.
I have used session-based logging both when I worked at Ericsson (the AXE system), and at Symsoft (the Nobill system), and both were excellent. However, I get a feeling that they are not in widespread use (may be wrong on that though), so that's why I wrote a description of them: http://henrikwarne.com/2014/01/21/session-based-logging/
And it invites timing-based heisenbugs (enable tracing, problem goes away).
Still a neat approach, however.
Grep them, tail them, copy and paste, search, transform them, look at them in less, open them in any editor. I love two write little bash oneliners that answer questions about logs. I can use these onliners everywhere anytime.
I dont have any of the efficiency problems the author talks about.
At best it's a NUL separated database structure where the fields are not compressed, which IS greppable just use \x00 in your regexp. At worst he might mean BER, which is an ASN.1 data encoding structure.
A traditional log with a parallel index would be completely backwards compatible, the query tool should work the same way, and you could even treat the index file as a rebuildable cache which can be useful. The interface presented by a specialized tool doesn't have to depend on any specific storage method.
Really, this recent fad of trying to remove old formats in the believe the old format was somehow preventing any new format from working in parallel reminds me of JWZ's recommendations on mbox "summary files" over the complexity of an actual database. Sometimes you can get the features you want without sacrificing performance or compatibility.
The alternative is to leave everything unstructured, and understand the formats minimally and lazily. Laziness is a virtue, right?
Then, I can add further parsers for the MESSAGE part whenever I feel like it, or whenever there is need. I don't need that up front.
So even if binary logging is way better (I can't say, not enough experience) you simply can't beat text logging, because text logging is natural. It just happens.
Store important data in the database so that you can query it efficiently.
Keep logs for random searches when something unexpected happens. I log gigabytes per day, but only grep maybe once-twice a year.
(And voila, you have binary log storage.)
I was thinking this would be a cool area of research for me to try programming again, but it seems so daunting I am not sure where to start.
As an software developer, I generally use log levels to indicate severity in my logs. So grepping for ERROR should catch anything I had the foresight to log at the ERROR level.
Simple heuristics like the number of WARN level logs a minute may be useful.
Beyond that it sounds interesting. It may be hard to do in a general way, so focusing on Apache logs or something common may be a simpler task.
Very cool stuff. Do you use it?
When I say too much overhead, I'm referring to the carbon proxy and redis requirements. We found that just using the json output from graphite was sufficient to feed a trend monitoring system.
The output is pretty sensitive, moreso than Icinga2 (Nagios) expects, so we had to turn down a few of the "is this really down" re-checks, since it would silence legitimate trend alerts.
It emails me any log entires it doesn't know about. I did have to add a large number of ssh lines that it should not bother me about, but other than that it works very well and I find it very useful.
So you can use it for other usages (such as sending an admin a mail if suddenly your server sends 500 errors, or a unusual amount of 404 errors for instance)
I like fail2ban, a lot, and alternatives in that field, but when I looked at the Arch Linux package last time there were dozens of commented-out, but heavily commented nonetheless regexp template files like you describe. I think this would be a neat machine learning thing.
What I am going for: use AI to train a passive entry-level sysadmin to warn you.