Hacker News new | past | comments | ask | show | jobs | submit login

After reading the article I wonder if there are lots of tools that do all the binary advantages in indexes but leave the logs as text files, why that is not fine. To get the binary advantage the log does not have to be binary.

The example with the timestamps is also strange. No matter how you store the timestamps, parsing a humanly reasonable query like "give me 10 hours starting from last Friday 2am" to an actual filter is a complex problem. The problem is complex no matter how you store your timestamp. You can choose to do the complexity before and create complex index structures. You can choose to have complex algorithms to parse simple timestamps in binary or text form, you can build complex regexes. But something needs to be complex, because the problem space is. Just being binary doesn't help you.

And that's really the point here, isn't it? Just being binary in itself is not an advantage. It doesn't even mean by itself that it will save disk space. But text in itself is an advantage, always, because text can be read by humans without help (and in some instances without any training or IT education), binary not.

Yesterday I was thinking there might be something about binary logs. Now I'm convinced there isn't. The only disadvantage seems to be that you also lose disk space if you store it in clear text. But disk space isn't an issue in most situations (and in many situations where it is an issue you might have resources and tools at hand to handle that as well) It is added complexity for no real advantage. Thanks for clearing that up.




Another advantage of using structured data rather than free-form text is that you can more precisely encode the essence of the event, with fields for timestamp, event source, type of event, its severity, any important parameters, and so on. This permits logging to be independent of the language of the system operator. Rather than grepping for what is almost always English text, one can query a language-independent set of fields, and then, if a suitable translation has been done, see the event in one's native language.

When applied widely throughout a system, this leads to the internationalisation of log messages. Thus lessening the anglocentric bias in systems software. Windows has done this for years, at least with its own system logging (other applications can still put free-form text into the event logs if they wish.)


About your first point: Independence. You are less independent of English and more dependent on the binary format and the tools who can handle it. It's a trade-off. And it might be just an opinion, but for me it's not a good trade-off. Learning English was a one-time endeavour for me. But binary formats and tools have to be learned separately.

About what you put in the log message: You can also put different fields in a line of text. Not getting the advantage or trade-off here.

About the internationalisations: As non-English developers we force all our systems who have logging internationalisation to English system language so we have a common ground for the messages. Understanding the English message is nearly no burden. Log Messages are Event triggers, either in code or in a developer's/admin's mind. If I get a log message in my native language I don't know which event that triggers, which makes it actually harder.

Really. I don't know any non-English person who considers log internationalisation a good thing. Fighting anglocentricism is a very anglocentric topic. Outside of UK/US that's a non topic. We (non-English people) are happy that there is a language we can use to talk to each other and we don't really care how it came to be that widely known.

And even if you don't speak English, I don't see the advantage of parsing \x03 instead of "Error:.*". Both are strings that have a meaning which is rather independent of its encoding.


This is also just anecdotal, but I used to work with a Chinese sysadmin (I was in the UK, he was in China) who found it much more preferable to work on the localised Windows servers we had installed over there, as all the UI and messages were in his native tongue. I'm sure it was easier for him to gain expertise in the tools he needed for his work, than to become proficient enough in English to understand every obscure log or error message that the system might throw at him.


With regards to disk space - compressed text logs are pretty common. The frequency with which they are compressed is adjustable, and, gzcat is a pretty well known mechanism for opening them.


> "give me 10 hours starting from last Friday 2am"

journalctl --since="$(date -d'last friday 2am' '+%F %X')" --until="$(date -d'last friday 2am + 10 hours' '+%F %X')"

Now I'm no systemd apologist but maybe some of the hate towards systemd, journald and pals is unwarranted. If one gives these newer tools a chance, they actually have some nice features. Despite the Internet's opinion, seems like they were not actually created to make Linux users' lives difficult.

If binary logs turn out to be the wrong technological decision, I'm sure we'll figure that out and change over to text logs again. All it would take is a few key savvy users losing their logs to journald corruption and the change in the wider "ecosystem" would be made. But if all goes well... then what's to complain about? :-D


It's not a counter argument to journalctl's usefulness being independent of binary logs, right? In fact using that nice tool for querying I don't really care how it stores logs under the hood.


Right, I'm sure you could do the same thing with grep and a classic /var/log/messages - in fact there's probably something to do this already. Or you'd find a gz from the day in question and read all of it. Just happened that I'd recently read the man page for journald and that was something I recalled.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: