Hacker News new | past | comments | ask | show | jobs | submit login

Right, but this isn't an argument about log formats. You're making a bigger argument about workflows, and you're saying that yours is unconditionally better. In your environment, it might be most appropriate to put in the up-front investment to totally control all your log formats. Within your context, you get to define a lowest common denominator which isn't text, and it sounds like that makes sense. With the services you run, you might be able to dictate that the log formats are restrictive enough that writing a parser for each one isn't a problematic overhead.

To get the benefits you're claiming, the storage format of your logs is actually irrelevant. If you're going to have an environment where you have to exert that much control over the output of your applications, when you parse the logs doesn't matter. You could do your parsing with grep and awk as the very last step before the user sees results, and you'd see the same benefits. Parsing up-front, assuming you know what data you can safely throw away, might appear to some as a premature optimisation.

> We have well documented tools and workflows, so anyone new to the system can catch up and start working with the logs within minutes.

It sounds like this is something which could be usefully open-sourced, to show how it's done.

> Our lowest common denominator is not text, because we control the environment, and we can raise the bar. Being able to do that is - I believe - important for any admin.

It's a question of what you choose to optimise for. Pre-parsed binary logs in a locked-down environment might be as flexible as freeform text, but I'd need to see a running system to properly judge.




> you're saying that yours is unconditionally better

I don't think I'm saying that. The article presents two setups and a few related use cases, where I believe binary log storage is superior.

> With the services you run, you might be able to dictate that the log formats are restrictive enough that writing a parser for each one isn't a problematic overhead.

I don't need to dictate all log formats. If I can't parse one, I'll just store it as-is, with some meta-data (timestamp, origin host, and so on). My processed logs do not need to be completely uniform. As long as they have a few common keys, I can work with them.

For some apps or groups of apps, I can create special parsers, but I don't necessarily need that from day one. If I'm ok with only new logs being parsed according to the new rules (and most often, I am), I can add new rules anytime.

> Parsing up-front, assuming you know what data you can safely throw away, might appear to some as a premature optimisation.

>> We have well documented tools and workflows, so anyone new to the system can catch up and start working with the logs within minutes. > It sounds like this is something which could be usefully open-sourced, to show how it's done.

LogStash is a reasonable starting point. Our solution has a lot of common with it, at least on the idea level.

> Pre-parsed binary logs in a locked-down environment might be as flexible as freeform text, but I'd need to see a running system to properly judge.

Only our storage is binary. That is all the article is talking about. Within that binary blob, there are many traces of freeform text, mostly in the MESSAGE keys of application logs which we care less about (and thus, parse no further than basic syslog parsing). You still have the flexibility of freeform text, even if you store it in a binary storage format.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: