Hacker News new | comments | show | ask | jobs | submit login
Labeled Tab-separated Values (ltsv.org)
21 points by naoya 1476 days ago | hide | past | web | 9 comments | favorite

So what is the purpose of replacing a common format with one that wastes space repeating labels? Many tools already support combined format, and you can look up the fields in the config if you really can't figure it out on your own.

Also, this linkbait title is just asking for a mod edit.

> So what is the purpose of replacing a common format with one that wastes space repeating labels?

Some time ago, we had to migrate dozens of HTML forms from legacy servers. We ended up implementing a generic forms handler to process all form submissions.

Initially, we logged all submissions to simple tab-delimited files. But as it turns out, some HTML field types, when left blank by the user, leave no trace in the query string.

So plain tab-delimited was not an option, and the answer turned out to be exactly this format.

It's difficult to extend the combined format, despite the demand to output more information, like response time, to access logs is increasing.

I'm not sure I parsed your comment correctly, but it's trivial to modify the logging format to add things like response time. "%D" for Apache will add response time.

This is orthogonal, anyways, because if you can add more information with a label, you can add it without a label too.

Edit: I apparently missed your point. The issue about being able to modify the logging format at arbitrary times without breaking tools seems to be the main concern. Do people typically change their Apache logging format from time to time?

Yes, I think. There is no doubt about increasing the importance analysing of access logs.

Some people, including me, have experiences to customise parsers after expanding combined format with several fields. It's annoying to re-write regular expressions for parsers of tools we use when changing log format and to remember the meaning and order of expanded fields.

With LTSV, parsers does not need to be modified in that cases. Easy to expand logs and easy to process them with labels.

Ok, it's sensible to make these logs easier to parse.

I don't understand why an entirely new Tag-Value scheme was invented though, and this article doesn't attempt to justify it. Maybe it's not new and I just haven't heard of it?

Why not use: JSON ASN.1 BER Or any other scheme with existing, mature, encoders and parsers.

A lot of tooling, especially in legacy processes, is based on record-per-line formats based on semi-specified formats such as tab separated value, comma separated values. The advancement here is that rather than using positional values, the values are labeled. The use case for this is if you want to make some legacy process (like some crazy bash script some sysadmin wrote up 8 years ago) and tweak them in a way that you can add and remove columns from text files more easily.

When LTSV, we don't need any special parser.

See also: http://ltsv.org/faq.html

I think this is sort of clever. Parsing is simple (split the line on tab, no `:' in the label, no escaping) and you can add/remove/re-order the input rows. By the time you gzip it up it seems like repeating the labels on every row would not add that much weight.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact