

Proposal for a text-based, typed, columnar data format - jimktrains2
http://jimkeener.com/posts/TDFrev2

======
ggchappell
Looks like a good idea overall. Getting a format like this widely accepted
would be tricky. Good tools that use it would certainly help.

A few comments/quibbles concerning the format itself:

(1) It think the realities of the modern world are such that you'll need to
embrace Unicode a bit more warmly and deal robustly with files containing non-
ASCII characters. (Some of those ordinary text editors you want to use will do
UTF-8 transparently.)

I would suggest:

\- Officially specifying UTF-8 as the default character encoding, to be used
if no character encoding is specified with some other mechanism.

\- Allowing specification of a different encoding within the file format. (You
already have an HTTP-style header; you _could_ use a Content-Type line to
specify encoding in the same way HTTP does.)

(2) You want to be editable using a standard text editor, but you allow null-
terminated data as a type. You can't do both of those.

(3) You're using "," as a separator with lower precedence than ";". That goes
against common usage in both programming languages and English text. Also your
header, following the HTTP convention, is "key: value".

Putting those together, doesn't it make more sense to do something like this:

    
    
      Field_name:Type,Type; 2nd_field_name:Type
    

Etc.

(The above looks more natural to _me_ , anyway.)

(4) Your escaping looks odd. The explanation about awk doesn't quite do it for
me. Why the strings of "$" and "%"?

To put it differently, why not use escapes in a more traditional style,
something like the following?

    
    
      ; %;
      , %,
      : %:
      % %%

~~~
jimktrains2
I think I do like using the headers to define fields better:
[http://jimkeener.com/posts/TDFrev3](http://jimkeener.com/posts/TDFrev3) That
allows it to fall back to being a simple CSV if all the headers are removed.

It looks a lot better and just becomes more readable I think. I also
simplified the types a bit (most noticeably removed the null-terminated and
Pascal strings).

------
jimktrains2
The orderly, typed part of me is always offended by CSV files even though they
are fairly useful. Having a texted based data exchange format that is typed
and doesn't have the field overhead for each record is a good thing, I feel.

I would greatly welcome any comments and experience with anything similar.
After I'm pretty confident with the format I would like to write a reference
library for it.

Note to mods: please don't change the title I submitted under, the post title
doesn't make much sense outside my blog, which I should change but cannot at
the moment. The blog title, in my defence just references this spec because
I'm not completely finished with the spec. I'm close and hope the hn community
can give me feedback before I write an implementation.

