
Comma Separated JSON - tambourine_man
http://www.kirit.com/Comma%20Separated%20JSON
======
tlrobinson
So literally just JSON without the "[" and "]," delimiting rows, and another
"[" at the beginning and "]" at the end of the file?

There are already streaming JSON parsers/serializers, why not just use (or
write) one of those instead of inventing yet another file format?

There's also [http://ndjson.org/](http://ndjson.org/) and
[http://jsonlines.org/](http://jsonlines.org/)

~~~
simlevesque
It's not just JSON without [ and ], it's a bunch of JSONs represented as a
table. The first line sets the property's key and every other line are the
values. It is a representation that needs a parser to be read but that's way
faster for a row based database to generate it than to output full json. Plus
it's smaller than minified json.

~~~
dalke
The spec doesn't specify what the first line contains.

It only says "Semantically a CSJ file is an array of JSON objects which share
a common set of keys." You used your knowledge of how these representations
usually work to infer that the first line contains a list of keys for the
remaining lines.

Some obvious questions for a table spec include: 1) must those keys be
strings? 2) Are duplicate keys allowed? 3) Can different rows have a different
number of columns? Or must the writer generate null fields?

FWIW, I've used JSON Lines for my own work. It's very easy to read a line and
(in Python), json.loads(line). How should I parse this "JSON without [ and ]"?

There's not a built-in function for it. Either I have to write my own JSON
tokenizer, or I have to terminate the string with '[' and ']' and parse
_that_. Both are much more complicated than using JSON Lines? You'll notice
they only gave output generation timing numbers, not input parsing.

If a JSON-based table format is important, why not layer those same table
semantics on top of JSON Lines, rather than make a format which is harder to
read?

~~~
simlevesque
The format for the keys is the same as a javascript object. They are unique
strings or numbers. It's already in the Ecmascript standard for objects.

~~~
dalke
Could you help me understand your response?

I find it hard to match your statement to the JSON spec(s). Neither RFC 7159
nor ECMA 404 use the term "key". The standard for objects requires a string
for the 'name' of an object member, which is certainly what you refer to.
However, JSON does not allow a number.

I tried looking at the ECMAScript spec. In [http://www.ecma-
international.org/publications/files/ECMA-ST...](http://www.ecma-
international.org/publications/files/ECMA-ST/Ecma-262.pdf) :

> An Object is logically a collection of properties. ... Properties are
> identified using key values . A property key value is either an ECMAScript
> String value or a Symbol value .

That 'Symbol' has nothing to do with numbers, so only strings make sense for
an external representation.

Perhaps you mean to include the integer from the Array object? The spec says
"An Array object is an exotic object that gives special treatment to array
index property key ... A String property name P is an array index if and only
if ToString(ToUint32(P)) is equal to P and ToUint32(P) is not equal to
2^32-1."

Is that limited range of integers what you mean when you saying that numbers
are allowed as keys?

Regarding uniqueness, the CSJ spec says nothing about uniqueness, so how do
you know the keys must be unique? It doesn't inherit that from JSON. RFC 7159
says that in an object "The names within an object SHOULD be unique.";
uniqueness is not required. It then adds "When the names within an object are
not unique, the behavior of software that receives such an object is
unpredictable."

Therefore, I don't see how you drew the conclusion you did.

------
dclowd9901
I appreciate the problem this is trying to solve, but it could be more easily
remedied by a JSON stream processor than reinventing the wheel of data
marshalling (yet again). JSON is so simplistic in structure, it's trivial to
know when you've met the conditions of a completed segment of structure.
There's no reason why a stream parser couldn't handle this well.

~~~
proteusguy
How about twice the bandwidth, twice the memory, and at least twice the CPU
complexity in parson full JSON objects than the proposed CSJ format?

------
gregmac
This is actually a very elegant way of formalizing CSV. For most apps, if you
were to simply output "CSV files" using this encoding, really no one consuming
them would know any different -- except you could actually point them at a
spec being used.

I've been in arguments with customers before where they complained we were
quoting strings, and CSV files shouldn't have quotes, and it would have been
nice to say "We're using CSJ, please just parse that way".

I think the name is perhaps misleading though, as it sounds more like it's a
variant of JSON than a formalization of CSV. Perhaps "JSON flavored CSV" would
be more appropriate?

~~~
LukeShu
I mean, you could always point them to RFC 4180.

~~~
gregmac
Okay talk about ironic. Yesterday in our ops meeting, it came up that some
tool another department had built was exporting csv "with extra quotes that
were causing problems" (I've had no involvement in any of this). My ears
perked up remembering this comment and I was able to point out this RFC, so
hopefully it will save them some time in arguing about csv formatting. Thanks!

------
smegel
So you save some space on top level keys which are stored once in the
header...but you waste space by having to flatten data structures into a table
to start with, with key values being repeated over lines, the more deeply
nested, the worse it gets.

Stick with JSON document per line.

~~~
thewisenerd
is there necessarily a need to repeat key values over lines?

why not have all possible object definitions in the header and data as a
single huge array?

~~~
athenot
You're describing the type-length-value format, used in some places in digital
video or in Open Sound Control.

Basically you have a TYPE (fixed width, eg. 16 bits), a LENGTH (again fixed)
and a VALUE which is simply a stream of bytes that goes on for LENGTH bytes
and is interpreted based on what TYPE it is. You can nest your fields and get
very creative in quite a compact way.

One could represent a JSON document with TLV by adding an associative map
between the field names and their type numbers, then stream away.

I always loved the simplicity of TLV. Though nowadays, I'm not sure this has
much of an advantage over gzipped JSON.

~~~
ubertaco
Off-topic: Howdy, Mr. Thénot! Crazy reading the comments and seeing your
username here. Hope life is goin' well for ya with the HealthNinja stuff --
haven't seen you since I left Wellcentive (around the same time as mbonnette,
if that gives you an idea of who I am).

------
henvic
ElasticSearch bulk API deals with streaming by using line breaks of JSON docs
instead of an array. If you are interested in using this Comma Separated JSON,
maybe you want to try this simpler approach before (though the comma separated
version seems to be lighter, if I want really to reduce weight I'd consider
protobuf, probably).

Reference:
[https://www.elastic.co/guide/en/elasticsearch/reference/curr...](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-
bulk.html)

------
bevacqua
Or: just use \n and also be able to read individual rows. Many programs in the
wild do this.

~~~
jameshart
Newline-delimited JSON? I've used it informally, but I've never had the nerve
to suggest it as an actual documented interchange format. It makes a lot of
sense for things which CSV is often used for - dumping out big tables of data
into files for shipping over FTP or similar - or even for logfiles with
structured data entries. The tricky thing is having to restrict the JSON on
each line to not contain any newline characters. Even though all legitimate
JSON can be minified into a form which has the same semantics but contains no
newlines, "JSON with no newlines in it" isn't a well defined standard.

~~~
bevacqua
"JSON with no commas in it" isn't a thing either. That's not what's being
discussed here.

------
mythz
I've been using my custom JSV Format for years:

[https://github.com/ServiceStack/ServiceStack.Text/wiki/JSV-F...](https://github.com/ServiceStack/ServiceStack.Text/wiki/JSV-
Format)

Which is just JSON with CSV-escaping which is ends up being faster, more
concise and more human-friendly since quotes are only necessary when using
special delimiter chars. It's just not as interoperable or ubiquitous as JSON
so only an option when you control both ends.

------
ctulek
If you are using a proper CSV library you don't have any of the problems this
format is trying to solve. The CSV format does not have any issues mentioned,
it is either a buggy CSV library or some lazzy developer who thinks CSV is
just fields separated with comma writing 2 line of code instead of using a
proper CSV library.

~~~
danso
Which CSV library is able to infer that a value is a number instead of a
string?

~~~
ctulek
CSV for sure requires a convention to be shared between the producer and the
consumer of the data. Once you have that I don't think you will need to deal
with this problem.

To be honest, what I didn't like is the tone of the article. It is just that
this statement is wrong: "But the problem is that CSV isn't really a well
defined file format with a well defined syntax." There is RFC 4180.

------
guelo
I would call this "JSON encoded CSV" since it appears to be a strict subset of
CSV, not JSON.

~~~
dalke
A cell can embed JSON objects and arrays, so it isn't a strict subset.

------
ptman
I thought MIME types had to go through an application process or start with x-

------
skybrian
I think I'd rather stick to the format supported by Go and the 'jq' command,
where each line is actually JSON.

But if it became popular, this format seems easy to support.

------
th0mat
JSON enhanced CSV might be a more suitable name. After all it is valid CSV and
invalid JSON.

------
mixmastamyk
Have needed something like this a few times, would be great if it was
standardized.

~~~
thewisenerd
inb4: [https://xkcd.com/927/](https://xkcd.com/927/)

the reason json is popular is because it's widely used and is easily parsed
(without having to call any extra libraries).

standardizing this would mean accepting upon a complete working alternate
that'd work everywhere just akin to json (plus the added perks), which will
take a lot of (dis)agreement, especially given that there are already a dozen
other standards trying to replace json.

------
querulous
no one uses csv by choice, they use it because they have to work with tools
(excel) that can only produce/consume csv

given you don't have that constraint there are much better options (json
lines, avro, parquet, protobufs...)

~~~
vortico
I use TSV often when I want to easily dump data from a C program to gnuplot or
similar. It's more of an ad hoc file format then a serious one though.

------
jameshart
So this is something that looks quite a lot like CSV, in some cases produces
identical output to CSV, but which isn't compatible with CSV.

~~~
dalke
To be fair, most CSV isn't compatible with CSV.

Especially with the CSV in RFC 4180, which among other things requires a CRLF.

------
gweinberg
doesn't look very JSONy without any curly braces.

