
JSON Lines - breck
http://jsonlines.org/examples/
======
crdoconnor
This is not a good replacement for CSV. It is better to just use regular JSON
for that.

If you have a file containing this:

    
    
      ["Name", "Session", "Score", "Completed"]
      ["Gilbert", "2013", 24, true]
      ["Alexa", "2013", 29, true]
      ["May", "2012B", 14, false]
      ["Deloise", "2012A", 19, true] 
    

And it gets chopped in half (a problem that is not that uncommon), you will
get this:

    
    
      ["Name", "Session", "Score", "Completed"]
      ["Gilbert", "2013", 24, true]
      ["Alexa", "2013", 29, true]
    

Which is still valid. This will cause missing data rather than a clear error
message. May and Deloise may end up not getting their scores and there's a
good chance nobody will notice.

By contrast when the file is represented as regular JSON:

    
    
      [
        ["Name", "Session", "Score", "Completed"],
        ["Gilbert", "2013", 24, true],
        ["Alexa", "2013", 29, true],
        ["May", "2012B", 14, false],
        ["Deloise", "2012A", 19, true]
      ]
    

(not a completely ideal representation, but you get the picture)

Chopping _this_ in half will give you a clear error, forcing the developer to
recover from it before continuing.

Same principle as strong vs. weak typing.

Individual JSON snippets in lines are a good replacement for plain text-only
logs of indeterminate length, however.

~~~
nulltype
Regular CSV also has that "problem". Being able to cut the file into pieces is
actually really useful when you want to, for instance, process each piece on a
different computer. Line-delimited JSON preserves that advantage.

~~~
gkop
Exactly. If JSONL takes off, I'd be glad to use it instead of YAML for
streaming serialization/deserialization purposes, if only because JSONL is
more space efficient.

------
srijs
There is also NDJSON (aka Newline-delimited JSON):
[http://ndjson.org/](http://ndjson.org/)

(disclosure: I'm the author of the Haskell ndjson-conduit library)

~~~
sandstrom
Any thoughts on merging the two efforts. They seem similar enough that you
could focus effort on one of them.

~~~
johnhenry
Interesting idea. If I'm not mistaken, JSON Lines looks like a subset of
NDJSON?

------
dlsym
I wonder, why even bother with wrapping rows in array brackets and not do it
implicitly.

    
    
        ["Name", "Session", "Score", "Completed"]
        ["Gilbert", "2013", 24, true]
    

would be

    
    
        "Name", "Session", "Score", "Completed"
        "Gilbert", "2013", 24, true
    

Which is, well, more or less just CSV. This should work with objects too:

    
    
        "name": "Jane", "key": { "nested": "object" }, "foo": ["bar"]
    
    

Or mixed:

    
    
        "Foo", { "fnord": 23 }, true

~~~
sandstrom
One benefit with each line being valid JSON is that writing a reader/generator
for this format is fairly simple. One can use existing JSON libraries as-is,
with some extra consideration to the line-like nature of this format.

For the above to work you'd have to use a custom JSON-parser when reading the
lines.

\------------

Could also be mentioned that JSONLines-like formats are already pretty common
in log-files, database exports etc. So this is more about giving it a name and
standardizing it. Which I think is great!

~~~
mtdewcmu
>> For the above to work you'd have to use a custom JSON-parser when reading
the lines.

Not necessarily. You could just re-add the square brackets before passing the
lines to the JSON parser.

That would make more work for the machine, because you'd probably have to copy
the whole string. But any time you can make less work for the human by making
the machine work a little harder, it's usually a win.

------
itsananderson
Heh. I wrote a post about this last year (and I'm not the first)
[http://willi.am/blog/2014/07/16/storing-data-as-newline-
deli...](http://willi.am/blog/2014/07/16/storing-data-as-newline-delimited-
json-entries/)

The main unique thing here is the proposal that tools like Excel should
support this format. That's definitely an interesting idea. Biggest challenge
is that the JSON objects can be nested, so table tools would need a way to
handle that. Still, interesting

~~~
laumars
Excel already supports nested data in plain XML (I say "plain", because Excel
also supports Microsoft-specific XML schemas), so it might not be that hard to
apply the same parser / deconstructor concepts to nested JSON structures as
well

------
Rangi42
I don't see what this improves over standard JSON. Just add a comma at the end
of each line and wrap the whole Lines file in brackets, and you have a valid
JSON file. You can petition spreadsheet makers to support this subset of JSON
files, while still letting any general JSON program read and write them.

As for CSV and TSV, they're a dead-simple format to read and write, and you
can enforce a standard way to escape special characters if you aren't dealing
with arbitrary user-submitted files. With numeric data escaping isn't even
necessary, and if you have to deal with strings that might themselves contain
special characters, the ASCII control characters 0x1E and 0x1F work well as
alternatives to comma/tab and newline.

I know Python has a csv module, but I've just gotten used to writing these
snippets:

    
    
        def tsv_write(path, headers, data, sep='\t', end='\n'):
            with open(path, 'w') as file:
                file.write(sep.join(map(str, headers)) + end)
                for datum in data:
                    file.write(sep.join(map(str, datum)) + end)
        
        def tsv_read(path, headers, sep='\t', end='\n'):
            with open(path, 'r') as file:
                for line in file:
                    yield line.rstrip(end).split(sep)
    

(That tsv_read function only works if end == '\n'. Here's a somewhat-
inefficient general-purpose alternative, although I've never needed it.)

    
    
        def read_records(file, end='\n'):
            if end == '\n':
                for line in file:
                    yield line
            else:
                record = []
                while True:
                    c = file.read(1)
                    if c:
                        record.append(c)
                    if c == end or not c:
                        yield ''.join(record)
                        record = []
                    if not c:
                        break
    
        def tsv_read(path, headers, sep='\t', end='\n'):
            with open(path, 'r') as file:
                for record in read_records(file, end):
                    yield record.rstrip(end).split(sep)

~~~
ma2rten
The advantage over standard json is that this is trivial to parse
incrementally, i.e. for a dataset that is too big to fit in memory.

I also wouldn't recommend you to reimplement a functionality that is already
part of the standard library, especially because the standard library function
is going to be much faster because it's implemented in C.

~~~
toomim
Actually, the approach with commas and a wrapped brackets are equally trivial
to parse incrementally.

Just start by erasing the open "[", and then read one row at a time, just like
you would do without the "[" and comma.

~~~
smsm42
I would say any tool that outputs lists of things in JSON and there is
reasonable suspicion that the output may be large, should do this. I.e. do
one-thing-per-line with opening and closing brackets on separate lines. It is
quite annoying to find oneself with huge dataset in JSON which was produced by
a naive generator that doesn't do that - when you need to extract something
like "first N elements that satisfy condition X" it is annoyingly more work
that it should be. By doing this people that don't care still get valid JSON,
people that do care save some time.

------
skybrian
It's good that someone gave this format a name. Although it's not called
anything special, but Go can read this format easily. See the Decoder example
[1].

If you don't actually need a stream, I think it's a bit nicer to write a
special header at the beginning and make the whole file a single JSON object:

    
    
      {"header": {"title": "An example", ...},
       "rows": [
      ["row1"],
      ...
      ["rowN"]
      ]}
    

[1]
[https://golang.org/pkg/encoding/json/#example_Decoder](https://golang.org/pkg/encoding/json/#example_Decoder)

------
KayEss
Been doing this for log files for years now. It's pretty convenient and
simple, but I'm not sure that any end user would want to see this -- they're
just going to want CSV.

------
mtdewcmu
I came up with an idea for a CSV replacement that's half-joke, half-serious.
It's called XML Separated Values (XSV). It's meant to be a 1:1 drop-in
replacement for CSV. It solves the ambiguity and encoding issues of CSV by
recycling XML's solutions. It has basically two tags: <c/> and <n/>, for comma
and newline, respectively.

Unfortunately, it needs a root element to be valid XML, so wrap the document
in <xsv></xsv>. Also, add an XML declaration.

Example:

    
    
      <?xml version="1.0" encoding="UTF-8"?>
      <xsv>one<c/>two<c/>three<n/>1<c/>2<c/>3<n/></xsv>
    

(Whitespace is passed-through, therefore it's significant.)

This solves the ambiguity problems of CSV without complicating CSV by
introducing things like data types (I think the virtue of CSV is its
simplicity), and it should be extremely simple to write parsers based on
either SAX or DOM (, etc).

Also, the use of XML provides unobtrusive places to insert application-defined
metadata, for example:

    
    
      <xsv columns="3">
    

This remains backwards-compatible with parsers that ignore attributes.

~~~
oneweekwonder
CSV is human readable, how readable is left to the data.

XML tries to be human readable, but you will need knowledge of elements or
"tags", meta data.

JSONLines also tries to be human readable, imo it does a better job, compared
to XML.

But not until Excel of LibraOffice implement a JSONL, or XSV reader/writer.

Will CSV be the de-facto multi standard.

~~~
collyw
In my experience XML is easier to read than JSON. Guess it depends on the
complxity of the structure.

~~~
mtdewcmu
I think one predictor of readability is the ratio of markup to data. XML gives
you some flexibility there, but XML-based formats tend to have very verbose
markup, which overwhelms the data.

To cut back on markup in XSV, I used self-closing tags for commas and
newlines. It would have been more conventional to use something like
<row><column>...</column></row>. That has some advantages, maybe, but it
doubles the number of tags (and it introduces things that are not strictly 1:1
with CSV, which could lead to ambiguity).

------
nness
> no standard column separator

Commas are the standard column separator in CSV files, its in the name.

~~~
Shank
I'll give the author the benefit of the doubt; Excel appears to export by
default using tabs as delimiters, not commas: [https://support.office.com/en-
za/article/Import-or-export-te...](https://support.office.com/en-
za/article/Import-or-export-text-txt-or-csv-
files-5250ac4c-663c-47ce-937b-339e391393ba#bmchange_the_delimiter_that_is_used_in_)

~~~
byroot
It actually depends of the locale of the system.

If you are on a French windows, it will use semi-colons as separators, encode
in window cp1252 (close to latin1 but not exactly) and use CRLF (\r\n) as line
returns.

But then if you use Excel for Mac 2008, Excel will generate CSVs in MacRoman
(old OS9 encoding) and use CR (\r) as line returns.

Long story short Excel is not even interoperable with itself when you use
CSVs. (Granted there is an advanced import tool, but it's a nightmare for the
average user).

[https://fr.wikipedia.org/wiki/MacRoman](https://fr.wikipedia.org/wiki/MacRoman)

~~~
nness
Wow, never realised it was that complicated. Although the locale dependant
separators make a lot of sense.

------
benatkin
This is great. Good to see it named and an extension proposed. Many tools
assume .json means a single JSON expression. It would be nice to have GitHub
allow one JSON expression per line.

I see someone already created a GitHub linguist issue for it:
[https://github.com/github/linguist/issues/2217](https://github.com/github/linguist/issues/2217)

------
Nemo157
I prefer JSON text sequence
([https://tools.ietf.org/html/rfc7464](https://tools.ietf.org/html/rfc7464)).
Self healing in the event of any corruption/parser bugs is nice, JSON lines
seems like it will have issues finding a valid object start if your data ever
contains new lines and you have some issue parsing one of the objects.

------
fenomas
Speaking as someone who just (today) was migrating game entity data from JSON
to CSV, the main reason I can see to use CSV so you can edit in
Excel/OpenOffice/Libre/etc.

If there was a decent visual tool (or chrome extension etc) for editing data
in a format like this new one, I'd probably jump on it, but otherwise it seems
like anyone currently using CSV might as well stick with it.

------
bpicolo
Aka how pretty much everybody stores logs already.

------
thawkins
Mongodb mongoimport/mongoexport already uses this format. It would be nice to
get it formalized with a recognised file extension and mimetype.

------
leeoniya
it's also worth mentioning JSONH for compact represenation of CSV data in
json:

[https://github.com/WebReflection/JSONH](https://github.com/WebReflection/JSONH)

------
bullen
I'll raise this one, I use JSON lines for version management.

Each line is a version, with only the changes.

This means you can't delete though, but for blockchain-like distributed
systems that's almost a feature.

------
sgt
I agree that CSV doesn't have a standard, but for all practical purposes -
wouldn't most people agree that the Excel dialect is the standard CSV dialect?

~~~
X-Istence
Not when you see some of the CSV data you have to parse from different
vendor's implementations. Especially when writing reporting tools.

I've seen so many different dialects, even from the same vendor... that's the
worst part.

------
ZenoArrow
Problems with CSV parsing? Just use TSV. I've never had a parsing problem with
TSV.

------
phamilton
This is how bulk operations in elasticsearch work. I'm a fan.

------
bikamonki
Nop. Why? You are breaking JSON. Why not a coll of objects? To save bytes?
Seems like one of your main motivations is 'csv lacks standards' and then you
go break a standard. In any case call it something else but no JSON.

~~~
angersock
It's not breaking JSON...it's explicitly stating a new format, free-text
delimited by \n and \r\n, wherein each line must be valid JSON.

This is fine.

