
Representing and Editing JSON with Spreadsheets (2018) - fanf2
https://medium.com/@donhopkins/representing-and-editing-json-with-spreadsheets-8f8832db5239
======
xpe
When I see things like this, I try to think about:

    
    
      - this relative to my existing tools
      - think about what tools you I wish I had
      - think about what tools that other people from other disciplines wish they had
    

Tools and habits and thinking patterns are interwoven. Changing tools might
seem, in some cases, to be a waste of time, relative to a rigid, short-term
performance metrics. However, changing tools can change your thinking
patterns, which can really help with long-term growth.

~~~
DonHopkins
Totally this.

SpreadON (which I'm now calling this tool) comes from working with people who
are quite comfortable and proficient with spreadsheets, use them for modeling,
calculating, and storing data, and have a large investment in certain tools. I
didn't want to make them change the way they worked or switch from using their
favorite tools (like Excel), but I wanted to be forward-compatible with other
tools (Google Sheets). And I definitely didn't want to ask them to write raw
Python or JSON data structures.

I came up with seed of the idea more than a decade ago, working on a
nutritional analysis and questionnaire system, which was configured with huge
tables of structured data about nutritional ingredients, questions, and other
structures, each including numbers, arrays, and dictionaries, all with the
same structure.

We needed to be able to represent what is essentially a whole bunch of
identical JSON structures in a spreadsheet, in a way that was compact, easy to
produce, edit, and check.

So I came up with a dead simple way of formatting the spreadsheet headers to
declare nested dictionary and array structures with numeric values, and
implemented a parser in Python that read a CSV file and returned an array of
nested Python dicts and arrays and numbers. (This was before JSON became so
popular, but Python data structures are essentially the same as JSON, just
calling them dicts instead of objects).

After the row of headers, each subsequent row represents a dict, and the
corresponding headers were either names of the dict keys, or special markup
like "begin array <name>", "end array <name>", "begin dict <name>", and "end
dict <name>". (The name was optional if you were inside an array, but would
typically be the numeric array index). The columns under the begin/end
array/dict headers were left blank.

That worked great, and we're still using it. But using spreadsheets to
represent repetitive structured data did (as you say) change my thinking
patterns, which led me to implement a more advanced version that let you
represent JSON data types, by adding a type after the dictionary key name (or
array index), so the parser knew what type to convert the spreadsheet cell
string to. (I object to guessing like YAML, or even parsing JSON syntax: just
write the data type declaration once, then you don't have to use punctuation
like quotation marks around strings, etc.)

SpreadON has a tag called "table" that implements this idea of DRY-ly and
compactly representing repeated structures by declaring key names,
dictionaries, arrays and structures in the headers, but it has a more
convenient JSON-ish syntax using [ ] { } instead of begin/end array/dict, and
it lets you interleave the structural syntax with the name and data type
declarations in the headers, so it doesn't waste valuable columns on
structural declarations (a disadvantage of the previous technique). But you
can still leave blank columns, if that makes it easier to read and edit, and
you can even use them for comments.

But there are also many cases when you want to represent structured but non-
repetitive JSON data in spreadsheets, so I came up with a tree-like syntax for
that, which is basically like pretty-printing typed JSON into the cells of the
spreadsheet, but is more repetitive (declaring the data type for each cell,
instead of guessing it like YAML), leaves more unused space (but which can be
used for comments), and is not as compact and easily editable as rows and
columns as the "table" representation. It's really great for writing JSON
configurations of nested dictionaries, where keys aren't typically repeated,
values have different types, and you might want to add comments or
intermediate calculation formulas in the unused cells to the right.

I also implemented a "grid" tag for representing two-dimensional arrays of
identically typed values, so you don't have to write out nested arrays, or
repeat the data types for each cell. Besides not having to repeat the same
data type for every cell, there are many obvious advantages to representing
data as dense two-dimensional grids in spreadsheets (it's easy to edit, use
formulas, and integrate with other data sources), so I wanted to support that
well.

An elaboration of the "grid" tag is the "region" tag, which points to a named
region by name, and declares it a data type. That lets you store the raw data
in another sheet independently, easily adding rows and columns, without
requiring you to insert and delete rows and columns in tree-structured sheets
of JSON referring to it. You can also extract rows and columns and cells of a
named region, transpose the region, flatten the region to row major or column
major 1-dimensional arrays, get the number of rows and columns, etc.

Spreadsheet users are accustom to working with named regions, and also to
being able to format the cells in various ways, which can be useful meta-data
in itself. However, when you export a spreadsheet as a CSV/TSV file (or
download a Google Sheet via its URL as CSV/TSV), you lose all of the named
regions and formatting information.

To solve that problem, I wrote a Google Sheets script that exports all the
named regions, as well as any layers of formatting information you're
interested in, into parallel sheets (with corresponding named regions) that
you can download. That way, the JSON consumer has access to all of the named
regions as well as the formatting information from the spreadsheet! The named
regions are necessary in order for the client to convert the CSV file into
JSON, if it uses any "region" references. But they can also be used directly.
The formatting information can be converted into parallel 2D JSON arrays with
the "region" tag, and used along with the numeric or string values of the
cells.

It eventually became obvious that there needed to be a better way to
automatically define the named regions in spreadsheets, instead of creating
and maintaining them by hand. So I made a way of defining user-defined named
tags, which you could place next to regions you wanted to capture, which it
could scan for empty cells as delimiters, and it will create named regions of
rows and columns and grids relative to the position of the user-defined named
tag. It also lets you capture named regions of formatting information. The
user-defined tags and the regions they produce are defined in another sheet in
"table" format, so you can create and edit them easily. (Kind of hard to
explain here without an example, but it looks pretty obvious, works well in
practice, and has been quite useful.)

This has all been the result of sitting down with the people who know how to
use the tools, and finding out how they use them, which features are
important, what they're willing to do, and discovering a fertile common ground
between the worlds of spreadsheets and JavaScript/Python programming where we
can both work together.

------
thatsnice
I made a Google spreadsheet add-on [1] that contains 3 built-in options for
converting nested JSON to tabular format.

Method 1: drills into nested json objects and arrays and returns the key names
as headers, key values as rows. In cases with multiple nested values, they get
returned into multiple columns, differentiated by a number, e.g. orders >
products > 1, orders > products > 2.

Method 2: same as above but returns all nested data into a single column, e.g.
a single column named orders > products. This can break the association
between JSON elements, but is more convenient for certain types of analysis.

Method 3: concatenates all the elements of each nested object into a single
cell, separated with pipes.

In some cases (e.g. when the primary object of interest is nested inside
another object) there are still problems recognizing what keys should
represent rows vs columns, but in my tests the above 3 algorithms cover most
of the use cases for spreadsheets.

[1] [https://mixedanalytics.com/knowledge-base/report-
styles/](https://mixedanalytics.com/knowledge-base/report-styles/)

~~~
anon1m0us
You could put the nested data into a separate sheet with a unique identifier
for the row. Then include a foreign key back to the original sheet's primary
id.

~~~
thatsnice
Interesting idea, thanks. There's often multiple levels of nested data, would
you want to see each level in a new sheet?

~~~
anon1m0us
Absolutely! It's relational data that could map to a defined relational
schema, so why not?

~~~
DonHopkins
As I described on another reply, SpreadOn not only supports a tree structured
format, but also has tags for referring to entire sheets by name, for
declaring grids of identically-typed data, for referring to named regions and
extracting subsets of them, and compactly defining tables of repeating
structures, which you can mix on a case-by as they make sense. (You can't
embed a grid in another grid since it requires more than one cell, but you can
make a grid of sheet references, since the sheet name only takes one cell and
can be put into a grid.)

There is no "one best way" to represent JSON data in spreadsheets, because
JSON data comes in all sizes and shapes, so SpreadON tries to support many
different useful formats that you can link together, with a simple
straightforward syntax that can be easily extended without breaking existing
documents.

------
janee
Haha I did something similar at my company. Our customers are heavy
spreadsheet users, so decided to create a generic Angular component that can
take semi-arbitrary json and display it in a spreadsheet (handsontable).

The code flattens a deeply nested json object into a bunch of 'rows', similar
to Spark's explode or Mongo's unwind.

The objects representing the flattend rows have their getters and setters
hooked to the original json object. That way when you edit the spreadsheet the
json is automatically updated, with parent values shown as merged cells for
array child values.

The code was completely bonkers and I ripped it out later...but must say it
was fun to write and see it doing its thing

------
adrianpike
This looks awesome! Fun timing, too - I'm working on a related problem, trying
to unlock multidimensional data in a similar way to how the spreadsheet
unlocks data for non-technical stakeholders. So many businesses run on
spreadsheets for good reason, but it's getting rare to be able to cleanly
crush a data dump into a CSV and let an analyst or exec crunch away on it.

I'm sure it's confirmation bias, but I've been seeing tons of stuff on HN
related to what I've been tinkering on, and it's really awesome to see other
potential competitors thinking about the same problem and know I'm not way off
in left field. :)

------
perl4ever
When using VBA and Excel, I've found it easiest to just stick to XML for my
REST calls and not try to deal with JSON. Talk about adding features to JSON
to make it richer just makes me think everything goes around in circles.

What I'd like to see is an LLVM back-end that generates VBA, but I may never
get around to it myself.

~~~
caspper69
Everything goes around in circles because we hit walls and obstacles, work
around them and move on. Then at a later date, when we hit new walls and
obstacles, we re-visit old areas, many times to find that they are no longer
problems (faster hardware, more network capacity, algorithmic breakthroughs),
and so the industry half-pivots in a new direction.

Everything old is new again. Wash, rinse, repeat.

------
DonHopkins
I've meant to contribute to this discussion and answer questions, but this
article is more than a year old, and I've been it regularly and made lots of
progress since then, I feel I should writing a fresh new article about it,
including some actual documentation and examples.

I'll post it to HN as soon as I'm done, and put a link to it here. And I would
appreciate hearing from anyone who's interested in or working on similar
ideas, so we can collaborate to combine our forces! (Email in profile.)

It also needs a good name! (As a rule of thumb, I try not to get bogged down
in naming things when I should be spending my time actually working on them
and using them, to figure out what they actually are before naming them. But
it's been more than a year and now I want to talk about it more
mellifluously.)

I was calling it "JSONster", but that sounded too monstrous, don't describe
it, and nobody gets the Napster reference these days.

So I tried to think of better name by repeatedly pounded my fist against my
forehead, which gave me a headache, when suddenly a homeopathic solution to my
naming problem sprang to mind: "SpreadON", short for "Spreadsheet Object
Notation".

Elevator Pitch:

SpreadON applies JSON to spreadsheets like butter to waffles, so it melts into
the nooks and crannies.

Marketing Ploy:

SpreadON: Apply directly to the spreadsheet. SpreadON: Apply directly to the
spreadsheet. SpreadON: Apply directly to the spreadsheet.

[https://www.youtube.com/watch?v=f_SwD7RveNE](https://www.youtube.com/watch?v=f_SwD7RveNE)

[https://en.wikipedia.org/wiki/HeadOn](https://en.wikipedia.org/wiki/HeadOn)

>HeadOn is the brand name of a topical product claimed to relieve headaches.
It achieved widespread notoriety in 2006 as a result of a repetitive
commercial, consisting only of the tagline "HeadOn. Apply directly to the
forehead", stated three times in succession. Originally sold as a homeopathic
preparation, the brand was transferred in 2008 to Sirvision, Inc., who re-
introduced the product with a new formulation.

>HeadOn's notoriety came in part because of its advertisements on cable and
daytime programming on broadcast television which consisted of using only the
tagline "HeadOn. Apply directly to the forehead", stated three times in
succession, accompanied by a video of a model using the product without ever
directly stating the product's purpose.

------
DonHopkins
I wrote some stuff in previous HN discussions about outliners and
spreadsheets, and also some stuff about Dave Winer's Frontier, which I'll
quote and link to here:

~~~
DonHopkins
[https://news.ycombinator.com/item?id=20773851](https://news.ycombinator.com/item?id=20773851)

DonHopkins 44 days ago [-]

I love the collaborative features of Google Docs and Google Sheets.

The thing that's missing from "Google Docs" is a decent collaborative outliner
called "Google Trees", that does to "NLS" and "Frontier" what "Google Sheets"
did to "VisiCalc" and "Excel".

And I don't mean "Google Wave", I mean a truly collaborative extensible
visually programmable spreadsheet-like outliner with expressions, constraints,
absolute and relative xpath-like addressing, and scripting like Google Sheets,
but with a tree instead of a grid. That eats drinks scripts and shits JSON and
XML or any other structured data.

Of course you should be able to link and embed outlines in spreadsheets, and
spreadsheets in outlines, but "Google Maps" should also be invited to the
party (along with its plus-one, "Google Mind Maps").

More on Douglass Engelbart's NLS and Dave Winer's Frontier:

[https://news.ycombinator.com/item?id=20425970](https://news.ycombinator.com/item?id=20425970)

------
seph-reed
Damn. I was going to do this soon. Guess I'll just have to do it even
harder...

EDIT: I already do this a ton at work, but I haven't yet made a tool for going
from json to srpeadsheet.

~~~
adrianpike
How are you doing it at work? Would love to pick your brain about the
workflows you're seeing for some customer research purposes :)

~~~
seph-reed
The route I was planning on going is more of an "someone smart sets it up for
layman's" use case.

Basically, you outline the type defs for your json, then how you want that
json to be displayed as csv.

------
rr-geil-j
Is there something like this but for YAML?

~~~
fourthark
I was going to say, this looks like YAML! Using blank columns for indentation.

~~~
DonHopkins
Part of SpreadON's design philosophy is to completely avoid the guess-work of
YAML, by explicitly declaring the data type as a type name tag. This also
enables extension tags for other formats and features like grids of
identically typed data, sheet and named region references, etc. And also to
avoid the inconvenient syntax of JSON as much as possible.

It addresses the most-complained-about problems with JavaScript: no comments,
and no trailing commas. You can use the unused cells to the right as comments
(or even intermediate calculation formulas), and not only is there no
inconsistency about not being able to put a comma after the last element of an
array or object, but it actually doesn't require any commas at all, no
quotation marks around string, no backslashes in strings, no spaces or tabs
for indentation, nor any other syntactic syrup of ipecac that JSON requires
but YAML foolishly tries to avoid by guessing. Each cell is a separate
syntactic token, so you don't need commas or spaces or tabs to separate them.

------
dustingetz
Graph, not tree

~~~
DonHopkins
JSON can only represent trees. JavaScript data can represent graphs, but JSON
is a subset of that.

