
JSON - bluu00
https://www.json.org/json-en.html
======
chubot
JSON is awesome, but it's troublesome that it can't represent binary data
(without a separate encoding, which requires metadata, and more code on both
sides of the wire).

I "discovered" a format that easily solves this, which I call QSN (quoted
string notation):

[http://www.oilshell.org/release/latest/doc/qsn.html](http://www.oilshell.org/release/latest/doc/qsn.html)

It's just Rust string literals with single quotes instead of double quotes.
Rust string literals have \x01 for arbitrary bytes like C or Python string
literals, but without the legacy of say \v and octal, and with just \u{}
rather than \u and \U.

I use in Oil to unambiguously parse and print filenames, display argv arrays,
etc. All of these are arbitrary binary data.

GNU tools have about 10 different ways to do this, but QSN makes it consistent
and precise.

I'm expanding it to QTSV, a variant of TSV, but if you like you could also
make some JSON variant. Technically using single quotes rather than double
would make it backward compatible with JSON.

~~~
spicybright
Interesting solution! I usually use base64 encoding unless I'm pushing lots of
data, then unfortunately it's easier to make some kind of "file upload"
separate from the json if you're going through http.

~~~
jkoudys
My demand for that too has slowed in the past couple years, mainly because
it's getting easier to do more on the client. e.g. I don't need to upload an
entire .docx file, if all my backend needs is ~50kB of values queried out of
one of its contained .xml files. Not saying this is a _solution_ to any of the
encoding questions, only that it's reduced my immediate need.

------
SigmundA
JSON is great and all, but we really, really need to agree on a binary format
that can support a few more data type including binary and dates.

I get that JSON being text is easy to debug, but text is just a binary format
that has viewers built into everything (utf8) if we agreed on a a more
structured hierarchal binary format there would be a viewer for it everywhere.

Taking it further a text file should just be one of these fictional hierarchal
binary format files with a single string node for the text, maybe with some
agreed metadata nodes as well, same with most other file formats.

~~~
xg15
My current hypothesis is that it really depends on the viewer and other parts
of its toolset. A format like this is easily defined (and there are already
dozens that would qualify), but right now, text is in many cases a lot more
convenient to use.

So to convince more people to use such a format, we'd need viewers and editors
that allow you to exploit the structure of the format, that are easy,
intuitively and convenient to use and that are readily available on whatever
system you're working on.

Text is also more robust than binary formats: Even if your
json/yaml/xml/whatever file got corrupted, you can still open it in a text
editor, make sense of it and fix it manually if necessary. An equivalent
binary format would need to have the same property.

~~~
SigmundA
I agree with all you have said, but I stick to my point, text is a just a
binary format we have agreed upon to decode individual characters then
secondary decoders are built upon that. Since we have agreed on this binary
format for characters there is a viewer for this format built into almost
everything, hence its convenience.

If we agreed upon a binary format that encodes hierarchal nodes/key/values
with various data types (raw binary, strings, numbers, dates) then hopefully
there would be a similar viewer everywhere and no need for most secondary
decoders (json on top of text).

It should also be faster and more compact if sending binary or numbers or
dates.

Properly designed it should still be able to handle corruption, but that just
has a lot to with the decoder and how it handles the corruption.

------
SAI_Peregrinus
JSON Numbers cannot represent all possible values of an IEEE754 Double (IE
Javascript numbers) since it's missing all of the NANs and infinities.

JSON Numbers cannot all be represented in an IEEE754 Double, since they're
arbitrary precision sequences of unbounded size.

This bit me recently, and so I'm slightly bitter about it.

~~~
RhysU
NaNs are a big gap and one that constantly worries me.

~~~
olliej
It never stops annoying me that nan and Infinity are properties in JS, and
JSON doesn’t just specialize for those values :-/

------
spicybright
I'm always impressed by this website's design and clear information. Obvious
flow charts, the basic info, and all on one simple page. I know json is very
simple, but I would love it if other technologies followed this design.

~~~
dhosek
I'm thinking that the book I learned Pascal from back in 1981 used a similar
notation (if not identical) for showing syntax. Of course with nearly 30 years
intervening, I can't come up with the name of the book, let alone the book
itself nor can I be 100% certain that I remember what the syntax diagrams
looked like.

~~~
setr
FYI The syntax duagram is called a railroad diagram, and is particularly
common in RDBMS docs (probably because SQL syntax is basically nonsense)

~~~
Twixes
It's quite amazing that all these years on, SQL is still king. There are
basically no rules for the syntax, how come nothing cleaner replaced it in
RDBMSes?

~~~
setr
Two things I think: it ultimately doesn't matter, because the key
differentiator between RDBMS selection is the _management system_ , not the
query system

And second is that RDBMS's are historically closed-source enterprise tooling
-- the vendors themselves have little interest rocking the boat, and there's
not much freedom for the community to inject a new language into the system
(except as ad-hoc, wonky transpilers, or framework wrappers like ORMs)

------
akmittal
JSON has been goto format for data exchange from a while. I think native
browser support is biggest reason of its success.

Is there any competing standard? I dont think I miss any features in JSON.

~~~
OmarShehata
Inability to write comments has been a big driver to YAML instead.

~~~
t-writescode
But with that positive comes all the negatives of YAML, especially around
arrays of structs vs regular arrays or arrays of structs which contain one
key-value pair each, and especially the struggles around multi-line strings or
other kinds of entries in YAML.

I don’t think it’s worth the cost and would rather pass some extra keys in
JSON that my parser ignores (since additive changes should never cause bugs in
data contracts), or a regular key ‘comment/description’.

~~~
hnlmorg
> But with that positive comes all the negatives of YAML, especially around
> arrays of structs vs regular arrays or arrays of structs which contain one
> key-value pair each

Are you able to elaborate on this problem? I'm not going to defend the
complexity of YAML but I've never ran into any issues with storing complex
structures within in.

> and especially the struggles around multi-line strings or other kinds of
> entries in YAML.

YAML actually has pretty sophisticated handling of white space within strings.
The problem isn't that whatever edge case you run into can't be done, the
problem is that YAML covers so many edge cases with different parsing
operators that it becomes a bit of a cryptic mess trying to remember which
operate is needed when. Though in fairness, JSON was never intended to be
human readable (it was meant to be machine generated and machine read) so it's
not any better in the readable whitespace department.

> I don’t think it’s worth the cost and would rather pass some extra keys in
> JSON that my parser ignores (since additive changes should never cause bugs
> in data contracts), or a regular key ‘comment/description’.

A third option would be to use hash-prefixed comments (like in Bash) then run
that JSON through a YAML parser since technically YAML is a superset of JSON
(literally, valid JSON is also valid YAML). Though I do accept that would be
an unattractive option to some because you end up with less strict format
checking of your source JSON (less strict in the JSON sense).

~~~
t-writescode
> the problem is that YAML covers so many edge cases with different parsing
> operators that it becomes a bit of a cryptic mess trying to remember which
> operate is needed when.

This was exactly my point around multi-line strings. You look at a mess of >'s
and |'s and it's absolutely not intuitive which one you should use if the
configuration files for one of the languages you're required to use happens to
use YAML. In json, there's virtually no ambiguity. Everything's either a
string, a number or a bool or a struct, and they all have exactly the same
shape with ... really no options to make things "easier"

As for structs and arrays, YAML doesn't really make them clear, in my opinion,
due to its lack of opening and closing values.

So, if you're new to k8s and need to make a configuration change to something
because the darn thing doesn't work, you're forced to learn yet another markup
language when it could just be a very familiar and comparatively intuitive
json blob.

For example,

    
    
      options:
        - key: value
          foo: bar
          thing: thing2
          smell: apple
    

"Oh, so to fix this, I just need to add another entry to turn on the debug
flag? And it's 'debug: true'? Oh, okay, so that's ...

    
    
      options:
        - key: value
          foo: bar
          thing: thing2
          smell: apple
          debug: true
    

right?

Oh, no? It's not... well what is it?

And then a long conversation with a coworker later, they explain, "Oh! No no
no, it's this:"

    
    
      item:
        - key: value
          foo: bar
          thing: thing2
          smell: apple
        - key: value2
          debug: true
    

Turns out debug was another option you needed to add.

Or, in other places in some syntax, you see a bunch of:

    
    
      items:
        - entry
        - entry2
        - entry3
    

or

    
    
      item: 1
      item2: 2
      item3: 3
    

I've familiarized myself more with YAML over time; but, its learning curve is
substantially more difficult than:

    
    
      {
        "everything is inside curly brackets": "keys and values can be strings",
        "there's a comma at the end of everything: [
            "arrays exist",
            "they're also comma delimited",
            1, "types don't matter"
        ]
      }

~~~
hnlmorg
To be honest I much much prefer YAML for manually handling multiline strings.
And frankly JSON's strictness on commas after all except the final entry
catches me out _so many times_ (it doesn't help that most parsers aren't great
at pointing out where the missing comma is).

Being pragmatic, I'd say neither serialisation format is better than the
other. JSON does something things better (It's easier to grok nested
structures and simpler to reason about the specification) but YAML does other
things better (easier to embed multiline blocks of text, handles streaming
better, supports comments).

Let's not also forget that most of the stuff that people dismiss in JSON is
only solved by unofficial hacks (eg jsonlines) that might be widely supported
but you cannot rely upon universally. So then you have two problems: a
standard that doesn't support x and multiple different implementations that
don't strictly support the standard. YAML is a hell of a lot better when it
comes to removing undefined behaviour in parsers -- even if that does come at
a cost to the complexity of the specification.

------
zests
I like JSON a lot, it's my go to for ad-hoc data storage. I've found it
extremely convenient because nearly all modern APIs can return json. Further,
jq makes working with json extremely ergonomic.

For certain tasks (personal use) I hand roll my own json document storage
database. I make a bunch of rest API calls to collect data and store it in
flat files. While doing this, I use JQ to store subsets of that information
(the stuff I really care about) in smaller json blobs. I then write a script
to aggregate all of the smaller blobs into a larger json array.

That's almost no work and you already get a nice schema-less database. Write
some commands (stored-procedures) to do any kind of filtering/modifications
and you can immediately get whatever view you want of your data with one
command. Write a wrapper script to identify documents by a field (primary key)
and you can make data modifications in an ergonomic way. I run "modify-
document <primary-key>" and it runs a tiny readline bash script prompting me
for info which immediately modifies the corresponding database row.

A work flow could be...

1\. Make API requests (with gnu parallel) storing the response json and a
smaller json blob.

2\. Aggregate the small blobs into an array.

3\. Filter blobs for any that are missing information.

4\. Manually update blobs missing information with readline script.

5\. Filter for blobs that need processing.

6\. Process blobs, use readline script to mark them as processed.

7\. Continue until all blobs are processed.

This is the kind of thing I would use excel to do in the past. Hopefully I
never make that mistake again.

~~~
phnofive
Are you me? Bash and jq (and sendemail) go so far in reporting and analyzing
data, it’s like magic. Haven’t figured out a great way to present the data so
folks can draw their own conclusions, though.

------
throwawgler87
Protocol buffers solve most of the issues mentioned by people here, on top of
being typesafe, space efficient (no need to encode key names because
everything is an ordered struct), and having great tooling.

[https://developers.google.com/protocol-
buffers](https://developers.google.com/protocol-buffers)

