
The Fixing-JSON Conversation - robin_reala
https://www.tbray.org/ongoing/When/201x/2016/08/22/Fixing-JSON-Redux
======
Zardoz84
SDLang !!!! : [https://sdlang.org/](https://sdlang.org/)

Full example : [https://github.com/Abscissa/SDLang-D/wiki/Language-
Guide#exa...](https://github.com/Abscissa/SDLang-D/wiki/Language-
Guide#example-sdl-file)

Examples:

Creating a Tree

    
    
        plants {
            trees {
                deciduous {
                    elm
                    oak
                }
            }
        }
    

Creating a Matrix

    
    
        myMatrix {
           4  2  5
           2  8  2
           4  2  1
        }
    

A Tree of Nodes with Values and Attributes

    
    
        folder "myFiles" color="yellow" protection=on {
            folder "my images" {
                file "myHouse.jpg" color=true date=2005/11/05
                file "myCar.jpg" color=false date=2002/01/05
            }
            folder "my documents" {
                document "resume.pdf"
            }
        }
    

Date and Date/Time Literals (and comments!)

    
    
        # create a tag called "date" with a date value of Dec 5, 2005
        date 2005/12/05
    
        # a date time literal without a timezone
        here 2005/12/05 14:12:23.345
    
        # a date time literal with a timezone
        in_japan 2005/12/05 14:12:23.345-JST

~~~
_pmf_
That's beautiful!

~~~
masklinn
For simple data exchange? Way over-engineered is what it is, TFA specifically
noted that they want to stay in the overall conceptual complexity of JSON:

> “Just use X” […] Nah, most of them are way, way rich­er than JSON, of­ten
> with fully-worked-out type sys­tems and Con­cep­tu­al Tu­to­ri­als and so
> on.

------
RoryH
I think now is a good time to re-quote the man himself... Douglas Crockford..

[https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...](https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaGSr)

    
    
      I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.  I know that the lack of comments makes some people sad, but it shouldn't. 
    
      Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.﻿

~~~
coldtea
> _I removed comments from JSON because I saw people were using them to hold
> parsing directives, a practice which would have destroyed interoperability._

I call BS. If people want to have custom parsing directives they can send them
out of band, encode them in the filename, or whatever. But they don't. And
I've not seen this happening with most other serialisation formats either, so
why would JSON be a particular target? After all it's value comes from being
trivially parsable across languages, and that would be killed by custom
parsing directives. Those wanting those would also implement their own parsers
etc.

Addition: Besides, reading comments to decide how to parse, implies either
"comments on top of the file" or a "2 stage parsing".

With 2 stage parsing, you could implement comments and whetever else yourself,
even in pure JSON anyway.

As for "comments on top of the file", well, just disallow them (only allow
comments after the first JSON object starts), and no issue with "parsing
directives" anymore...

~~~
mort96
If people really wanted parsing directives, they could just say that keys
starting with # and their values are parsing directives - e.g:

    
    
        {
            "#if": "parserversion > 1.5",
            "key": "somevalue",
            "#else": "",
            "key": "othervalue"
        }
    

Thought I also don't really see any reason to include parsing directivesin
JSON.

~~~
DonHopkins
If you're going to design something like that, it's wise to make sure it can
be represented as valid JSON.

Since JSON objects don't support order or repeated keys, that syntax can't be
represented, edited or processed by the rich ecosystem of JSON tools. Most
decent JSON editors will show that text with squigly red underlines. It's not
worth giving up interoperability, and having to make yet another new set of
tools for an incompatible syntax.

That was the mistake that the Angular 2 template syntax made.

But XML-based templating languages like Genshi [1] show how you can obey the
rules of XML, use namespaces correctly, support element, attribute and text
based expressions, looping, logic and macros, and it works just fine and
interoperates perfectly with existing tools.

Genshi was based on another Python based XML templating system called Kid [2],
which itself was influenced by Zope's page templates, TAL template attribute
language, TALES expressions [3] and METAL templates [4]. Genshi and Kid
templates are simple and easy to use compared to the conglomeration of Zope
stuff.

Here is the essential trick, described in the Zope manual, which all those
languages share, that makes it possible to sidestep the fact that attributes
are not ordered:

When there is only one TAL statement per element, the order in which they are
executed is simple. Starting with the root element, each element’s statements
are executed, then each of its child elements is visited, in order, to do the
same.

Any combination of statements may appear on the same elements, except that the
content and replace statements may not appear together.

Due to the fact that TAL sees statements as XML attributes, even in HTML
documents, it cannot use the order in which statements are written in the tag
to determine the order in which they are executed. TAL must also forbid
multiples of the same kind of statement on a single element, so it is
sufficient to arrange the kinds of statement in a precedence list.

When an element has multiple statements, they are executed in this order:

    
    
        1) define
        2) condition
        3) repeat
        4) content or replace
        5) attributes
        6) omit-tag
    

It would be great to have a Genshi-like templating language for JSON, tightly
integrated with JavaScript the same way Genshi is integrated with Python.

[1] [https://genshi.edgewall.org/](https://genshi.edgewall.org/)

[2]
[http://turbogears.org/1.0/docs/GettingStarted/Kid.html](http://turbogears.org/1.0/docs/GettingStarted/Kid.html)

[3]
[https://docs.zope.org/zope2/zope2book/AppendixC.html](https://docs.zope.org/zope2/zope2book/AppendixC.html)

[4]
[https://docs.zope.org/zope2/zope2book/AppendixC.html#metal-o...](https://docs.zope.org/zope2/zope2book/AppendixC.html#metal-
overview)

~~~
masklinn
> It would be great to have a Genshi-like templating language for JSON,
> tightly integrated with JavaScript the same way Genshi is integrated with
> Python.

But… that already exists. It's called "Python". Just define your data
structure using bog-standard Python and serialise it.

~~~
DonHopkins
What I mean is using JSON as the syntax, and JavaScript as the expression and
scripting language.

Zope has this supposedly "restricted subset" of Python that you're allowed to
use as expressions and scripts, but it's missing important features and isn't
meaningfully safe from a paranoid security perspective, and you end up having
to drop down to lower level Zope external methods to write real Python code,
which is very inconvenient.

If you trust someone enough to give them access to editing templates with
"restricted python expressions", then you can probably trust them enough to
use real Python expressions. You'd be unwise to give somebody you don't
actually trust access to even a "restricted subset" of Python running on your
server. That's just asking for trouble.

~~~
masklinn
> What I mean is using JSON as the syntax, and JavaScript as the expression
> and scripting language.

… it's the exact same process except with Javascript as the language? Generate
your JS datastructure then JSON.stringify it? I don't understand what the
issue is or why you'd want a templating language when much of JSON's point is
that it matches directly to common standard datastructures.

------
thymelord
Timestamps are so complicated once you factor in timezones and daylight
savings that it doesn't belong in JSON. Time zones are not static. They can
change from country to country, or even states within countries. Ditto for
when daylight savings is enacted during the year - even changing over the
years. There is no rhyme or reason to any of this. The data for this has to be
stored in tables and time zone meanings can change retroactively. The only
reliable time stamp is UTC without leap seconds. (Speaking of leap seconds,
who thought seconds going from 0 - 60 rather than 0 - 59 was a good idea?)

Accurate time is one of the most difficult things to model in computer
science.

~~~
realharo
Time is actually quite simple if you have a good mental model of what you're
trying to represent and don't try to mix different concepts into a single
value.

This talk explains it VERY nicely:
[https://www.youtube.com/watch?v=2rnIHsqABfM](https://www.youtube.com/watch?v=2rnIHsqABfM)

Basically just decide whether you're trying to store an absolute time (a
timestamp will do) or a civil time (year, month, day, etc.) and treat them as
two separate data types.

(If you just use "civil time + offset from UTC" like RFC 3339 does, then you
can convert it to an absolute time, but you can convert _only that one
specific value_ using that offset, and not any other - i.e. that offset is
_not_ a substitute for an actual timezone identifier.)

~~~
coldtea
> _Time is actually quite simple if you have a good mental model of what you
> 're trying to represent and don't try to mix different concepts into a
> single value._

Not even close.

[http://www.creativedeletion.com/2015/01/28/falsehoods-
progra...](http://www.creativedeletion.com/2015/01/28/falsehoods-programmers-
date-time-zones.html)

~~~
realharo
If you watch the video I linked above, he explicitly mentions that 99% of the
time you _should not_ be dealing with any of those things manually - that if
you find yourself working with offsets and DST values, it's a sign that you're
most likely doing something wrong.

~~~
coldtea
> _that if you find yourself working with offsets and DST values, it 's a sign
> that you're most likely doing something wrong._

The video is wrong then.

If you're writing an application that deals with times (e.g. stores and
queries events with specific timestamps) and you don't take offsets and DST
values into account, you get all kinds of weird edge cases.

~~~
realharo
No, the point is that you should not be doing those thing _manually_ , i.e.
you should never be adding integer offsets to something or doing similar
operations. Instead, all the rules for conversions between times are already
stored in the timezone database, so all you should do is something like

    
    
        ToAbsolute(CivilTime, TimeZone)
    

and the reverse. He also mentions (at 26:50) proper ways of dealing with
repeating and non-existent civil times close to DST transitions (and sane
default ways if you just don't want to bother).

When talking about JSON or serialization formats specifically, none of the
complexities need to ever leak into the representation.

~~~
masklinn
> Instead, all the rules for conversions between times are already stored in
> the timezone database, so all you should do is something like

That only works if you're completely detached from the user and don't care for
them. Example:

On January 1st 2011, a Samoan user living in Samoa (timezone Pacific/Apia)
records an event for January 1st 2012. You convert January 1st 2012 09:00:00
to UTC, storing 2011-01-01T20:00:00.

On January 2nd 2012 at 10AM, you remind your user that they had an event set.

Because in May 2011 Samoa announced they were going to skip a local day and
move across the international date line. So 2011-12-30T09:00:00 UTC was
2011-12-29T23:00:00 Pacific/Apia, but 2011-12-30T10:00:00 UTC was
2011-12-31T00:00:00 Pacific/Apia.

And as far as your user is concerned, they told you to ping them on January
1st at 9AM and you pinged a day late. Just because you store absolute
datetimes doesn't mean you won't fuck up, and when the data is user-provided,
chances are as good that _that decision_ will be the one fuckup.

~~~
realharo
That still doesn't mean that you should do any of those operations by hand
(which is what the previous comment was about).

But sure, if you do scheduling for future times, you do need to be aware of
such possibilities and store the future times as civil times (and have some
sane way of handling non-existing/repeating times - but again, most of the
time the system/library will do this for you).

Then even if the user is flying across the world, they can still get the alert
at the right time wherever they are (assuming the device updates local
timezone based on location).

I never said that you should only store absolute times - only that they are a
separate data type and you shouldn't mix them or try to convert them by hand.

------
wtbob
I think the idea about removing whitespace is kinda hilarious, because that
would mean one could write:

    
    
        ["a" "b" "c"]
    

or:

    
    
        {"a": "foo" "b": "bar" "c": "baz"}
    

The advantage over:

    
    
        (a b c)
    

or:

    
    
        (a foo b bar c baz)
    

or:

    
    
        ((a foo) (b bar) (c baz))
    

or:

    
    
        ((a . foo) (b . bar) (c . baz))
    

seems … non-existent.

------
tarnacious_
I don't think # or // for comments is a very good idea as it would also make
newline characters significant. I find it useful to be able store a JSON
object per-line.

------
sanqui
Personally, I would really like to see integer object keys (as opposed to only
string keys). For simple numeric transformations, strings feel really heavy
and require annoying conversion in languages. E.g. {"10": 60, "42": 2}.

~~~
niftich
The flipside is that an integer-keyed map is similar in meaning to an array,
which associates by virtue of placement, an integer with the value sitting at
that index.

While it's possible to spec it to forbid this interpretation, Lua has made
this interpretation a language feature, and it'd become impossible to
construct an unambigous parser/printer in Lua for this new format.

~~~
thaumasiotes
> it'd become impossible to construct an unambigous parser/printer in Lua for
> this new format

How so? JSON is just text; you can parse it however you like. For example, I
can write a parser that produces the byte sequence "It checks out." for any
valid JSON input. A Lua JSON parser that represents objects with integer keys
as maps with string keys, and a Lua JSON parser that represents objects with
integer keys as sparse arrays, are both unambiguous and both correct (as to a
hypothetical JSON which allowed integer keys). JSON is a format, not a Lua
structure.

Even if your JSON data is

    
    
        { 10: 10, "10": "ten" }
    

there's no problem being _able_ to write an unambiguous parser. Define what
your parser does in that situation, and it's unambiguous.

------
fiatjaf
This guy is an idiot anyway. There's no way to "fix JSON". All you can do is
create a new language, it doesn't matter if you call it JSON 2.0, it will
still be incompatible with all the JSON parsers of today. I don't get why he
is so mad at people suggesting him to use one of the JSON supersets that exist
today.

------
willvarfar
If // and /* are used as comments, then most of this new extended-JSON will
still be valid Javascript.

If # is used as comments, then this breaks documents being Javascript.

The post says that "don't eval() JSON ever", but that's like Crockford leaving
out comments originally in order to stop them being abused as processor
directives...

~~~
daenney
Like the post says, JSON is already not guaranteed to be valid JS so this
isn't really a problem. The fact that 99% of the time it works to just eval it
is great and granted, the "feature" that triggers this is incompatibility is a
bit obscure.

But if you just do the right thing from the start you'll never have a thing to
worry about in the first place, # comments or not.

~~~
abofh
This is a bit of a strawman argument though. JSON is rarely used in a generic
form - but instead as a format between endpoints (servers, clients,
whathaveyou). When you control the server, you control the API - so random
UTF-8 magic crap? Wasn't valid in the first place.

Just because you can express things that aren't javascript with it doesn't
mean that ever actually turns up in practice -- more importantly 95% of the
time, you control both the producer and consumer of JSON because it's consumed
internally anyhow - at which point - you're not _going_ to write illegal json
to yourself.

This is a true statement, but virtually nothing in practice cares.

~~~
josephg
I'm jealous. I've had it show up in production. I think we were using script
tags with sanitized JSON to re-hydrate an isomorphically rendered page.

Of course, a user entered one of the valid json, invalid JS characters in a
field, which made its way into our database. Once there we started having
weird errors show up between the frontend and client. That little gem took us
_days_ to track down.

JSON should be either a subset of javascript, or obviously not a subset of
javascript. 99% compatible systems are dangerous landmines. That you haven't
been blown up yet doesn't make it ok.

------
velox_io
JSON (for the most part) is a nice format to work with, aside from loosely
defined datetimes as mention.

The two areas where I believe the format can greatly be improved; 1# having a
standard to define the structure (sometimes schemas can be handy!); 2# a
stranded binary format, yes right now with have UBJSON (which doesn't have a
date format, this is worse in binary) and BSON (which contains some MongoDB
specific stuff).

I'm not saying they don't have their place, but.. Protocol Buffers are more
akeen to .net or Java serialization, in the they're quite fragile if used with
different versions and/ or with different vendors.

~~~
thymelord
MessagePack is better supported than UBJSON and has far more implementations
in almost every language.

[http://msgpack.org/index.html](http://msgpack.org/index.html)

JSON Schema draft 4 is the defacto schema standard for JSON.

[http://json-schema.org/latest/json-schema-core.html](http://json-
schema.org/latest/json-schema-core.html)

Having used both XML schema and this one, I much prefer using JSON schema.

------
fiatjaf

        “Just use X” · For val­ues of X in­clud­ing Hj­son, Ama­zon Ion, edn, Tran­sit, YAML, and TOML. ¶
        Nah, most of them are way, way rich­er than JSON, of­ten with fully-worked-out type sys­tems and Con­cep­tu­al Tu­to­ri­als and so on.
    

What? MOST OF THEM? YAML is not, Hjson is not, TOML is not.

~~~
gengkev
YAML... really? Looking at the examples in the Wikipedia article
([https://en.wikipedia.org/wiki/YAML](https://en.wikipedia.org/wiki/YAML))
gives me a headache. Fortunately, most actual YAML files I've seen are not
that complicated.

~~~
fiatjaf
[https://en.wikipedia.org/wiki/JSON#YAML_sample](https://en.wikipedia.org/wiki/JSON#YAML_sample)

------
niftich
He summarized the most upvoted posts from the last thread [1] really well.

[1]
[https://news.ycombinator.com/item?id=12328088](https://news.ycombinator.com/item?id=12328088)

Regarding datetimes, it's worth pointing out the conversation that TOML had
about it. It's a pretty long read [2][3][4][5] with lots of points raised for
and against, but it also shows some of the process of how consensus was
eventually forged: through trial-and-error, some enlightening realizations,
expert opinions, and a willingness to leave _some_ aspects of the behavior it
up to parser, to avoid requiring all other languages to reimplement half of
Java 8 Time.

[2] [https://github.com/toml-lang/toml/pull/414](https://github.com/toml-
lang/toml/pull/414)

[3] [https://github.com/toml-lang/toml/pull/362](https://github.com/toml-
lang/toml/pull/362)

[4] [https://github.com/toml-lang/toml/issues/412](https://github.com/toml-
lang/toml/issues/412)

[5] [https://github.com/toml-lang/toml/issues/263](https://github.com/toml-
lang/toml/issues/263)

The salient point being that RFC 3339 does _not_ in truth describe _exactly
one_ datatype, so you can't just reference the spec and hope everyone reads it
the same way. EDIT: Specifically, RFC 3339 says:

"Date and time expressions indicate an instant in time. Description of time
periods, or intervals, is not covered here.", but then goes on to define [6] a
number of different syntaxes in ABNF, to indicate the subsets of ISO 8601 that
"SHOULD be used in new protocols on the Internet." It essentially never
defines what a 'valid' RFC 3339 object looks like, it doesn't explicitly say
which ones are considered complete representations, so it's not clear if, say,
'2016' is a valid RFC 3339 object... but the ones towards the bottom contain
more than one discrete term, and can be presumed to be 'complete'
representations. These are:

[A] partial-time: HH:MM:SS(.SSS)

[B] full-date: YYYY-MM-DD

[C] full-time: 'partial-time' +/\- offsetFromUTC(HH:MM)

[D] date-time: 'full-date' "T" 'full-time'

Out of these, [D] is _clearly_ a timestamp of an absolute instant in time, but
the rest are debatable.

[6]
[https://tools.ietf.org/html/rfc3339#section-5.6](https://tools.ietf.org/html/rfc3339#section-5.6)

~~~
outsidetheparty
> He summarized the most upvoted posts from the last thread [1] really well.

I feel like he glossed right past the objections to the biggest and (to my
mind) most destructive proposed change, the commas-to-whitespace thing; in
fact doubles down on it (let's just declare that commas _are_ whitespace!
_That_ surely won't confuse anyone!)

~~~
thymelord
I've written a few JSON parsers over the years that treat commas as
whitespace. The grammar is simpler and the parser is faster as result. As long
as one always emits standards-compliant JSON there's no problem.

Had the JSON standard supported ECMAScript array holes [1,,,2,,3] this grammar
shortcut would not have been possible. But luckily that's not the case.

~~~
outsidetheparty
Well, that's the thing, though: sure, for machine parsing it really doesn't
matter what you use as a delimiter.

But the only reason to get rid of commas is to eliminate the trailing comma
problem, which only occurs when hand-editing JSON. Replacing that with
whitespace, or worse both whitespace and commas, would be a lot more prone to
hand-editing errors, I think, than would the much less drastic change of
allowing trailing commas. Or better still of just leaving JSON as is and
letting people use a more robust protocol, if that fits their needs, or pre-
parsing whatever special snowflake variations they want into standards-
compliant JSON.

I'm more or less in agreement with the commenter on his site who said

> this entire proposal pretty much comes down to "I like JSON, but need more
> and am too lazy to write the extra 3 line wrapper to process type 'x'." I'd
> say no thanks. [https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-
> JS...](https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-
> JSON#c1471881712.859312)

