
Specifying JSON - andyjohnson0
https://www.tbray.org/ongoing/When/201x/2016/04/30/JSON-Schema-funnies
======
skrebbel
This was my personal pet peeve for a while. I even half-assed a spec language
called RELAX JSON that I bet Bray would like. [0]

But then I realized that what I had made up was _very_ similar to type
definitions in TypeScript!

Especially if you use `interface` and not `class`, TypeScript gives you a lot
of flexilibity in specifying data structures. You get optional values, typed
arrays, typed but mixed arrays, references to other named types that you
define yourself, and so on. Really, it's all you need and more. The only
downside is that you can spec stuff that you can't put in JSON, such as Date
objects. As a schema language, that is a bit weird. But just don't spec it
like that! :-D

I'll post some examples in a reply when I'm back behind a computer.

[0][https://github.com/eteeselink/relax-
json](https://github.com/eteeselink/relax-json)

~~~
bcherny
Ha! That's the reason I started working on [https://github.com/bcherny/json-
schema-to-typescript](https://github.com/bcherny/json-schema-to-typescript).

For the record, JSON Schemas are quite a bit more expressive than TypeScript
interfaces, or even Scala Traits. I actually put together a big list of every
JSON Schema constraint that isn't checkable at compile time in TS:
[https://github.com/bcherny/json-schema-to-typescript#not-
exp...](https://github.com/bcherny/json-schema-to-typescript#not-expressible-
in-typescript).

~~~
skrebbel
Cool stuff!

Fair enough on the constraints. One of the design goals I had at the time was
that _I didn 't want_ stuff like that to be expressible. Basically, it felt
over the top to me - you might as well just allow writing predicates in plain
JS then. Even more expressive!

My primary goal was a spec language for humans to read and write. Machine
validation came second. TypeScript, oddly maybe, appears to have a similar
design goal.

------
carsongross
I created a minimal schema for JSON a few years ago:

[http://jschema.org/](http://jschema.org/)

The idea was/is to capture the spirit of JSON minimalism, avoiding the
insanity of XML-based schema languages.

Currently, some UCSC students are working on javascript and java tools to work
with schemas. We should be able to release that work in the next month or so.

~~~
drdaeman
There's also JSONSchema ([http://json-schema.org/](http://json-schema.org/)),
which is significantly more verbose, but has optionals, string length
constraints and other sort of quite useful stuff.

And there is (or, better say, was) an "Orderly JSON" language that had nice
and compact human-manageable representations, but had essentially died from
nobody using it.

JSchema:

    
    
        {
            "name": "@string",
            "age": "@int"
        }
    

Pros: is JSON, is simple to read and write. Cons: is (currently) limited to
very primitive schemas.

JSONSchema:

    
    
        {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "age": {"type": "integer", "minimum": 0, "description": "Age in years"}
            },
            "required": ["name"]
        }
    

Pros: is JSON and reasonably powerful. Seems to be a most common standard out
there, with lots of libraries available. Cons: is verbose to the extent it's
no fun to read or write by hand. Has some ambiguities and - as the OP article
correctly says - spec is unfinished.

Orderly JSON:

    
    
        object {
            string name;
            integer {0,} age?;
        };
    

Pros: compact and powerful. Cons: not JSON but a custom grammar, no one uses
it.

~~~
xirdstl
We use JSONSchema in several places. It has met our needs pretty well,
particularly for dynamic form generation (using the angular-schema-form
library)

~~~
throwanem
As long as you don't need good interop, JSON Schema can be pretty great. (If
you need interop, it's not, because none of the tools seem to implement
unspecified behavior the same way.)

~~~
drdaeman
Even when you need interoperation, the well-defined subset of JSONSchema may
do reasonably well. I think every library out there handles the essentials
(the core types and basic constraints) in the exactly same manner, and the
incompatibilities starts with less common things like references etc.

------
ivan_gammel
JSON is a JavaScript child with all its pains and mistakes, and with one of
the biggest ones of not looking into what's going on other platforms.

There was great work done by OASIS to look into all aspects of meta-modelling
and data warehousing, that resulted in specs like MOF and UML2. I'd say that
every modern DOM or other object metamodel must be based on these specs (or
have some sensible mapping like the one done by Eclipse EMF project for XML
schema to Ecore). What's good in MOF is that it easy to map it to programming
language metamodels (for example, you can map MOF model to Java Reflection
API).

Another option, if you don't need to design it from scratch, is just to use
XML schema: JSON is basically a representation of DOM. In Java you can even
use the same API to serialize both to JSON and to XML, which means that they
both can be described by the same model language. If someone doesn't like XML
schema syntax, it's probably possible to load XML Schema XSD into DOM and then
serialize it to JSON - voila, you have JSON schema.

------
virmundi
On a related note, Transit [1], by Hickey et all, can provide a meta scheme
for data interchange that inter-operates with JavaScript. It provides a richer
set of data sets than traditional schemas. You can define things as a Set or
List, for example. As a result, you can express data behavior while expressing
general structure.

1 -
[http://blog.cognitect.com/blog/2014/7/22/transit](http://blog.cognitect.com/blog/2014/7/22/transit)

~~~
Scarbutt
But its just to slow compared to JSON.

~~~
dignati
Do you have any data on that? We found that using transit could actually
increase overall throughput because it does some minimal
compression/deduplication.

------
RangerScience
_Or just use bloody YAML_

Seriously. Everything the others can do, it does better (or, at least as good
- not gonna argue that YAML tags are pretty); and most importantly,
_optionally_. (Plus, it's got a few features the others don't)

You don't have to use block text. You don't have to use quotations. You don't
have to use tags, or references. A simple usage of YAML is simpler than any
other. A complex use of YAML is still less complex than many uses of XML, but
has the same information content.

~~~
bsandert
Try writing YAML in combination with templating (such as in Salt). Because
indentation means hierarchy, it's terrible. I'd _much_ rather use JSON in such
a context.

~~~
nojvek
What's salt?

~~~
ptman
Probably [https://saltstack.com/](https://saltstack.com/)

------
pfooti
Yeah, so I do a lot with JSON right now, and having some kind of spec for APIs
is pretty important. My biggest pain point is dates, since there's no simple
way to throw some JSON at JSON.parse and get a date out. You have to know, a
priori, what fields are dates and do some follow-up. There's other issues with
some fields sometimes getting serialized as string representations of
themselves (e.g.: "10" vs 10, or "false" vs false).

On the other hand, I did a lot with XML in the 00s, and I sure don't want to
go back to dealing with that anymore. I'm sure the world has come a long way
since SAX was the thing to use for XML, but that experience left me so deeply
scarred that I'm willing to just write an ad hoc schema on my frontend and
say, "after JSON.parse, go in and coerce these fields to be int / bool / Date,
etc". Not great for interop, but what the hey.

~~~
PretzelFisch
Why we could not have # as a date prefix and suffix boggles my mind. And
comments, if we want to use JSON for a config file we need to agree to allow
// and / __\ to temporarily comment out properties and object.

~~~
sopooneo
Is your idea of having # as a date prefix and suffix compatible with having
json be a subset of valid javascript? Because that was one of its original
selling points.

Are you thinking something like

    
    
        {
            "business": "Jim's Hat Shop",
            "date_visited": #2016-05-03#
        }
    

Or were you thinking

    
    
        {
            "business": "Jim's Hat Shop",
            "date_visited": "#2016-05-03#"
        }

~~~
PretzelFisch
I was thinking { "business": "Jim's Hat Shop", "date_visited": #2016-05-03# }
because the parsing would be simple while the other would work.

~~~
pfooti
Personally, this isn't bad - some kind of literal notation for date objects
seems like a good idea (we already have octal and binary literals).

The other option ("#date#") _is_ bad, IMO, since it's a magic string. I don't
like magic strings because they blow up at unexpected times. Imagine the
notice to the users: "please stop using the hashtag #racecar#, we know
palindromes are cool, but it is causing our database to crash". I had similar
issues with CDATA blocks in XML, actually - escaping strings to afford
reasonable parsing is never fun.

~~~
sopooneo
Your note about octal and binary literals sort of goes back and forth compared
to the idea of date literals for me. Because octal and binary literals _are_
supported in vanilla javascript, but are _not_ explicitly supported in JSON.
However, throwing them in seems like only a minor transgression since they are
native to javascript, and are effectively just another representation of a
number.

On the other hand, the date literal you suggest is _not_ part of javascript.
And on that note, why not just use timestamps for dates in json?

------
EdSharkey
We wish a better standard would emerge for JSON Schema as well. We had to
choose from just a few Validators that would grok the style of $ref we used -
that part of JSON Schema is woefully underspecified. Of the few validators
that we found, we found WILDLY differing performance and validation failure
message quality.

The fastest JavaScript implementation we found was is-my-json-valid, but its
error messages are nearly useless.

The best error messages (and IMHO conformance to the spec) came from fge's
json-schema-validator, which is a Java implementation. The test suite on this
impl is impressive. The validation failure messages are excruciatingly long
and detailed if you have a large schema, but help a lot with debugging your
document generators. But, the performance is awful. If you do any regex
patterns, it shells to a Rhino script to to the validations which are
predictably catastrophic for performance, making that validator unusable in
production environments.

JSON Schema must be very difficult to optimize the way it is structured, which
tells me it is probably a badly conceived specification.

JSON in general needs the kind of tooling and library love that XML got back
in the day. I've searched in vain forever just for a great query language for
JSON documents. I'm convinced that JavaScript+Lodash is the query language for
JSON, to which I say meh. And, I'd love to see a first-class JSON streaming
API and tooling/libraries (similar to STaX in the good old days of XML). How
is anyone supposed to work with huge documents??

We're simply not giving ourselves enough tools, it's all very loosey goosey in
JavaScript land.

------
andyjohnson0
_" If it looks like a doc­u­men­t, use XML. If it looks like an ob­jec­t, use
JSON. It’s that sim­ple. The es­sen­tial dif­fer­ence isn’t
sim­plic­i­ty/­com­plex­i­ty or com­pact/ver­bose or type­d/­tex­t, it’s
ordered-by-default or not."_

I can appreciate that an order-independent representation can make schema
versioning easier, but what other advantages does this provide?

~~~
teach
Mostly for programming languages with hash tables / dictionaries.

Python dictionaries hold keys but the keys aren't in any particular order, so
allowing them to come out in whatever order maps very naturally to the
language.

~~~
Pxtl
Yes, the order-obessession of XML is a constant source of surprise in XML. The
idea that

    
    
       <myObj>
          <Foo>myFoo</Foo>
          <Bar>myBar</Bar>
       </myObj>
    

is somehow different from

    
    
       <myObj>
          <Bar>myBar</Bar>
          <Foo>myFoo</Foo>
       </myObj>
    

is incredibly surprising and frustrating, and XML schema actually _supports_
ignoring the difference but xml schema tools seem to hate using the "all"
instead of the "sequence" schema definitions.

The worse offender, imho, is Microsoft's DataContractSerializer, whose default
behavior is to silently ignore this ordering failure and also silently fail to
deserialize any out-of-order elements.

Either fail loudly or be flexible about ordering.

~~~
jdnier
"If it looks like a doc­u­men­t, use XML." And the reason for that is XML
originated from SGML and both were initially focused on marking up documents
(where everything is ordered, just like the words in this sentence), not
objects or bits of data. And that's why JSON is usually simpler and more
useful for your example and similar. Whether or not the order is significant
in your XML example becomes a schema-level issue (DTD, Relax, XMLSchema).
Maybe you are being forced to use XML, but if not, you're describing an
object, so use JSON.

~~~
Pxtl
Yes, but xmlschema allows for a decent platform-agnostic system for defining
objects. The problem is that most tools default (or even force) to "sequence"
instead of "all". With "all" you get object-ish XML behavior.

There is no well-adopted analog to xmlschema for json.

------
int_handler
Why not just define your schema using Protocol Buffers and use the [proto3
JSON mapping]([https://developers.google.com/protocol-
buffers/docs/proto3#j...](https://developers.google.com/protocol-
buffers/docs/proto3#json))?

On a similar note, text protos actually work very well as configuration files
and are used by SyntaxNet/Parsey McParseface [1] and Bazel's CROSSTOOL config
[2].

[1]
[https://github.com/tensorflow/models/blob/master/syntaxnet/s...](https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/context.pbtxt)

[2]
[https://github.com/bazelbuild/bazel/blob/master/tools/cpp/CR...](https://github.com/bazelbuild/bazel/blob/master/tools/cpp/CROSSTOOL)

~~~
habitue
"just" sort of implies this is the most obvious thing to do, and for some
reason he's eschewed this solution

~~~
int_handler
By "just" I mean this is a simpler solution (compared to JSONSchema), not that
it is an obvious solution. I'm pretty sure the author does not know about this
solution since using Protocol Buffers for configs is not a well-known use case
outside of Google.

------
willvarfar
I felt this pain when working with JSON APIs. I ended up writing my own
literate JSON spec tool that looked like JSON but was in fact Python:

    
    
        {"name": str, "age": int, "groups": [str]}
    

The template has the same 'shape' as the expected JSON, making it very easy
for humans to grok.

We put all these specs into a file that acted both as documentation and as the
actual Python code we used as templates to validate the real JSON. The doc was
always accurate ;)

It turned into a full-blown dynamic type checking thing for Python, but it's
still most useful for checking JSON APIs at runtime:
[https://pypi.python.org/pypi/obiwan/1.0.8](https://pypi.python.org/pypi/obiwan/1.0.8)

If you validate the JSON with an obiwan template, you can avoid having to
type-check it as you parse it.

Obiwan syntax would be easy to use in other languages, but we never needed to
try.

~~~
tracker1
I often do something similar when documenting a structure as a first pass..

    
    
        {
          "name": String, // [required]
          "age": Number, //int
          "groups": [String], // groupId (UUID)
          ...
        }
    

The structure represents the actual object, and comments allow for additional
context/details... it's worked pretty well for documenting APIs that are under
development, or intention when working with other devs.

~~~
GauntletWizard
At that point, you're inches from specifying protobufs or thrift, and the
advantages in structured object parsing and tooling that exists from using one
of those ecosystems is not to be ignored.

~~~
tracker1
It's worth noting that depending on the language, JSON parsing is faster than
binary implementations. Also worth noting that you can do JSON + gzip if there
are larger records to be transferred regularly.

------
adsharma
Whether some spec has expired or not makes little difference to me, but some
simple things seem to be hard to express with json schema:

[https://github.com/fge/json-schema-
validator/issues/182](https://github.com/fge/json-schema-validator/issues/182)

------
mianos
I am using
[https://github.com/keleshev/schema](https://github.com/keleshev/schema) with
_python_. I see a solid need for a schema here and there but don't buy into
the whole heavy handed XSD. (Who hasn't struggled with the completely broken
Microsoft .NET SOAP schema). From a practical perspective I am tending to
validate multiple sections of json with a schema closer to the code that uses
it. This way I don't have to maintain some massive schema at the top. (It is
great for decorators @validate_body(Schema({'size': Use(int), 'name':
unicode}) def handler(body): aa = body['size'] ...

------
tetron
[http://github.com/common-workflow-
language/schema_salad](http://github.com/common-workflow-
language/schema_salad) is another schema language for JSON & YAML based on
Apache Avro.

------
boronine
My attempt at this problem: [http://www.teleport-
json.org](http://www.teleport-json.org) Currently working on a Haxe port so I
can deliver this library to multiple languages from one code base.

------
thelastone
JSONSchema doesn't seem to be quite there yet. There are some conflicts in the
standard and it's not very simple to read (see for example
[http://www2016.net/proceedings/proceedings/p263.pdf](http://www2016.net/proceedings/proceedings/p263.pdf)).
One tool available for the same task that is really neat and extensible is
Cerberus ([http://docs.python-cerberus.org/en/stable/](http://docs.python-
cerberus.org/en/stable/)), maybe could try that. cheers!

------
danh1979
I use a lot of JSON and YAML via libvariant [0], a set of C++ libraries. e.g.
the core class Variant is a JSON-ish object implementation; there are
pluggable Serializers & Deserializers, etc.

Libvariant includes a command line tool, varsh ("Variant shell"), that can
schema-validate JSON and YAML documents.

[0]
[https://bitbucket.org/gallen/libvariant](https://bitbucket.org/gallen/libvariant)

------
snaky
No one mentioned JCR (JSON Content Rules)?

There's
[https://tools.ietf.org/html/rfc7159](https://tools.ietf.org/html/rfc7159),
co-constraints [https://www.ietf.org/internet-drafts/draft-cordell-jcr-co-
co...](https://www.ietf.org/internet-drafts/draft-cordell-jcr-co-
constraints-00.txt), compact syntax

------
drewda
I've also been surprised that there are no JSON editors that are both powerful
and attractive. The one that comes closest in my experience is
[http://www.jsoneditoronline.org/](http://www.jsoneditoronline.org/) but I'd
appreciate pointers to any others.

Unlike XML and UML, there's just less of a market for GUI-like editors for
JSON?

~~~
tracker1
I found one (windows based) a while back, but cannot for the life of me
remember the name... really liked it. It was similar to XML Notepad from MS,
but for JSON. Unfortunately the name of it was __NOT __ "JSON Notepad" which
would have made it far easier to find again. :-(

Something that would also be nice, if anyone comes up with such a beast would
be the ability to edit a multi-record json file... effectively each line is a
single JSON, and an LF at the end of each line/record. I've used this
structure a few times. Given that JSON will typically encode special
characters allowing for single-line representation by default, it's worked
_really_ well, especially combined with gzip as a stream for archiving data.

If anyone knows what that program was, would love to find it again.

~~~
TMSZ
I hope this would be JSONedit as it is mine ;) -
[http://tomeko.net/software/JSONedit/](http://tomeko.net/software/JSONedit/)
There are also - I think two different - plugins for Notepad++, JsonViewer:
[https://jsonviewer.codeplex.com/](https://jsonviewer.codeplex.com/),
commercial: XMLSpy and JSONBuddy and one or two half-baked projects on
codeproject.

~~~
tracker1
That may well have been it, thanks. Will need to get WINE setup, so I can give
it a try.

Tweeted it... It's funny, but I often use twitter to tweet about stuff, just
so that I can find them later.

------
oceanswave
CSV must have been frustratingly hard for Tim.

------
snikeris
> By the way, it’s amus­ing that this cen­tu­ry hasn’t yet of­fered a
> plau­si­ble new markup al­ter­na­tive to the last one’s mouldy left­over­s.

See: [https://github.com/edn-format/edn](https://github.com/edn-format/edn)

~~~
drjeats
Isn't edn a data/object format, rather than a markup format? It's like a
better JSON.

------
ythl
Even if my data looks like a document, I refuse to use XML. It's just too
clumsy to traverse compared to JSON.

~~~
MichaelGG
My biggest problem with using JSON, apart from the needless quotation marks on
identifiers, is the lack of comments. Any human-editable format (like using
.json for project config files or build files) should have comments.

~~~
TMSZ
There are few ways of inserting comments:
[http://stackoverflow.com/questions/244777/can-i-use-
comments...](http://stackoverflow.com/questions/244777/can-i-use-comments-
inside-a-json-file)

If you do not care too much about exchanging data with other apps then
parser/generator supporting javascript style comments (e.g. json-cpp) would
work. With each JSON value it can store "commentBefore" and "commentAfter". If
you need you can strip these comments any time by parse + write cycle with
"collectComments" option inactive.

~~~
MichaelGG
Adding data and calling it a comment isn't really a comment.

And I find the whole "run minifiy/strip first" to be a hilarious suggestion
from Douglas Crockford. It can be rephrased as "If you want comments in JSON,
for a config file say, then don't use JSON."

That, and XML requiring the tag name in the closing tag, are some of the
obvious, silly, mistakes. XML in particular. Without the named closing tags,
just using, say, <>, would reduce the apparent bloat and annoyance by a very
large factor.

------
CGamesPlay
I really liked what Atom did with CSON (CoffeeScript object notation). It
feels very similar to CSS, in fact you specify a CSS selector and then a
string representation of a key sequences and map it to an action.

    
    
        'atom-text-editor .insert-mode':
          'j k': 'vim-mode:enter-normal-mode'

~~~
BinaryIdiot
Looking at the code on their project's page it doesn't seem to quite match
what you wrote but maybe they don't have examples that cover it. I'm not quite
sure what your example is supposed to do but those can be perfectly valid JSON
keys and values so if it's doing anything but straight translating it then
that's kinda weird.

