
Introducing TJSON: Tagged JSON with Rich Types - alexatkeplar
https://www.tjson.org/
======
portlander12345
Surprised no one here has mentioned Transit. It's an extensible typed data
format with both JSON and binary representations. In other words, you can
configure custom types, such as Immutable data structures, and they'll be
automatically serialized and restored for you.

Good intro: [http://cognitect.github.io/transit-
tour/](http://cognitect.github.io/transit-tour/)

GitHub: [https://github.com/cognitect/transit-
js](https://github.com/cognitect/transit-js)

Note that in the introduction they provide a simple benchmark where Transit is
both more compact and faster to parse than JSON with custom hydration.

------
keithwhor
Biggest complaint:

Unreadable format, as mentioned in this thread.

{"key:A<A<s>>":[["values"],["here"]]}

This doesn't mean anything to me as a developer, unless I've seen the spec.
It's kludgy. It's not reverse-compatible if you don't install a TJSON parser.

Two solutions immediately strike me as better, one has been mentioned here.

(1) Not optimal, but _actually spell out words_ in key names. There's no
reason "A" has to mean Array. That doesn't mean anything to me. If I'm seeing
it for the first time and have no idea what TJSON is, the very next value
could be "key2:B<B<t>>".

(2) Far more optimal: as an example has been provided with "date", just nest
objects as values for any extended types. Then this spec is _completely_
reverse compatible and compliant, and as a developer I don't have to worry
about parsing key names.

e.g.

    
    
      {
        "some_nested_array": {
          "type": "array.array.string",
          "value": [
            ["values"],
            ["here"]
          ]
        }
      }
    

Extremely easy to implement and not reliant on a governing body.

~~~
kafkaesq
And that was the beauty of JSON: There was no "format", per se. And you
certainly didn't need to think or care about some friggen "spec". All you had
to do was sort of take a look at it, say "OK", and get moving with the
business of writing your application.

------
tobltobs
Dear JS Hipsters, even if you all suffer from NIHS, could you please take look
at XML before you invent another format. I am sure you will get used to those
square brackets.

~~~
seagreen
First: XML is a markup language for encoding documents, JSON is a data-
interchange language. Each can be twisted to do the job of the other, but they
don't naturally do the same job.

Second, XML is extraordinarily complicated. Flipping around the XML 1.0 spec
([https://www.w3.org/TR/xml/](https://www.w3.org/TR/xml/)) isn't really
encouraging me that all of this is there for a reason. I'd love to be proved
wrong though!

In contrast, RFC 7159 is incredibly short and readable:
[https://tools.ietf.org/html/rfc7159](https://tools.ietf.org/html/rfc7159).
The TJSON spec isn't bad either:
[https://www.tjson.org/spec/](https://www.tjson.org/spec/). Even combining
both the result is still far shorter and more clear than XML.

~~~
tootie
First, either XML or JSON are suitable for encoding documents or data
interchange. Second, XML is also very _sophisticated_ and has an array of
useful features that JSON developers are suddenly realizing to be pretty
valuable sometimes. XSD is verbose, but it's rock solid. XPath and XInclude
are also pretty awesome.

------
laurent123456
I'm curious what would be the use case for this? JSON is a human
readable/writable format, however this kind of syntax is not anymore:
"{"nested-array:A<A<s>>": [["Nested"], ["Array!"]]}"

So it feels more like a machine format, but in that case why not use a more
efficient one, like a binary format?

~~~
bascule
Hello, I created TJSON. The answer to your question can be found in the second
sentence on the page:

 _> TJSON documents are amenable to "content-aware hashing" where different
encodings of the same data (including both TJSON and binary formats like
Protocol Buffers, MessagePack, BSON, etc) can share the same content hash and
therefore the same cryptographic signature._

TJSON is designed to facilitate documents that retain the same content hash
when transcoded to/from binary formats.

~~~
laurent123456
If hashing is the main concern, wouldn't a "strict" spec for JSON do the job?
eg. "all keys must be sorted", "all dates must be ISO-xxx", etc.?

~~~
bascule
You're describing canonicalization, which incorporates elements of the
encoding format into the final hash, and therefore does not facilitate
retaining the same content hash when transcoding to _different_ formats.

Also, canonicalization is a bit of a mess. There are several incompatible
canonicalization schemes for JSON, and even within a single one of those
people have a difficult time implementing them correctly. See e.g.
[https://github.com/theupdateframework/tuf/issues/362](https://github.com/theupdateframework/tuf/issues/362)

~~~
dolmen
So you are just creating another non-interoperable canonicalization format
that takes out from JSON everything that makes it great: terseness.

[https://xkcd.com/927/](https://xkcd.com/927/)

~~~
jerf
I'm not sure what you mean by "terseness". Do you mean the size of the spec?
Because as serialization formats go, it is on the short side. However, typical
JSON data is anything but terse; it is almost the most verbose serialization
format in use, beaten out only by XML. Which I suppose could be where you're
getting the idea that it is terse from, but in that case, it is "terse" only
in the sense that it is the second worse of all the couple dozen common
formats. It's kind of like how I've said a couple of times on HN that new
compiled language designers should be grateful to C++ for setting the bar for
compilation speed so low; it makes it very easy to put "compiles more quickly
that C++!" into the initial elevator pitch. However, you are not "fast" merely
by beating the slowest, nor are you "terse" merely by being slightly more
efficient than the worst.

~~~
dolmen
Can we at least agree that TJSON is more verbose than JSON?

------
hajile
If you're making it unreadable with types, you might as well switch to a
statically typed binary JSON format like bson or ubjson instead. You get
smaller files, faster parsing, partial parsing (skip what you don't need), and
(in some implementations) streaming of large files.

[http://ubjson.org/](http://ubjson.org/)

[http://bsonspec.org/](http://bsonspec.org/)

------
escherize
Edn seems like a better solution here. Not o ly is the tagging more
straightforward (wow not embedded in a string?), But you can write your own
tags for custom types.

------
user5994461
If you want JSON with type checking, use a json schema.

[http://json-schema.org/examples.html](http://json-schema.org/examples.html)

Been there for almost a decade. Already supported by all the major json
libraries in all the major languages.

------
ungzd
This is literally hungarian notation for JSON.

------
dolmen
As any normal JSON document is not a valid TJSON document (and worse, some
JSON document may be valid TJSON documents but TJSON imposes a different
interpretation) using the "JSON" suffix is just misleading.

------
pfooti
literally the only use case I see here is dates. Like everything else I can
infer the type of the field based on its contents. "boolean": false, no
kidding. "event_ts": 1223349483, is that an index number or milliseconds since
epoch or what? Well, probably ms since epoch, but my one gripe about json is
that there's no good way to push dates without domain knowledge (anything
whose property name ends in _at or _ts gets converted? all numbers in a
certain range get converted?)

~~~
geezerjay
> literally the only use case I see here is dates.

{ "date": "1937-01-01T12:00:27.87+00:20" }

As you can see, JSON doesn't stop anyone from using RFC3339 to encode dates.

~~~
pfooti
Sure, and then on the other end of the connection you need to say:
newThing.date = new Date(newThing.date), or else it'll deserialize as a
string.

What I'm getting as is that a date gets serialized into JSON as either a
string or a number, depending on who wrote the toJSON method, and that the
consumer of that JSON needs knowledge about the schema of the data in order to
properly deserialize it.

~~~
geezerjay
> Sure, and then on the other end of the connection you need to say:
> newThing.date = new Date(newThing.date), or else it'll deserialize as a
> string.

Only if the software running on the other end does not support your data
format.

Meanwhile, HTML uses string attributes to declare languages, and no one ever
complained that browsers may interpret the lang tag as a string.

~~~
pfooti
I feel like we may be talking past each other there. Imagine:

const data = { name: 'foo', time: new Date('05 October 2011 14:48 UTC') };
const data2 = JSON.parse(JSON.stringify(data));

typeof data.time === 'object' typeof data2.time === 'string'

What I'm getting at is: JSON.parse and JSON.stringify convert date objects
into strings. While dates are not javascript primitives, there are no other
things I can think of that you'd use to hold state that aren't preserved
across JSON.parse(JSON.stringify(thing)) boundaries. If you want to write a
JSON handling API endpoint, you can pretty easily process all the incoming
data and deal with it in a zero-knowledge fashion, but if you want to properly
deal with dates there, you're going to have to add knowledge to the parser
receiving the JSON object.

This is a complaint I have about JSON in general, as a method for serializing
data for transport across the wire. XML is an annoying format to work with,
but at least metadata was _possible_ (of course, dealing with dtds or other
metadata formats basically made zero-knowlege parsing impossible anyway, you
just wrote a meta-parser that used the dtd to encode the knowledge).

If, instead of '2011-10-05T14:48:00.000Z', or 1317826080000, we took a page
from binary / octal / hex numeric representations, and defined a date prefix -
0xcoffee is a number, why can't Dx1317826080000 be parsed by JSON.parse as a
date?

I mean, I know why. But it's still annoying that, when receiving data from an
API, I basically can treat everything in the API as fine as written, except
that I have to go in and fiddle date strings into date objects.

------
dep_b
Backward compatibility seems terrible to me. A regular JSON parser will
produce garbage from this since variable names are changed while an XML parser
parsing without any context how to parse specific fields will still provide
correct data. Your dates might remain strings for example but the string is
still correct.

------
TazeTSchnitzel
I question the value of the tags where they state the obvious. Does this

    
    
      {"foo:O":{}}
    

really tell you more than

    
    
      {"foo":{}}
    

?

The ability to encode sets, integers, binary data and time stamps is useful.
But why tag things which are what they look like? It's a waste of space.

~~~
bascule
Domain separation. Unless everything is tagged, an attacker can trick the
parser into misinterpreting the type of an object.

Or, a more mundane explanation: the parser will silently clobber the name
because it contains a ":"

Leaving any names untagged is ambiguous.

------
zwerdlds
Anyone care to give a first blush comparison vs protobuffs/json schema?

~~~
dolmen
JSON schema is a much more complete language for validation. And every types
in the limited TJSON can be described with JSON Schema.

Besides that, in JSON Schema the schema is not bundled with the data. This is
a feature for input validation: the receiver must know what it allows, not
just what is received. This is a feature for readability (which is a great
feature of JSON) as the data is not uncumbered with the schema. A receiver is
free to use a schema or not. While TJSON imposes a receiver to recognize its
dirty format.

So TJSON brings nothing new, except interoperability problems.

------
panic
Some previous discussion:
[https://news.ycombinator.com/item?id=12856968](https://news.ycombinator.com/item?id=12856968)

------
keredson
Why would you define the type of something in the PARENT object?

~~~
cartercole
so you can be cute and have a reason to enforce no top level arrays with the
added benefit of ajax security... but really why would you dirty your key that
allows fast lookup with type information and keep yourself from using : in
keys

------
Entangled
I prefer YAML no matter how much lipstick you put on JSON.

~~~
dolmen
A major problem with YAML is that the spec is so complex that no existing
parser implements it fully. And each implementation supports a different
subset. YAML is just not interoperable.

