
Introducing TJSON, a stricter, typed form of JSON - bascule
https://tonyarcieri.com/introducing-tjson-a-stricter-typed-form-of-json
======
bruth
All of the keys in JSON must be strings, so they should not need tags for
themselves. Instead why not put the tag of the value assigned to the key in
the key:

    
    
        {
            "s:string":"Hello, world!",
            "b64:binary":"SGVsbG8sIHdvcmxk",
            "i:integer":42,
            "f:float":42.0,
            "t:timestamp":"2016-11-02T02:07:30Z"
        }
    

This prevents having to mess with the values in general and integers don't
need to be encoded as strings.

EDIT:

I see this constraint:

    
    
       Member names in TJSON must be distinct. The use of the same member name more than once in the same object is an error.
    

which is still satisfied, however you could have `i:foo` and `s:foo` which
would result in redundant keys in the resulting JSON document. This constraint
could be clarified that, untagged key names must be unique.

Another question, is a mimetype planned for this? `application/tjson`?

~~~
snarf21
I agree, why define a new format that is more verbose when you can just make
it by convention at first and let parsers evolve. I probably wouldn't use :
even in quotes to prevent confusion. Something like this seems safe and
doesn't break anything: { "string$s":"Hello, world!",
"binary$b64:":"SGVsbG8sIHdvcmxk", "integer$i":42, "float$f":42.0,
"timestamp$t":"2016-11-02T02:07:30Z" }

This makes it easy for the parser to determine if they should perform type
checking. If you run this JSON through a non typed parser, you could easily
strip out the $type yourself (until they evolve as well). Surely not perfect
but gives you self describing data and ability to perform type checking if
desired. $0.02

~~~
bascule
Putting type sigils on object keys does not solve the problem of typing
arrays, unless the types of array array elements are always homogenous,
disallowed as the root symbol (they are presently allowed), and are always
typed by their membership in an object (and therefore by the key referring to
them). This also does not solve the problem of how to type multidimensional
arrays.

The question of homogenous types for non-scalars is still an open issue, and
is probably the best place to further discuss this:

[https://github.com/tjson/tjson-
spec/issues/23](https://github.com/tjson/tjson-spec/issues/23)

As an aesthetic note: I personally find "$" visually noisy as a sigil, and
think it has generally lost favor as a sigil for commonly used expressions in
programming languages, but is probably familiar to users of Perl, PHP, bash,
and BASIC

~~~
snarf21
Array is a more complex issue. The biggest issue is whether it can break
existing parsers or not. Additionally, the extra typing for things like
heterogeneous or nested arrays will require application code that understands
the typing instead of leaving that up to the parser. I think the simplest rule
for now would be to only allow homogeneous arrays. This is quite an
interesting problem. (Other suggestions to send along a JSONSchema seem
unrealistic and the beauty of JSON is its simplicity and brevity, nobody wants
another XML)

{ homog1$ai : [2, 3, 4] } { homog2$as : ["a", "b", "c"] }

I don't love $ but _ is so much more likely to be used in a key name for
clarity like first_name. I also doubt that many people end their keys with
$type so there is unlikely to be a conflict. If they do, it is probably a code
standard they are using internally anyway for a similar purpose. Personally, I
think that things like jQuery, etc. have trained people to see $ as a marker
for "identifier" that I think it feels pretty natural, at least at this point.
Again, just my $0.02 and you're mileage may vary....

~~~
bascule
Please see this issue for homogenous typing of arrays:

[https://github.com/tjson/tjson-
spec/issues/23](https://github.com/tjson/tjson-spec/issues/23)

Also based on the feedback I've received, I'm putting together a full proposal
for moving all type information to object keys, and fully typing all non-
scalars (and nested non-scalars) in a way that will be friendlier to
statically typed languages.

That said, I don't think the "$" thing is going to happen.

~~~
bascule
I've made a concrete proposal for moving type signatures exclusively to a
postfix tag on object keys here:

[https://github.com/tjson/tjson-
spec/issues/30](https://github.com/tjson/tjson-spec/issues/30)

------
tiglionabbit
When have you ever written a program that doesn't know ahead of time what type
of data it's going to be operating on? Especially if you're using a statically
typed language.

Whether you validate incoming payloads in JSONSchema or not, you will always
have some understanding of what the shape of the incoming JSON is supposed to
be, down to the most concrete types. You'll probably receive many JSON
payloads that all conform to the same schema. So why bother redundantly
describing that schema in every individual payload?

If you want strict types, write a JSONSchema. If you need to know specific
sub-type information, start specifying what should go into the "format" field
in JSONSchema. They did it in Swagger:
[http://swagger.io/specification/](http://swagger.io/specification/)

Since the article complains about JSON parsers not knowing how to handle
certain situations, perhaps people should start writing JSON parsers that
allow you to pass in a JSONSchema document at parse time so they're sure to
handle each field type correctly.

~~~
lobster_johnson
There are many use cases where you don't know the shape of the data. Many apps
need to index or store or transform arbitrary key/value pairs, but without
knowing anything about those keys or values mean. JSON is a schemaless
interchange format, so those situations arise pretty much by default.

Not that I love this format -- fixing JSON needs a bit more effort, especially
on syntax.

~~~
colanderman
> Many apps need to index or store or transform arbitrary key/value pairs, but
> without knowing anything about those keys or values mean.

Then those apps _shouldn 't be interpreting those values_. E.g. if you don't
know and don't care whether a given JSON number is an integer or a decimal,
_don 't represent it as a number in your app_. Just copy the serialized number
verbatim (or a canonicalized version thereof).

~~~
lobster_johnson
Of course, if you're not touching it, that's a fine strategy. But maybe you're
transforming it. Or you want to parse it, extract some value and send it
somewhere. There are lots of use cases where there's no way around parsing and
interpreting the data types in a JSON blob.

~~~
colanderman
I don't mean the entire JSON structure. Go ahead and parse that; leave the
individual _values_ opaque.

------
bpicolo
I'm still waiting on xml with curly braces instead of angle brackets. As far
as I can tell that's all that's holding us back

~~~
falcolas
Yup, we already have schema validation, JSONRPC, and transformations, all
that's really missing is namespaces and comments.

Then we can go full WSDL and SOAP.

~~~
mianos
I find this comment extremely offensive. Microsoft is going to fix the
namespaces in their SOAP for .net real soon now. In the meantime all you have
to do is put a few patches in your non .net SOAP code to deal with the badly
formed namespaces. Besides, those 20 patches have only been needed for ten
years now.

~~~
falcolas
The sarcasm detectors seem faulty today. I got it at least. ;)

~~~
mianos
My mistake is probably making a joke of the pain people have gone through with
SOAP. It is probably not funny to many. :)

------
msoad
It's amazing how many people are trying to reinvent protocol buffers! Every
time I see something like this I think the developer didn't do their research
or maybe they wanted to make a hobby project anyway. Stuff like this is
dangerous to use in production. Even JSON as simple as it looks had a lot of
bugs that are now.

If you want typed data structure transfer, use protocol buffer.

~~~
dewitt
I take it you don't know who either of the two authors are?

\-
[https://en.wikipedia.org/wiki/Ben_Laurie](https://en.wikipedia.org/wiki/Ben_Laurie)

\- [https://github.com/tarcieri](https://github.com/tarcieri)

They know about protocol buffers.

~~~
Cyph0n
Pretty damn impressive. I never knew about them either to be honest.

------
zeveb
> Its primary intended use is in cryptographic authentication contexts,
> particularly ones where JSON is used as a human-friendly alternative
> representation of data in a system which otherwise works natively in a
> binary format.

The author might care to take a look at canonical S-expressions, a format from
the 90s which attempted to do the same thing for many of the same reasons, and
has the advantage of being rather more elegant.

E.g:

    
    
        {
            "s:string":"s:Hello, world!",
            "s:binary":"b64:SGVsbG8sIHdvcmxk",
            "s:integer":"i:42",
            "s:float":42.0,
            "s:timestamp":"t:2016-11-02T02:07:30Z"
        }
    

could be:

    
    
        (string "Hello, world!"
         binary [b]|SGVsbG8sIHdvcmxk|
         integer [i]"42"
         float [f]"42.0"
         timestamp [t]"2016-11-02T02:07:30Z")
    

Which is a perfectly valid encoding, but can use the canonical encoding
(useful for cryptographic hashes):

    
    
        (6:string13:Hello, world!6:binary[1:b]13:Hello, world!7:integer[1:i]2:425:float[f]4:42.09:timestamp[1:t]20:2016-11-02T02:07:30Z)
    

Which can be encoded for transport as:

    
    
        {KDY6c3RyaW5nMTM6SGVsbG8sIHdvcmxkITY6YmluYXJ5WzE6Yl0xMzpIZWxsbywgd29ybGQhNzpp
        bnRlZ2VyWzE6aV0yOjQyNTpmbG9hdFtmXTQ6NDIuMDk6dGltZXN0YW1wWzE6dF0yMDoyMDE2LTEx
        LTAyVDAyOjA3OjMwWik=}
    

Granted, 'elegance' is in the eye of the beholder, but I like it.

I also think that there's a deeper concern with any shallow notion of types.
An application doesn't care so much about 'some integer' as it does about 'a
valid integer for this domain,' and _that_ concern is what leads to schemas
and profiles and things like that. Just encoding the machine type of a value
is insufficient: one has to encode the _domain_ type, which means conveying
the domain, which means assuming some sort of shared knowledge.

~~~
bascule
S-expressions are great, and I'm a big fan of SPKI/SDSI, which used
S-expressions in a security context.

However, they have generally not gained favor in the greater programming
ecosystem, whereas JSON has. TJSON is trying to tap into the greater ecosystem
of people who are familiar with JSON to some extent. Hence its backwards
compatibility with JSON, and not adding a backwards-incompatible type syntax,
as Amazon Ion did.

------
lillesvin
I feel like there's a missed opportunity in not calling it TySON or something
like that.

That aside, wouldn't it make more sense to fix the JSON parsers instead? They
are the ones having issues parsing e.g. 64 bit integers, JSON has no problem
holding them.

~~~
rpedela
I was confused by the claim that JSON parsers do not handle 64-bit integers.
If the parser is written in Javascript, then it has a problem because
Javascript does not support 64-bit integers. But I have not seen that problem
in any other language. For example, Postgres's JSON parser can handle whatever
the maximum size of PG numeric is and Python can handle extremely large
numbers as well.

~~~
bascule
From RFC 7159 section 6. Numbers:

[https://tools.ietf.org/html/rfc7159#section-6](https://tools.ietf.org/html/rfc7159#section-6)

    
    
       Note that when such software is used, numbers that are integers and
       are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
       sense that implementations will agree exactly on their numeric
       values.
    

You can't depend on interoperable support for 64-bit integers in JSON.
Furthermore many JSON libraries convert all numbers to floats, so this problem
doesn't affect only JavaScript.

TJSON requires conforming parsers to support the full 64-bit signed and
unsigned ranges. This will involve using bignums in JavaScript.

~~~
rpedela
Which ones besides Javascript implementations? Do you have examples?

~~~
bascule
Go's JSON parser parses all numbers as floats, for example

~~~
bascule
Also note that the sinister problem here is that implementations which convert
numbers to floats will silently lose precision when they overflow the range
allowed in RFC 7159. This leads to quite subtle errors, and is why Twitter
moved to encoding Snowflake IDs as strings:

[https://blog.twitter.com/2011/important-direct-message-
ids-w...](https://blog.twitter.com/2011/important-direct-message-ids-will-
become-64-bit-snowflake-ids-on-sep-30)

------
justin_vanw
wow, this looks awful and painful.

There's no reason to tag the type of a field when you have a typed syntax. The
real problems with JSON aren't at all addressed by this:

keys have to be strings lack of 'attributes' like xml, which means you have to
make a document convoluted from the start.

For example, lets say I am storing product data, I might do it like:

{'title': "Billy goes to Buffalo", 'page_count': 193, 'author': "Ray
Broadbunky"}

But later I might want to be able to store attributes or metadata, in xml this
doesn't change the schema of the document:

<product> <title>Billy goes to Buffalo</title> <page_count>193</page_count>
<author>Ray Broadbunky</author> </product>

Can be extended to:

<product> <title human_verified="false">Billy goes to Buffalo</title>
<page_count human_verified="true">193</page_count> <author
human_verified="true">Ray Broadbunky</author> </product>

It's not beautiful but anything using this data will not have to change at all
to add any metadata like this.

However, with JSON you have to either add new data that can somehow be joined
to the data originally, or more commonly you have to be very defensive and
'plan for' this stuff, greatly complicating the schema.

You end up starting with: {'attributes': [ {'name':'title','value':"Billy goes
to Buffalo"}, {'name':'page_count', 'value':193, ...

so that you can add unanticipated things later without breaking consumers of
the data

but at least some are addressed: no standard way to store bytestrings lack of
time type

------
marianoguerra
isn't there a way to extend the types to specify our own and register
constructors for them? like transit?

otherwise we will be in the same place of json in terms of extension where our
own types are second class citizens.

~~~
cdmckay
Agreed. Just adding some fixed types doesn't really help that much.

Something like EDN for JSON would be cool: [https://github.com/edn-
format/edn](https://github.com/edn-format/edn)

~~~
fnordsensei
Isn't Transit basically EDN for JSON in that it adds types and whatnot, and
encodes to JSON?

Or do you mean, you want a format that's sort of halfway between EDN and JSON?

~~~
marianoguerra
transit works great except that it's unreadable with current tools (for
example browser devtools or attaching listeners to kafka).

I know it's a tool problem but I don't see the whole world embracing transit.

If this format gets adopted with extensible types we get a readable format
that has what transit provides and if there's no tooling support we can still
read it with standard json tools or none at all.

~~~
unlogic
Transit is unreadable exactly because it has to work around the limitations of
JSON (like string-only keys) to deliver its primary features: true maps,
tagged collections etc. TJSON only has tags for primitives, so yeah, it's not
much different from JSON this way, the tooling is happy.

------
emmelaich
Those labels in the example are confusing. Instead of string, binary, integer,
float,timestamp please use something like name, password, age, height,
sessiontime.

Using string and binary is worse than using foo and bar.

------
Kinnard
Reminds me of Tyre – Typed regular expressions:
[https://news.ycombinator.com/item?id=12292389](https://news.ycombinator.com/item?id=12292389)

------
jasonkostempski
"underspecification has lead to a proliferation of interoperability problems
and ambiguities."

So TJSON has a perfect spec and everyone, now and forever, will interpret it
perfectly?

~~~
bascule
No, but it has a set of machine-readable examples which are intended to cover
JSON's present underspecified edge cases:

[https://github.com/tjson/tjson-spec/blob/master/draft-
tjson-...](https://github.com/tjson/tjson-spec/blob/master/draft-tjson-
examples.txt)

------
kevinSuttle
What about [https://amznlabs.github.io/ion-
docs/](https://amznlabs.github.io/ion-docs/) ?

~~~
bascule
Ion is a superset of JSON: not all Ion documents are valid JSON documents.

TJSON can be viewed as a subset of JSON: all TJSON documents are valid JSON
documents, and parsed by existing JSON parsers. Consuming TJSON documents as
JSON will involve stripping the tags, but as noted in the post, people already
do these sort of transformations on parsed JSON to e.g. extract binary data.

------
shitgoose
json became so popular in first place because of its simplicity, i.e. no
schemas, namespaces, attributes, less bizarre notation than xml. let's keep it
this way.

~~~
bascule
TJSON doesn't add any of the things you just complained about

~~~
shitgoose
it doesn't. instead, it takes it to new heights:

"s:id":"i:11"

this illustrates, what in my mind is main problem with contemporary software
development. in old days, first, there was a problem, for which we had to find
a tool that is good enough. nowdays there are plenty of tools, for which we
are hoping someone will find a problem.

------
drawkbox
Why muddy up the actual values where you will have to parse that value with
"t:" where t is type?

Why stuff it in one key/val? why not separated where it looks to see if type
is present, if so it converts to it/validates against it (you can also place
other validations/constraints on it like min/max values, length etc -- that
will fall apart if you are trying to stuff it all in one key/value).

Like this:

    
    
      {
        "val":"Hello, world!",
        "type":"string",
        "validation": "[regex]"
      }
    

Instead of:

    
    
      {
        "s:string":"s:Hello, world!"
      }
    

This is typically how we type fields in JSON when needed as there is no
parsing needed on the value. If you need to check type and it is present you
can act on it.

~~~
wojcikstefan
That's a lot of extra bytes you have to send over the wire. Also, I don't
think validation makes sense. When sent by the server, it's too limited (would
lead to situations where you're doing half the validation in TJSON and half in
the client code). When sent by the client, it can't be trusted anyway.

~~~
drawkbox
True if validation is on there. I just put it in to show you could have other
easily added validations aside from type (TJSON is locked to just type as it
is concat/mashed in one value colon separated). If you just take the "val" and
"type" it is really no extra bytes or very minimal but cleaner.

    
    
      {
        "val":"Hello World",
        "type":"string"
      }
    
      OR 
    
      {
        "s:string":"s:Hello, world!"
      }
    

Pretty much the same. I guess my personal preference is I don't like to mash
values and parse values out of key/value values.

In the end all validation is done on the server anyways so types/schemas for
JSON are really just a nice to have and should not be relied on unless you
control both ends of the pipe.

------
mitchtbaum
This looks similar to msgpack with saltpack for crypto parts. Right?

[http://msgpack.org/](http://msgpack.org/)

[https://saltpack.org/](https://saltpack.org/)

------
colanderman
Six things:

1) "Lack of full precision 64-bit integers" is bullshit. Numeric precision is
not specified by JSON. If a parser can't deal with 64-bit integer values, it's
a poor parser.

2) "s: UTF-8 string" What does this mean? JSON strings are strings of Unicode
code points; JSON itself may be encoded as UTF-8, -16, or -32. So does this
mean "encode the string as UTF-8, then represent as Unicode code points"? That
makes no sense.

Does this mean "encode the string as UTF-8 and output directly regardless of
the encoding of the rest of the JSON output"? That makes no sense either.

So I'm guessing the author just conflated "UTF-8" with "Unicode", which is
concerning given that he is attempting to define an interchange protocol.

3) "i: signed integer (base 10, 64-bit range)" What does this mean?
(-2^64,2^64)? (-2^63,2^63)? [-2^63,2^63)?

4) "t: timestamp (Z-normalized)" What does that mean? There are literally
dozens of timestamp formats. Does he mean full ISO 8601, restricted to UTC?

5) What is the point of TJSON anyway? When you deserialize, you _still_ have
to check that the data is of the type you expect. At best this saves a bit of
parsing, since the deserializer can do that automatically. Various JSON schema
languages already exist, which give you this richer typechecking.

The only use case I can think of for this is exactly what the author mentions
further down the article: canonicalization for content-aware hashing. But this
only works if the only types you care about fall into the small handful he
thought of. What about, say, IP addresses? Case-insensitive strings (such as
e-mail addresses)?

6) If we're talking about canonicalization, TJSON does not say how to
canonicalize decimal numbers. I suppose this stems from the author's mistaken
belief that numbers in JSON are IEEE floats (they're not, regardless of what
common broken parsers do).

I hate to be so negative, but this really comes off as half-baked.

EDIT: Looking at the spec [1] it seems to address _some_ of these, but still
indicates a strong confusion between data _types_ (Unicode, rational numeric)
and data _representations_ (UTF-8, IEEE double).

[1] [https://github.com/tjson/tjson-spec/blob/master/draft-
tjson-...](https://github.com/tjson/tjson-spec/blob/master/draft-tjson-
spec.md)

~~~
bascule
Responding to:

 _EDIT: Looking at the spec [1] it seems to address some of these, but still
indicates a strong confusion between data types (Unicode, rational numeric)
and data representations (UTF-8, IEEE double)._

The format is described in terms of the tags (which act as type annotations),
each of which corresponds to a specific on-the-wire format. Different tagged
serializations of the same data may correspond to data of the same type. A
better place to discuss ambiguities in the spec regarding this issue is here:
[https://github.com/tjson/tjson-
spec/issues/27](https://github.com/tjson/tjson-spec/issues/27)

The idea that different on-the-wire representations of an object correspond to
the same typed data object (and can therefore result in the same hash) is core
to understanding content-aware hashing.

So to your _I 'm guessing the author just conflated_ accusations, I don't
think you fully understand what's going on here.

~~~
colanderman
JSON is not defined in terms of UTF-8. That would be patently ridiculous,
since UTF-8 is a serialization.

JSON is defined in terms of Unicode code points. A string in JSON is a
sequence of code points, some of which are (necessarily) escaped, others of
which may be.

So, to say "the string must be UTF-8" _makes no sense_. The JSON serialization
_itself_ can be UTF-8 (which I presume is what the author means). But nowhere
does JSON talk about the encoding of a string _within JSON_ , because _it is
not encoded_.

Furthermore, what does the author intend for escaped characters? Are they
allowed? Presumably not, since that would provide for non-canonical
representations. But _some_ escapes must be allowed, since control characters
(i.e. code points less than U+0020) _must_ be escaped per the JSON spec.
Nowhere does he address this; just a technically meaningless "strings must be
UTF-8".

~~~
bascule
_JSON is not defined in terms of UTF-8. That would be patently ridiculous,
since UTF-8 is a serialization._

TJSON is defined as a serialization format on top of a JSON-like data model.
The TJSON spec originally used the terminology "Unicode String", but moved to
using "UTF-8 String", the rationale for which is given here:
[https://github.com/tjson/tjson-
spec/issues/27](https://github.com/tjson/tjson-spec/issues/27)

If your intent is to actually effect a change in the specification, that is
the proper place to do it, but specific criticisms of the exact wording of the
specification, preferably in the form of pull-requests, would be the best way
to affect such changes.

If your intent is not to effect a change in the specification, you're entitled
to your opinion, but I'm done discussing the matter as the discussion has
ceased to be meaningful to me. Generic criticisms like "You used 'UTF-8'
instead of 'Unicode'" outside the context of specific sections of the
specification aren't particularly helpful.

 _Furthermore, what does the author intend for escaped characters? Are they
allowed? Presumably not, since that would provide for non-canonical
representations._

You are continuing to miss the point: TJSON intends to provide a foundation to
use content-aware content hashing in lieu of a canonicalization scheme as an
alternative solution which works across multiple encodings of the same data,
sidesteps the exact problems you're talking about, and also allows arbitrary
subsets of an object graph to be authenticated without requiring
rehashing/resigning. Please see this closed issue on canonicalization ("won't
do"):

[https://github.com/tjson/tjson-
spec/issues/24](https://github.com/tjson/tjson-spec/issues/24)

From what I can gather, TJSON is offering a degree of abstraction you have not
yet fully gleaned. The core idea is: many serializations, one underlying data
structure/object graph. TJSON is a mere serialization layer, and indeed many
TJSON documents may refer to the same underlying data structure, but all will
have the same "objecthash":

[https://github.com/benlaurie/objecthash](https://github.com/benlaurie/objecthash)

~~~
colanderman
> The core idea is: many serializations, one underlying data structure/object
> graph.

 _Then why is UTF-8 even mentioned?_ Or time zone offsets, for that matter?

~~~
bascule
So it's possible to specify a rigorous set of tests cases that, ideally if all
are passed, can be used to certify a conforming implementation.

In other words, to solve this problem:

[http://seriot.ch/parsing_json.php](http://seriot.ch/parsing_json.php)

While in some cases it might make sense to relax some of the requirements, I'm
a fan of keeping things simple. Call me one of those crazy people who thinks
Postel's Law is wrong.

TJSON specifies a set of test cases for this purpose here:

[https://raw.githubusercontent.com/tjson/tjson-
spec/master/dr...](https://raw.githubusercontent.com/tjson/tjson-
spec/master/draft-tjson-examples.txt)

I prefer to specify things in such a way that it's relatively easy to specify
a test suite that covers all of the corner cases.

A secondary goal of TJSON is to produce a stricter format, so I'd prefer to
start with additional strictness requirements, and relax them if a reasonable
case can be made.

------
kr0
Why don't float types use a tagged string? It says "tagging is mandatory" in
the initial document, but floating point types are then omitted in the
official spec

~~~
jerf
Floating point types are tagged by the use of the floating point grammar. It
would require the standard to be clear that the _only_ way to indicate
integers is via "i:288", though, or there will be ambiguity.

I don't know if that circle can be squared, either; if you require integers to
use the tagged string, it isn't really backwards compatible any more. If you
don't, the floats remain ambiguous.

Given that the text of the blog post suggests, probably correctly, that new
parsers will be necessary to use this format, I'm not convinced that trying to
reuse JSON's grammar is that advantageous. If I'm switching parsers, the
competition is no longer JSON, it's the full range of possible replacements,
including Protocol Buffers, Cap'n Proto, XML, BSON, and everything else. If
you're willing to replace parsers there's probably already something out there
for you.

~~~
bascule
_It would require the standard to be clear that the only way to indicate
integers is via "i:288", though, or there will be ambiguity._

The spec does this here:

[https://www.tjson.org/spec/#rfc.section.4.3](https://www.tjson.org/spec/#rfc.section.4.3)

    
    
      4.3.  Floating Points
    
         All numeric literals which are not represented as tagged strings MUST
         be treated as floating points under TJSON.  This is already the
         default behavior of many JSON libraries.
    

_If I 'm switching parsers, the competition is no longer JSON, it's the full
range of possible replacements, including Protocol Buffers, Cap'n Proto, XML,
BSON, and everything else. _

As noted in the post (which names a similar list of binary formats), TJSON is
intended to be supplemental to binary formats, not a "replacement"

~~~
jerf
Thank you. I skimmed over that accidentally. Good.

------
rurban
This argumentation is complete bullshit and even dangerous.

> "Parsing JSON is a Minefield": From a strictly software engineering
> perspective these ambiguities can lead to annoying bugs and reliability
> problems, but in a security context such as JOSE they can be fodder for
> attackers to exploit. It really feels like JSON could use a well-defined
> “strict mode”.

Not at all. This article just outlined the differences of the various
implementations regarding the 2 specs. And then added a spec test suite,
including all the undefined problems, with suggestions how to go forward.

JSON is already strict enough. The problem are people like op to make it even
not-stricter. The latest JSON spec RFC 7159 adds ambiguity by allowing all
scalar values on the top level, which leads to practical exploitability. See
e.g. [https://metacpan.org/pod/Cpanel::JSON::XS#OLD-VS.-NEW-
JSON-R...](https://metacpan.org/pod/Cpanel::JSON::XS#OLD-VS.-NEW-JSON-
RFC-4627-VS.-RFC-7159)

"For example, imagine you have two banks communicating, and on one side, the
JSON coder gets upgraded. Two messages, such as 10 and 1000 might then be
confused to mean 101000, something that couldn't happen in the original JSON,
because neither of these messages would be valid JSON.

If one side accepts these messages, then an upgrade in the coder on either
side could result in this becoming exploitable."

What the op now suggests is adding the insecurity-mistake YAML took by adding
tags to all keys. Here types don't add security, they weaken security!

It is security nightmare as it is leading to exploits which are e.g. already
added to metasploit (CVE-2015-1592). tagged decoders are always a problem, and
currently JSON and msgpack are the only serializers safe from such exploits
due to its strictness.

I would suggest that the remaining JSON libraries first fix their problems by
conforming to the specs. First the secure old variant (RFC 4627) as default,
and then maybe the relaxed new RFC 7159 variant, but denoting the security
problems with interop of scalar values.

Currently only my Cpanel::JSON::XS library pass all these tests from the
Minefield article. E.g. the ruby one, which the author complains about, not.
The type problem is esp. problematic in dynamic languages like ruby, where
classes are not finalized by default.

------
amorphid
I've been writing a JSON parser when I have a few minutes here and there. I
was surprised by the lack of specificity in defining numbers, specifically
floats. If floats are know to lose precision after a few decimal places...

iex> 1.5555555555555555

1.5555555555555556

...why not just specify a max precision? You can always say "if you need a
more precise number, just store it as a string". If I wanted a room for
interpretation, I'd use YAML!

------
DiabloD3
So, why would I use this instead of actual JSON (== browser support), BSON
(binary JSON), or Capn Proto (I control both ends of this)?

------
romanovcode
I'd rather use XML than this atrocity.

------
mnarayan01
> All base64url strings in TJSON MUST NOT include any padding with the '='
> character.

This seems like it makes a streaming parser's job (slightly) more of a
headache, without any serious advantage. Which seems particularly odd to me
given that this seems heavily focused on binary stuff.

~~~
bascule
Padding is redundant when base64url is encapsulated in a quoted string.

If you're writing a state machine-based parser which is processing a quoted
base64url it will, in amortized time, be able to find a close quote token
faster than it will be able to find valid close padding.

------
gengkev
I'm a bit confused that TJSON only allows UTF-8 strings. The only way to
escape Unicode characters in JSON is \uXXXX. But to encode astral characters
with this syntax, UTF-16 surrogate pairs must be used. How does TJSON handle
this, if strings must be encoded with UTF-8 only?

~~~
rurban
JSON is defined to use surrogate pairs to encode these. TJSON must do nothing
here.

e.g. \ud8a4\uddd1 => U+391d1

------
jasoncchild
Does a time zone key trigger the enforcement a specific ISO standard format
for the value?

------
rxbudian
Why not just have a separate metadata file. It will keep the json file lean.

------
novaleaf
and still no ability to have comments. one reason I strongly prefer JSON5
[http://json5.org/](http://json5.org/)

------
partycoder
Can you have a typed array too?

~~~
bascule
TBD: [https://github.com/tjson/tjson-
spec/issues/23](https://github.com/tjson/tjson-spec/issues/23)

------
jaimex2
This is literally protobuffs.

~~~
aboodman
It's actually the opposite of protobufs. This format is self-describing - the
type information is carried along with the data. Protobufs aren't self-
describing. You need to have the type information out-of-line in order to make
any sense of serialized protobufs.

~~~
drdaeman
I'm sort of nitpicking, but Protobufs have wire-level type tags (so old app
versions are able to handle newer schemes, with fields they don't know yet).
They're limited, but they exist.

