Hacker News new | past | comments | ask | show | jobs | submit login
JSON with Commas and Comments (nigeltao.github.io)
155 points by todsacerdoti 5 days ago | hide | past | favorite | 244 comments





Let's stop pretending JSON isn't popular because it lets us be loosey-goosey with our specifications. Writing specifications is hard and time consuming, and getting people to follow them is almost impossible.

JSON is something we can (almost) all agree on for dumping loosely specified, human readable representations of data structures. It lets users and client application developers be lazy and not have to learn a new library for our preferred data format. The lack of schemas or validation means we can often get away with wishy-washy, hand-written, plain English specifications, or specifications "by example". It's convenient.

For anything intended to be robust, for serious data interchange, use something else. Anything with support for machine-readable schemas, validation, and robust encoding rules. Binary is preferable but a JSON serialization is still a must-have escape hatch.

And here's a shocking idea... provide your clients with an SDK. It's pretty easy to build one around a multi-language data serialization framework and maintaining an SDK is still easier than maintaining a spec that your clients will probably implement incorrectly.


I think the reason JSON caught on outside of the javascript community is because it provided a succinct way to represent tree structures of common types (numbers, strings, dictionaries, and lists), using a syntax familiar to anyone who has used C, Perl, Python, Ruby, or Java.

S-expressions might have caught on for this purpose, but they lacked a killer app (jquery, and later rails). Like JSON, they are easy to parse and easy to generate, but being more loosely-specified than JSON, it's less clear how to map a given S-expression to a native type. As a glue format, JSON feels just right, even if it's a bit picky sometimes (e.g. trailing commas).

For anyone who wants extra features like comments, YAML is the oldest popular format I am aware of that is a superset of JSON. For any new format to succeed, IMO it needs to sufficiently distinguish itself from both YAML and JSON, and not just support a feature set that happens to lie somewhere between the two.


I think the reason JSON caught on outside the Javascript community is because it caught on so massively inside the Javascript community, at a time when there was a lot more crossover between people working on the javascript layer of web apps and people working on the server side (back then, almost always in a different language).

I was doing mostly Perl and Javascript when it caught on, and to this day I have very mixed feelings: it's wasteful of space but still doesn't allow comments; its type system is basically a technical-debt generator; and for all that people still get it wrong pretty often. On the other hand, it's more or less human readable for simple data structures and it's more or less everywhere.

My hunch is that for a new format to take off it wouldn't so much need to not fall between YAML and JSON: it would need to be the default format of something with such super exponential growth that even us oldies would have to use it.


Suppose simple > complex wins sometimes. JSON was a serialization format. XML was a markup language being used for serialization... and created a lot of extra work and bugs for developers. Markup languages can be incredibly useful (evidence: the web, SVG), but often bring immense complexity to simple problems.

> I think the reason JSON caught on outside of the javascript community is because it provided a succinct way to represent tree structures of common types (numbers, strings, dictionaries, and lists), using a syntax familiar to anyone who has used C, Perl, Python, Ruby, or Java.

I believe it's hard to explain JSON's popularity and wide adoption without talking about javascript. With javascript, JSON was right from the start an ‘eval’ away from being parsed. The barrier to entry to adopt it simply was never there. Once you start to expose JSON APIs to clients, other servers also start to need to consume data from those servers. Rinse and repeat until you reach mass adoption.


Oh yeah the "good old days" where one ran eval on the response to and XmlHTTPRequest.

> Like JSON, they are easy to parse and easy to generate, but being more loosely-specified than JSON, ...

They would not have been loosely-specified if they were specified, like JSON was :-) I mean this is taking things a bit backward. When tools use a data format based on S-exprs, they define more clearly what is or isn't valid (OCaml Dune, Guix, etc.)


I think JSON demonstrates that most of the time you don't need it, and doing that sort of thing badly is worse than not doing it at all.

E.g. XML has all that, and yet the average use of xml is much more fragile thsn the average use of json. Similarly asn.1 has all that, but does anyone actually like the fact the x509 certs use it? (To be clear, json would definitely not be appropriate for that case either).


There are both JSON [1] and XML [2] encoding rules specified for ASN.1... I have never seen it used in the wild, but realising it's a thing is horrifying enough.

[1] https://www.itu.int/rec/T-REC-X.697/en

[2] https://www.itu.int/rec/T-REC-X.693/en


I've used BER encoded ASN.1 at work quite a bit, and the reason you won't see it much are pretty much:

- HORRIBLE schema format.

- Very few FOSS libraries, most of which are terrible. I used Lev Walkins asn1c[0], which I patched slightly and coupled with some semi-generic code that walks the generated data structures to emit JSON (for debugging purposes).

[0] https://github.com/vlm/asn1c


Erlang's ASN.1 is awesome! Have a look at http://erlang.org/doc/apps/asn1/asn1.pdf! It could even be used as an introductory reading to ASN.1, to be honest.

I've implemented (a subset of) ASN.1 for an SNMP server I wrote for an embedded monitoring system, so I know the pain, but it was one thing to implement it for a compact binary format. Inflicting it on a text-based format just seems like it'd multiply the pain...

The resulting JSON was pretty clean, no worse than you'd get out of Flatbuffers or Protocol Buffers JSON encoding.

It was very handy to be able to use jq on the data.


What programming languages and platforms will your SDK be compatible with, will you also be around to port the SDK to new platforms in twenty or thirty years? Those are good reasons to agree on a standardized data format instead of a specific implementation. Pretty much every language on the planet has JSON parser libraries, and if not, it's fairly trivial to write your own (or convert the JSON data to something else).

> Pretty much every language on the planet has JSON parser libraries.

Yes, but they all implement the JSON RFC slightly differently, or implement it in a totally non-compliant way.


Then it's totally fine to pick another commonly used data exchange format which is better defined, that's still much better than hiding the data format behind an SDK.

However, we're coming from XML, which was better defined, and there are also binary "quasi standard" formats like protobuf, so those problems had already been solved.

Yet still "the world" has moved to JSON, which seems to indicate that those problems probably were not all that important for most use cases.


As long as you set a limit on integer sizes (in most cases an arbitrary low limit should be fine) and don't expect any precision on float (each conversion between text and binary representation there is problematic) and don't use extensions (like comments or trailing commas) you should be quite fine with most JSON libraries. This won't cover all things, but a lot.

It's not a great standard if every program/library using it must explicitly state its limits for such basic things as number values.

It's very easy to do JSON wrong, to not even be aware of doing it wrong, and do it wrong in a way that limits its interoperability (see: the default behaviour of the json library in Python). This is not theoretical either, just google for 'json nan github' to see the hundreds of production programs accidentally emitting JSON that cannot be ingested by RFC-compliant parsers.


I agree that as a standard it isn't great. However the fact that it is so successful shows that the minimalistic design has benefits for adoption.

And regarding integer ranges: Any user has restrictions on top of the generic format. Some fields have to be present for the application to work, some fields have to be a string, others an array. Some integer has to be between 0 and 100, some array has to have 5 elements. Given that 99% of integers in practice fit in a signed int32 there is little problem (97% are probably 0, 1% are 1, 0.5% are -1 and only the rest other values ...) If you are on the edge you have to know and work-around ...


> Any user has restrictions on top of the generic format.

The problem is that you are not guaranteed to know, as a JSON library user, that the library has not mangled the numbers it received on the wire prior to your application receiving it - so you don't know if you having .a set to 42 is the result of 42 being sent over the wire, or an implementation dropping bits that are outside the range of you library's support. The RFC does not mandate what a JSON implementation should do in case of numbers outside its' support.

> Given that 99% of integers in practice fit in a signed int32 there is little problem.

Until you hit that 1%, and you find this out the hard way, and you have no way of solving it because you don't control the emitting side. Again, this has happened in practice to me, when a Python library was emitting large numbers as numbers (as the RFC permits), while the receiving side silently casted to 32-bit floats, losing data (as the RFC permits). Both sides are right per the spec. Technically, everything worked as per spec. Practically, the product was broken.

99% of the time you might be okay. But the 1% of edge cases makes it that you can never rely on JSON, which makes it a bad interchange format if you care about reliability and safety. You _can_ make it work if you severly limit yourself and are deeply aware of all the possible issues that using JSON has (and there's a lot more). Or you can just pick some other standard that solves these basic things for you (eg. Protobuf).


Any half serious library will be able to handle 42. If I leave the signed int32 range as a designer of the API I should be aware of the problem and either clearly document what I do or find some alternative. (Like making it a string with quotes)

Random side note: ECMAScript has no integer type, but only Number, which is a float, thus when dealing with such numbers in a JavaScript frontend you are in Problem Land anyways ... which again shows that boundaries have to be thought of, independently from the specification of the data exchange layer.


> the receiving side silently casted to 32-bit floats, losing data (as the RFC permits)

Had to look that up. Was under the impression the rfc only specified the syntax. But here it is

> This specification allows implementations to set limits on the range and precision of numbers accepted.

I don’t think the implication is that it must be done by silently loosing precision though. A loud error would fit that description just fine. So in this case I would blame the implementation not the spec.


I really want something that’s strongly typed but doesn’t require code generation like protobufs do. Yaml doesn’t do it for me. The closest I can get is putting the type guarantees in the database and using GraphQL.

I made https://concise-encoding.org/ to deal with this:

- strongly typed

- ad-hoc or schema (your choice)

- no code generation step

- edit in text, send in binary


Why would someone choose this rather than msgpack or CBOR or protobuf or any of the other existing things in that space?

Because there isn't anything else in this space that:

- supports ad-hoc data structures or schemas per your preference

- supports all common types natively (doesn't require special string encoding like base64 or such nonsense)

- supports comments, metadata, references (for recursive/cyclical data), custom types

- doesn't require an extra compilation step or special definition files

- Has parallel binary and textual forms so that you're not wasting CPU and bandwidth serializing/deserializing text. Everything stays in binary except in the rare cases where humans want to look or edit.


That looks pretty good, actually.

I use JSON Schema to validate JSON documents.

https://json-schema.org/


Imho, statically typed languages are the ones that benefit most from schema. The current schema version is 12 but the implementations for Go, Rust, C++ and Java are all listed as draft 7. None of them support codegen either, just validation, so not exactly compelling.

> The current schema version is 12 but the implementations for Go, Rust, C++ and Java are all listed as draft 7

It's actually 2020-12, which is two versions after Draft 7 (they shifted from Draft n to YYYY-MM after Draft 7, and since then have had 2019-09 and 2020-12.)

And that's true of most languages, though there is some 2019-09 support. (It really doesn't help that there is also OpenAPI which baked in a variant—“extended subset”—of Draft 5 JSON Schema.)


OpenAPI 3.1 which was released recently, uses JSONSchema 2020-12 as the primary schema format. As a result, we can expect further consolidation of tooling, etc in the community.

I benefit greatly from schema validation in Ruby, ensuring that ingress-processing code does not receive e.g a String or Hash instead of an Array which would have things blow up way after the ingress edge when a call to an Array method fails, or worse, produce silently broken behaviour that may or may not blow up even farther down the road because both String and Hash respond to e.g #[](Integer).

Yeah but Ruby is a dynamically typed language. There's not much benefit to codegen since nothing is checked at compile time anyway.

I found code generation to be useful in Ruby with protobuf. This:

https://github.com/lloeki/ruby-skyjam/blob/master/defs/skyja...

gives that:

https://github.com/lloeki/ruby-skyjam/blob/master/lib/skyjam...

I would certainly enjoy having a DSL to write descriptive code to validate using JSON schema, but it would be even better if the Ruby definitions could be generated and persisted in Ruby files using that DSL.

Also, storing things in basic hash/array types works, but having dedicated types is useful, so that one can ensure not shoving one kind of hash in place of another unrelated kind of hash.

As for types themselves in general, there's RBS and Sorbet. One could have type definition generation as well for even deeper static and runtime checks.


Do you really want generated code to manipulate JSON? I'm not sure there is a demand for that.

Manipulating anything dynamic in a statically typed language is generally tedious and not type safe, so yes.

In any language I eventually need to validate. Whether I do it early, using a validator, or during processing the data at later is a choice depending on the problem.

Existence of a schema definition file and checking responses against is signalling that I can trust an API vendor to be at least aware of the requirements for clients. (Whether they randomly change the schema definition or ignore it is a second question, but at least somebody once thought about formalising and it's not an complete adhoc dump of today's internal data representation)


And there's Amazon Ion too - https://amzn.github.io/ion-docs/

"Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. The text format (a superset of JSON) is easy to read and author, supporting rapid prototyping. The binary representation is efficient to store, transmit, and skip-scan parse. The rich type system provides unambiguous semantics for long-term preservation of data which can survive multiple generations of software evolution."


Strongly typed + no code generation is obviously doable in any dynamically typed language.

Apache Avro has support for parsing and utilizing schemas at runtime, even in C++.

For Apache Thrift you have things like thriftpy: https://thriftpy.readthedocs.io/en/latest/

I'm not aware of a type-safe mechanism for Flatbuffers or Protocol Buffers.


There are protobuf libraries without code generation, for instance: https://github.com/cloudwu/pbc (you lose the connection to the language's type system though).

Json does have schemas

Sure, they are just not that widely used. As in, I've not really encountered much json with schema in the wild in my career. I know it exists (in several forms). Support for schemas in mainstream parsers is pretty much non existent making their use very optional as you pretty much have to jump through hoops to do anything with them. Most people seem to opt not to do that.

I never got anything out of xml validation either. It just lead to this ridiculously verbose garbage xml that was even less readable with all the namespace, namespace declarations, etc. Very tedious to write manually as well. Also lots of weirdness doing e.g. xpath or xsl transformations against that. The whole SOAP / web services bubble eventually imploded when people figured that you could send tiny json objects instead via REST.

I have many issues. People sending invalid json to my APIs is not one of them. You get a bad request if you do. End of story. Don't don that if you don't like bad requests. It's not a problem that justifies a lot of over engineering.

Json, yaml, toml, hocon, etc. are basically all just variants of attempting to send human editable blobs of information. Fine for apis where all of the requests are created by programs. Unfortunately people also abuse them for things like DSLs that are authored by humans. The real problem there is not using a strongly and statically typed language that simply does not allow illegal things. A schema is just a stop gap solution when you don't have that.

Kotlin is actually great for creating proper DSLs. You get IDE auto-completion and red squiggly lines when you do it wrong. I think Rust also has some nice syntactical constructs for creating DSLs. Typescript might also emerge as a language that is very suitable for that (with maybe a few more features borrowed from Kotlin). Using a non compiled language with weak typing kind of defeats the purpose. Hence the endless ruby but not quite ruby like DSLs for things like puppet.


The JSON schema ecosystem is a mess. json-schema.org is one thing, openapi is another (it uses an 'extended subset' of json-schema.org, which is another way of saying 'incompatible'), there’s also typeschema, and a bunch of other pet projects.

There is no official JSON schema RFC, there is no official linking between json-schema.org and the JSON RFC. There is no push to make every language that supports JSON to also support a JSON schema system.

Finally, there isn't even a guarantee that all human-readable JSON document specifications can be expressed within a JSON schema in any sensible way.


OpenAPI 3.1 which was released recently, uses JSONSchema 2020-12 as the primary schema format. The incompatibilities has been completely resolved as of today.

XML schema is joy ?

Yes it does but as someone else mentioned, it can be really challenging to work with. Its difficult to visualize the structure of the document. Its difficult to program against. This was one of the problems which I wanted to solve when I created unify-jdocs. Agreed that it may not do all the things that JSON schema does but it does all those which are required 90% of the time. Along with other things like not having to generate any model classes (code generation) and being able to read and write any path in a single line of code. And being able to merge JSON documents into another. Strong typing has its pros and cons. In my opinion, for complex JSON documents, going the strongly typed way makes making change difficult in the long run. Think of JSON documents the size of 3000 JSON paths going 10 levels deep etc. You can read more about this Java library at https://github.com/americanexpress/unify-jdocs.

Rather than keep making new variants of JSON, it'd be nice if somebody could convince some mainstream language maintainers to just update their built-in JSON parser to add optional features like skipping over comments and not caring about trailing commas. Most parsers support various flags already to configure things, so there could just be an ALLOW_COMMENTS flag and ALLOW_TRAILING_COMMAS flag.

As a case in point, the python parser breaks the JSON spec already with regards to Infinity/NaN, but then has flags to configure this. See https://docs.python.org/3/library/json.html#module-json

> The RFC does not permit the representation of infinite or NaN number values. Despite that, by default, this module accepts and outputs Infinity, -Infinity, and NaN as if they were valid JSON number literal values:


> Rather than keep making new variants of JSON, it'd be nice if somebody could convince some mainstream language maintainers to just update their built-in JSON parser to add optional features like skipping over comments and not caring about trailing commas.

I think the most correct way to deal with this problem is to get IETF and ECMA to update the JSON standard first. Honestly, comma-after-final-element and comments are such important quality-of-life features that I never understood why they weren't part of the original spec.

On the other hand, it might be easier to just adopt TOML and forget about JSON.


>I never understood why they weren't part of the original spec

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

>Trailing commas in objects were only introduced in ECMAScript 5. As JSON is based on JavaScript's syntax prior to ES5, trailing commas are not allowed in JSON.

json was largely `eval`d when it first hit the scene. It was easy to parse because you just received it and `eval`d it directly to parse. worked in every browser. security concerns over time lead to this being rightfully replaced with the JSON object and its associated encoder and decoder.

however, since early javascript didn't allow for trailing commas, neither could JSON if it wanted to be able to be `eval`d.


Interesting note: Douglas Crockford's version of JSON's origin story is great. At Yahoo Crockford wanted to return data from an API as a Javascript object because it skipped a translation step and avoided messy XML. And his bosses originally objected because there was no JSON spec. So Crockford bought a website, quickly wrote up a JSON spec and published the spec to his new official looking JSON.org site. With an official spec in place and documented on the web, there was no longer cause for his bosses to object to APIs returning JSON :)

Comments were intentionally excluded from JSON, lest they be used to instruct the parser and cause fragmentation of the ecosystem.

I know it's the official reason, but it's also a really bad one - nothing prevents current JSON parsers to add some weird syntactic rules inside plain strings (similar to "use strict" in JS), but you don't see that happen. It's always been a completely hypothetical issue.

The only real reason why comments would be problematic is that they are a pain to preserve in a consistent way when editing a file, and thus would require extra work in parsers / serializers. Still, it would be worth the cost imo.


I agree that it was stupid. I am 100% pro-comment.

In most applications reading and writing JSON are completely separate operations and roundtripping comments makes no sense.

Actual JSON-reading applications must ignore comments because they aren't allowed to care about them. On the other hand anything useful can be placed in a proper field, a comment would be a hack to pass information to human readers but not to JSON consumers (saving some time and memory). Such a technique could only be considered if people are supposed to read the JSON files and some information is useful for them but not for the reading application: a niche within a niche.

Preserving comments would actually be a complicated special purpose feature, reserved for something like structured text editors that can perform nonstandard parsing with comments included in their special object model (of the large Javascript subset/superset/variant they choose to support and roundtrip, not of JSON).


I think that json is mostly a lightweight, human-readable data exchange format for machines to communicate. It's generally not meant to be written by humans.

It's not even good for machines to communicate with, considering:

- There is no minimum/maximum number/integer value (or sizes) defined in the RFC [1] - you have no guarantee that a number you emit will be readable by all RFC-compliant parsers, so the only safe option is to emit all numbers as strings. There is also no behaviour mandated for parsers that encounter numbers/integers outside of their supported size range, so you can't rely on any fail-safe behaviour.

- There is no standardized behaviour for repeated dictionary keys (which are allowed but 'discouraged' per spec), and different implementations will treat them in different ways. This is especially painful when trying to build JSON middleware that does a parse/check/modify/emit of arbitrary data.

- Implementations in some languages (eg. Python) are non-RFC compliant by default (`python.dumps` will emit NaN/Inf/-Inf, even though the RFC forbids that), and generally all implementations are similar-but-different-enough to trip you up [2].

All three of these have bitten me in the past when trying to interface with something that spoke JSON, and as such I refuse to design new systems that use JSON as any source of truth.

[1] - https://www.rfc-editor.org/rfc/rfc8259.html

[2] - http://seriot.ch/parsing_json.php


There are already better formats if we just want machines to communicate though; e.g. protobuf, msgpack.

The whole reason JSON is text-based is to make it human-readable while also enabling data exchange, but the lack of comments works against that goal.


Not in the browser or in many stalin.

> many stalin.

Wasn't there just one Stalin, though?


One was certainly more than enough. Unfortunately, many have tried to pluralize.

:D

Stdin. New phone, forgot to disable autocorrect.


But it's used for all kind of config files. And there I would like to comment why stuff is done the way it is.

That's not a problem with JSON, that's a problem with people choosing JSON as a configuration file format.

I remember seeing some conversation from the JSON authors around comments and specifically not allowing comments into the spec because they did not want people to use them as extension mechanisms, so the whole no comments in JSON thing was very much intentional.


JSON parsing is already a minefield. Please see [0], specifically this chart [1]. As mentioned in a sibling comment, I think a new mimetype makes a lot more sense sense than stirring this pot further.

[0] http://seriot.ch/parsing_json.php

[1] http://seriot.ch/json/pruned_results.png


It is a minefield, but we all walk it all the time.

A new mime type doesn’t really help because the parser doesn’t check the mime type, it assumes the programmer did that. I’m not against making a new mime type or specification, but there are already so many and making more doesn’t seem to help. Yes it would be nice if PHP and Python implement the logic for ignoring trailing commas the same way, but 99% of the time that isn’t all that important. Yes there will be cases where it matters and bugs are introduced, but since there already significant differences between JSON parsers I don’t see these kinds of things as making it much worse.


It's going to have to be a different mode and mimetype because this format would break all kinds of parsers.

I wish something would done in this area relatively soon though because JS has added a number of features that would make a new JSON much nicer like multi-line strings and BigInts.

I think JSON + comments, commas, template literals, BigInt, NaN, Infinity, and BigDecimals (if/when those land in JS) would be very useful. (It'd be nice to include dates, but that's tricky w/o a literal and because dates)


JSON doesn’t specify its numeric types: the mapping of a string of digits to concrete numeric types is implementation-defined: so, JSON doesn’t need specific syntax for BigInts or arbitrary-precision decimals.

Current parsers cannot start returning BigInts instead of numbers without that being a breaking change. And I'm not sure anyone wants a format where the result may change types based on the size of the number, especially for languages where arbitrary precision types are not compatible with other numbers.

> Current parsers cannot start returning BigInts instead of numbers without that being a breaking change.

Current parsers aren't uniform here. Since the JSON spec is silent on what post-parsing format is used for numbers, each parser is free to do whatever makes sense in the context of the host language. I reckon you'll find some JSON parsers use bigints already, especially in languages with first-class bigint support (such as various Lisp dialects)

> And I'm not sure anyone wants a format where the result may change types based on the size of the number

That's nothing to do with the format, that's to do with the parser. Some parsers already do exactly that – use an integer type for numbers that are integers, use a floating point type for numbers that contain decimal points


> I reckon you'll find some JSON parsers already use bigints already

Yep. Python is one such language. I've seen this catch people by surprise when they discover their serial number (which granted, should have been a string in the first place) doesn't survive a trip from Python to JSON to Javascript, among other languages.


It doesn’t help that it’s called a serial number, and is sequential (hence, serial, but some manufacturers don’t care). num++ is easier than implementing increment_number_string that works on ASCII digits.

Yep, "serial number" was basically the argument for using a number type in the first place, and here they were sequential. Even then, ignoring the JSON peculiarities, it would have been ok up till someone had the brilliant idea to have x digit serial numbers for one product line, and y digit for another. With only zero padding to tell them apart. This person, not a programmer, could not fathom why it caused the "tech heads to go crazy".

Sometimes I debate collecting stories like this.


I can see why this example appeals to you. However, asking implementors to go outside of a spec is _in effect_ making a new JSON variant. Once you see this, you can understand why people want a spec with a name.

Wouldn't you just end up with (for example) "Python-flavoured JSON", just like we have "GitHub-flavoured Markdown"?

On the other hand, there is an RFC for JSON where there isn't for Markdown, so it ought to be possible to make use of the usual standards process to formalise a successor to the original JSON which might include some of the very common additions some people always want (trailing commas, comments, proper dates) - that this hasn't happened over the last 15 years suggests really that the problem is actually due to lack of agreement.


Seriously, I believe the omission of Infinity and NaN from JSON is a huge mistake, if not Crockford's bad joke. It is commonly said that JSON was originally going to be `eval`ed and Infinity and NaN could have been redefined, but that eval had to be already preceded by filtering anyway so you can put a few lines to ensure that Infinity and NaN are expected values. Or use `Function` instead. Or use `1/0` or `0/0` as pseudo-literals. Not every JS value is present in the JSON data model, but it's absurd that not every JS number is present in it.

Or just do RSON? https://github.com/rson-rs/rson

I honestly don't want to bash JS, but typing is not it's strength.

(Not affiliated)


I think that's the opposite of what GP is advocating; they're arguing that a better approach than defining entire new formats, we should just augment existing JSON implementations. I'd guess that the rationale is that it would be a tougher uphill battle to get people to switch to an entirely new format, but adding new features to implementations they're already using wouldn't require making anyone switch.

And I generally think good constraints are more empowering than good options when it comes to any form of systems design. It took me 20 years in the industry and 20 minutes with cynefin to understand this.

While we are at it, how about making the parsers accept non-ascii space, so that things don't break if somebody using Windows edit a file somewhere and adds a BOM.

ruby's JSON.parse accepts comments (but not trailing commas)

The page addresses that there are other alternatives/supersets. I think the most common one that has decent adoption is JSON5.

I would like to see why the author chose to do yet another format, vs adopt JSON5.


I find that JSON5 does too much.

What are your specific dislikes?

I've been finding plain JavaScript as a very nice alternative to the many config file formats out there. It already has the trailing commas + comments bells and whistles, plus the ability to do computation to generate repetitive elements. Of course there's always the danger of the user creating a monstrosity of a 'config file' but if they're the ones using the software that consumes the config file, that's on them to keep the config file complexity down. Go already has a very useful JavaScript engine written in pure Go (goja), I recently opted for that as the config file format over templated JSON/TOML/HJSON or Starlark/Tengo because JavaScript is well defined and expressive. I'm thinking it will be a good choice for an authorization rules engine as well (over custom DSLs like OPA/Casbin) because using it is so much simpler if the user already knows JavaScript.

Turing complete config files makes me a bit nervous.

Not just turing complete, but impure. Configuration files shouldn't be able to perform arbitrary IO.

Starlark to solve this exact problem, and is a Python subset so that people don't have to learn something totally new either. It's still turing complete, but at least guaranteed to not have any side effects, or even be able to access anything but an explicitly given execution context.


@borkdude’s sci actually seems like a nice contender for this as well.

This is my preferred solution as well, nowadays.

Alternatively, just use JSON as a low-level data interchange much like a CSV file. Put all your human-managed content in a configuration language that emits JSON, like jsonnet, cue, dahl, etc. These languages add comments, function, variables, and much, much more (like fancy data cascades, validation, schemas, imports, etc.) to make managing configuration at scale easy.

Why even go to the trouble of converting it to json then?

It simplifies your code that depends on configuration. Instead of having to parse the config language you just read the JSON. Pretty much every programming langauge has a JSON parser built in to its standard library, while support for higher level config langauges is a lot less ubiquitous.

It also makes it easier for other tools to generate, validate, or output config that feeds into your system. They can do whatever processing they want and emit plain old JSON.


For interop with the massive json ecosystem.

For even more flexibility in JSON format there is SEN. https://news.ycombinator.com/item?id=26237048 talks about making JSON look good and introduces SEN at the end with optional commas and quotes. It also allows comments and converts back and forth with pure JSON with no loss of data. Comments are obvious lost though as JSON does not support comments.

I wholeheartedly agree with this idea. In case anyone wants a fast C++ parser for this, RapidJSON [1] has optional support for comments and trailing commas; I actually added trailing comma support to it.

I did this to support hand-writing game data in JSON (e.g. monster stats, campaign dialog scripts) and then converting it to MessagePack at build time [2]. This gives you very high simplicity (no schemas) and excellent performance.

[1] http://rapidjson.org/

[2] https://github.com/ludocode/msgpack-tools


For comments in plain JSON a common approach is to use keys like "comments" or "description" in object literals.

As for trailing comma the only real issue in practice is noisy unified diff output. But for JSON word diff or similar works better in any case when the trailing comma is not an issue.


Adding a property to an object that does not belong to the object is pretty much the worst workaround to comments, in addition to it being much harder to read due to lack of comment specific syntax highlighting.

That is not only issue in practice. I can't count the number of times when somebody changed the json adding single line thinking what could go wrong without test...

I’ve been using Cue (https://cuelang.org) as a human-friendly superset of json, and it’s a breath of fresh air. The creator is one of the original authors of GCL, Google’s core infrastructure configuration language, and you can tell he poured all the hard-earned lessons from that into Cue. Highly recommended.

JSON is a data format. You actually want strict formalism in a data format, because the integrity of the data matters more than flexibility.

Comments would only serve to make humans edit the file by hand more than they do now, and to maintain it like a snowflake. You should not be maintaining your data in JSON, it should only be a momentary stop along the way to a more robust data storage system [with a schema].

Optional end commas would only encourage people to use shitty hacks to try to craft JSON without a real parser or encoder, which would 1000% end up with broken JSON all the time, which would then force vendors to support shitty broken JSON.

There's a reason it's designed like it is. Cutting corners is not going to result in better outcomes.


What's nice is that this can essentially be ignored by API's and browsers.

Computer-generated data never needs to include trailing commas or comments.

Really the principal use case here is JSON as human-edited configuration files. In which case I suppose it's nice to have a name like "JWCC", but really it's just two flags for JSON decoding libraries to add (or a single nice combined flag).

So that really would be great IMHO, if you could just call json_decode($json, JSON_ALLOW_JWCC). I don't think we need a new MIME type or anything. But a file extension of ".jwcc" would be a nice convention too.


It doesn’t improve the dev experience if the compilers don’t produce it, and this is really about dev comfort. You need to be able to modify JSON programatically and keep the comments, if the output is pretty-printed.

Also, NaN numbers are an entire features, which they talk about above.


But why should re-jsoning keep //-comments? I never met the need for that personally. Comments help initial filename.json to be parsed by human eyes, and when it is sucked into a program (and possibly transformed), they lose meaning, even when pretty printed. For them to stay, one can put comments into real keys and/or structure their data appropriately.

Even if one ultimately needs to keep json document’s human structure (like key order, comments, whitespace nuances), they may create and use more syntax structure-aware library to do that.


> they may use more structure-aware library to do that

Sure. I’m particularly thinking about mvn upgrades or “npm update”, which modify pom.xml/package.json files to upgrade the libraries, after checking rules (non-breaking changes or not, vulnerabilities or latest, etc).

For mvn, libraries never succeeded to modify the pom in-place without wrecking the file format, so mvn upgrades never became a thing. It’s also a demonstration that a DOM with comments in memory doesn’t ensure we can output the file as-is ;)


That sounds like an entirely separate issue from what JWCC tries to achieve though.

For programatically upgrading JSON config files, it's not just comments but whitespace, indentation, etc. that need to be preserved. Honestly that needs an entirely separate tool/library from normal JSON encoding and decoding -- e.g. special jwcc_insert() and jwcc_delete() functions that guarantee the file remains untouched (exact formatting preserved) except for the modified part.

It's really more akin to when your apache.conf file gets modified programatically -- everything is preserved exactly except for the specific lines that get touched/added.


Here's another flavour of JSON with comments: https://pypi.org/project/commentjson/

Also, there is JSON5, which I think is more widely known

https://json5.org

https://github.com/json5/json5


> Yes, Doug Crockford deliberately removed comments from JSON but people keep putting them back in. If we’re going to have comment-enriched JSON (e.g. for human-editable configuration files), we might as well have a standard one.

The text "deliberately removed comments from JSON" links to https://web.archive.org/web/20150105080225if_/https://plus.g... where Doug not only explains the reason, but also a solution which works with the existing standard. It's imperative that the JWCC author explain what they find lacking in Doug's solution - stripping comments before handing off the a JSON parser. A new format is a really weird way to go about fixing a problem that already has a solution.


Crockford regularly displays a stunning degree of tunnel vision.

Without a standard format for comments, you have no reason whatsoever to expect the ad-hoc comments in a JSON file made by anyone who isn't you to be styled like JavaScript comments. And if your response is "then everyone needs to use JavaScript-style comments", well, TA-DA, you've just added comments to the spec. You can't have it both ways.

> It's imperative that the JWCC author explain what they find lacking in Doug's solution - stripping comments before handing off the a JSON parser

If the syntax rules for adding comments are unspecified, then the the syntax rules for what to strip out are unspecified too!


I wish JSON supported comments, but I don’t really understand your criticism. You say:

> Without a standard format for comments, you have no reason whatsoever to expect the ad-hoc comments in a JSON file made by anyone who isn't you to be styled like JavaScript comments.

But, since comments are not supported in JSON, you can expect JSON from external sources to contain no comments whatsoever.

That’s precisely Crockford’s point: he didn’t want the format to have a comment syntax that works across different parties so that the only way to use comments is internally, i.e. with a custom comment syntax that you strip before interacting with external sources.


I can't believe that anyone who has ever wanted to comment a package.json file could think Crockford's argument is anything but nonsense. The problem is these files are both edited by humans, and by a plethora of tools. Without a standard comment format, it's been a nightmare trying to comment package.json file in a way that doesn't break something in the NPM/Node ecosystem.

Using JSON for configuration could be the mistake? That wasn't the original goal of JSON.

>That wasn't the original goal of JSON.

That's neither here, nor there. The narrow vision many/most tools were created with is laughable compared to the actual creative uses people put them into.

Heck, the internet wasn't created for collaborating, socializing, shopping, reading, listening to music, etc., anyway, it was created to have a war-proof network for army use, yet here we are...


ARPANET, the direct predecessor for the internet, was created for collaborating, including socializing and reading. It was NOT created to "have a war-proof network for army use". The latter is a pernicious falsehood.

Quoting https://en.wikipedia.org/wiki/ARPANET :

> It was from the RAND study that the false rumor started, claiming that the ARPANET was somehow related to building a network resistant to nuclear war. This was never true of the ARPANET, but was an aspect of the earlier RAND study of secure communication. The later work on internetworking did emphasize robustness and survivability, including the capability to withstand losses of large portions of the underlying networks.[51]

See also Licklider's work with https://en.wikipedia.org/wiki/Intergalactic_Computer_Network leading up to ARPANET, or his "The Computer as a Communications Device" - https://signallake.com/innovation/LickliderApr68.pdf where he describes how the network might be used:

> You will not send a letter or a telegram; you will simply identify the people whose files should be linked to yours and the parts to which they should be linked-and perhaps specify a coefficient of urgency. You will seldom make a telephone call; you will ask the network to link your consoles together,You will seldom make a purely business trip, because linking consoles will be so much more efficient. When you do visit another person with the object of intellectual communication, you and he will sit at a two-place console and interact as much through it as face to face. If our extrapolation from Doug Engelbart’s meeting proves correct, you will spend much more time in computer-facilitated teleconferences and much less en route to meetings.

and has a section titled "On-line interactive communities":

> Available within the network will be functions and services to which you subscribe on a regular basis and others that you call for when you need them.In the former group will be investment guidance, tax counseling, selective dissemination of information in your field of specialization, announcement of cultural, sport, and entertainment events that fit your interests, etc. In the latter group will be dictionaries, encyclopedias, indexes, catalogues, edit-ing programs, teaching programs, testing programs, programming systems, data bases, and—most important—communication, display, and modeling programs.

Collaboration and reading were surely some of the main goals in the vision that became ARPANET.


From the very page you link to:

Nonetheless, according to Stephen J. Lukasik, who as Deputy Director and Director of DARPA (1967–1974) was "the person who signed most of the checks for Arpanet's development:

"The goal was to exploit new computer technologies to meet the needs of military command and control against nuclear threats, achieve survivable control of US nuclear forces, and improve military tactical and management decision making."


That quote was immediately after a quote from Charles Herzfeld, ARPA Director:

> "The ARPANET was not started to create a Command and Control System that would survive a nuclear attack, as many now claim. To build such a system was, clearly, a major military need, but it was not ARPA's mission to do this; in fact, we would have been severely criticized had we tried."

History's complicated, isn't it?

So, we look at other things: Licklider became Program Director at ARPA in 1962 and ARPANET started in 1966 (which is when Lukasik joined as Director of Nuclear Test Detection before becoming A.D. the next year, then Director in 1971). And as Lukasik writes in his paper, the late 1960s were a different funding era than the early 1960s when Herzfeld's "foundling" started.

Recall that the 1968 Mansfield Amendment prohibited military funding of research that lacked "a direct or apparent relationship to specific military function" - far different than the Ruina years where office directors and program managers had significant autonomy and funding authority. An effective ARPA director after the Mansfield Amendment was passed is going to be someone who is good at viewing ARPA projects through that military support lens, yes? Which might be different than the lens used earlier?

Next, quoting https://en.wikipedia.org/wiki/Robert_Taylor_(computer_scient... :

> Taylor hoped to build a computer network to connect the ARPA-sponsored projects together, if nothing else, to let him communicate to all of them through one terminal. By June 1966, Taylor had been named director of IPTO; in this capacity, he shepherded the ARPANET project until 1969.[11] Taylor had convinced ARPA director Charles M. Herzfeld to fund a network project earlier in February 1966, and Herzfeld transferred a million dollars from a ballistic missile defense program to Taylor's budget.

It therefore seems very much like collaboration, at the very least, was indeed part of ARPANET's goals when it started in 1966.

FWIW, as Martin Campbell-Kelly and Daniel D Garcia-Swartz point out, at https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.58...

> ARPANET network was one among a myriad of (commercial and non-commercial) networks that developed over that period of time – the integration of these networks into an internet was likely to happen, whether ARPANET existed or not.

They further consider it "Whig history" to think ARPANET plays a critical role in the modern day internet.

And I agree with that assessment.


>History's complicated, isn't it?

Yes, but in the end it's he who pays the bills who decides. Without them it's lights out.


Herzfeld paid the initial bills. Herzfeld said "ARPANET was not started to create a Command and Control System that would survive a nuclear attack".

Lukasik came in later, and had no signing authority or supervisory position on the initial ARPANET work.

Following your guideline, the ARPANET therefore wasn't "created to have a war-proof network for army use"; that was a design goal which came later, after the collaboration goal.


Who is to say what to legitimate use and what is misuse of a creation if not the creator?

Obvioysly the world at large.

Would you consider your use of fire (e.g. for cooking) "illegitimate" if the creator of fire said so? ("No, must eat food raw! Fire is meant for heating only! Ugh!").


The people who use it.

It's like the inventor of gif declaring it is pronounced 'jif'. Who cares, basically nobody else says it like that.


What if it's a goal of JWCC?

I think TOML and YAML are both better for configuration files.

Exactly, hasn’t NPM caused enough problems? Now you want to spread that mindset to JSON?

The fact that lots of people want to comment their package.json file and can’t easily due to Crockford’s decision does not mean that Crockford’s argument is nonsense. Crockford’s argument is not that there shouldn’t be comments in JSON because no one would want to use them. And again, I would also like to have comments in JSON.

My point is that I don't buy Crockford's argument, at all, against having comments in JSON. "I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability".

So instead you have people coming up with a million different bad hacks to support comments in non standard ways (e.g. "fake comment" properties, duplicate keys, etc.) Coupled with the fact that since it's not in the standard, you never really know if the file you're creating may be read by something that won't strip comments. It's the worst of all possible worlds in my opinion, with 0 benefit.


I’ve personally never encountered any combination of JSON-encoded data and JSON parser that didn’t work perfectly together. I don’t know whether the right trade-offs were made between interoperability and other features, but I’d say it’s very clear that if interoperability was a chief design goal, it was executed extremely well.

I don't understand why you keep making the same circular arguments. If the lowest-level standard for JSON-encoded data also included comments and trailing commas, than I have no doubt that every combination of JSON-encoded data and JSON parser would be able to parse that. Instead, it wasn't put in the lowest level spec, so instead what you have are lots of individual parsers with non-compatible custom flags. If Crockford's argument was "I got rid of comments because people were using them for preprocessor directives", then he just made the problem a million times worse because now there are a million different, incompatible "preprocessor directives" that are just implemented as basically command line args now if a user wants to have a very basic feature, which is comments.

> If Crockford's argument was "I got rid of comments because people were using them for preprocessor directives", then he just made the problem a million times worse because now there are a million different, incompatible "preprocessor directives"

No, because his goal was to make JSON a highly-compatible interchange format, which it is! There are roughly zero incompatible preprocessor directives, and roughly zero problems using JSON to exchange data between parties. Seriously, when have you received JSON that you couldn’t immediately parse with your standard JSON parser of preference? The fact that some people might devise their own tools to store incompatible JSON on their end with things like comments, and strip those things out when transmitting them in order to make them compatible JSON, does not qualify as an incompatible preprocessor directive. On the contrary, thats precisely the intent of the choice to not have comments in JSON.


> Seriously, when have you received JSON that you couldn’t immediately parse with your standard JSON parser of preference?

Every single time for any non-mediocre definition of parsing. JSON has no way to transmit the meaning of values, no datetimes, no units, no sets, nor any other semantic attribute. That means your "parsed" result is always useless and wrong on its own, literally a lesser-dimensional projection of the original information, until you interpret it again in an entirely ad-hoc custom manner. Wouldn't it be nice if you could store instructions inline for how to do that.


JSON schema exist exactly for that reason, hacks like adding type is just nonsense. If somebody need comments, types it - XML has all that. Also how hard is to write {"type": "datetime", "value": "2021-02-23 12:46:37.07"}?

> JSON schema exist exactly for that reason, hacks like adding type is just nonsense

Creating schemas for declaring types is literally a hack for adding type! I think its very bizarre to say "You dont need to do X because you can just do X."

> Also how hard is to write {"type": "datetime", "value": "2021-02-23 12:46:37.07"}

"datetime" isn't a universal format specification. 2021-01-10 could be October 1 or January 10. What time zone is this in? Local to the sender, UTC, local to the recipient, somewhere else? Is this a 24 hour clock or something that gets a modifier for the other half of the day? You're making some classic mistakes that people make when they don't think about the complexity of the problem domain of communicating information unambiguously.


> Creating schemas for declaring types is literally a hack for adding type!

It is a "communication protocol" - base of any messaging system. Message has header with protocol name and version: HTTP, TCP, DB connection, SOAP, any message queue - everything working that way.

> "datetime" isn't a universal format specification. than use another (see the comment above)

self-contained message is anti-pattern


> I’ve personally never encountered any combination of JSON-encoded data and JSON parser that didn’t work perfectly together.

As recently as 2017, the Google Translate API used to return completely invalid JSON with empty array indices instead of nulls ([,,,] instead of [null,null,null,null]). This broke several JSON parsers until they patched around Google's cavalier abuse of the language. Your experience of everything always working perfectly doesn't mean that everything always works perfectly.


That sounds more like a bug that, because the group responsible for the bug didn’t fix the bug and was extremely important to the community, developers of JSON parsers had to put in bug fixes on their end instead. I’m not familiar with the details of that event, but it doesn’t sound like anyone was actually accepting this as a new valid variant JSON, and AFAIK it didn’t lead to the propagation of different variants of JSON that are both in significant use but are incompatible. Or to put it another way, I don’t think people need to worry about whether they’re using “Google Translate JSON” or “Standard JSON.”

> I’m not familiar with the details of that event, but it doesn’t sound like anyone was actually accepting this as a new valid variant JSON

When they accept it, it becomes defacto valid. That's how acceptance works.

See also http://seriot.ch/parsing_json.php#41 for a big table of "JSON-encoded data and JSON parser that didn’t work perfectly together". Read the whole page though. It's quite enlightening.


> he didn’t want

He didn't want. That's great, but using JSON doesn't magically eliminate the need for documenting what things _mean_. Oh, this text isn't just text and is actually a datestamp? Oh, this array isn't allowed to have repeat values? Oh, the valid values for this field are "red", "yellow", "purple", and 5? You could have had that documentation inline. Instead you're forced to have it somewhere else because the need for documentation didn't magically vanish when Crockford waved his wand.

All that eliminating comments accomplishes is needlessly limiting the utility of an otherwise mostly fine format. JSON could have been a good configuration language. Instead it isn't and apparently people have to fight to justify fixing that.


>But, since comments are not supported in JSON, you can expect JSON from external sources to contain no comments whatsoever.

You can expect it, and then you'd be suprised when they do.

See people really want/need comments, and they're gonna implement them anyway in dozens of little parsers/libs, and you're gonna have to deal with files having them anyway...


You can expect JSON from external sources to be a JSON like notation their parser accepts. Not JSON always.

If the rules are unspecified the only thing you wouldn't be able to put in comments are what is specified as JSON syntax. If you add minimal specifications for escape sequences then even that is allowed.

> If you add specifications then

Indeed.


If I’m going to use “JSON with comments” that I have to run through a preprocessor before parsing, why not just use a language where comments are explicitly allowed (YAML, TOML, etc.)? Crockford’s answer is not a solution. His words can be rewritten as: “you want comments? Use a different language that looks and feels like JSON, but isn’t.” All that does is add confusion.

The only reason JSON is as popular as it is is because JavaScript parses it natively (JSON.parse is defined by the JS spec).


> The only reason JSON is as popular

I mean I remember using XML for serialization back in the day and how it made we want to kill anyone who was involved in its creation. JSON won because it’s a great and simple standard and yes because it maps 1:1 to js. It was very popular with people who hated js when it came out too - just because of how much better it was than the other options


I think the move from XML to JSON mirrored a lot of the overall switch in the software industry at large from unnecessarily overly complicated "design-by-committee" specs to simple, get-shit-done practicality. (Heck, anyone remember the horror that was the original EJB spec, or god forbid, CORBA?) The rise in scripting languages like Python and JS on the backend is another example of this.

And I see this obsession in some people in the industry trying to prematurely optimize every drop in their code, but they don't understand the added complexity results in inefficiency (mostly lost improvement potential) in the long-term.

>Doug not only explains the reason

Just because he had a reason to give doesn't mean it wasn't a bad reason. His stated reasoning failed with hindsight.

(a) People already do what he thought was preventing (custom parser behavior) by removing comments, (so his choice failed to prevent what he wanted to prevent)

(b) people have already created dozens of JSON parser variants that accept comments, because we really want those. And not just some niche devs - Microsoft and dozens of other big companies have tools that accept JSON + dangling commas + extra comments (so his choice was second-guessed and bypassed anyway, just in ad-hoc ways instead of a better universal one).

It wasn't his place to define whether we get comments or not based on some potential parser abuse. But he did it, and now we're stuck with that decision...


> It wasn't his place to define whether we get comments or not based on some potential parser abuse. But he did it, and now we're stuck with that decision...

He was writing the specification, of course it was his place to do so. Who else would be in the right "place" to write his own specification that no one was enforced to use.


I disagree, he has DONE a common data format that works. If you do mod it, for example add comment to the data sent, 99.9% of the people agree, it is no longer JSON.

Which is neither here, nor there.

People still do it, and they still call it JSON. Heck, many JSON parsers, who otherwise work fine with regular JSON still accept it.

Even parsers using a different name like JSONC still allude to the JSON connection - and still are an argument that people found a lack in JSON.

Etymology/originology never produced much benefit over pragmatic non-prescriptive examination of how people use things. If anything, it helped confused the situation with pedantic objections.


Given the number of times people have invented and re-invented JSON + comments, I think it's clear there is a demand for JSON + comments, and that Crockford's view that you can solve everything by asking people to manually strip the comments out before parsing the JSON isn't viable.

Petabytes of JSON formatted data flowing across the internet every second of every day disagree with your assessment.

And I suppose the vast amounts of XML (still!) flowing across the internet show there's no need for JSON at all, so the real mistake was even inventing JSON?

(Also, as gets pointed out every time this topic pops up on HN, including multiple times on the submission already, the main demand isn't people wanting to add comments to the JS flowing across the wire, but to config files sitting on disk. Responding to people saying "feature X would help use case Y" by noting how many people are using it for use case Z even without feature X is logically incoherent.)


Crockford (and many others) found that the many flaws inherent to XML warranted a new specification, so JSON was born. If you have an issue with JSON, go create your own spec, don’t shit all over the perfectly capable, simple, elegant specification that is JSON.

> If you have an issue with JSON, go create your own spec

And many people have. You're literally posting this in response to someone submitting the new spec they created to solve this.

I don't understand what you're trying to argue here.


the very simple thing lacking in this solution is that sometimes you want to programmatically _add_ comments to (or preserve comments within) some output file format. If they must be stripped before parsing, then it is definitionally impossible to do either of these things.

It's not unlike saying "actual code shouldn't allow comments - you can have a separate preprocessor in your non-code file that strips those out." Sure, you could do that, but all your line numbers would be meaningless and everyone would hate the language from day 1.


> A new format is a really weird way to go about fixing a problem that already has a solution.

Stripping comments before handing off to a JSON parser is not a full solution. For example, if the file has to be augmented with additional data and written out again, the comments would be lost.


Formatting would be lost too (multiple newlines, array formatting, wrap limits, etc), and the order of keys. There are two separate tasks: comment json for a human and ignore that on parse(), and maintain pretty-json with all its usefulness programmatically with parseWithSyntaxStructure(). We shouldn’t mix them into one single problem, because in one case you just want multiple readers (a human and a computer) and in the other you want to rich-edit a human representation on its own.

> a practice which would have destroyed interoperability

A baseless reason is not a compelling reason. Some imagined eventual use is not relevant to the utility and convenience. Interoperability (in this greater undefined sense posited) cannot be maintained via syntax anyway.

> stripping comments before handing off the a JSON parser.

Putting an extra processing step before use of JSON is impractical.


The base is the part right before that.

> I removed comments from JSON because I saw people were using them to hold parsing directives

> Putting an extra processing step before use of JSON is impractical.

Care to explain?


People want to use JSON as on disk configuration files, and that's where they want the comments and trailing commas. You can either generate the configuration file from another file, which is frankly stupid annoying and leads to repositories with two copies of the same information, or modify the software that reads the configuration file to strip it before handing it to the JSON parser... which clearly "solves the problem" but also begs the question "ok, so what is the format of the actual confirmation file, then?". The answer to that latter question is more important, as the goal would then be to get everyone to standardize around that format, at which point (I guess) "JSON" would be obsolete as no one would use it and everyone would be using the new JSON wrapper format and all implementations would be designed to have a way to read that format... but like, at that point, you are just making a weird semantics argument. I think the core problem is we just have too many of these... like, what happened to JSON5? We need some reason to all rally around a single specific format of "JSON, but with comments and trailing commas (and I will personally request multi-line strings)".

> I removed comments from JSON because I saw people were using them to hold parsing directives

Java has not been destroyed, nor any other language that uses injected behavior. Ironically, applying parsing directives are basically another tool being run (similar to a linter) so I'm not sure what he's even getting at when suggesting running yet another tool. This just complicates the ecosystem and is part of what makes javascript seem so primitive in use and syntax. Features that are counterproductive for poorly considered reasons.


Simple example. JSON is frequently used for configuration files. Comments break syntax highlighting for JSON.

Similar thing if you want to inspect a JSON dump into a file.


Oh, it's because people were abusing them. I thought it must have been that you can't read a file then write it back out without either losing the comments or having a whole special data structure to store them in.

If the solution is to store config in a format with comments and strip them off before handing to JSON, what should that format be? Shouldn't it be formalized? Couldn't JWCC be that format?

I just happen to be working on JSON parsing today. What a mess. People don't even follow what is already there. I've seen several 'JSON' data files like this:

{<object>} {<object>}

That isn't valid JSON, is it? The Qt JSON parser doesn't handle it.


That looks a bit like NDJSON, assuming that the space between the objects is actually a newline.

There are libraries[1] that support parsing it and it's not too hard to do yourself, either. Some fairly popular projects use it to represent multiple responses in a single response body[2].

[1] http://ndjson.org/libraries.html [2] ElasticSearch uses it for msearch response bodys, for example.


Or JSON lines - whose website looks very similar to NDJSON's:

https://jsonlines.org/

This format works great when using Amazon Athena (Presto) against log files written with one JSON object per line.


Both the websites are fancy ways of saying "one JSON object per line". The idea of laying out data that way was in common use before either of the website, they're just recording the practice.

It looks like it might be this: https://jsonlines.org/ Which I hadn't heard of before.

FYI: I collect JSON variants with extension at the Awesome JSON - What's Next? page [1].

[1] https://github.com/json-next/awesome-json-next


I wonder if there's an equivalent for markdown variants. I wonder if there are more markdown variants or JSON variants.

I actually think the lack of comment support in JSON has resulted in better naming and better documentation by those who create JSON to be consumed by others. I don’t want comment support in JSON at this point personally.

I agree. Also if json had comments, people would start putting data in comments so soon you would have to parse comments as well. Then someone would ask for ' for strings and whole json would turn into garbage.

Aren't these deserialization options already in, say, Jackson?

These options are already in lots of parsers. That's part of why this proposal is attractive: we don't need much to start using it.

In fact I've been using JSON with comments and trailing commas in my own projects a lot already, and I imagine lots of other people have converged to the same idea since these are really the only two pain points when handwriting JSON. It's about time we give it a formal name.


I'm honestly surprised EDN hasn't taken off as a JSON alternative. It keeps all the good things about JSON but then adds the features people want like comments and more clear specification. The ability to tag elements to extend for specific types while allowing intermediaries to not care about those types is incredibly powerful.

IMO it would also be really nice to have non-string keys in objects, like:

{ [1, 2]: "a", [3, 4]: "b" }


You realize this would push outside of what JavaScript provides?

    'Welcome to Node.js v15.7.0.
    Type ".help" for more information.
    > { [1, 2]: "a" }
    ({ [1, 2]: "a" })
        ^
    Uncaught SyntaxError: Unexpected token ','
Thar' be dragons! https://stackoverflow.com/questions/32660188/using-array-obj...

it's true JS doesn't support this, and it's also true that any JSON that supported it would therefore no longer be _JavaScript_ object notation.

However, it's equally true that a better language and object notation would support non-string scalar keys.


I can see why you want non-string scalar keys. In general, I do too. (I'm writing a data language that has them.)

However, I don't think it is useful to prod JSON (or a variant) to go in this direction.


Why not?

JSON is extremely closely linked with JavaScript. Trying to move it in a different direction, in my opinion, would require a lot of effort, without much benefit, and is unlikely to succeed.

JSON is used lots of places JavaScript isn't. It hasn't been closely linked with JavaScript since people realized eval is dangerous.

People are more interested in JSON derivatives like JSON5 or Ion than more complex languages like Dhall in my experience.


All fair points.

Dhall is new to me. Would you tell me more about other configuration languages you seen? I often use TOML myself.


JavaScript wouldn't be able to consume those resulting objects, so it would be pretty diminished as an alternative to JSON. Object keys are all strings.

Not that I find this idea useful for otherwise limited DTO format, which JSON is, but JS has Map, which may be used for such maps.

Sadly because arrays are not immutable you can't meaningfully use them as Map keys. You'd need a tuple type, which does not currently exist.

it's usually a pretty bad idea to have a mutable object (the array) as a key. Though I guess if you consider it as a 'tuple' since when in json form it can't be modified I guess it'd be technically possible, though with some heavy caveats.

There’s no concept of mutability in JSON. A JSON parser could give you immutable arrays and maps if it wanted to.

Yes, but json by itself doesn't actually do anything. It'd always be used in context of javascript/python/ etc.. As I said you could interpret it as a tuple or I guess a frozen array for javascript? I don't think there is a native immmutable array in javascript.

If it's a map I'm a bit more unsure how you'd check to find the object (quickly). You're basically getting into the how to store a struct/class as a map key. Either way its a bit more involved than just having json parser return an immutable array and map, and the more I think about it the more edges cases there are.


> I don't think there is a native immmutable array in javascript.

I'm here to tell you that `Object.freeze()` works just fine on arrays.


Nice. Though I don't think that solves the rest of the issues I outlined sadly.

I guess you could have this json+ convert into some custom javascript class that allows arrays/objects as map keys rather than the normal javascript object that only accepts strings as map keys.


> I guess you could have this json+ convert into some custom javascript class that allows arrays/objects as map keys rather than the normal javascript object that only accepts strings as map keys.

Sure - why not? JSON doesn't have to map directly to JavaScript objects.


Why?

One of the uses is mapping values to objects instead of strings, so you don’t have to use ids:

  ob = {...}
  somemap[ob] = value
  // vs
  somemap[ob.id] = value
In languages which allow weak keys in such maps, a garbage collector may even reclaim whole key-value pairs once ob falls out of existence. But to my opinion, in json it is barely useful for a number of reasons. It is more programming technique than data transfer format, and these features (and reference loops) should be encoded at a higher level than json.

Why can't these no-brainer features just be added to browsers without a new standard!!

You mean like when Google implements some Chrome-only features crippling other browsers, so people are indirectly forced to switch to their browser? We've been through the era of "this website works only in IE6 and better" already and I cant remember a single advantage it would bring.

That would make fetching JSON data significantly more complicated. You'd need to write your fetch call to detect if the browser supports comments and commas in JSON, and then make the request to the server to ask for whichever variant the user's browser supports, presumably with a different Accept header. Users who haven't updated to a new version of the browser would still ask for the old JSON format, and users who have updated would ask for the newer version.

That would mean your API would need to serve different variants based on the Accept header, and your backend code would need to generate the two different variants.

And then you'd have to question if sending API responses with unused things like comments is actually a good idea given it's entirely wasted data in a production environment.

I suspect few people would use a newer JSON format until it's been available for a long time, and even then they wouldn't use it on big sites.


yea but it would be cool if you could include comments, wouldn't it?

People use JSON For configuration files? What a PITA. Use a tool suited to the job, eg, TOML.

https://en.m.wikipedia.org/wiki/TOML

Not that there's anything wrong with adding comments and commas to JSON. Call it JSONv2, keep the .json extension and let people bump around upgrading for a bit. It's hardly much of a change but has significant benefits, even outside the config file use. For instance it can sometimes be useful to annotate raw data, and it'd be nice to have that built in. Certainly the commas is a no-brainer.


Looks pretty similar to rome json (https://rome.tools/#rome-json)

Could we solve dates first, then worry about comments and commas?

ISO-8601 not good enough for you? It’s a standard, human readable, has every datetime detail, every language that supports JSON can read it.

What’s missing?


A json parser can discriminate between the integer 42 and the string “42”, but it cannot discriminate between the date 2021-02-23T08:10:19+0000 and the string “2021-02-23T08:10:19+0000” because in json, the date 2021-02-23T08:10:19+0000 will have to be encoded in a string.

That means your json parser either will accidentally parse some strings as dates, or will have to be told which strings are (or even might be) dates, introducing a partial schema, or it will have to leave it to the application to convert strings into dates. Workable? Somewhat, but not ideal, just as not supporting numbers, and leaving string-to-integer-conversion to the application (after all, JavaScript will happily convert strings to integers in eval) would be ‘not ideal’.

Also, are you going to accept 2021-02-23T08:10:19Z, too? Leaving out the seconds, minutes,…? for interoperability, you would have to specify that.


Be careful ever saying “every datetime detail”! An obvious one is that ISO 8601 can’t represent the location of the local time with enough information to know the current (or historical) daylight saving time rules for that location. You might not need that now, but you might if you’re building a calendar app!

ISO 8601 can represent a simple offset from UTC, but it can’t do a lot of the things that the IANA time zone database can do.


Fair, very fair. Have gotten bit by that in the past.

Adding an IANA library to the ISO8601 library (often part of the same lib) has so far solved that problem for me. I’m sure it starts struggling with dates before the early 1800’s though.

Also doesn’t work well if you’re dealing with relativistic effects I think.


As silly as it is, I don't use it strictly because the 'T' between the date and the time making them harder to read. "YYYY-MM-DD hh:mm:ss.nnn" and a separate timezone field is the format for me.

Another problem with ISO is the specs aren't freely available, so people will generally guess or refer to old versions, drafts, or RFCs (like RFC 3339)


I've never had a problem with dates by always serializing them as ISO-8601.

It absolutely has every benefit you cite.

It's the only ISO spec I know the name of without having to look it up.


Is the -0700 offset PDT or MST? ISO-8601 doesn’t solve a whole bunch of problems related to locale-aware date math.

Three character timezones like "PDT" also aren't good enough because they make it too easy to encode timestamps that don't make sense, like a British Summer Time (BST) time in December.

To be honest I think you're better off just not encoding the timezone in your main timestamp string at all and instead adding the full IANA timezone name (e.g. "Europe/London") in to a separate field. In sensible formats all the dates within a single JSON object or object will have the same timezone anyway, and to save bytes you can simply say in your spec "If absent, assume UTC".


Yeah, I'm just using the three character timezone as a handy shortcut: the point is an ISO-8601 timestring isn't unambiguous in all contexts.

Truly representing dates would make the semantics more complicated than JSON, Javascript, CSS, and C++20 combined.

The practical approaches are to represent it as time since the Epoch (in floating point) or a string where both sides somehow agree what "03/04/05" means.


I'd recommend always representing dates or datetimes as a string in ISO-8601 format.

Which ISO-8601 format though? There are a variety to choose from. You want the date-only format? or the date+time format? you want the T separator or space? Is time zone represented?

All these are allowed according to the ISO standard.


> You want the date-only format? or the date+time format? If your data field conceptually represents a date (such as a date of birth, or the user's selection in the input of a date), use the ISO-8601 date.

If your data field represents a time (such as the selection of the time when a daily should recur each day), then you should use the ISO-8601 format with of a time.

If your data field represents a specific point in time (such as the time an action happened), then you should use the full representation of the date and time.

> you want the T separator or space?

When concatenating the date and time, the 'T' separator is used. Although, this is an implementation detail I've never had to touch myself, since every date library I've ever used has been able to format dates into an ISO format, and parse them from an ISO format without me having to consider the separator.

> Is time zone represented?

Again, this depends on what the data being represented fundamentally is. If it's a point in time that an event happened, yes. _Usually_ if you're representing a full date & time the answer is yes, you should represent the time-zone (I always serializing these dates in UTC and include the TZ).

The TZ should only be forgone if the data represents the abstract notion of a specific date and time, rather than a specific point in time.

To me (other than the "T" separator), these are all fundamental data questions, and not serialization questions. You need to know whether a field is a DATE, a TIME, a DATETIME and whether is a Zoned Date Time or a Local Date Time regardless of your serialization format.

ISO-8601 conveniently supports all these uses-cases and more.


Sure, but all of those variations can be parsed unambigously as long as it's communicated to be an ISO-8601 datetime.

Well at least the article is aware of its own vanity. I was going to cite the XKCD about “standards” but it did for me. Just use JSON5 and stop losing time (or YML, yes it’s not that terrible for configuration files)

Just use YAML. JSON is valid YAML.

Problem is... YAML sucks, as human readable format

No it doesn’t. I know the HN-crowd feeling about this, but no, YML doesn’t suck as a human configuration language. TOML or XML properly suck. YML is actually a superset of JSON with comments and less brackets if you want. It’s widespread and supported and fulfill the need of OP, so no need to add yet another format. Or use JSON5. But please not yet another standard

YAML is one of the worst formats I've ever encountered due to its mind-boggling complexity.

There's a ton of ways to represent booleans, a ton of ways to represent dates, a ton of ways to represent numbers of a ton of different bases, a ton of ways to represent strings of various line-endingness'. It really has caused a significant number of easily avoidable issues in my experience.


How is TOML bad as a config language? I would not transmit data in TOML but is makes a lot of sense as a config language.

Neither YAML or TOML are config languages, they are data formats.

A data format is a way of structuring or serializing data. The whole point of the format is to read and write arbitrary data. It may have a simple schema, or features designed for the loading and unloading of data. But all those features are designed to assist the machine, not the human operator.

A config format is designed specifically to assist a human operator, not a program. Humans should not have to consider data types when they write a config, or when they feed it to a program. Humans should have useful features to make their lives easier, like variable substitution, pattern matching, inheritance, namespaces, simple newline-separated whitespace-indifferent commands, etc.

Programming languages are just elaborate configuration formats. And that's where the problem begins: how much "power" do you give the configuration format before it gets unruly? It's difficult to find a balance. People end up using simple data formats because the parsers are widely available and they can "fake" advanced features by making programs interpret a specifically-crafted data structure as a configuration instruction. But there's very little thought put into how this can be extended to make more complex configurations easier.

Most web servers and other complex software [usually run by sysadmins] have a real configuration format or configuration language. Poorly-written software pretends a data format is a programming language, or even worse, forces you to use an actual programming language. These designers fundamentally don't understand or don't care about the user.


Why do you think TOML sucks for configuration files? I use it everywhere and it has been a joy so far. Granted, I only used it with Go, but it has been a great experience, and the configuration files are nice. For configuration files I would rather use TOML than JSON. For other stuff? I would probably go with ASN.1!

How complex is your configuration? I've used TOML for Traefik config[1] and I've found there's so much repetition.

The single/double square bracket thing is also non-obvious at first.

[1] https://doc.traefik.io/traefik/routing/routers/


As a human-readable configuration language, YAML probably beats JSON and XML, but the implicit typing rules in the spec make it horrible to work with. I've written a lot of YAML for Ansible and it has some awful footguns.

YAML is fine as a user. It can be just JSON without some of its visual clutter if you like brackets, or much more concise if you prefer.

All the complexity related to anchors and such is unfortunate, but that’s a problem for parser writers.


> YAML is fine as a user

The link in the main article convinced me otherwise: https://noyaml.com/




Combining effort with JSON5 (~4K GitHub stars) makes sense to me.

They explicitly do not want unquoted keys. The author reviewed most prior art in TFA.

Would you like to spell out what TFA means?

People use TFA (the fine/featured article) without malice here. I apologize that I did not convey my spirit adequately.

Thanks for clarifying.

The term is very easily confused, to state the obvious.

In my opinion, I think it would be better to avoid it. I'm sorry if one negative meaning deprives people of the joy of using a nicer one, but such is the nature of language.



I did not intend this spirit. Instead I wanted to constructively point out that the article reviews several Json quirks and supersets.

I'm aware, thanks.

I wanted to see if CameronNemo would spell it out.

I'd like to point out the HN Guidelines:

https://news.ycombinator.com/newsguidelines.html

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

Even putting aside what "F" stands for, this means that comments like TFA or RTFA are not in the spirit of the discussion here.


"TFA" is a common usage on HN just to neutrally mean "the article that this comment page is about". It doesn't by itself imply "RTFA".

>"Did you even read the article? It mentions that" can be shortened to "The article mentions that."

That's exactly what the parent wrote, so he's totally fine with respect to the HN Guidelines.

To quote: "The author reviewed most prior art in TFA.".


The sentence you quote explicitly states that "The article mentioned that." is a permissible and in-spirit comment.

The guidelines are important and relevant. Kind, civil discussion is important. Putting the f-word in an acronym doesn't change its meaning.

The fine article.

You can say this without using that acroynm.

Please review https://news.ycombinator.com/newsguidelines.html.


[flagged]


This comment is off the mark. You are conflating comments with extensibility. They are very different.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: