Hacker News new | past | comments | ask | show | jobs | submit login
The Pretty JSON Revolution (ohler.com)
169 points by peterohler on Feb 23, 2021 | hide | past | favorite | 156 comments

If you're on a mac or Linux system, you likely have a JSON formatter already installed

`python3 -m json.tool somefile.json` or `cat foo.json | python3 -m json.tool` will print it in "one line per node" format. 3.9 introduces a --sort-keys switch for sorted objects also.

jq is a very nice tool for JSON wrangling available to install on most distros. It also provides key sorting, which is great for diff-ing JSON.

jq is a nice tool. oj is similar in many ways but different in others. jq has it's own proprietary query language while oj uses JSON path. The output options are also different with some overlap. Maybe jq will get a pretty output option after reading the article. :-)

> Maybe jq will get a pretty output option after reading the article. :-)

In my experience jq already does pretty the output. Maybe I'm missing something in your comment.

"pretty output" meaning the one called "Human Style" in the article, where multiple array elements or key-value pairs are compacted onto 1 line, IF they fit into the specified line length.

jq output is "pretty" by default, just without the ability to customize the prettification (as far as I know). If you want non-"pretty" output you need to add the `-c/--compact` option

jq output is also colorized on a tty output. This can be forced when piping to a pager e.g. `<json-producing command> | jq -S -C . | less -R`

I found JSON path to be so, so weak and limiting. Missing powerful axis traversals like xpath has, and also has very confusing semantics (the filter condition also changes the output???).

I hoped to find jq as a gem/module/library but I was disappointed. After days of searching and trying different things, I honestly could not find any powerful library or API for traversing and searching JSON.

JMESPath (https://jmespath.org/) is closer to XPath, with the support for the axis traversal, filtering expressions and embeddable libraries in most mainstream programming languages. It also comes with jq-styled processing pipelines; the syntax is the same that AWS use to query AWS resources.

What does "proprietary query language" mean in this context? jq is open source, right?

I would be a little nervous sorting the keys. I thought it was not too uncommon for parsers to treat them as an alist where order matters. I guess so long as it is a stable sort, no big deal?

Pretty printing JSON is mostly for developer consumption, I'm not sure how the pretty printed JSON would end up being fed automatically to another system?

(I have actually encountered a order-dependent JSON-subset parser before, but to my mind, that code is broken)

Pretty JSON is still just JSON. It should work with any JSON parser. Of course if it is being fed from one system to another there is no need to make the JSON anything other than compact one line JSON. Pretty is for human consumption.

Some parsers will reject JSON that has whitespace around the elements. Pretty JSON is really only for human consumption.

Outside of pretty printing - MySQL breaks your app if you depend on key order in an object because it alphabetically sorts object keys when you store it in a JSON column.

JavaScript itself will also sort object keys if they are numeric, so `{a: "a", c: "c", b: "b", "1": 1};` will be transformed to `{1: 1, a: "a", c: "c", b: "b"}`.

If you have a parser that's looking at keys in a hash as sorted, you should change your parser. Lists sure but not keys.

You're free to encode meaning in the order if you want, as the JSON spec explicitly punts on the issue:

"The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange."


It is a stable sort since for almost every parser out there there can be no duplicate keys. It would be dangerous for an JSON parser to assume JSON object keys are in some specific order. It is certainly not something that can be counted on in golang, Ruby, or Python.

> It is certainly not something that can be counted on in golang, Ruby, or Python.

As of Python 3.6 (in theory not guaranteed until Python 3.7), key order is preserved when reading and writing. That's a consequence of the fact Python dictionaries can now remember the order of insertion.

It's true that it doesn't support duplicate keys though (unless you pass in a different class to the object_pairs_hook parameter of loads() to replace its use of dict).

Also in violation of the spec:

> An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.


The spec does call out that some parsers deal with this. And since it is javascript based, leniency is the norm.

That is, you could rely on this and but be aware of it, is my point. Firefox, for example, will happily take an object with duplicates and report only the last one.

Well, by definition "unordered" means you can't count on any particular order. So while parsers may indeed preserve order, anything that relies on it is in violation of the standard.

That said, I agree that being aware of this is important if you're emitting JSON. You'd think nobody would ever address a JSON object by its ordinal position, but programmers are lazy and worse, think they're clever. :)

Exactly that. I did not mean to disagree that it is somewhat wrong. Just feels dangerous, as behavior could change with no side channel warnings.

If you have an application that's dependent on JSON key order, then you're not passing it prettified JSON.

The main apps I've seen that depend on JSON structure are for hashing, which would also be broken by whitespace / linebreak variances in pretty-printers.

I wold rather pritty printing retain original order. Some JSON schemas I've seen put a version or class type at the top of every object and I wouldn't want that to be ordered elsewhere.

I prefer Nushell[1] for data processing, it's a full fledged shell but I rarely use it as a interactive shell, mostly as a scripting language and some one-offs oneliners. It supports CSV, JSON and other languages by default and provide the data a much nicer common interface

Something like:

    open file.json | select colors | each { ^echo $it.hex }
is much nicer than jq

[1]: https://www.nushell.sh/

I did not know about sort keys. I'm adding that to my alias. Thank you.

Oj supports a config file as well. .oj-config.sen. -help-config will describe it in more detail.

I use json_pp a lot. It's usually installed in the base OS.

json_pp < somefile.json

The only thing I don't like is that it doesn't process commandline arguments. You have to pipe the file in. It is also fairly strict, I've run into a number of malformed JSON files that it rejects but other parsers would accept. Naked TRUE/FALSE statements are one thing it hates that are super common, especially from places like Google.

> JSON can be made prettier by sorting the JSON object members by element keys.

This seems to be a bad idea. The JSON language spec has ORDERED object members. But the order is arbitrary (precisely the one given in the JSON string) and does not have to be the lexicographic.

Sorting the object members by default would introduce problems whenever the order matters to the consumer of the JSON.

> The JSON language spec has ORDERED object members.

False. “An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.” [emphasis added][0]

[0] https://tools.ietf.org/html/rfc8259

Not exactly, unfortunately. The text you cite is in the introduction, which is non-normative. That text talks about the conceptual data model, but that's just to frame the reader's thinking.

The normative text has the "real" answer, and the real answer is that it's basically undefined behavior. It starts by saying "The names within an object SHOULD be unique", and then elaborates:

  An object whose names are all unique is interoperable in the sense that all
  software implementations receiving that object will agree on the name-value
  mappings.  When the names within an object are not unique, the behavior of
  software that receives such an object is unpredictable.  Many implementations
  report the last name/value pair only.  Other implementations report an error
  or fail to parse the object, and some implementations report all of the
  name/value pairs, including duplicates.
  JSON parsing libraries have been observed to differ as to whether or not they
  make the ordering of object members visible to calling software.
  Implementations whose behavior does not depend on member ordering will be
  interoperable in the sense that they will not be affected by these

> The normative text

Unlike some RFCs, which clearly and explicitly delineate normative from informative material, RFC 8259 does not, but the text you cite is on its face informative rather than normative: it does not specify what an implementation MUST or SHOULD do, or what the object model IS, it describes the variety of preexisting implementations (based, correctly or not, on prior specifications) that are in the wild.

This reads like multiple instances of the same name in a json object is undefined behavior and the user will get what they deserve if they try to rely on that behavior.

So basically, "it's a free-for-all, we guarantee absolutely nothing unless you 1) don't generate duplicate keys and 2) the code of your consumer doesn't depend on their ordering".

Yeah. Maybe it was a bit unclear, but when I used the term "undefined behavior" I think I meant exactly what "free-for-all" means in your sentence.

In theory, order shouldn't matter, right? But I recall seeing this trick for adding comments to json:

    "foo": "this is a comment about foo",
    "foo": "actual value of foo that overwrites the comment"
The trick is that the second value value of foo overwrites the first. But, clearly, sorting would would wreak havoc here (if the value was used in the sort key). ;)

Yes. That is valid JSON.

A fun fact about MongoDB is it will actually store that JSON, both duplicate keys. The implication is that whatever MongoDB client you're using, that maps Mongo data to dictionaries/maps, is not capable of representing all valid MongoDB documents. It's important to recognize that Mongo may be storing data your client will not be able to access.

I learned this when the Python client was showing one value for a key, and the Ruby client was showing another value for the same key, and neither client was showing the whole document.

I can't reproduce that behaviour on MongoDB 4.2.x using Python and the Shell client.

It was the case some years ago. I'm not sure if it still is the case.

The spec[1] says:

> An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

"whenever the order matters to the consumer of the JSON" should be never.

More pragmatically, regardless of what the spec says, a ton of JSON tooling assumes the order doesn't matter and relying on it would be a big mistake.

[1]: https://tools.ietf.org/html/rfc7159#section-1

The other spec (ECMA-404) disagrees with RFC 7159 on this point.

That's the great thing about specs -- if you don't like what one says, there's always another to support your position. :)

Well, falling back to the pragmatist position: a lot of JSON tooling assumes the order isn't relevant semantically. Not taking that position will cause you a lot of headaches.

...and a lot of other tooling assumes the order is relevant. Quite a lot of application logic in the real world, too. Which is why browsers preserve iteration order.

JSON is a serialization format. Its components inherently have a serial order. You can't change this any more than you can legislate the value of pi to be 3.

Nitpicking, but JSON was designed as a data-interchange format, it only happens to be used for serialization. I agree that any kind of pipeline component (i.e., not a source, and not a sink) should preserve ordering where possible, for the sake of robustness.

JSON is defined (and has always been) as a serial stream of characters. I think it's fair to call that serialization.

I'm not disagreeing that JSON is used for data serialization, but your last comment just muddies the waters. There are plenty of things that are serial in nature, but have nothing to do with "serialization" in the sense of data marshalling. I guess that the term is kind of unfortunate.

Consider just how how many data formats are ultimately defined as a "serial stream of characters" -- and then consider how few of those you would practically use for marshalling a general data structure.

Well, you, as a programmer, can make the assumption that anyone creating JSON for consumption by your program will obey this requirement. Generally speaking, it's not a good idea to make that assumption, because even people within your team or company may not know of that rule or expect it to be enforced and may break it.

Also, there's the problem of different JSON libraries behaving differently. Such as using unordered hashmaps as an internal data representation for parsed content, making compliance difficult.

Sure, but we were talking about being a consumer. In general, I would argue Postel's law: be careful with what you send, be liberal with what you accept.

Don't mess with the order yourself, but don't assume other tooling will respect it.

Agreed, and if you want to write tools that are conformant with both specs, you really don't have a choice. The RFC pretty much spells out this point: "JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences."

(This kind of text is why I prefer the ECMA document -- it's clearly written to be a normative standard, rather than as a field-report and Request for Comments.)

That ECMA (via ECMA-404) and IETF (via RFC 8259, and before that RFC 7159) have subtly incompatible standards with the same title is annoying; perhaps we need to talk about “IETF JSON” (or “application/json”, as it is expressly the basis of the MIME type) vs “ECMA JSON”.

Here is our opportunity to publish an ANSI or ISO spec, and take the market by storm. :)

Personally, I will adopt the first version of JSON that lets me insert a flipping comment!

> More pragmatically, regardless of what the spec says, a ton of JSON tooling assumes the order doesn't matter and relying on it would be a big mistake.

Agreed. But this does not mean that a tool should break it.

My assumption would be that

should always evaluate to the same as

(for all functions fn)

JavaScript objects are ordered maps (as are PHP’s arrays, iirc): it’s not common, but relying on this is specified behavior in JavaScript: it would be a little surprising for JSON object literals to follow a different rule.

> JavaScript objects are ordered maps (as are PHP’s arrays, iirc): it’s not common, but relying on this is specified behavior in JavaScript: it would be a little surprising for JSON object literals to follow a different rule.

JSON object literals expressly follow a different rule (are expliclty unordered) per the IETF specs and have no specific significance to order at the JSON level though some might conceivably be introduced in ancillary specifications or tooling per the bigECMA spec.

JSON is syntactically a subset of JS but not semantically identical. Unambiguous order would require an array of one-entry objects in JSON.

This isn’t true, the ECMA spec for JSON specifies that order is treated in an implementation-defined fashion.

> An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs. A name is a string. A single colon token follows each name, separating the name from the value. A single comma token separates a value from a following name. The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.


This wording allows a particular implementation to define its own meaning to the order of key-value pairs, or even to produce a multimap.

I checked the spec again:

It's unclear: At one point it says "An object is an unordered set of name/value pairs." while in the actual grammar it is ordered:

        '{' ws '}'
        '{' members '}'
        member ',' members

I recommend reading the ECMA-404 spec instead. It's less ambiguous, and basically says that you can treat the pairs as ordered if you want, as the syntax itself doesn't imbue the order with any meaning:


Grammar is inherently ordered, semantics is different from syntax.

I think you will find the JSON object members are not ordered while JSON array members are ordered. Since the JSON object members are not ordered, changing the order for display purposes does not change the data in any material way.

I have trouble getting excited about tools to prettify JSON as long as the guy controlling the standard has a stubborn attitude about allowing comments or decent storage for long/multiline strings.

At this point I honestly take XML over JSON where I have a choice because of CDATA and comments.

I was interested in this link just for its take on comments, bummed to see that skipped over.


      "colors": [
        { "color": "black",   "hex": "#000", "rgb": [ 0,   0,   0   ] },
        { "color": "red",     "hex": "#f00", "rgb": [ 255, 0,   0   ] },
        { "color": "yellow",  "hex": "#ff0", "rgb": [ 255, 255, 0   ] },
        { "color": "green",   "hex": "#0f0", "rgb": [ 0,   255, 0   ] },
        { "color": "cyan",    "hex": "#0ff", "rgb": [ 0,   255, 255 ] },
        { "color": "blue",    "hex": "#00f", "rgb": [ 0,   0,   255 ] },
        { "color": "magenta", "hex": "#f0f", "rgb": [ 255, 0,   255 ] },
        { "color": "white",   "hex": "#fff", "rgb": [ 255, 255, 255 ] }

The position of commas in the rgb array are triggering me

Numbers to the right make it much more pleasant to my eyes

      "colors": [
        { "color": "black",   "hex": "#000", "rgb": [   0,   0,   0 ] },
        { "color": "red",     "hex": "#f00", "rgb": [ 255,   0,   0 ] },
        { "color": "yellow",  "hex": "#ff0", "rgb": [ 255, 255,   0 ] },
        { "color": "green",   "hex": "#0f0", "rgb": [   0, 255,   0 ] },
        { "color": "cyan",    "hex": "#0ff", "rgb": [   0, 255, 255 ] },
        { "color": "blue",    "hex": "#00f", "rgb": [   0,   0, 255 ] },
        { "color": "magenta", "hex": "#f0f", "rgb": [ 255,   0, 255 ] },
        { "color": "white",   "hex": "#fff", "rgb": [ 255, 255, 255 ] }

> The position of commas in the rgb array are triggering me

You mean:

      "colors": [
        { "color": "black"  , "hex": "#000", "rgb": [   0,   0,   0 ] },
        { "color": "red"    , "hex": "#f00", "rgb": [ 255,   0,   0 ] },
        { "color": "yellow" , "hex": "#ff0", "rgb": [ 255, 255,   0 ] },
        { "color": "green"  , "hex": "#0f0", "rgb": [   0, 255,   0 ] },
        { "color": "cyan"   , "hex": "#0ff", "rgb": [   0, 255, 255 ] },
        { "color": "blue"   , "hex": "#00f", "rgb": [   0,   0, 255 ] },
        { "color": "magenta", "hex": "#f0f", "rgb": [ 255,   0, 255 ] },
        { "color": "white"  , "hex": "#fff", "rgb": [ 255, 255, 255 ] }

You mean:

      "colors": [
        { "color": "black"  , "hex": "#000", "rgb": [   0,   0,   0 ] },
        { "color": "red"    , "hex": "#f00", "rgb": [ 255,   0,   0 ] },
        { "color": "yellow" , "hex": "#ff0", "rgb": [ 255, 255,   0 ] },
        { "color": "green"  , "hex": "#0f0", "rgb": [   0, 255,   0 ] },
        { "color": "cyan"   , "hex": "#0ff", "rgb": [   0, 255, 255 ] },
        { "color": "blue"   , "hex": "#00f", "rgb": [   0,   0, 255 ] },
        { "color": "magenta", "hex": "#f0f", "rgb": [ 255,   0, 255 ] },
        { "color": "white"  , "hex": "#fff", "rgb": [ 255, 255, 255 ] },

The last extra comma is valid javascript, but, (possibly) unfortunately, invalid JSON.

Exactly. I format things like that in my source code. Anything else is too painful to bear. When code by other people who don't care about those things, it's hard to understand :)

And then somebody wants to edit the file your careful alignment gets gradually mangled and there's no tooling to auto-do it and we refactor something globally and it gets worse and ultimately I go shift-alt-F to restore it to autoformatted sanity.

Opinionated Opinion:

That's an incredibly XML-ified version of a color table. I can clearly see the tags now. Can't just do a look up of a color color, instead I would have to iterate over the members or store it in a different data structure.

Why even use JSON? Blech.

Just because the data is structured this way in the example, doesn't mean it's not possible. What's hindering you from defining a class property for each color?

"colors": { "red":{"rgb":"fff"}", ... }

It's absolutely possible, yet the example doesn't do it. That's what the comment was calling out (plus a bit of superfluous "WTF").

What if you want to get the name for a hex color? Your "opinion" is just assumptions on how the data structure will be used, which you want to bake into the transmission format itself.

Then you're no worse off than if you had the original format. You'd have to iterate through the members in either case.

That said, given there are just shy of 17 million possible RBG combinations, and a small fraction those are of named colors, I'd personally continue to optimize for the named color case.

You should definitely optimize your data storage for the kind of operations you're going to do on them. However you're taking about the data transmission format here!

Surely you wouldn't think of sending a piece of JSON over the wire with both name:color and rgb:color, regardless of whether that is what the recipient wants to operate on. You just have to let it unmarshall the data into whatever form it needs.

That's the point GP is making. Use JSON objects for dictionary-like structures instead of directly translating XML to JSON.

But the point is that it's always usage dependant.

You key by color names, that seems obvious. But what if you want to look up a color by hex? Now you have to look through them all.

What if this list is actually an order list of colors for different headings? And they can repeat? Then indexing by index is exactly what you want.

Point is, you don't know the reason behind the data structure.

(That said, I do wish the json had a standardized way to remove redundancy for objects that always follow the same structure. One list of property names, and then everything in arrays.)

> But what if you want to look up a color by hex? Now you have to look through them all.

As you would with the original one as well.

> Then indexing by index is exactly what you want.

A much better use case. That said, the list order is kinda fragile, having an explicit row identifier might be worth adding. Especially since all of the columns are being explicitly called out instead of in their own list.

Perhaps the purpose is to list colors in a specific order, such as to show to the user in a desired order?

And before you argue that dictionaries can still be iterated in order, you better check the sibling threads where people are arguing you shouldn’t rely on that.

I don't get why you're railing against this comment - that's the same data structure used in the article we're discussing. The only difference between the proposal in the grandparent comment and the article is the indentation and spacing.

It being the same structure doesn't make it good.

That said, my comment may a bit misdirected as a result, in which case: Mea Culpa.

An issue has been added to the repo. It is on the list now. Great idea, thanks. https://github.com/ohler55/ojg/issues/35

Personally, aligning columns like that drives me nuts. It's awkward to maintain and makes things harder to actually read most of the time.

How does it make it harder to read? Patterns and repetitions jump out of the page. Indeed, maintaining it takes time and effort, I wish the editor were smart enough, but I only see advantages with regards to reading the code.

visidata shows it as:

  color   | hex  | rgb ║
  red     | #f00 | [3] ║
  black   | #000 | [3] ║
  yellow  | #ff0 | [3] ║
  green   | #0f0 | [3] ║
  cyan    | #0ff | [3] ║
  blue    | #00f | [3] ║
  magenta | #f0f | [3] ║
  white   | #fff | [3] ║
I would be nice to inline the rgb column here.

Ouch! This is borderline unreadable to me. Even more so when there's a value of length, say, 50 for the one of color keys.

What on earth is difficult to read about aligned data?

More importantly, what’s more readable to you?

Ginormous distances between values and keys. It's enough to have just 2-3 long values in 2-3 different rows and the amount of whitespace becomes huge, much larger than the amount of data.

I hadn't seen the SEN format before. I would like the keys unquoted as far as possible, but the commas kept in and otherwise also to keep it 100% Javascript-compatible and usable to cut and paste it into Javascript code.

I would prefer one that aligned the like named keys, if it fits in screen. Makes it dead easy to scan the values.

That said, it is just another pun on the text as art thing. In that it doesn't really scale, and you are going to upset someone by not having a codified tool for automatically doing this. (I don't recall seeing align-regex in any popular tool.)

As a joke I developed a format called "KVIN" which is like GRON but "context-sensitive":

    foo.bar.baz = 10
           .biz = 12 // foo.bar.biz
      ..boz.baz = 31 // foo.boz.baz
etc. It basically combines really brittle context-sensitive grammar production with complete lack of greppability.

Can you explain what you mean by "like named"? The sorting helps a lot but I'm always interested in additional features.

I assume they mean vertical alignment like:

  [ { foo: a   bar: 123.45 }
    { foo: abc bar:   6.7  } ]

Another post said something similar. An issue was added to add the feature. I think it is a good one to add.

The current top comment is what I meant. On my phone, so couldn't put an example easily.

Also the revolution of two-letter command names :/.

I mean, at least before one has proven a tool's ubiquitous use, use a longer name.

jq just got lucky but I don't think it was because of its name ;).

Oj is actually pretty well known as a Ruby JSON parser. The OjG project is in the same family.

Nicely done. First I've seen "SEN".

Treating the colons as white space, as you've done with the commas, will move you one step closer to The Correct Answer™.

I suppose that is possible to remove the colons but it is nice having the extra reminder that the left side of the colon is a key and the right a value. That could easily become lost if a new line is inserted after the key.

SEN is new. After dealing with broken JSON due to commas missing or one at the end of an array and some of the team using Javascript this was a way of sucking in the broken JSON and fixing it.

> After dealing with broken JSON...

Postel tried to warn us.

> ...nice having the extra reminder that the left side of the colon is a key and the right a value.

Totally. IMHO: whitespace, formatting, delimiters are for humans. The parsers can do without. With some exceptions, like your examples of quoting strings to remove ambiguity.

The colons really aren't necessary in most cases. Clojure does away with the colons for instance. And I'm pretty sure GP is referring to some kind of Lisp.

In addition to outputting HTML, it could output JSON that could be rendered to HTML, the terminal, or JSX:


oo cool, wooorm looks useful, thanks for the link

You're not wrong, but wooorm is a person.

Remark and Unified are some well-known projects that wooorm maintains.

https://github.com/remarkjs/remark https://unifiedjs.com/

hahaha (facepalm) I use remark, too. Sorry wooorm!

I recall seeing something that claimed all JSON is syntactically valid JavaScript. If that's correct, shouldn't it be possible to use JS code formatting engines to intelligently format JSON?

Yes, Prettier can format JSON.


Basically yes, though `{` and `}` also start and end expression blocks in JavaScript. So the JS formatter needs to be aware that it is working with a JSON object and not a complete expression.

Valid JSON:

    "key1": "hello",
    "key2": "world"
You could "trick" a JS formatter to format it by wrapping with a fake function, etc. Some minimum valid JS:

    key1: "hello",
    key2: "world",
MongoDB has some JS libraries that use similar tricks to use JS parsers for their shell query format (which is similar to JSON). For example, around line 597: https://unpkg.com/browse/ejson-shell-parser@1.1.1/dist/ejson...

Easy node one-liner:


The human style format reminds me of the default format of js-beautify [1]. We use it to get the "human-style" instead of the "One Line Per Node" for a project where we store json files in a git repo. That way the git diff is pretty easy to read and not bloated. Too bad, not many tools have the "human-style" option.

[1] https://github.com/beautify-web/js-beautify

This beautiful post is missing a last section called "Gron: do away with json altogether and print something actually readable". That would be a good punchline!

Interesting idea. Solves the issue of trying to find data in JSON fairly well. I'm a bit biased but I like using the oj with a JSONPath extraction (-x option) to do something similar but with the power of JSONPath.

> Those two parameters are specified as a float where the whole number part is the edge and the fractional part or the number of 10ths is the maximum depth on a single line.

Is that a convention I'm not aware of? Seems a little obtuse and unnecessary, why not just accept two arguments? One less arbitrary usage detail to remember.

I keep the `-p` (pretty) option as a single option. Not a convention at all. I toyed with 80x3 and 80:3 but ended up with 80.3. You are right though, it probably make sense to support two options as well to avoid the unusual convention. Maybe a `-edge` and `-max-depth` options in addition. Issue created: https://github.com/ohler55/ojg/issues/36

Thanks for the reply, the combined parameter was borne out of your real world usage so I'm glad it sounds like you'll keep it in. Hope I didn't come off cynical, this is a very cool feature for OjG.

All good. You were polite and friendly or at least I read it that way.

Another way to display JSON files in a more readable format is catj (https://github.com/soheilpro/catj)

I tried using a json pretty printer in the lisp pp family of pretty printers (but it didn’t have miser mode.) Maybe we were just formatting things wrong and should have put brackets or breaking rules in different places, but changing that sort of code is hard and the results weren’t particularly great and people preferred the standard JSON.print(_,null,2) method. We switched to this and it was simpler and better. This format is also easier to process with something like grep or sed or awk or editor macros when needed.

It seems a mistake to format JSON as non-JSON text (SEN Format) in the name of "pretty". That will inevitably lead to copy/paste and monkey-see errors.

I think that even you discard the last step ("sen") and stick to plain "human style with colors", this is already much prettier than many languages support.

I like the idea that the incompatible format is off by default.

> the conversion from SEN to JSON and the reverse is lossless

How does SEN deal with numbers-encoded as string? is it something like .4 ? that's a bit confusing

SEN still uses quotes when necessary. For any sting that starts with a number, the sting is quoted. Note in the example the hex colors are in quotes since the `#` character is not valid unquoted token character.

The biggest advantage of the one line format is the ndjson/jsonl file where one line = one record.

Slightly off topic but I sometimes use https://json.pizza (a site I know from HN) to format JSON. It however does not have different ways to format just the standard indentation.

Funny coincidence for cli tool name and the Icelandic meaning: https://en.wiktionary.org/wiki/oj#Interjection

Quite a variety of meanings. Some pretty funny.

Relevant plug: if Pretty Notations interest you, then you should keep an eye on Tree Notation https://treenotation.org/.

The Tree Notation is like the opposite of pretty notation though, no?

The whole idea of pretty notation is automatically inserting non-significant whitespace to make it look nice. Step 2, "one line per node", inserts spaces and newlines. Step 4, "human style" strategically removes some of those so the lines look nice -- the 2nd level dict has lots of content, so it was split across multiple lines... while the 3rd level dict has fewer data, so it all fits on one line.

As opposed to this, Tree Notation is all about single canonical representation. So whitespace is significant, and you can never add or remove it to make output look nicer. You do whatever your schema tells you, and I hope you like many short lines.

The OP is sort of all over the place (their favorite "Pretty JSON" is not actually JSON at all, but SEN, which is definitely not JSON, which has a very discrete specification).

So what they are really talking about is just pretty code. Their favorite examples utilize alignment (tree notation does that better—every tree doc is ismorphic to a spreadsheet and you don't have to align things to the left spine, and their are grid langs that don't do that).

The colors et al are called "secondary notations" and again Tree Notation can't be beat. Adding secondary notations is simple. Here's an example: https://www.youtube.com/watch?v=vn2aJA5ANUc

I think you missed the point here -- the great thing about author's "oj" here is that it takes existing document, with any format, and makes it look nice, in a fully automated way and without changing semantics.

(I am talking about "Human Style with Colors" here -- this is the cool part, and I don't really expect SEN to take off except to display things on the terminal)

That tabular-like alignment was generated automatically -- I can take any existing JSON data source and the program will automatically make it look nice while not requiring any changes in the consumers or producers of the data.

Compare to Tree Notation, for example this code: https://jtree.treenotation.org/designer/#standard%20iris has this block with has a clear structure:

     extends floatCell
     extends floatCell
     extends floatCell
     extends floatCell
     enum virginica versicolor setosa
     highlightScope constant.language
This looks pretty ugly to me. There is clearly the table-like structure, but it is hard to see, because each line is split in 2. If this were JSON/SEN, I could make it look nicer:

    sepalLengthCell: { extends: floatCell }
    sepalWidthCell: { extends: floatCell }
    petalLengthCell: { extends: floatCell }
    petalWidthCell: { extends: floatCell }
    speciesCell: { 
        enum: [virginica, versicolor, setosa]
        highlightScope: constant.language
See how it's all aligned now and how structure comes out? And all at zero effort from my part, it was all computer generated? But with Tree Notation, the above is invalid -- it has different meaning, so the compiler won't accept it. You have to use much uglier vertical method, with all the newlines.

And Pretty JSON can also adapt to display width. Someone with large fonts or small display can request 80 characters wide output, and "speciesCell" will be wrapped. Someone with huge display can request output 250 characters wide, and "speciesCell" will be column-aligned with others. Another thing which is pretty impossible in Table Notation without a lot of work.

> "oj" here is that it takes existing document, with any format

No. It takes JSON only. Which is one format out of 10,000 (though a popular one).

> There is clearly the table-like structure

It is a directed acyclic graph structure.

I won't disagree that perhaps in certain older tools oj may be better in certain situations. But Tree Notation (or more generally 2D/3D languages where positioning is the only thing used for syntax) are the future. The key thing to keep in mind is that without the colors in the last 2 examples OJ is not very useful or pretty. So to make oj nice you need to start adding parsers which are necessary for secondary notations. Then once you start adding secondary notations, 2D/3D langs make that orders of magnitude easier.

You keep saying that "Tree Languages are the future" but I don't see any impressive examples of it. Instead, when I look at the examples, I see a rather clunky notation.

Is that iris grammar document in designer a good example? Because it does not really show why is it better than something like SEN or JSON. The text alignment is awkward, and it does not fit well on the screen. The colors are there, but I am not convinced that having keywords be a different colors outweights the ugly formatting. And I am not sure what "secondary notations" are, but I am guessing they are not present in the document?

The idea of the schemas are good. The world could use more editors with context-aware syntax highlighting and auto-completion. However, the data model you have chosen ("TreeNode=tuple(string, list[TreeNode]") doesn't map well to any programming language. And the text serialization you have chosen -- having a single canonical representation that is using whitespace as the only syntax element -- un-nesseary restricts the data that can be represented.

I think switching to a more conventional data model and text serialization will significantly increase uptake of your project, as well as make it more aesthetically pleasant. Because I seriously doubt anyone can call your existing grammar programs "nice looking".

First just want to say thanks for continuing to take the time to provide feedback. Often I'm blunt but that's just for sake of time and very appreciative of the time you have been taken to reflect on this.

    You keep saying that "Tree Languages are the future"
    but I don't see any impressive examples of it.
Binary notation as an idea was worked through for ~250 years before we had very impressive examples of it in computers, so I'm pretty happy with the examples so far given that I'm ~10 years in, and ~4 years since publication. So just at a high level the short term game common in the rest of tech isn't something I'm interested in playing.

Beyond what's out there, I've seen the results from thousands of experiments in everything from assemblies to compiler compilers to declarative data notations, and from software to hardware and everything in between, so the amount of data I have dwarfs what everyone else has seen. While I totally get that it's not raining buckets now, and in fact people feel barely a drizzle, I have quite a dataset that there are big clouds on the horizon.

    Is that iris grammar document in designer a good example?
It's mildly neat. Here's how we used an early version of that a couple of years ago to publish synthesized data for a GWAS EOPEC study (https://github.com/breckuh/eopegwas/blob/master/mockData/cli...). Tree Notation will become the standard way to describe data schemas and make synthesis a breeze.

    outweights the ugly formatting
The formatting can be described as this: minimal. In fact, the most minimal. If you think minimal is ugly, than we probably won't come to an agreement. Keep in mind though that you can write code to project Tree programs in whatever way you want. I won't disagree with the statement "Tree Notation doesn't work as well with my existing tools as other langs", but if you go back to stuff from 2017 and look at the trajectory, you'll see that Tree notation tooling has improved remarkably and in a couple more years you'll see stuff that just isn't possible with 1-D langs.

    doesn't map well to any programming language
Do you know Lisp? Tree Notation maps to S-Expressions without parens.

    un-nesseary restricts the data that can be represented.
From the paper (2017): "Prediction 1: no structure will be found that cannot serialize to TN."

    your existing grammar programs "nice looking".
To each their own. I think in the long run simplicity lasts. Also, the bigger idea isn't Tree Notation, but the idea of 2-D and 3-D languages https://longbets.org/793/

This site uses only images to show the code but doesn't provide any text alternative for the image. Every image just has a `title` attribute of "Code you could hold in your hand"

Lots of code examples here: https://jtree.treenotation.org/designer/

And the source for that homepage is here: https://github.com/treenotation/treenotation.org

Always open to PR!

a) That's not relevant, b) it's not nearly as new, interesting, or revolutionary as you think it is, and c) please stop spamming links to your website in every second thread here.

I don't think YAML is perfect, but it is better than every one of these pretty formats.

Pretty JSON is inevitably for either logging or config files, and YAML is better at both of those.

I expected a Mona Lisa as json by the end of this article

Just look at that #smile

Now that would be cool! :-)

I maintained a code beautification tool for about a decade. Here is what I learned from code beautification.

1. First notice that there is a world of difference between what users want and what they are willing to achieve. Know this more than anything else. People will ask for all kinds of shit, and.... A wish list is not a fully explored business requirement with known sub-tasks and test cases. A simple ask can become something worthy of a different independent project.

2. Too subjective. Everybody has subtle different personal preferences. In some cases the inability to support some edge case of some language will cause certain users to have an emotional episode. WTF. This is free software providing a convenience that you can easily live without.

3. A lot of work. You have to be very clear about what language, grammar, class of languages, or other various of characters you are willing to support. For example there is HTML then there are about billion trillion different HTML template schemes each with their own syntax and inside that syntax is a wildly different language than the surrounding HTML.

4. Carve out a measurable portion of your life. This is an investment of time you will never get back. Writing a code beautifier is far more work than it sounds. First, you need a parser. If one does not exist for the language you wish to support in the language or format of your tool you will need to write one. Be careful though, because that parser will have to support conventions that are unique to beautification and not necessarily useful elsewhere. In the case of the HTML example above you will need multiple different parsers that can achieve a nesting of parse trees or achieve harmony of a uniform parse tree beloved by all languages. This is achievable, as I have done it, but good luck.

5. Maintenance. There are always new edge cases, new languages, new grammars, new features and your users will want them all. Set hard boundaries.


With the amount of work required you will begin to ask yourself some basic life questions:

Does this tool bring me more money or a better job? Does it bring me prestige AND satisfy a craving for attention? Does it improve my work, as in other real work outside your beautification tool?

In my case, for a while, the tool did allow me access to better jobs with increased pay. It demonstrated I could do things many other developers could not and that I was willing to dedicate some absurd about of effort into something people actually used. But, that will only take your career so far after which you are just spinning your wheels and burning time.

When I got further in my career I realized I wasn't beautifying my code ever. I had no need for the tool I was maintaining and despite continuous maintenance by me the tool started to decay, because the requirements had grown out of control and I was no longer an end user.

If you want something human readable why limit yourself to printing the raw JSON with different indentation rules? Just write a JSON "visualizer" that does something smart with the data.

Your final example is just approaching a JSON -> YAML converter. If your complaint about your chosen human readable serialization format is that it isn't human readable enough, then switch to something more inherently human readable instead of writing tools to temporarily transform it.

`oj` looks like a useful tool. I wish it was Homebrew-installable.

An incremental formatting tweak is not a revolution.

The title was meant to be fun. JSON format is a pretty light topic.

What if the JSON has multiple nested objects?

Works fine. Give it a try.

IMHO, this is what YAML is actually for.

YAML is “a superset of JSON”, yes, but there are two separate meanings to that:

• YAML has alternative syntactic sugar for expressing the same underlying JSON-equivalent semantics (sort of the same as Avro being canonically a binary compact expression of underlying JSON — in both cases, libraries for the codec expect JSON-encodable data structures as #encode input, and produce JSON-encodable data structures as #decode output)

• YAML has its own semantics (like node type annotations, or references) that JSON doesn’t have, such that documents that use these are no longer transposable into JSON.

I love bullet point #1. I hate bullet point #2.

Personally, I wish there was a name for the reduced subset of YAML that is still a “syntactic superset of JSON”, but which has none of the extended semantics of bullet-point #2.

Many systems that “consume YAML” already actually require their documents to be this “strictly-JSONifiable YAML”! Kubernetes, for example: it might seem to expose a YAML manifest API, but actually, internally, it does everything in JSON. All the resources in k8s etcd are stored in canonicalized JSON. The k8s controller just prettifies that JSON to YAML on its way out to you; and uglifies it back to JSON when you send it in. Which means that any YAML features that don’t survive that translation, can’t be used.

IMHO, if YAML hadn’t been designed with any extended semantics, but instead had strictly targeted being a “sugared alternative encoding of JSON”, I think everyone would have switched to sending YAML in place of JSON a long time ago. Browsers would have likely added YAML parsing as well.

But those added semantics are just so much extra work for everybody. Type annotations are source of so many vulnerabilities in programs that were unaware their input could “reach in and do things” through those types; and yet many YAML parser libs don’t have any flag to restrict them from decoding these type annotations (i.e. no way to “defuse the bomb.”) References change the entire way you have to write a YAML parser, disallowing some types of parsing grammar altogether, meaning you might no longer have access to the first-class parsing solution of your language runtime; meaning that for many runtimes, the YAML codec lib for that runtime is much slower — and memory-intensive! — than the JSON codec lib for the same runtime. Etc.

Honestly, if we could all agree on a name for “strict, JSONifiable YAML”, and create libraries that only parse/validate/accept that subset of YAML while rejecting the higher-level semantics, those libs—and that interchange format—would be immediately more popular than YAML. The time for this to happen hasn’t passed! We still have a chance!

In the python world this is StrictYAML: https://hitchdev.com/strictyaml/

JSON is mostly for machines not people. When needed, developers format their json in their code editor of choice or bash.

There are a lot of people who store JSON in NoSQL databases. After fetching a JSON records you generally view the JSON or you do as a developer. That is where the tool is handy as you get get something like a FHIR record on a single page instead of crunched into a single line to expanded over multiple pages.

True, and this is a nice tool to do so.

JSON is mostly for people and not machines in that it is meant to be easily readable and editable by humans. If you wanted a something for machines you would store your data in a compressed/binary format.

Which demonstrates that JSON is pointless.

It's too ugly for humans (too many quotes, too many escape characters, and no comments) and too texty for machines.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact