Hacker News new | past | comments | ask | show | jobs | submit login
JSON has become today's machine-readable output format on Unix (utcc.utoronto.ca)
46 points by ingve 1 day ago | hide | past | favorite | 68 comments





Oddly enough, BitTorrent's bencode format is a significant subset of JSON but is much easier to parse as a binary format. Bencode only supports integers, byte strings, lists, and dictionaries.

I wrote a more detailed comparison on: https://www.nayuki.io/page/bittorrent-bencode-format-tools


"BitTorrent became popular around year 2005, whereas JSON became popular around year 2010. The usage of JSON in the real world has greatly eclipsed BitTorrent or bencode, so there is a natural bias to view bencode through the lens of JSON even though JSON was adopted later (though not necessarily invented later)."

Netstrings was proposed in 1997.

https://cr.yp.to/proto/netstrings.txt

Whether bencoding's remarkable similarity to netstrings is purely coincidence is left as a question for the reader.

Perhaps there is a "natural bias" to view netstrings through the lens of bencoding even though bencoding came much later.

"Oddly enough, BitTorrent's bencode format is a significant subset of JSON but is much easier to parse as a binary format."

Curious what makes it "odd".

JSON assumed that memory is an unlimited resource.^1 It is like having to read an entire file into memory before processing it. Hence we see revisions such as "line-delimited JSON". Netstrings is even more memory-efficient than line-oriented processing; there is no need to read in an entire line.

1. This makes sense if one agrees that JSON was designed for graphical web browsers, programs notorious for excessive memory usage.


tnetstrings[1] was a later refinement of netstrings.

I liked it, but alas it never seemed to really take off.

[1]: https://tnetstrings.info/


So if you want strings, you need to guess what encoding was used or store the encoding in another field? I don't think that makes it a much nicer format. I do like the ability to store byte strings directly.

Oh, I didn't know about Bencode. It looks interesting. Thank you for sharing!

I really like bencode. The only thing I miss is floats.

You can use two integers, one that represents the entire number including decimals, and one that represents the precision, to know how many decimals are there. For example, you'd represent "123.45" as 12345 and 2. That's often how monetary amounts are stored in databases, to avoid a lot of common floating-point arithmetic pitfalls.

Or just '123' & '45'?

I think it is to optimize arithmetic operations. Significantly less steps with the first method, which only requires adjustment of how many digits are considered to be decimal rather than the rejoin, arithmetic, separate again for your proposal. Plus, wider float.

But then you can't tell the difference between 0.12 and 0.00012.

Unless you're suggesting to use the strings "0" and "00012", at which point you could just use a byte string with the utf8 encoding of the value.


But that's just floats with extra steps? Floats have two parts in their binary representation: mantissa and exponent (and sign), which correspond exactly to your "entire number" and "precision", only in base 2 instead of 10.

the difference being that with integers you never end up with rounding errors when doing addition subtraction or multiplication (only division)

No love for XML…

I agree, and I feel like the reason to this is the mere existence of 'jq'. Without 'jq' working with json in a Unix shell would be a lot more uncomfortable, but not impossible.

Also Python. Python handles json really well, at least compared to say bash.

I want to say server side JavaScript plays a role in this too, but I’m not a JS developer.


Though jq syntax leaves a lot to be desired

What do you think the trade off here with syntax is and what jq was designed for?

A bash script might need to execute it and you want something without lots of funny characters or whitespace as it's going straight in to the terminal.

That necessary terseness makes it the opposite of readable.


The syntax felt ok to me; selectors felt natural and pipes felt very conventionally shell-like. Ut man, the vocabulary, the variety of different operators you'll need to use in this circumstance or that–brutal.

There's some decent cookbooks/recipes but they're still 1/5th as big as they could be.


Oh dear - bitten by his "scraping" protection.

As his site no longer allows me to view with Firefox-115 ESR, it being deemed as "too old" despite still being a supported release.


There was a little glitch in the scraping protection where an errant regular expression briefly blocked all Firefox versions. Which is especially bad because I (the blog author) exclusively use Firefox, so I was blocking myself. The management apologizes for the problem (and generally allows much older Firefox versions than Chrome versions, as people seem to still use them on various platforms).

Hmm - it still blocks me when my browser reports its native string of:

    Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
however if I override it to the following, the site lets be through:

    Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0

This should be fixed for you now. To my surprise, some legitimate versions of Firefox report 'rv:109.0' with a Firefox/<ver> version that is not 109 (along with all of the crawlers that have latched on to 'rv:109.0' as something they put in their User-Agent strings along with random old Firefox versions).

It looks like those versions will continue for some time, at least until Sep 2025 on Win 7-8.1 and macOS 10.12-10.14, given the recent announcement:

    https://www.ghacks.net/2025/02/19/mozilla-extends-firefox-support-for-windows-7-to-september-2025/
and I can confirm that I can now read your site using the native user agent string. (FWIW, I expect to upgrade to FF-128 within the next couple of months.)

I hit reload on FF and it worked...? I don't know why I got flagged once and not subsequent times.

sort of agree... but only because you can gron it to remove the madness and then grep/cut/sed/awk the output like a human being.

JSON is just a modern flavor of XML, and in a few years we'll likely mock both of them as if they were the same silly thing. They are functionally equivalent: not human-writable, cumbersome, crufty, with unclear semantics, with unclear semantic usage, and all around ugly.


I unfortunately write a bit of both xml and json as part of my day to day. JSON is significantly easier to read and write as a human. So many xml files use a combination of the file structure, the nodes and the attributes to encode their data - meaning to parse it you need to know the specifics of how it was written. JSON is much simpler and 95% of the time can be thought of as map<string, JsonObject> and it just works.

Yml goes too far in the brevity side - I find the 2 space indent, use of “-“ as a list delimiter, white space sensitivity and confusing behaviour with quoted vs unquoted strings incredibly hard to work with.


> 95% of the time can be thought of as map<string, JsonObject>

But for that case you don't need json. A dockerfile-like text file with lines of the form

    STRING other stuff
is trivial to parse in any language and without requiring any library. And it's actually human-editable.

Using json for such trivial stuff is like using a linear algebra library to compute b/a (the quotient of two numbers) by calling linalg.solve([[a]],[b]). Of course it will work. Of course it is more general. But it feels silly.


How does your approach handle keys with spaces? How does it handle multiline strings, escapes and so on?

With your approach, I have to either not use these features, do a lot of manual testing to figure out the corner cases of your implementation, or spend a non-trivial amount of time to implement them because they weren't considered from the start. This hardly seems better than just using JSON.


> How does your approach handle keys with spaces?

It doesn't, and that's OK; not a big deal.

JSON has no support for standard floating-point numbers, and that's a bigger problem for me. I can easily change my keys to make them sane, but I cannot change the numbers!


> It doesn't, and that's OK; not a big deal.

It's only not a big deal if your use case doesn't require this feature.

You don't always have full control over the keys. Say I take over your legacy project, and I need to add a way to configure settings related to file paths. Whoops, suddenly I'm either limited in the FS layout I can use, or I have to implement support for this feature (and properly test it, and document it, and...)

> JSON has no support for standard floating-point numbers, and that's a bigger problem for me. I can easily change my keys to make them sane, but I cannot change the numbers!

You can always encode them as strings, no?


> But for that case you don't need json. A dockerfile-like text file with lines of the form

I don't think that would solve any of your listed problems with json. The key thing is that it's represented by:

    struct JsonObject {
        Dictionary<string, JsonObject> entries;
    }
and that's the entire data structure. Your dockerfile example handles the absolute most basic of cases, e.g.

    {
        "foo": ["bar", "baz"]
    }
would be

    FOO ["bar", "baz"]
But when you nest objects .e.g.

    {
        "foo": {
            "bar": "baz"
        }
    }
The dockerfile-style example is

    FOO { BAR baz }
or

    FOO
        BAR baz # Now we're whitespace sensitive
Or maybe we're:

    Foo # What do we put here?
    Foo.bar baz
> Using json for such trivial stuff is like using a linear algebra library

I disagree. I think using xml for that is like using a linear algebra library, as you can have:

    <Foo>Bar</Foo>
    <Foo value="Bar"></Foo>
    <Foo><Bar>Baz</Bar></Foo>
    <Foo bar="Baz"></Foo>
    <Foo><Bar value="Baz"></Bar></Foo>
and end up with the same thing. But with json, it's just:

    { "Foo": "Bar"} 
and

    { 
        "Foo": {
            "Bar: "Baz"
        }
    }
E: Formatting

Cool, now I need to write a custom parser (no matter how simple it is) for your custom format. With no guarantee that it won’t evolve with breaking changes when you realize you forgot to handle some use cases and end up reinventing JSON.

JSON is trivial as it is, don’t try to reinvent the wheel badly.


but it's easier to write this "parser" than to call any json library

The article suggests that Gnu Awk might soon improve its understanding of JSON.

Can somebody please shed some light at this? Will gawk get JSON support. Or is is already there and I just need to get a recent version?


GNU awk already does. [0] Sorta. The "non-essential" stuff, like xml, json, redis gets put into gawkextlib. Usually packaged for your platform.

[0] https://www.gnu.org/software/gawk/manual/html_node/gawkextli...


Thanks. I wasn't aware about this library.

This is part of the reason I love working with powershell. I like having things already in a json-like format by default.

JSON solves enough problems and is a simple enough format to become ubiquitous. My only beef is with the serialization/deserialization costs.

That's why I've made a 1:1 binary format

https://github.com/kstenerud/bonjson


about time to move from undefined to atleast something... programs should have clear interfaces for in and output. im sure there were sound reasons, but slapping wads of unstructured text to and fro in 2025 sounds almost primordial -_-

I like that more and more CLI tools are implementing a json-output mode. Like `ip -j a`

I prefer JSON5 myself but not well support unfortunately

https://json5.org/


Not sure why they had to add additional white space characters... also, single line comment seems problematic in this respect... machine readable JSON is often one line.


JSON is a human readable format. Machine readable means only a specialized program can be used to make human understand it.

Machine readable just means a machine can read it. Whether humans can read it as well is irrelevant to the definition.

(And there are pretty much no formats that a human hasn't learned to read, up to and including binary)


So every text is machine-readable? Because even English can now be readable by a machine via LLM. Then why use machine-readable? Just say text.

It generally means something with a well specified format which can be processed by a parser.

Technically, LLM can be a parser, if you tell it to output its findings in another data format, as:

     A parser is a software component that takes input data (typically text) and builds a data structure

Parsers don't hallucinate.

That is a very good way of shooting yourself in the foot

Now every API call is a "call to an LLM"? How much will that cost? Oh and how are you calling the LLM API in the first place?


Keep in mind, I consider what's machine-readable to mean, what's only readable by a machine. I.e. a magnetic tape is machine-readable, while JSON is human-readable.

> I consider what's machine-readable to mean [...]

I hate to break it to you but the world has a very different and well agreed-upon definition of what "machine readable" means.

You're going to get nowhere if you continue to argue that your definition is the correct one. That ship sailed long ago.


Ok. But what is machine-readable, then?

Is a picture of my passport machine-readable? Is a PDF machine-readable? How do you classify it? And what happens when ten years in the future once algorithms become more optimized? If an AI can read Shakespeare, and parse its paragraph for verbing a noun, is all human written stuff then machine-readable?


The picture of your passport is machine-readable to any machine that can read it. That is not all machines.

The significance of JSON, and the submission itself, seems to be in the ubiquity of JSON making it more machine-readable than other formats.

You're being too strict with language and definitions.


> You're being too strict with language and definitions.

Yeah, because the stricter the definition, the more useful it is. A "thing" has less informational value than a "yellow jacket" and that has less informational value than "white and yellow jacket, with Adidas logo".


> Yeah, because the stricter the definition, the more useful it is

Language is capable of expressing broad and narrow concepts. I don't think it can be said that either is inherently "more useful" - it just depends what you're intending to convey.

Moreover I suspect iinnPP may not have meant you're being too strict as in specific, but strict as in rigid and inflexible - seemingly needing to separate everything with a sharp objective binary line rather than being able to consider the context (something that is "big" in one context may not be in another) and varying degrees.

> A "thing" has less informational value than a "yellow jacket" and that has less informational value than "white and yellow jacket, with Adidas logo"

If the word "flavoste" came to commonly refer to yellow jackets in general, and someone used that word to refer to a yellow jacket, it doesn't make a whole lot of sense to call them wrong because the jacket doesn't have an Adidas logo if that's not what the word flavoste is used to mean.

If you want a term to refer to things that are only readable by machines, such as to exclude JSON/CSV/XML/etc., something like "binary file" or "non-human-readable format" may be closer.


> but strict as in rigid and inflexible - seemingly needing to separate everything with a sharp objective binary line rather than being able to consider the context (something that is "big" in one context may not be in another) and varying degrees.

Rigid in as I like things to be concise and precise as possible. The more rigid the definition, the fewer things a word can be.

> Language is capable of expressing broad and narrow concepts. I don't think it can be said that either is inherently "more useful" - it just depends what you're intending to convey.

Sure, language isn't a perfect conveyor of meaning, that's for sure.

That said, I try to be concise and precise as possible, and vague, duplicated words just tick me off. Why say machine-readable if you can omit it (since everything appears to be machine-readable nowadays)? Just say something like THE text format on Linux or whatever.


That's a minority definition. Few of the rest of us would say "JSON isn't machine-readable".

At 2 tokens per second? Needing a network connection or sending your data(including confidential's) to the cloud? needing very expensive GPUs? With hallucinations and tremendous energy expense.

I parse things usually at a rate of 2-20 million tokens per second, on a local computer. Never hallucinates and it is always perfect.

Don't get me wrong. I use LLMs a lot, but they are good at what they are good at.


"Trivially parseable by machines" is not mutually exclusive with "Trivially parseable by humans". JSON is both.

It's both. "Readable" means "possible to read it". On the human readability side, though, the obligation to enquote string keys irks me a lot.

Having had to have displeasure to program parsers for YAML, the ability to start your word with a valid token symbol in an unquoted string (for example :this or ,this or &:hate_you) is so limiting and prevents many optimizations.

No. It means that the parsing code is trivial to make instead of using some kind of LALR, SLR, Earley monstrosities with hundreds of megabytes of grammar tables in memory, just to understand the output of a program like "ls" or "find" or "grep".

JSON is ALSO easy for machines to read. I know because I made several JSON parsers myself. I have also made parsers for computer languages(with things like PEG) and natural languages and there is a world of difference.


JSON is machine readable first; and human readable for convenience. Its primary purpose is for machine to machine communication, but it does allow human intervention. People have used that as a signal to start using it for configuration files but that is an anti pattern in my opinion.

If you want something like JSON that is designed for humans take a look at YAML


I will pick JSON over YAML every single time. Not because JSON is so good, it's just YAML is so much more cancerous.

Can you explain why is Yaml cancerous (Genuine question)

I have always preferred (even without thinking) to use configuration files as Yaml and kept Json for interprocess communication


Many people explained it better than I ever could, so let me defer to them. TL;DR: YAML is an overcomplicated spec with too many footguns.

- https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...

- https://www.arp242.net/yaml-config.html

- https://noyaml.com/ (couldn't not mention this famous piece)


“I don’t want to learn how to use it properly therefore it’s like [bad thing I clearly have no experience of]”

JSON is not fit for purpose as a configuration format. Use YAML or don’t, use properties if you like. JSON is first and foremost for m2m- it’s even in the name “Object Notation”




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: