Oddly enough, BitTorrent's bencode format is a significant subset of JSON but is much easier to parse as a binary format. Bencode only supports integers, byte strings, lists, and dictionaries.
"BitTorrent became popular around year 2005, whereas JSON became popular around year 2010. The usage of JSON in the real world has greatly eclipsed BitTorrent or bencode, so there is a natural bias to view bencode through the lens of JSON even though JSON was adopted later (though not necessarily invented later)."
Whether bencoding's remarkable similarity to netstrings is purely coincidence is left as a question for the reader.
Perhaps there is a "natural bias" to view netstrings through the lens of bencoding even though bencoding came much later.
"Oddly enough, BitTorrent's bencode format is a significant subset of JSON but is much easier to parse as a binary format."
Curious what makes it "odd".
JSON assumed that memory is an unlimited resource.^1 It is like having to read an entire file into memory before processing it. Hence we see revisions such as "line-delimited JSON". Netstrings is even more memory-efficient than line-oriented processing; there is no need to read in an entire line.
1. This makes sense if one agrees that JSON was designed for graphical web browsers, programs notorious for excessive memory usage.
So if you want strings, you need to guess what encoding was used or store the encoding in another field? I don't think that makes it a much nicer format. I do like the ability to store byte strings directly.
You can use two integers, one that represents the entire number including decimals, and one that represents the precision, to know how many decimals are there. For example, you'd represent "123.45" as 12345 and 2. That's often how monetary amounts are stored in databases, to avoid a lot of common floating-point arithmetic pitfalls.
I think it is to optimize arithmetic operations. Significantly less steps with the first method, which only requires adjustment of how many digits are considered to be decimal rather than the rejoin, arithmetic, separate again for your proposal. Plus, wider float.
But that's just floats with extra steps? Floats have two parts in their binary representation: mantissa and exponent (and sign), which correspond exactly to your "entire number" and "precision", only in base 2 instead of 10.
I agree, and I feel like the reason to this is the mere existence of 'jq'.
Without 'jq' working with json in a Unix shell would be a lot more uncomfortable, but not impossible.
What do you think the trade off here with syntax is and what jq was designed for?
A bash script might need to execute it and you want something without lots of funny characters or whitespace as it's going straight in to the terminal.
That necessary terseness makes it the opposite of readable.
The syntax felt ok to me; selectors felt natural and pipes felt very conventionally shell-like. Ut man, the vocabulary, the variety of different operators you'll need to use in this circumstance or that–brutal.
There's some decent cookbooks/recipes but they're still 1/5th as big as they could be.
There was a little glitch in the scraping protection where an errant regular expression briefly blocked all Firefox versions. Which is especially bad because I (the blog author) exclusively use Firefox, so I was blocking myself. The management apologizes for the problem (and generally allows much older Firefox versions than Chrome versions, as people seem to still use them on various platforms).
This should be fixed for you now. To my surprise, some legitimate versions of Firefox report 'rv:109.0' with a Firefox/<ver> version that is not 109 (along with all of the crawlers that have latched on to 'rv:109.0' as something they put in their User-Agent strings along with random old Firefox versions).
and I can confirm that I can now read your site using the native user agent string. (FWIW, I expect to upgrade to FF-128 within the next couple of months.)
sort of agree... but only because you can gron it to remove the madness and then grep/cut/sed/awk the output like a human being.
JSON is just a modern flavor of XML, and in a few years we'll likely mock both of them as if they were the same silly thing. They are functionally equivalent: not human-writable, cumbersome, crufty, with unclear semantics, with unclear semantic usage, and all around ugly.
I unfortunately write a bit of both xml and json as part of my day to day. JSON is significantly easier to read and write as a human. So many xml files use a combination of the file structure, the nodes and the attributes to encode their data - meaning to parse it you need to know the specifics of how it was written. JSON is much simpler and 95% of the time can be thought of as map<string, JsonObject> and it just works.
Yml goes too far in the brevity side - I find the 2 space indent, use of “-“ as a list delimiter, white space sensitivity and confusing behaviour with quoted vs unquoted strings incredibly hard to work with.
> 95% of the time can be thought of as map<string, JsonObject>
But for that case you don't need json. A dockerfile-like text file with lines of the form
STRING other stuff
is trivial to parse in any language and without requiring any library. And it's actually human-editable.
Using json for such trivial stuff is like using a linear algebra library to compute b/a (the quotient of two numbers) by calling linalg.solve([[a]],[b]). Of course it will work. Of course it is more general. But it feels silly.
How does your approach handle keys with spaces? How does it handle multiline strings, escapes and so on?
With your approach, I have to either not use these features, do a lot of manual testing to figure out the corner cases of your implementation, or spend a non-trivial amount of time to implement them because they weren't considered from the start. This hardly seems better than just using JSON.
JSON has no support for standard floating-point numbers, and that's a bigger problem for me. I can easily change my keys to make them sane, but I cannot change the numbers!
It's only not a big deal if your use case doesn't require this feature.
You don't always have full control over the keys. Say I take over your legacy project, and I need to add a way to configure settings related to file paths. Whoops, suddenly I'm either limited in the FS layout I can use, or I have to implement support for this feature (and properly test it, and document it, and...)
> JSON has no support for standard floating-point numbers, and that's a bigger problem for me. I can easily change my keys to make them sane, but I cannot change the numbers!
Cool, now I need to write a custom parser (no matter how simple it is) for your custom format. With no guarantee that it won’t evolve with breaking changes when you realize you forgot to handle some use cases and end up reinventing JSON.
JSON is trivial as it is, don’t try to reinvent the wheel badly.
about time to move from undefined to atleast something... programs should have clear interfaces for in and output. im sure there were sound reasons, but slapping wads of unstructured text to and fro in 2025 sounds almost primordial -_-
Not sure why they had to add additional white space characters... also, single line comment seems problematic in this respect... machine readable JSON is often one line.
Keep in mind, I consider what's machine-readable to mean, what's only readable by a machine. I.e. a magnetic tape is machine-readable, while JSON is human-readable.
Is a picture of my passport machine-readable? Is a PDF machine-readable? How do you classify it? And what happens when ten years in the future once algorithms become more optimized? If an AI can read Shakespeare, and parse its paragraph for verbing a noun, is all human written stuff then machine-readable?
> You're being too strict with language and definitions.
Yeah, because the stricter the definition, the more useful it is. A "thing" has less informational value than a "yellow jacket" and that has less informational value than "white and yellow jacket, with Adidas logo".
> Yeah, because the stricter the definition, the more useful it is
Language is capable of expressing broad and narrow concepts. I don't think it can be said that either is inherently "more useful" - it just depends what you're intending to convey.
Moreover I suspect iinnPP may not have meant you're being too strict as in specific, but strict as in rigid and inflexible - seemingly needing to separate everything with a sharp objective binary line rather than being able to consider the context (something that is "big" in one context may not be in another) and varying degrees.
> A "thing" has less informational value than a "yellow jacket" and that has less informational value than "white and yellow jacket, with Adidas logo"
If the word "flavoste" came to commonly refer to yellow jackets in general, and someone used that word to refer to a yellow jacket, it doesn't make a whole lot of sense to call them wrong because the jacket doesn't have an Adidas logo if that's not what the word flavoste is used to mean.
If you want a term to refer to things that are only readable by machines, such as to exclude JSON/CSV/XML/etc., something like "binary file" or "non-human-readable format" may be closer.
> but strict as in rigid and inflexible - seemingly needing to separate everything with a sharp objective binary line rather than being able to consider the context (something that is "big" in one context may not be in another) and varying degrees.
Rigid in as I like things to be concise and precise as possible. The more rigid the definition, the fewer things a word can be.
> Language is capable of expressing broad and narrow concepts. I don't think it can be said that either is inherently "more useful" - it just depends what you're intending to convey.
Sure, language isn't a perfect conveyor of meaning, that's for sure.
That said, I try to be concise and precise as possible, and vague, duplicated words just tick me off. Why say machine-readable if you can omit it (since everything appears to be machine-readable nowadays)? Just say something like THE text format on Linux or whatever.
At 2 tokens per second? Needing a network connection or sending your data(including confidential's) to the cloud? needing very expensive GPUs? With hallucinations and tremendous energy expense.
I parse things usually at a rate of 2-20 million tokens per second, on a local computer. Never hallucinates and it is always perfect.
Don't get me wrong. I use LLMs a lot, but they are good at what they are good at.
Having had to have displeasure to program parsers for YAML, the ability to start your word with a valid token symbol in an unquoted string (for example :this or ,this or &:hate_you) is so limiting and prevents many optimizations.
No. It means that the parsing code is trivial to make instead of using some kind of LALR, SLR, Earley monstrosities with hundreds of megabytes of grammar tables in memory, just to understand the output of a program like "ls" or "find" or "grep".
JSON is ALSO easy for machines to read. I know because I made several JSON parsers myself. I have also made parsers for computer languages(with things like PEG) and natural languages and there is a world of difference.
JSON is machine readable first; and human readable for convenience. Its primary purpose is for machine to machine communication, but it does allow human intervention. People have used that as a signal to start using it for configuration files but that is an anti pattern in my opinion.
If you want something like JSON that is designed for humans take a look at YAML
“I don’t want to learn how to use it properly therefore it’s like [bad thing I clearly have no experience of]”
JSON is not fit for purpose as a configuration format. Use YAML or don’t, use properties if you like. JSON is first and foremost for m2m- it’s even in the name “Object Notation”
I wrote a more detailed comparison on: https://www.nayuki.io/page/bittorrent-bencode-format-tools
reply