I’m a big fan of tools producing json output, but I couldn’t help but chuckle a bit at:
> This allows easier parsing in scripts by using JSON parsing tools like jq, jello, jp, etc. without arcane awk, sed, cut, tr, reverse, etc. incantations.
While I love jq, I find its syntax to be utterly baroque. Far more than awk and sed (let alone simple tools like cut and tr). To the point where I end up looking at the jq man page to remind myself how to do even seemingly simple things at times. And often jq is just the beginning of a pipeline to get hierarchical data into columnar format so I can hit it with awk and sed.
But maybe this is all about familiarity; I have years of experience cobbling together shell tools and reach for jq comparatively less. I suspect the author may be the opposite.
It's not you; parsing line-based text is fundamentally simpler than parsing structured data like JSON, especially nested JSON. It's just a lot easier to reason about, and a lot easier to split up in smaller chunks, too. Who hasn't ended up with something like:
cmd | sed | grep | sed | grep
Each bit gets you closer what you want in small easy-to-understand steps, making it fairly easy to develop. And for things you only do once or twice, who cares as long as it works, right? I find that's a lot harder to do with jq. I feel this is a problem waiting for a better solution (and gron isn't it).
I probably have deeper knowledge of this kind of thing than most, and in most cases I found that doing a "crude" low-cognitive approach is better than spending brain cycles and effort on a "clever" high-cognitive approach.
I've definitely waxed & waned a number of times through jq familiarity. And sometimes it is a bit of a stress.
But this feels like a vastly unfair basis for comparison. None of the old tools do 1/1000th what jq does. Arbitrary structure data in, a whole pipe of simple or complex commands, structured data out. Jq is complex maybe sure yeah, but it replaces some really gnarly inefficient awful zsh scripts I'd have to cobble together with something that, in the end, is sensible & clear & direct. That other people would probably also write the same way I did.
Living in the structures data world is so much better than arbitrary text-line oriented systems. They may individually be better, but the utter lack of consistency in the old world meant reinventing too much, each time you want to do a task. Jq effectively is a mini shell, full of shell like concepts (operators, variables, pipes,...), and on those grounds, it's not like zsh has any shortage of capabilities most people don't know about & use well. Most of our shell scripting is beating on the same 10% and having to go hunt around through the other 90% on demand too.
Completely agree. I use jq multiple times a day now, purely because GPT-4 or Claude Opus mean I never have to think about how to express anything in it.
Speaking as an inexperienced, infrequent jq user, using ChatGPT to create a jq version of my above PowerShell example took more time than writing a jq version by hand with the help of jq documentation (and to have any confidence that the result is actually correct in the general case, rather than merely producing expected results for my example, I'd need to inspect OpenGPT's final command and RTFM anyway):
Oof, that "don't" in the code comment within the single quotes was causing your single quoted command to end, causing the command to be parsed incorrectly in your shell. I wonder if that was the only issue originally because it actually looks pretty good to me. Regardless, I can tell you with confidence it would take much less time for me to construct a working version based on the first input than starting from RTFM, despite already having some jq experience myself. Was this 3.5 or 4?
I've been using sed, awk, and friends for I find myself reaching for 30+ years, refer to jq documentation 90% of the time I use it (vs. <<10% when using traditional UNIX tools), and still appreciate JSON output as an option.
Incidentally, PowerShell has nice built-in facilities for working with JSON:
I would argue it's not just about familiarity. I had the same reaction to that quote and feel the same as you do, and yet I have been using jq before I first learned and started using awk.
For a long time I had been familiar with jq, but every time I had to use it for any non-trivial task after not touching it for a while I had to go to the docs and really scratch my head. Meanwhile, I had no idea about awk, and for ages I would just see those awk one-liners on Stack Overflow as weird arcane alternatives to "simply" piping sed, grep and tr over bash.
However, one day I finally had the time and the reason to have a look at AWK Programming for a specific use that I needed, and it immediately clicked. Awk makes sense and now I can easily fall back to it every time, even if I haven't used it for ages. At most, all I need to look up is the order of parameters in the gawk regex functions.
And yet, time and time again I am baffled by jq, however much I have used it in the past. Whenever I need to parse a JSON I'm happy enough if I can use jq to get it to a good enough halfway point so that I can pipe it to awk to do the actual work.
On linux systems, I was thinking that it would be great if there was a dbus wrapper that outputs all data as json by default.
As many things are exposed on dbus (like systemd units, timers, services, etc), and busctl [0] has a --json output.
We could have a dbus-services, or dbus-timers, etc to retrieve that on dbus and emit json.
If most things were on dbus, one could have access to almost everything on the system as json directly, while talking to the underlying implementation of most cli by default.
I suppose something similar could be done on MacOS with launchctl and others (I don't know much about MacOS, though).
I prefer jsonl from command line tools, just cause it’s nicer to log, and parse using standard tools. Plus it’s like a two second function if you’re doing it in a real programming language. I think it’s the best of both worlds.
They say to use JSON lines for streaming, but even if you’re not it’s still very nice to be able to do my_command | grep xyz and still have it be parseable
Given that I prefer using ubiquitous, relatively small and simple utilities such as sed and tr, I use a simple custom utility I wrote myself using flex to make JSON output line-delimited so it's (a) human readable and (b) can be processed with traditional UNIX utilities such as sed and tr. Although it's quite popular jq, also originally written using flex (and yacc), is (c) not as ubiquitous as UNIX utilties such as sed and tr and (d) too large and complicated for me.
I have tried many alternatives, even including large Go binaries such as gron. None of them appealed to me. I use a comparatively simple C program that is 45.9 KiB when statically-compiled. Of the alternatives I have tried, this one was the most interesting:
...obviously, in some cases a graph/hierarchy is very useful to be able to traverse, but emitting records and "coalescing" the path into a record-field makes simple operations simple.
Don't make me go like `.cpu.bank[0].chip[3].temp` or whatever... just give me "all the CPU's" and then give me a good URI(!!) to describe it or search for it later.
Rationale: coalescing and traversing weird object paths is tough to do dynamically via current shell DSLs, and the URI concept as an ID excellent application.
This is really the only reason, because JSON as a format is otherwise terrible. No good way to represent 64 bit numbers, very limited types and no comments.
Parsing is hard and this definitely beats having to write ad-hoc parsers for stuff. I worry we're essentially reinventing a less verbose form of SQL though.
I love that book. Parsing Techniques is even more detailed. It's still pretty hard. Easy would be a parse(grammar, input) function in the standard library. Earley algorithm is perfect for that. I wonder why no one's thought of it.
> The file systems of Unix machines all have the same general structure. [...] Note the obsessive use of abbreviations and avoidance of capital letters; this is a system invented by people to whom repetitive stress disorder is what black lung is to miners. Long names get worn down to three-letter nubbins, like stones smoothed by a river.
-- Neal Stephenson, In the Beginning Was the Command Line (Ch. 14), 1999
But why? Just for people to gron it back into usable form?
Unix tools work best with line-based data. Using some "structured" monstrosity like xml or json forces all other tools to deal with this particular format, thus breaking the orthogonality between programs.
okay, run it once with head (assuming we have a --dry option...) ah, that's the column. okay cut -d"," -f0, ah whoops it's starting at 1. ah, damn, there's a weird comma in the name/quote. oh weird, that one's null, ah heck.
schemas are cool. JSON extends and builds on the UNIX idea of having things pipe and plumb well together.
But why JSON and not CSV [1]? Most of what the article suggests, like formatting the output as lines and flattening, is how CSV works. CSV is much easier to parse than JSON (use split(",") or the equivalent). A complete record (JSON object) of CSV data can be parsed in a single line, unlike typical JSON. The line-based nature of CSV makes it far more fault tolerant to broken streams/truncation and more in-line with standard Unix conventions.
Plain text is more easy to reason about, because we are used to process text. A good textual output, that is records delimited by spaces, tabs or a delimiter, to me is all it's needed, for most applications.
An object structure it's much more complex to use. For example an output that is a set of records can be easily imported in an Excel sheet, in an SQL database, processed line by line, without issues. Processing JSON is not straight forward, not all programs support JSON.
Finally JSON can't be processed as a stream, meaning that tools like head, tail, etc. doesn't work on JSON, you have to read it all in memory, or use JSON lines, that is not a standard format, that not all parsers support natively, etc.
JSON is good if for integrating the program inside other programs (as a subprocess), so having an option to input/output JSON in a program is useful, but to me it's not as useful for interactive shell usage. I prefer to use UNIX tools such as grep, cut, head, tail, etc.
But as the article points out, the advice to use unbuffered JSON lines for commands that are line oriented is well given. Not doing that can really make life sad.
you still need to know the schema of the JSON which could also use a dry run as well. not really sure how just because it's JSON solves that in your mind
But that output is wildly more variant between applications, especially for cases involving escape sequences, whitespace handling, and so on. All of these are specified in JSON
I can make whatever I want in a [{},{},{}] and it be valid JSON. If it's the first time you've used my thing, you'll have to somehow look up how the JSON is structured. Whether that's from howtousething.com, man thing, or thing --help, you'll still need to find out what thing does. it doesn't matter if it's your thing or my thing, but some how, thing needs to be able to tell people what to do. there is no universal thing that thing outputs. otherwise, nobody would need yours or my thing, but someone else's thing already does it.
Newline-delimited JSON is so much more useful to me than weird Unix line-formatted output that I have to parse with pattern matching or regular expressions.
It's basically a line-based format that can represent all of the JSON types and includes support for nested data structures where necessary. What's not to like?
If this had been in shells and cmdline tools since the beginning it would have saved so much work, and the security problems could have been dealt with by an eval that only set variables, adding a prefix/scope to variables, and so on.
Unfortunately it's too late for this and today you'll be using a pipeline to make the json output shell friendly or use some substring hacks that probably work most of the time.
That's great for key=value data, but more complex data structures don't work so well in that format, JSON does. "Why would you need to represent data as a complex data structure?" Sometimes attributes are owned by a specific entity, and that entity might own multiple attributes. It might even own other sub-entities. JSON represents that. Key=value does not.
JSON is literally key=value, just nested. Which you can do with shell variables.
The question was "What's not to like [about JSON output from cmdline tools]?" and the answer is that it's cumbersome to read in a shell and all but requires another pipeline stage.
I didn't even recommend shell variable output and made it clear this isn't today a reasonable solution so I'm not sure where this hostility in the replies comes from, but I assume from recognition that it's a more practical solution to reading data within a shell but not wanting that to be so.
The nature of being nested, and also containing structures like lists, maps, etc. All of which makes it more complicated than key=value.
> The question was "What's not to like [about JSON output from cmdline tools]?" and the answer is that it's cumbersome to read in a shell and all but requires another pipeline stage.
It depends on the intended use for your shell program. If you intend the CLI tool to be used in CI pipelines (eg. your CLI tool's output is being read by an automated process on a computer) and the data it outputs is more complicated than a simple key=value, JSON is great for that. Your CI program can pipe to jq. You as a human can pipe to jq, though I agree it's somewhat less desirable. Though just piping to jq without any arguments pretty prints it for you which also makes it fairly readable for humans.
> so I'm not sure where this hostility in the replies comes from
You're reading into hostility where there isn't any.
> The nature of being nested, and also containing structures like lists, maps, etc. All of which makes it more complicated than key=value.
These are javascript objects, which are key-value. A list array is just keyed by a number instead of a string. They're functionally exactly the same as name=value except JSON is parsed depth-first whereas shell variables are breadth-first parsing (which is way better from shells).
Do you have an example of a CLI tool - intended for human use - that has output so complicated it can't be easily mapped to name=value? I don't think there is one, and it's certainly not common.
> You're reading into hostility where there isn't any.
I think "it seems you're determined not to use jq" is pretty hostile since I made no intimation of that at all.
> I think "it seems you're determined not to use jq" is pretty hostile since I made no intimation of that at all.
Well, I didn't say that, so I don't know what that other person's feelings or intentions are, to be fair. I personally have no feeling of hostility towards you just because we (apparently) disagree on the usefulness of JSON to represent complex data types, or at least disagree on how often human-usable CLI tools output complex data. But to answer:
> Do you have an example of a CLI tool - intended for human use - that has output so complicated it can't be easily mapped to name=value? I don't think there is one, and it's certainly not common.
kubectl. Which to be fair defaults to output to a table-like format. Though it gets all that data in the table from JSON for you.
smartctl is another one, which also defaults to table format.
To be honest, I could go on and on if the only qualifier is a CLI tool that emits complex data, not suited for just key=value.
> These are javascript objects, which are key-value. A list array is just keyed by a number instead of a string. They're functionally exactly the same as name=value except JSON is parsed depth-first whereas shell variables are breadth-first parsing (which is way better from shells).
As mentioned before, just because you can compare JSON to key=value, does not mean it's as simple as key=value. It's a data serialization language that builds well on top of simple key=value formats. You're welcome to enjoy other data serialization languages, like yaml, HCL, or PKL. But none of those are simple key=value formats either. They built the ability to represent more complex structures on top of that.
A data serialization language allows the end-user to specify how they would like to use that data, while allowing them to use standard parsing tools like jq. Cramming complex data into a value string in a key=value format gives end users the same allowance to use that data however they want, while also giving them a chore to handle parsing it in custom ways tailored to just your CLI application, likely in ways that would seem far more brittle than parsing a defined language with well defined constraints. That doesn't sound like great UX to me. But to be fair to you, you're not saying that you wish to use key=value to represent complex data. Rather, you're saying there's a general lack of complex data to be found, to which I also disagree with.
> But none of those are simple key=value formats either.
What is the difference between:
{ object: { name: value }}
{ object: "{ name: value }"}
object="name=value"
There's zero difference between any of them except how you parse and process the data.
> kubectl. Which to be fair defaults to output to a table-like format.
With line-based shell-variable output you have a line of variables and you have blocks of lines separated by an empty line (like an HTTP 1 header).
This can easily map to any table, two dimensions, or two levels of data structure without even quoting subvariables like in the example above. So, no, kubectl is not an example at least not how you've described it.
> What is the difference between .. There's zero difference between any of them except how you parse and process the data.
Answered in the previous message... "A data serialization language allows the end-user to specify how they would like to use that data, while allowing them to use standard parsing tools like jq. Cramming complex data into a value string in a key=value format gives end users the same allowance to use that data however they want, while also giving them a chore to handle parsing it in custom ways tailored to just your CLI application, likely in ways that would seem far more brittle than parsing a defined language with well defined constraints."
> With line-based shell-variable output you have a line of variables and you have blocks of lines separated by an empty line (like an HTTP 1 header)...
I would not choose to write application logic that foregoes defined data serialization languages for parsing barely structured strings the way you seem to prefer. But you go about it the way you prefer, I guess. This whole discussion leaves a lot of room for personal opinions. I think we both agree that the other person's opinion here is subjectively the more annoying route to deal with. But that's the way life is sometimes.
That's not your original request though, to use line-based data. It seems you're determined not to use jq but if anything, json output | jq is more the unix way than piping everything through shell vars.
> That's not your original request though, to use line-based data.
It wasn't my request and OP (not me) said "line-based data" is best. The comment I replied to said "Newline-delimited JSON ... a line-based format".
If the only objection you have is "but that's line-based!" then you're in a completely different conversation.
> if anything, json output | jq is more the unix way than piping everything through shell vars.
The unix way is line-based. The comment I replied to is talking about line-based output. Line-based output is the only structure for data universal to unix cmdline tools - even tab/space isn't universal; sending structured non-line-delimited data to a program to unpack it is the least unix-like way to do it.
Also there's no pipe in the shell-variable output scheme I described, whereas "json | jq" is a shell pipeline.
And, the author isn’t suggesting only having JSON output, but adding it as an option for those of use that would make use of it. The plain text should remain as well (and has to or many, many things would break).
On a separate point, I find the JSON much easier to reason about. The wall of text output doesn’t work for my brain - I just can’t see it all. Structuring/nesting with clear delineations makes it far easier for me to grok.
Line-oriented formats, like most traditional Unix-style tools, are for human consumption. JSON is bad at that, thus gron.
On the other hand, structured output formats, like JSON, make it easier to consume with other programs. Standard formats have readily-available and commonly used libraries, whereas line parsing tends to be one-off for every program. Whether JSON is the best format for this is certainly debatable, but it is quite ubiquitous, which is a huge advantage. I doubt many folks would propose XML as a general recommendation.
Tools should have both options on their path to maturity -- both human-consumable and computer-consumable output format options.
If you're using UNIX tools to parse it, it sucks, but generally if I'm reading the output of a command, and the command is more than one word, I'm doing the whole thing in Python.
That's real programming, and for that I want type checkers and debuggers and modern syntax and all that. And the performance is often faster because you're not spinning up subprocesses for each command.
Unix tools on newline delineated list like objects work great, but try them on dict like objects and it becomes clunky really fast. Parsing `ip` output is a good example where the data naturally lends itself to iface:attrs. Plucking the value you want by `.[.name | startswith('eth')].addr` is way easier than pulling this out with grep/awk.
> This allows easier parsing in scripts by using JSON parsing tools like jq, jello, jp, etc. without arcane awk, sed, cut, tr, reverse, etc. incantations.
While I love jq, I find its syntax to be utterly baroque. Far more than awk and sed (let alone simple tools like cut and tr). To the point where I end up looking at the jq man page to remind myself how to do even seemingly simple things at times. And often jq is just the beginning of a pipeline to get hierarchical data into columnar format so I can hit it with awk and sed.
But maybe this is all about familiarity; I have years of experience cobbling together shell tools and reach for jq comparatively less. I suspect the author may be the opposite.