Hacker News new | past | comments | ask | show | jobs | submit login
Illustrated Jq Tutorial (mosermichael.github.io)
272 points by MichaelMoser123 3 months ago | hide | past | web | favorite | 72 comments



I've been a die-hard Linux user for about a dozen years. Recently I had to do some development with MS Powershell. I was very reluctant at first, but after getting familiar with the technology, I almost fell in love.

"Cmdlets", basically commands used in Powershell, output "objects" instead of the streams of text used in a more classical shell. Powershell has built-in tools to work with these objects. For example, you can take the output from one Cmdlet, pipe it through `SELECT` with a list of fields specified, and get a stream of objects only containing those fields. Other operations can be performed against those objects as well, such as filtering and whatnot.

Back to normal nix commands, we're starting to see more and more commands introduce direct JSON support [1]. There are even tools to translate output from common commands into JSON [2]. We'll probably see `jq` shipped directly with modern distros soon. Eventually we'll reach a tipping point where it's expected that command supports JSON output. Tools like `awk`/`sed` might get updated to have a richer support for JSON. Finally, we'll have ubiquitous Powershell-like capabilities on every nix machine.

Powershell _is_ available on Linux. The model of piping objects instead of JSON is both powerful and more efficient (For example, there's no redundant keys like in a stream of JSON objects, leading to less moving bytes, like how CSV headers aren't repeated with every row. Plus, binary data is smaller than text.) But, most developers are hesitant to switch out their shell and existing workflows for a completely new tool, which is why Powershell will likely only be adopted by a small subset of sysadmins.

[1] https://daniel.haxx.se/blog/2020/03/17/curl-write-out-json/

[2] https://github.com/kellyjonbrazil/jc


Though it's pretty immature, nushell has a similar idea, with its own internal data model being streams of structured, typed data: https://www.nushell.sh/

And back to nix commands, libxo is used by a chunk of the FreeBSD base tools to offer output in JSON, amongst other things: https://github.com/Juniper/libxo

    -% ps --libxo=json,pretty
    {
      "process-information": {
        "process": [
          {
            "pid": "52455",
            "terminal-name": "5 ",
            "state": "Is",
            "cpu-time": "0:00.00",
            "command": "-sh (sh)"
          },

    -% uptime --libxo=json,pretty
    {
      "uptime-information": {
        "time-of-day": " 8:34p.m.",
        "uptime": 1730360,
        "days": 20,
Be nice to see more tools converted.


# ip -j a | jq


> But, most developers are hesitant to switch out their shell and existing workflows for a completely new tool, which is why Powershell will likely only be adopted by a small subset of sysadmins.

What I would like to see is some sort of stddata stream be offered by the kernel itself so devs won't have to switch their shells and object manipulation can be a standard.


Not only that, but pwsh support for objects doesn't stop on passing objects around and mapping to properties to parameters. There are number of mehanisms in place. All nix variants solve just 1 of those mechanisms.

IMO, powershell should be added to ALL mainstream distros as first order citizen. There is no downside to that given that MS is now legit FOSS player and that anybody can fork in case something goes wrong along the way...


Has Microsoft given some sort of explicit patent license for Powershell?



That is a no then, as the MIT does not contain an explicit patent grant.

So Microsoft could very well sue anyone using Powershell, using patents. We don't know if they would win, as there might (or might not) be some protection in a "implicit" grant that a copyright license give. But it would be a fight most organizations would not be willing to take, and would settle out of - a win for Microsoft in practical terms. It is not long ago since they did this with FAT32 against practically all Android manufacturers.

Apache 2.0 would be the permissive license to go for if desiring to guarantee safety from patent issues. http://en.swpat.org/wiki/Patent_clauses_in_software_licences


What are you talking about ?

> deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so


Yeah that is the implicit grant. But maybe it is clearer in practice than I though. The Open Source case book regarding patents puts it in "Express but Non-Specific Licenses" https://google.github.io/opencasebook/patents/, and Scott K Peterson at Red Hat says its explicitness does not depend on the mention of "patent" https://opensource.com/article/18/3/patent-grant-mit-license


Looks like a corner case, but really, if implicit grant is not enough then ANY explicit grant will miss some of the real life moments - you mentioned patents, but probably many more exist or will exist in the future.

If somebody pulled this trick on me, and I were a judge, I would punish both laywer and a company he is representing. Its simply emberrasing and belitteling toward human intelect.

Also:

> the Open Source Initiative has stated its view that all open source licenses implicitly include a patent grant

Thx for the info.


Is there a good source to get started with powershell, perhaps aimed at people with good bash knowledge.

Every time I open it up I just nope back out, and most of the docs I peeked at didn't draw me in.


If you're not aware of what jq is, it's "JSON Query", a cli tool for filtering JSON streams - https://stedolan.github.io/jq/manual/


Very neat project! I think visibility into intermediate stages of a pipeline can be enormously useful for data restructuring tasks, where the individual steps aren't necessarily "difficult" to understand, but it can be hard to keep track of what's going on without a good feedback loop.

Here's a demo of a prototype live programming environment I made for jq, which similarly shows step-by-step views of the data but also gives live feedback as you construct your pipeline:

https://twitter.com/geoffreylitt/status/1161033775872118789

By the end of that Twitter thread I ultimately morphed it into a tool for building interactive GUIs (eg, get API data in JSON, use jq to morph it into the right shape for your UI, output to a HTML template).


Please don't put "Show HN" on posts like this. See the rules at https://news.ycombinator.com/showhn.html.

I'm sure it's fine reading material, but if we allowed Show HN to be reading material, every submission would be a Show HN.


Shameless plug: if you really like jq, I built a project that uses libjq to process various formats.

https://github.com/jzelinskie/faq


That is very cool! For other readers, "process" means submit jq-like document queries, and "various formats" means other JSON or JSON-like representations, such as BSON, Bencode[1], TOML, XML, and YAML. Thank you for sharing!

[1]: In 2001, prior to the huge popularity of JSON, it is an ASCII-coded dict+string+int+list format used in Bittorrent .torrent files.


If you'd like to avoid cgo, you can use a pure go implementation of jq called gojq: https://github.com/itchyny/gojq


Thanks for this link. If you'd like to create a GitHub issue, I'd appreciate it. My justification for linking to libjq is that it's a moving target: there are various builtins added across updates etc...


great content, it's just a bit annoying that one has to click on every portion of the command to see the "illustrated" part


Why does the reader mode icon not show up for this page on Firefox Android? I don't think it's possible to read on mobile atm.


There is a specific requirement to trigger that. Not sure whether desktop and Android Firefox is using the same criteria though. (https://stackoverflow.com/questions/30661650/how-does-firefo...)

edit: the article does not have <p> tag. so reader view is not triggered.


Nice overview of jq! You may want to say demonstration rather than illustration though.

I liked jq but liked json, a similar npm package, a little bit better for simple tasks.

You can find more about it here: https://github.com/trentm/json

As a JS dev I tend to have node installed anyhow so I just use a shell alias to wrap ‘node -pe’ these days. It’s not really for shell scripts but it’s great for quick every day usage. Plus you can use JS if needed instead of their DSL.

Here the code for the alias in my shell profile: https://github.com/KylePDavis/dotfiles/blob/master/.profile#...


Shameless plug: If you use jq a lot, catj (https://github.com/soheilpro/catj) can really help you with writing query expressions.


I use gron to do this same thing:

https://github.com/tomnomnom/gron

Some options of gron I use often:

--stream, which treats the input as "JSON lines" format

--ungron, which converts from the flat format back to JSON


Can confirm, quite like catj. As a python dev though, I find the npm installation non intuitive. Jq is simple - stuff a binary in some bin/.


Looks like a good tool, but can everyone reading this please plus-one the thumbs up here to get the reverse operation implanted?

https://github.com/soheilpro/catj/issues/7


Is there something similar for YAML? I've tried `yq` briefly but weirdly enough it doesn't seem to accept standard input in the way that jq does (ie pipe in some json, and output some pretty json)


Highly recommend https://github.com/mikefarah/yq

It can do everything with json as well and convert between


Worth noting OP has tried it and has some critiques.

> I've tried `yq` briefly but weirdly enough it doesn't seem to accept standard input in the way that jq does (ie pipe in some json, and output some pretty json)




it shouldn't be too difficult to convert between yaml and json, funny i couldn't find a light weight converter easily. I think i will try to write one.


Doesn't yaml have significantly more baggage as far as "advanced features" that couldn't be properly duplicated in json?


Basic YAML, sure, but YAML has some insane features that I don't think would be possible to replicate in JSON. Or, at least, you'd lose some information in the conversion.

For example, YAML supports the concept of reusable fragments. You can define a fragment in one place and reference it further down in your YAML file. A JSON converter could take the final YAML output and turn it into JSON but you would lose the context of the fact that in the original YAML it was an included fragment and not just the same section repeated a few times.


Yeah, YAML can also embed arbitrary code in the original Ruby incarnation and that clearly cannot be translated to JSON.

Yeah, no straightforward translation of references to JSON. You could provide both translation ramps, but it would be an implementation-specific convention and not something other JSON tools understood.


jc has a yaml to json converter:

cat file.yaml | jc --yaml


yq r test.json > test.yaml

yq r -j test.yaml > test.json


the tutorial could be dramatically improved by showing some json data and then the results of processing.


If you click on the query string in the command it does. Not intuitive at all though.


Nice! I struggled to learn jq initially and I made a similar page for my team.

One suggestion is to use with_entries as a replacement for the 'to_entries | map(...) | from_entries' pattern. For example:

  jq '.metadata.annotations | with_entries(select(.key == "label1"))'
is equivalent to

  jq '.metadata.annotations | to_entries | map(select(.key == "label1")) | from_entries'


Not related to with_entries, but I didn't see anywhere else in this thread that mentioned dealing with awscli output

from_entries handles nicely the Tags in a lot of awscli output, you can do things like

    aws ec2 describe-instances | \
        jq '.Reservations[].Instances[] | 
            {Role: .Tags | from_entries | .role,
             Name: .Tags | from_entries | .name,
             Id: .InstanceId}' \
        -C -c | sort | less -R
to get a summary of all your instances sorted by role.


thanks for the suggestion, i will add another example like this.


I have been using jq for years and still can't get it to work quite how I would expect it to. kubectl's jsonpath seems just about workable.


I have the same issue with jq. I need to use my google fu to figure out how to do anything more than a simple select.

I created jello[0], which uses python list and dict syntax to filter JSON. Here's a blog post[1] I wrote that shows how it can be used.

[0] https://github.com/kellyjonbrazil/jello [1] https://blog.kellybrazil.com/2020/03/25/jello-the-jq-alterna...


I don't know what the term would be, mental model, but I just can't get jq to click. Mostly because i only need it every once in a while. It's frustrating for me because it seems quite powerful.


I tend to only use it for relatively simpler queries than it is capable of. Ditto regular expressions. But I do not find myself missing advanced features of either very often. If I run out of my ability with jq, the manual page and playing around with the query sometimes illuminates the correct answer. And if not, it's often time (for me) to switch to using a less ephemeral program anyway.


I like jq for simple stuff which is what I mostly use it for. Whenever I have to dive into the documentation to do something complicated I die a bit on the inside.

Give something like this a go if you know javascript: https://www.npmjs.com/package/jsling


I like how you have output for each process of the pipeline, however it would be much better in terms of usability if you could dynamically load the result just below the query rather than opening a new page.

With that said this is a great overview!


I added another version where all the links are part of the same page / inline div's that are displayed https://mosermichael.github.io/jq-illustrated/dir-single-fil...


> if you could dynamically load the result just below the query rather than opening a new page.

s/rather than/in addition to/

For those of us with javascript disabled, the way the page works is perfectly fine as it is.


i thought of that of doing it with frames, but then frames do annoy a lot of people.

Do you have some small example where they do such an UI properly, so that i can copy it? I am not much of an expert in javascript/css.


Would like to have a kind of "rosetta stone" where each of these examples is rewritten by passing the json to "gron" and then using the standard unix tools.

I guess some of the examples would be simpler than the jq solution.


Kudos for sharing "gron," I hadn't heard of that tool before and it looks quite useful: https://github.com/tomnomnom/gron


For a truly unix experience, filter the output of gron through this

    grep -Ev '({}|\[\])' | tr -d \; | cut -c 6-
It will remove useless cruft that is added for the output to be valid javascript


apart from all the useless use of cat, great content!


God forbid that someone should ever write (- y + x) rather than (x - y), what a useless use of the plus sign!

Why have a problem with this? Catting a single file is a well-known idiom for outputting its contents into a stream, plays well with positioning in pipelines, and has the nice property that you can erase as much of the pipeline as you want, to be able to peek inside it at any point.


Completely agree. Cut it if your file is 10GB of logs because it requires an extra stream copy, but otherwise, it is the best way to interactively drill down into data and write the pipeline as you go. It is almost never a valid criticism, and when it is, calling it Useless UOC (reads as an insult to the author!) was the worst way to communicate that to the masses.


I think that done deliberately to emphasise the pipelining aspect, redirection would visually obscure the main principle of the article.


not only that but I do the same when I use the CLI because it makes sense to say "I have a file. Now I will pipe it to ___" rather than "Using ___ I will pipe a file in then ___".

It just makes more sense to do things in order even if it's a couple extra characters.


You can keep the order but save a process by doing:

<input_file command_line


I was going to suggest that shells should allow specifying the input redirect before the command, but it turned out that "<input command" in bash already works. Anyone knows about other shells?


All POSIX-compliant shells should support this.


This is so much easier with Powershell. V7 is totally cross-platform and I cant see why people have a problem to use it, if nothing else then for `ConvertFrom/To-Json/CSV/Whatever` cmdlets ..


I'd love to see an example. Let's say I'm on a Debian server. How would I acquire Powershell (is it GPL/MIT?) and use it to convert some JSON?


Powershell Core and .NET core are MIT Licensed as of 2016 https://github.com/PowerShell/PowerShell/blob/master/LICENSE...

Debian install instructions https://docs.microsoft.com/en-us/powershell/scripting/instal...

Enable https, add feed, apt install

As someone who started out scripting in Powershell but preferred Linux as an OS. I found myself missing powershell's object passing in opposition to bash's string passing.

That being said I now primarily use iPython for advanced shell tasks as I can leverage all of Python's libraries like JSON or YAML.


> That being said I now primarily use iPython for advanced shell tasks as I can leverage all of Python's libraries like JSON or YAML.

TBH they're both excellent choices. nushell looks really good too. The point is not to scrape text - rather than write a bash script that can handle JSON properly, write a pwsh/python/nushell script that handle everything properly (ie, by selecting fields rather than scraping text).


Yeah, lets wait 10 years for nushell to catch up.


Terrific, thank you!


You'd install it via apt obviously. For Ubuntu:

    curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
    sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-debian-jessie-prod jessie main" > /etc/apt/sources.list.d/microsoft.list'
    sudo apt-get update
    sudo apt-get install -y powershell
I'm a node developer, here's what I used to find packages that had a cracked dependency recently:

    $results = @{} 
    ls -recurse -filter "eslint-scope" | foreach { 
      $file = "${PSItem}\package.json" 
      $version = cat $file | convertfrom-json | select -ExpandProperty version 
      $results.Add($file,$version) } 
    echo $results | format-list
PS. pwsh should really combine reading a file, determining it's type, and parsing it, into one command. Like:

    open $file | select -ExpandProperty version 
I know I can make this, it just should be in the stdlib.


As someone fairly ignorant of Powershell, how well does it interact with the rest of the Unix-minded ecosystem? jq is nice because its plainly compatible with being piped into another command like xargs. I don't want a better jq if it means relearning and remastering everything else along with it.


The same as any other CLI app.

> I don't want a better jq if it means relearning and remastering everything else along with it.

As I said, better jq may be equal to:

pwsh -Command "(cat test.json | ConvertFrom-Json).Whatever.Property.Or.Normal.Filter"




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: