Hacker News new | past | comments | ask | show | jobs | submit login
An Introduction to JQ (earthly.dev)
420 points by sidcool on Aug 25, 2021 | hide | past | favorite | 71 comments

One small but really convenient tip missing from the article is object shortcuts - these two commands are the same:

    curl -s https://api.github.com/repos/stedolan/jq/issues?per_page=2 | jq '[ .[] | { title: .title, number: .number } ]'
    curl -s https://api.github.com/repos/stedolan/jq/issues?per_page=2 | jq '[ .[] | { title, number } ]'

Not to distract from your point, but in this specific instance I'd probably use

  map({title, number})

  [ .[] | <something> ]

Side note, but one of the hardest things about jq is to realize all the operators operate on streams of values.

For example, map() operates on a stream of arrays. That kind of doubling of iteration can be confusing at first.

The first is nice because it highlights being able to remap the value to different key.

For anyone wrangling with data like I do. I use https://github.com/TomWright/dasel quite a lot (it supports various formats and conversion between them) Also csvkit https://csvkit.readthedocs.io for CSV to sql. And ofcourse pandas for analysis.

As long as we're recommending data wrangling tools, I'm a fan of visidata: https://www.visidata.org/

Thank you! More data wrangling tips are much appreciated!

If you prefer a visual drag and drop approach to command line take a look at: https://www.easydatatransform.com/

> csvkit


I always look for "jq for csv" but perhaps I should just convert to json, then back to csv with jq

I love these - I use jq daily and am always happy to read about it from another angle.

The author probably has internalized more of the manual than they realize, and maybe improved at least one explanation; FTA:

> map(...) let’s you unwrap an array, apply a filter and then rewrap the results back into an array. You can think of it as a shorthand for [ .[] | ... ] and it comes up quite a bit in my experience, so it’s worth it committing to memory.

From https://stedolan.github.io/jq/manual/#map(x),map_values(x) :

> map(x) is equivalent to [.[] | x]. In fact, this is how it's defined. Similarly, map_values(x) is defined as .[] |= x.

Note here the casual introduction of the update assignment operator, '|='

What does [.[] | x] do?

Not a jq expert, but my understanding is:

`.[]` takes a list and turns it into a sequence consisting of each element of that list.

`| x` applies the filter `x` to that sequence, turning it into a new sequence.

The outermost `[ ]` builds a list from that new sequence.

Take each sub-item in `.` (the current item being processed), apply the function `x` to each sub-item, and collect all the result values from `x` into an array.


This is the most jq-ish reply xD

This is a terrific introduction to a tool that I use often enough to pretend to be familiar with, but honestly just grep my bash history every time I need to use.

But I'd really like to see a discussion about the tool that the host website promotes: Earthy. It a build system, which is a family of tools that I've always hated, but it seems to be pretty decent. Is anybody using it?

I'm off to find HN threads on Earthly.

This is an amazing article. I think the two things that set it apart is (a) showing the process of building a complex command from scratch step by step, and (b) actually using a real API endpoint to start with.

One convenient tip I discovered after reading this article and trying out the command is that

    jq 'map({ title: .title, number: .number, labels: .labels | length }) | map(select(.labels > 0))'
can be refactored into

    jq 'map({ title: .title, number: .number, labels: .labels | length } | select(.labels > 0))'
or in other words, map(filter1) | map(filter2) == map(filter1 | filter2).

jq is unsurprisingly Turing complete, so I wrote a Whitespace interpreter[0] in jq. It is able to handle real-time I/O by requesting lines on-demand from stdin, which is the main input source, with `input` and outputting strings in a stream.

With a relatively large jq program like that, it is critical that the main recursive loop run efficiently, so it's annoying that there's no way to detect whether tail call optimization was applied, other than benchmarking. It would also be nice if object values were lazily evaluated so that it would be possible to create ad hoc switches.

[0]: https://github.com/andrewarchi/wsjq

The jq documentation is good, and thorough, but we see a lot of articles like these, and personally I think I would remember the particulars better (and maybe guess them without even referring to the reference manual) if the author wrote a narrative account of how they conceived of jq and how the language emerged from that conception. The tutorial is a great start, but something a little more fleshed out and introspective would help a lot, in my opinion.

You mean the jq creator should write up about their linguistic choices, or the author of this article?

The jq creator.

As much as I love jq and it has helped me, it always has an „awk“ vibe to me which is one of the tools I use less despite its capability, because of the same reason: I tend to forget all the special syntax over the years.

Compare that with „gron“ ( https://github.com/Deitar13/gron ), which is arguably not as powerful and clunky, but it allows me to compose and tie into the other unix tools way better.

It‘s mainly for that reason I use it more than jq these days for ad-hoc analysis.

AWK and JQ definitely suffer from "power-creep" as shell tools. IMO they are unreasonably powerful; full-fledged scripting languages masquerading as simple text-processing tools.

Part of the blame is definitely on the fact that the "idiomatic" way to filter columns in a shell pipeline is to invoke AWK: `awk '{print $3,$5}'`. Similarly for JQ. Virtually every sysadmin and programmer gets introduced to these languages as Unixy tools when in fact they are antithetical to the Unix philosophy.

The result is ending up with overcomplicated "production" pipelines (curl|sed|awk|jq) when you really could be writing one far more coherent, maintainable, scalable C, Go, Python, etc. program with their standard libraries.

This. I stopped writing shell script for the exact reason. When something is more complex than launching a couple of commands, and involves for example JSON processing, I write the script in python.

I realized it after wasting multiple hours debugging problems at work in scripts that used jq or AWK or similar tools and most of the time the problem was solved by quoting randomly things, except when discovering that there is the edge case that I didn't considered and the program broke, again.

Now when I have that sort of problems I don't even bother trying to fix them, I just rewrite the whole script in python (they are usually small scripts so it's a question of 15 minutes most of the time). And writing new script in bash is banned (except particular cases).

Also there is the concept of portability, most people assume that everyone has a way to install that tools because they have on their system, it's not that simple, and while putting things in production or on the CI it breaks because jq is missing. And good luck with Windows, by the way.

Good writeup!

One nit: ‘jq -r’ seems like more fundamental than a sidenote, especially considering it’s a cli tool. That could just be how I use it though (as glue between json and bash).

Another good tool in that family is ‘jtbl’. It provides table output, which is useful for cut, awk, sed and column.

`jtbl` seems nice! I'd add:

* https://github.com/tomnomnom/gron - make json greppable

* https://sr.ht/~gpanders/ijq/ - interactive jq

The interactive ijq is nice! It’s useful and the ui makes it a good training tool. Thank you!

You might know this, but feeding arrays to '|@tsv' will provide output ready to pipe to these as well.

Something I just learned about the other day was jid [0] to help query the json keys

[0] https://github.com/fiatjaf/jiq

You can do the same thing using fzf preview mode

Can you elaborate ?

I guess they mean something like this, which I find useful at quickly iterating over jq

  fzf --print-query --preview-window wrap --no-clear --preview 'cat file.json | jq {q}'
you can also pipe a curl in the above, but that will mean a lot of (slow) requests, so I have this snippet saved for running fzf --preview with jq on something from a web service

  curl -L https://datahub.io/core/covid-19/r/worldwide-aggregate.json > /tmp/foo && echo '' | fzf --print-query --preview-window wrap --no-clear --preview 'cat /tmp/foo | jq {q}'
now if you write something like

you will get a live preview of the result as you type

I did not know about `jiq`, so thanks for that tip, it looks like it does the same or something similar, but without the extra cruft of storing a temp file

It can be just `jq {q} file.json`. No need for `cat`.

`echo '' | fzf --print-query --preview "cat file.json | jq {q}"`


For anyone struggling with jq's syntax, I recommend taking a look at fx [1] which uses JavaScript as the query language.

The author of the tool has also written a guide [2] and recorded a screencast [3] about the tool.

[1] https://github.com/antonmedv/fx

[2] https://medium.com/@antonmedv/discover-how-to-use-fx-effecti...

[3] https://youtu.be/ktfeRxKog98

> However, some things never stick in my head, nor my fingers, and I have to google them every time. jq is one of these.

However powerful jq may be, the comment above, which has exactly been my experience with jq, summarizes quite nicely the biggest hurdle with this tool.

99% of my jq use looks like this:

     cat file.json | jq . | <regular list of unix filters>
and the remaining 1% is straight cut and paste from google / stackoverflow that may or may not end up doing what I want.

jq's DSL is inscrutable

you may enjoy one of the various "gron" commands, that format json as a list of lines, with one field per line, and is very easily greppable, seddable, and awkable.

>you may enjoy one of the various "gron" commands

This is the second time someone recommended this on HN, so this time, I did go and have a look.

Really nice indeed, thanks for the tip.

I have to use jq once in a blue moon which means I have to rely on trial and error, because I have forgotten all that I leanred last time.

If you work with JSON and CSV data regularly, I'd also recommend checking out Miller: https://github.com/johnkerl/miller

From the project's description: Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

A very useful function in jq is "join", which I use a lot to cook the final shape of the data (many times used with fzf/dmenu)

Here's a simple way to list and browse github issues of a given user/repo:

    #!/usr/bin/env sh

    browse_url() {
      firefox http://github.com/$1/issues/$2

    issue=$(curl https://api.github.com/repos/$1/issues |
          jq -r 'map([(.number|tostring), .title] | join(" | ")) | join("\n")' |
          dmenu -i -l 10 |
          awk "{print \$1}")

    browse_url $1 $issue

Although the `| join("\n")` part could be done in a more idomatic way with just `[]`, sometimes the manual way are still clearer to me:

    map([(.number|tostring), .title] | join(" | "))[]

Personally I find it clearer in many cases to use a format string instead. ie. instead of writing:

    [(.number|tostring), .title] | join(" | ")
I would write:

    "\(.number) | \(.title)"
which IMO is more readable in cases where you have specific values you want to put in specific places, as opposed to a list of unknown length which you want joined (eg. I would still use join("\n") in your example).


I didn't know that one could build an arbitrary string like that inside a map.

Thanks a lot for that, I agree it looks better!

My OpenAPI spec is 60,000 lines of machine generated goodness. JQ is great for doing non-trivial but useful things. Eg: Which endpoints are available if you have a specific OAuth Scope?

Most useful jq cli flag is -f. - take the jq script from file

Most useful tutorial for learning to manipulated your OpenAPI spec https://apihandyman.io/api-toolbox-jq-and-openapi-part-1-usi...

Thx. I hate typing on my mobile.

Big fan of JQ. I like it more than the traditional UNIX suite of text manipulation commands, because I get closer to "querying" rather than just filtering. It has really made me rethink where I want "interacting with a computer" to go in the future -- less typing commands, more querying stuff.

I have a few utilities involving JQ that I wrote.

For structured logs, I have jlog. Pipe JSON structured logs into it, and it pretty-prints the logs. For example, time zones are converted to your local time, if you choose; or you can make the timestamps relative to each other, or now. It includes jq so that you can select relevant log lines, delete spammy fields, join fields together, etc. Basically, every time you run it, you get the logs YOU want to look at. https://github.com/jrockway/json-logs. Not to oversell it, but this is one of the few pieces of software I've written that passes the toothbrush test -- I use it twice a day, every day. All the documentation is in --help; I should really paste that into the Github readme.

I am also a big fan of using JQ on Kubernetes objects. I know what I'm looking for, and it's often not in the default table view that kubectl prints. I integrated JQ into a kubectl extension, to save you "-o json | jq" and having to pick apart the v1.List that kubectl marshals objects into. https://github.com/jrockway/kubectl-jq. That one actually has documentation, but there is a fatal flaw -- it doesn't integrate with kubectl tab completion (limitation of k8s.io/cli-runtime), so it's not too good unless you already have a target in mind, or you're targeting everything of a particular resource type. This afternoon I wanted to see the image tag of every pod that wasn't terminated (some old Job runs exist in the namespace), and that's easy to do with JQ: `kubectl jq pods 'select(.status.containerStatuses[].state.terminated == null) | .spec.containers[].image'`. I have no idea how you'd do such a thing without JQ, probably just `kubectl describe pods | grep something` and do the filtering in your head. (The recipes in the kubectl-jq documentation are pretty useful. One time I had a Kubernetes secret that had a key set to a (base64-encoded) JSON file containing a base64-encoded piece of data I wanted. Easy to fix with jq; `.data.THING | @base64d | fromjson | .actualValue | @base64d`.

JQ is something I definitely can't live without. But I will admit to sometimes preprocessing the input with grep, `select(.key|test("regex"))` is awfully verbose compared to "grep regex" ;)

Very useful tool indeed. Worth investing some time in if you like doing data processing on the command line.

Stuff I do with it:

- prepare json request bodies for curl commands by constructing json objects using environment variables

- grab content from a deeply nested json structure for usage in a script

- extract csv from json

- pretty print json or ndjson output curl ... |jq -C '' | less -r. I actually have an alias set up for that.

The syntax is a bit hard to deal with. I find myself copy pasting from stack overflow a lot when I know it can do a particular thing but just can't figure out how to do it.

If there's an API for working with data that you know well already, it's somewhat pointless to learn the (quite esoteric) jq one.

Just pipe curl to node (https://github.com/jareware/howto/blob/master/Replacing%20jq...) or Python or Ruby or whatever you already know!

Or use that language’s native HTTP client and cut out the shell middleman.

jq is much more terse. When I’m working in bash, I much prefer to write `kubectl get secret foo -o json | jq -r '.data | map_values(@base64d)'` rather than the equivalent Python.

JQ is great once you get the hang of it. Some time back I had to write some simple tests on JSON output and I wrote a couple of helpers if anyone else is interested [0].

[0] https://gist.github.com/Checksum/72d927471c76c76c46418b3ee88...

I really want to like JQ. I know the tool is super powerful. Unfortunately I find the syntax obtuse and very hard to remember, especially when I only use it on rare occasions.

Everytime I want to use it I end up searching up examples of the syntax and not quite getting it right.

For exploratory purposes one can also use `jq 'keys'`

``` cat file.json | jq 'keys' | grep ependencies ```

This will list the keys, sometimes is really helpful with a big json that you don't know the schema, but you have the intuition that some key should be there.

Should you wish to jq can search too without the need of piping through a shell:

  jq 'keys | map(select(test("ependencies")))' file.json
map(select()) is a pretty useful construct to loop over stuff and pick out the interesting parts.

I don't work with data, so my most common usage for jq is checking package.json properties from the command line:

  cat node_modules/some_lib/package.json | jq '.version'

i have a jq tutorial, where you can click on each step of the pipeline to view what it is doing [1], even had it featured here on hn, a while ago. [2]

[1] https://mosermichael.github.io/jq-illustrated/dir/content.ht... [2] https://news.ycombinator.com/item?id=22626080

Since the language is based on JavaScript, you can also concatenate strings together with “”+”” as part of your output, eg ‘.Name + “(“ + .Email + “)”

Most of the APIs in normal workflow are hidden behind a token. Is there a way to smooth the workflow for putting tokens in curl command ?

If the token is passed as a url parameter in a get request, -G can be useful as it forces the -d key=value switches to append to the url. So that:

  curl -G 'https://api.example.com' \
    -d foo=bar \
    -d baz=whee
Is the same as

  curl 'https://api.example.com?foo=bar&baz=whee'
That can make all the quoting hell a bit easier, as you're doing it one param at a time.

Assuming you pass your token in a header, put your token in an environment variable and add it to your request like this: ``` curl -H "Authorization: Bearer $TOKEN" ... ```


Excellent introduction!

It can really be overwhelming when you realize they control all of the major institutions in our country.

Has anyone figured out a workflow to nicely integrate jq filters into a postman type application?

Been looking for such a write up. Kudos to the author.

alias json_query=jq

no need to memorize

That title though.

dirty mind

Can anyone explain why I'm seeing JQ all over the web?! Yesterday a vendor gave a presentation saying that we should use it when using their command line tools, what's going on?

Frequency illusion

> The frequency illusion is that once something has been noticed then every instance of that thing is noticed, leading to the belief it has a high frequency of occurrence


What is the cognitive bias called in which you see an internet post talking about a personal experience that is clearly perfectly plausible/likely, but instead choose to down vote them and assert the said experience is fraudulent by citing a wikipedia article that is barely incidentally related?

This is clearly a flavor-of-the-month tool that has been getting much coverage on these sorts of sites lately, why impune the guys mental function?

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact