Hacker News new | past | comments | ask | show | jobs | submit login
Mastering Jq: Part 1 (codefaster.substack.com)
205 points by code-faster on June 30, 2020 | hide | past | favorite | 72 comments



By far the most useful jq operator I've used is recursive descent: `..`. It can take a positively heinous json blob and pick out the attributes/keys you care about. For example:

Yikes:

> wget -O - -q 'https://reddit.com/r/unixporn/new.json' | jq

I just want the links:

> wget -O - -q 'https://reddit.com/r/unixporn/new.json' | jq '..|.permalink? | select(.)'


I've been casually using jq for years, yet you just managed to put three things I didn't know about it and will use extensively from now on in a single one-liner. Thank you!


Thank you. This will be added to my 'How To JQ' list, since I'm convinced now that I'll never fully understand its syntax enough to master it outright.


Oh, would you mind sharing your list please?


I know I might be a dissenting opinion here, but I can never wrap my head around `jq`. I can manage `jq .`, `jq .foo` and `jq -r`, but beyond that, the DSL is just opaque to me.

And every time I try to learn, I get lost in a maze of a manpage:

https://manpages.debian.org/jq

I mean there are useful tips in there, but I rarely find what I need. I usually find it simpler to write a Python script (which ships with `json` because it's "batteries included") and operate on lists and dicts the normal way...

It's longer, but at least I don't need to learn a new programming language (if we can call jq that...)


> the DSL is just opaque to me.

May I suggest "gron"? As compared to jq it is simpler (good!) has less features (good!) has no DSL (good!) and it is less powerful (good!, in many cases). It is a tool that "expands" json into standalone lines, and you can further process them using the standard tools grep, sed, cut, sort, awk...

EDIT: for example, if you have this json:

    {
      "outdir" : "out3",
      "data" : [
        {"img" : "img_01.jpg"},
        {"img" : "img_02.jpg"}
        ],
      "roi" : {
          "x" : 150,
          "y" : 150,
          "w" : 700,
          "h" : 700
      },
      "margin_h": 20,
      "margin_v": 5,
      "tile_size" : 300,
      "resolution": 0.5
    }
running it through gron produces this

    outdir = "out3"
    data[0].img = "img_01.jpg"
    data[1].img = "img_02.jpg"
    roi.x = 150
    roi.y = 150
    roi.w = 700
    roi.h = 700
    margin_h = 20
    margin_v = 5
    tile_size = 300
    resolution = 0.5
that some people find really convenient to deal with.



This online tool also produces assignment statements from JSON: https://www.convertjson.com/json-path-list.htm (note: creator here)


> I know I might be a dissenting opinion here, but I can never wrap my head around `jq`.

I don't think this is uncommon. People generally reach for jq for simple path expression evaluation against JSON objects, and never get deeper into it than that. It seems like the kind of thing you could re-implement yourself in an afternoon. However, as soon as you start taking advantage of some of the more complex functionality--say, a program like `.entries[] | select(.size > 1024) | .name`--there's a disquieting feeling of "what the hell is actually happening here?"

I took some time a while back to really dig into how jq programs work, and was surprised to discover how deep and powerful jq really is, while being built out of some very simple fundamental building blocks. Most of the jq builtin functions are actually implemented in jq itself[1]; even more of them could be, but aren't for the sake of performance. And the power and self-consistency of jq made sense when I found out that the creator Stephen Dolan is an "actual" Computer Scientist (in the academic sense) with extensive programming language research experience[2].

The main obstacle I see for most people is the lack of of an accessible introduction to how jq actually works when it comes to streams, function calls, generators, and backtracking. They're explained somewhat in the docs and on the project wiki[3][4][5], but in a fairly blunt way that assumes existing familiarity with terminology and concepts. It takes some effort to learn initially, but once you understand it, everything falls into place. I'm hopeful that subsequent installments in this "mastering" series can explain how the jq model works in an approachable way.

[1] https://github.com/stedolan/jq/blob/master/src/builtin.jq

[2] http://stedolan.net/research/

[3] https://stedolan.github.io/jq/manual/#Advancedfeatures

[4] https://github.com/stedolan/jq/wiki/Advanced-Topics

[5] https://github.com/stedolan/jq/wiki/Internals:-backtracking


Jq is really powerful but I think the bit it gets wrong is in array handling and filtering arrays. It is such a common thing that some syntactic sugar would be justified, but jq requires you to reach for map() and select() to do even the simplest stuff, and I can never remember if I should filter the `.array` or `.array[]`.

Yq [1] syntax for that is much better, and yq can accept JSON files too.

[1] https://github.com/mikefarah/yq


Yeah, the fact that jq has both 'streams of values' and 'arrays of values', and that the two are somehow completely orthogonal and support different operations is a huge design wart, in my opinion.


Let's just say it: jq is an amazing tool, but the DSL is just bad.

This is not to shame the developers but to voice our issues with it since this seems to be the prevailing sentiment, perhaps some brave soul will try and simplify it for human brains.


I cannot remember the syntax and have to search for it every time but it's very powerful. You basically give a DSL to process json. I can do map, filter and even math.

The reason it's popular partly because it's available from both Centos/Debian based. Just use `yum` and `apt`. Most of time, you maybe limited with what have on a container. And `jq` may even pre-install by DevOps.

Second is it play well with unix tool, pipe and friend.


Yeah, I find jq similar to writing regexes: I always have to look up the syntax, only get it working after some confusion why my patterns aren't matching, then forget it all in a few days so have to relearn it again later.

Any more alternates to Python or Node that are lightweight to add to a project? I'd rather use Node but that's a massive dependency for a non-Node project.


You can use any language with json support, which is most languages.

In python, you can have a jq like experience with

  def f(data):
    # put code here

  data = json.load(sys.stdin)
  output = f(data)
  print(json.dumps(output))


Agree and I also struggle with the docs. The examples are so trivial that they’re useless if you’re trying to use the examples to learn how to use them in context.

I think the examples for any() are any(true, false) and any(true, true). Fair play but not super helpful if I’m new and trying to understand how this fits into a typical jq filter.


I agree, jq is very advanced when everything most people want is simple json transformation and highlighting.

I've made jql[0] for that reason.

Check it out, it's less featureful, but much simpler with a uniform lispy syntax.

[0]: https://github.com/cube2222/jql


Is there anything in my tutorial that I can help clarify for you?


This resonates with me. I don't need more example of basic selects.. json is a tree structure and jq is a tool that allows us to select elements from a tree and transform them.

I would like to see a document that talks about jq from this more abstract perspective and what capabilities exist in this context. For example, can I select two arbitrary branches? Can I merge / move branches? Once we have a grasp of tree manipulation capabilities that are made available through the DSL we can lookup the syntax. Can I select a node and apply filters to it's children? There are so many ways in which a tree can be manipulated and I personally need a higher level description of what I can accomplish with jq.


> [...] but at least I don't need to learn a new programming language (if we can call jq that...)

jq has looping, branching, recursion (with tail recursion optimization), variable assignment, modules, I/O... Yeah, it's a programming language.


I can totally relate.

I think the only way would be a long web page with a massive list of examples to "solve problem X".

But the DSL is ...byzantine, to say the least.


AFAICT, it's basically xpath applied to json. Which isn't helpful, but perhaps explains why it's so obtuse.


I know xpath pretty well, and cannot wrap my head around jq's DSL. It might have borrowed from elements of xpath's more obtuse expressions, but simple xpath seems much simpler (to me) than simple jq.


And I still have no idea why jq is slow af.


Seems pretty fast to me considering it is reading JSON. JSON is difficult to read quickly.


The man page might not be the best way to learn jq. Did you read the documentation on has website? It's pretty comprehensive, with straightforward examples.

Spending one hour or so actually learning jq is a good time investment.


One of the most useful ways I've found to learn/test queries in `jq` is by using it as a semi-REPL via fzf with `--preview`:

    echo '' | fzf --print-query --preview 'cat example.json | jq {q}'
Here's a rough example of it in action: https://asciinema.org/a/y4WGyqcz1wWdiyxDofdPXKtdC


Wow, thank you for sharing this. I had no idea you could do that with fzf.


This example is weirdly spelled:

    echo '{"k1": [{"k2": [9]}]}' | jq '.k1 | .[0] | .k2 | .[0]'
is equivalent to

    echo '{"k1": [{"k2": [9]}]}' | jq '.k1[0].k2[0]'

I kept waiting for the author to explain this, but they did not


It is a weird way to write it, I never use this notation for myself.

When teaching though, I'd rather make the it more visually explicit that jq commands are a sequence of filters. With the former notation, a | explicitly tells the reader that it's the end of one filter and the start of a new one, the latter is more implicit.


That's possible! I've never had to teach it, good point.


When I first started using jq, all of the examples I was following used the short-form. It took me a while to figure out how to use the functions and do some advanced transforms because of this. Since this article is Part 1 of a series I suspect the author will cover this in a later article.

From a teaching perspective it is probably best to start with the long-form.


good point, I hadn't considered that idea.


`jq` is up there with `nix` for me where I can sort of remember the DSL but always have to open the docs. I recently found https://jqplay.org/ and its a huge help. I can crank scripts out much quicker now that I'm iterating with that.


https://jqterm.com is also a good one :)


jq is to cloud and "Infrastructure as Code" software what grep and awk are to unix software. It's the glue that lets you combine the pieces into a system. It's an odd language though, and not that easy to master. Tutorials like this one are helpful and needed... Thanks, Tyler!

However, there are quite a few typos in the examples... even the very first one is missing a quote and won't parse if cut-and-pasted as is.


I used to use shell scripts and jq a lot, but got tired of my scripts growing into monstrous mudballs of Bash hackery.

I do my scripting mostly in Go now and it's much easier to structure my code and grow it over time. I sometimes use Python, but Go is more flexible for what I do. I can compile an exe and drop it onto a host and run it, without worrying about VM and library versioning (or in Bash's case, making sure jq is installed and all the Unix utils have compatible versions).


How’s go’s batteries-included story? I’ve got nigh-on 50k lines of Python2 that IT is about to boot. My choice is to port it to ... something. I was going to do Python3 but I’ve always thought that a memory-management-free systems-like language would’ve always been better.


There is not a language as batteries-included as Go, that I've found. The standard lib includes almost everything, including an excellent HTTP server and client (even an HTTP proxy), a test framework, JSON/XML/CSV libs, SQL driver, CLI parser, a template library, crypto, etc. The "go" command includes tooling like a package manager, unit test runner, code formatter, static analyzer, race condition checker, code coverage, profiler, cross-compiler, etc.

I don't know what you mean by "memory-management-free" though, Go is still a garbage collected language.


I'd say Python has more batteries than Go.

Also I recently learned Go's CLI parser is pretty weak compared to argparse.


ymmv there but for what I'm looking for Go is usually more helpful than python straight out of the box, specifically because you don't have to reach for a third party library to make simple http requests, which is what I often end up using a scripting language for. I did just pick up argparse though and was very pleasantly surprised about how nice it is, go's `flags` module in the stdlib is unfortunately not quite as nice and is one of the things I reach for a third party library on. However, because it's pure go it still compiles down to a single binary which you can just scp around.


go's standard library and docs are high quality and well designed. You can take a look here to see if it fits your needs: https://golang.org/pkg/


Go's error-oriented style makes it very painful to write long sequences of fallible operations, particularly if your organization has test coverage requirements. I'll write servers in Go all day long, but scripting is the last thing I'd use it for.


There is some unfortunate boilerplate involved for scripting, but the benefits dramatically outweight that minor annoyance, IMO. Also, as the script accretes complexity (which happens quite often for me), the disciplined error handling starts adding more value.


Explain the benefits. "if err != nil { return nil, err }" is implicit for every statement in languages with exceptions.


That's only one way to handle an error. The benefit is that it's explicit, not implicit, and it's a psychological nudge to ask yourself "oh yeah this could produce an error, how should I handle it?". I can scan code and see where all the errors occur and how that particular component intends for each to be handled.

And handling errors in concurrent code is consistent--still passed as values. Languages with implicit stack-unwinding exceptions have inconsistent ways to deal with the errors because, in concurrent programs, error handling doesn't end once the stack is unwound, because there are lots of concurrent stacks. You then usually catch the exception near the top of the stack and pass it to another stack...as a value.

So yes, there is some boilerplate involved in that. But as a codebase grows, I appreciate that the language encourages me and others to be intentional about how errors are handled, and also provides a consistent mechanism for handling them in concurrent code (concurrency is a big reason I use Go to begin with).


You're not actually supposed to care about errors. It's the C tradition, YOLO.

(bash doesn't care about errors either, by the way.)


Thanks for flagging that, they've been fixed and retested, thank you jbotz!


I don't know what it is, but anything more than moderately simple stuff gets super annoying for me with JQ.

After getting super frustrated with the documentation while trying to accomplish stuff that should have been straight forward, I created jsling so I could pipe output through node for JavaScript one-liners. Anything moderately complicated that doesn't need to be portable; I just use that.

EDIT: I should mention that by "moderately complicated" I mean stuff that starts to get into the realm of joins, correlated sub queries, and the like.


When I find myself reaching for jq, that’s a signal that maybe this bash script should be implemented in a language such as python or go



I myself am developing a very similar tool [1] with somewhat saner (IMHO!) DSL. In fact this is a side-project from a JS lib I developed long ago [2].

My approach uses very simple ideas and is heavily based on JS internally. This is simple once you understand it [3] (thus saner DSL) and gives the whole power of JS in your hands.

I've also prepared a basic comparison jq vs jsqry based on some examples in article [4].

It's worth noting that currently the CLI tool [1] is written in Java using Graal VM polyglot native image compilation. Thus HUGE executable size of 95 MB (sic!) because it bundles JS engine. I'm considering rewriting this to QuickJS by Fabrice Bellard [5]. This should make it MUCH smaller.

[1] https://github.com/jsqry/jsqry-cli

[2] https://github.com/jsqry/jsqry

[3] https://jsqry.github.io/#filtering

[4] https://gist.github.com/xonixx/d6066e83ec0773df248141440b18e...

[5] https://bellard.org/quickjs/


Just recently used xq to convert and clean up a bunch of weird xml files to JSON. It was such a breeze.

I love how nested lists inside objects can expand into a particular one with the .[] operator.

For example:

{ a: [{b: 5}, {b: 3}], c: 5} can be transformed into: [{ b: 5, c: 5}, {b: 3, c: 5}] using jq '{c: .c, b: .a[].b}'

For heavily nested XMLs I can get a nice flat output.


Nice! I've found xml transforms is one of the best applications of jq (yes, jq, not xq).

In a future post, I'll cover how to use jq not just for json and xml but any data format.


I wrote https://github.com/moreati/jq-filter for using jq with as a filter in Ansible.


The author of jq is quite talented. I hope jq becomes a gateway drug.


Jq is quickly becoming one of the most useful tools in devops.


Mhm. Can you please add some details on how you are using it ?


Generally we use it to manipulate JSON configuration files. So often we'll have a single Config file across many microservices. We just use the same config for simplicity. Then we deploy many different services that have access to that same config. Some of the docker containers know how to natively access the config file. Others, things were we might be running one-off jobs especially, are often built from outside software that has no idea how to read our config file. Instead of programming a harness around it, we just typically use jq to read the config elements we need and save them as variables in a script. Then we can pass those variables as needed to whatever command line arguments.

Now, admittedly anywhere we have Node available, it pretty much negates a lot of it. But even with node, it's sometimes easier to just throw in a couple jq calls to extract data.


I would also be fascinated to hear how you're using it.


I reckon JMESPath should be first choice in 2020, as it's properly specified, so you can find x-language implementations that behave the same.


jq is honestly one of my favorite tools ever. Super powerful yet simple.

I've written some relatively complex programs with it, e.g. this one [1] I just wrote to manipulate the output from neuron [2] to generate index files based on tags. It's probably overkill/not as efficient as it could be, but I wanted to be able to expand upon it later.

I think once you grok the basics, which can be a bit confusing at first, you can accomplish some amazing things with the help of a few of the more advanced features like reduce, to_entries/from_entries, etc.

I really wish it was more actively developed, I feel like it actually has potential to be a semi-general-purpose functional programming language.

1: https://gist.github.com/b50a047115d1dcf1f15c16a6f7b71e3c

2: https://github.com/srid/neuron


Is there stuff jq can do that I can't do with Invoke-RestMethod and manipulating objects in PowerShell?


It can run without PowerShell.


There's nothing in jq you can't do with any programming language with a JSON parser and serializer implemented.

But I find jq more pleasant than most for lots of JSON-to-JSON transformations, and Powershell worse than not only jq but also Node or Python for that task.


It’s a one-liner in PowerShell to grab JSON from an endpoint and deserialize it.


> It’s a one-liner in PowerShell to grab JSON from an endpoint and deserialize it.

Yes, and that’s great. I don't prefer jq for downloading JSON (which it doesn't do), but, as I said, for JSON-to-JSON transformations.)


That’s fair, it probably makes more sense for that usecase than PS. It just feels like yet another DSL to learn


I really feel like powershell is underrated in the silicon valley ecosystem. I just had my work computer switched from mac to win10 and am so glad to have powershell back at the handy.


My favorite way to use jq is as follows:

`tail -f my-log-file.json | jq`

This can be great if you've setup Nginx to log as JSON.


At first I thought this was about github.com/timestored/jq and was very confused.


jq really is an awesome tool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: