Hacker News new | past | comments | ask | show | jobs | submit login
Extracting Objects Recursively with Jq (simonwillison.net)
268 points by edward on Aug 1, 2021 | hide | past | favorite | 71 comments



I tend to use jq a lot. As others have said, sometimes jq can be hard to grasp. Often it requires multiple attempts to get the correct answer. To make it a little easier for me, I've written a helper function[0] that combines it with fzf[1] to run jq as a REPL on any json. It allows to incrementally alter your DSL without having to continually call jq. This is similar to jid/jiq but a little more powerful. It includes functions to change the preview to output raw, compact (or not), and some other things.

It is essentially similar to jqplay but local.

I didn't use jid/jiq because jid uses go-simplejson, which is nowhere near as powerful as jq, and jiq seemed very buggy when I used it and it felt like it was hacked together. Plus there was no where to change jq's arguments while running it.

I'm sure this function can be improved on, but this has been good enough for me so far.

Also, I run gojq[2] instead of jq. It is a drop-in replacement for jq but is written in Go, and has some improvements over jq such as bug fixes, support for yaml input, and it also provides more helpful error messages.

[0] https://github.com/hoshsadiq/dot_files/blob/master/zshrc.d/m...

[1] https://github.com/junegunn/fzf

[2] https://github.com/itchyny/gojq/


If anyone is an emacs user and this sounds compelling, I recommend counsel-jq[0] for the sort of feedback loop described here.

[0]: https://github.com/200ok-ch/counsel-jq


I use counsel-jq occasionally, and my main issue with it is the single line input. As soon as the filtering is not trivial, it's much more convenient to be able to use multiple lines.


I've written about this before, but for tmux users, I have interactive querying the form of ten-line shell script [1]. See it live [2]. It can easily be modfied to interactively query yaml, html with xpath or css, or text with awk or what have, and I have variants for each of these. This has the advantage over the parent comment thatyou have your text editor's key bindings instead of those of fzf.

Dependencies: tmux, nodemon, less, jq, vim

[1]: https://gist.github.com/psacawa/e63c4e25a8b0405309d3a03b6b50...

[2]: https://streamable.com/jwdrqu


Just sharing my take on that interactive jq (or anything else) repl:

https://github.com/kbd/setup/blob/master/HOME/bin/fzr

It's just an fzf wrapper that sets up temporary files and so on. It works really well; it's amazing all the things one can use fzf for.


This is nice, it uses latest/modern python3 features, thanks, I needed some use case to see how those feature are useful in real world...


I'm the opposite, I find that jq doesn't have a reason to exist, other than pretty printing JSON files on a terminal and doing basic filtering on a JSON object and only as interactive shell usage, NOT in a script.

It's the classical tool, like sed, like awk, like a ton of unix utility that at first they seem to you easy to use, then you have to do something complex and you start abusing them, by piping things multiple times into jq, and you end up writing things like this:

    echo $json | jq "something $(echo $variable | jq 'something else' | tr '"' '\'') | sed 's/"/\\'/g" | jq "another js invocation" | awk '...' > file2.json
I stopped using jq after realizing that I was wasting my time by trying to fix a script that used jq and didn't managed quoting correctly, trying to use different kind of quotes, even filtering the input before passing it to jq with tr replacing things. It's just another tool prone to abuse like sed, awk, tr, cut or similar things.

I thought why I'm wasting my time on a tool that has a complex and limited DSL when I can write a clean python script in 10 minutes to do the same things that is easier to write, to read and most importantly to maintain.

To me a script that has to manipulate JSON should be written in an high level programming language like Python, and not be abused with tool as jq and stuff. Even I there is an already existent big bash script that you don't want to rewrite and you have to do some json processing in it... you can write an inline python script like this:

     python3 <<PYEND
     ## your python code
     PYEND
Also jq is another dependency to a script that must be installed.


Speaking as a long-time python developer...

If you actually "get into" jq you find out that it's a significantly neater language than it appears on the surface. Firstly it does allow you write multi-line scripts, and things start to look a lot neater once you do. Secondly it's actually a real, working, functional programming language, which allows very succinct expression of ideas which, in python, would likely require the reader to track state across explicit loops and the like.

Once you dig into the manual, you also tend to discover that a lot of the things that cause you to string multiple jq invocations together aren't actually necessary because there are quite sensible ways of handling them in-language.

It's quite laughable though to tout python over jq because of it adding a dependency. Perhaps if you're already embedded in python-land and all your environments already have python - but many (most?) of us are increasingly targeting extremely minimal container image environments. In that case, adding python is a much larger and more complex dependency than jq's single 3.8MB binary.

As someone who has to read an awful lot of other peoples deployment scripts, it's also quite nice when I see jq because it loudly advertises "all I'm doing here is mangling one piece of json into another! no side effects!". I'd much rather follow the thread of execution into that than some mystery ruby script any day.


> Secondly it's actually a real, working, functional programming language, which allows very succinct expression of ideas which, in python, would likely require the reader to track state across explicit loops and the like.

This is the problem. Is yet another language that someone has to learn and know, like the ton of other UNIX commands that have their own DSL.

> Once you dig into the manual, you also tend to discover that a lot of the things that cause you to string multiple jq invocations together aren't actually necessary because there are quite sensible ways of handling them in-language.

Nice. I can dig into the manual and spend a day to learn it, but I don't have that time. I have a script to fix, I know how to program in python, and throw away the jq code and substitute it with 10 lines of python in a minute, and problems solved.

> It's quite laughable though to tout python over jq because of it adding a dependency

If jq enters your codebase then everywhere you have to install it. It's not that difficult (but not trivial on Windows), but it is annoying, you run a script and you then find out that you don't have jq installed and you need to install it.

Also what is faster: eliminating the need for the jq command in a script, or installing jq on tens of different systems with different operating systems?

No, where I work I established a rule that every script should be written in python and should use only the standard library, with a few exception (e.g. we work with AWS so boto3 is an exception). If not every developer did use whatever tool they thought it was cool (like jq), write a ton of bash spaghetti code with all these tools used together, and forced every other developer to install them on their system (and cause a lot of work to the IT, i.e. me to do so if they weren't able and fix all the problem).

> Perhaps if you're already embedded in python-land and all your environments already have python

Python is everywhere. Every Linux distribution have a python interpreter in them, same for macOS, and in Windows nowadays you install directly from the store with one click. The problem is that if jq enters the codebase then of course every developer machine has to have it installed. This is annoying.


I sympathize with both points of view. I think of autotools as an extreme example of how less general tools tend to grow in complexity to the point that the effort to learn them outweighs the benefit you get from knowing them well. But there is something missing from the "just use python everywhere" argument as well. For many usecases, choosing a less expressive language can guard against excessive complexity creeping in at that layer. IMO, json+jq fills a useful niche between semistructured/unstructured-text+awk/sed/bash and APIs+some-gp-scripting-language


If you have quoting issues, and going by the example you posted of replacing quotes with sed and tr, you haven't learned how to pass parameters to jq correctly. You should use `--arg` and `--argjson`, not shell string interpolation, to pass in external strings / JSON object strings to your jq command. And use `--raw-output` to convert JSON strings to displayable strings (ie no quotes, evaluated escapes) for output.


I do the same thing (want to iterate on some complicated jq query -- or, let's be real, blunder around with my loose grasp of jq). I use https://github.com/lotabout/skim#interactive-mode instead of fzf.

  function jqsk {
    sk --tac --ansi --regex  --query . --multi --interactive --cmd '{}' \
       --bind 'enter:select-all+accept,ctrl-y:select-all+execute-silent(for line in {+}; do echo $line; done | pbcopy)+deselect-all' \
       --cmd-history=${HOME}/.sk_history --cmd-history-size=100000 \
       --no-clear-if-empty \
       --cmd-query "cat $1 | jq --raw-output --color-output --exit-status '.'"
  }
I run it like `jqsk file.json` and then change the `'.'` in the cmd-query to whatever I'm trying to do with jq.

I'll definitely look into some of the other mentioned solutions for this though. I settled on skim a year+ ago and haven't revisited.


Another option for interactive version is: https://sr.ht/~gpanders/ijq/


That’s cool! What’s the use of FZF? Isn’t it a fuzzy finder, what are you searching for in a jq REPL?


It only uses the FZF's preview. The suggestions is completely empty. I tried to find an alternative as FZF has no way of disabling the selector window, but I was unable to find anything that was good enough for this.

I considered forking jid/jiq and using gojq as a library, but I ended up not going down that route because of reasons that I cannot remember. I also considered using a tui or something but FZF has so much already implemented and has a lot of it right, and I didn't particularly feel like re-inventing the wheel.


This looks cool - would you be willing to share it in a brew-installable format?


Not particularly. I don't use a Mac, but I'd be happy to separate out the function into it's own script so it can be downloaded and put in your $PATH. I personally use zinit to manage individual files from random repos.


Why not just use inotify (or similar tools offered by other OS)?


JQ is often frustrating when you want to do something non-trivial but you can't figure out how and the documentation is of little help.

I think JQ could really benefit from having a classic programming language style "book", like "The AWK Programming Language". JQ is fundamentally a functional programming language with semantics that are not obvious reading its current docs.


I've also found the documentation tough to use when I need to jump back in and answer something like "how do I _____?" It feels like one of those sets of docs that's better approached by just reading the whole damn thing, working through some examples, etc. Studying the docs as opposed to referring to them.


Right, I agree. But I think the way the docs are currently written are not really meant to be read through like that. Thats why I mentioned "The AWK Programming Language" book, which is excellently written, and easy to read from start to end.

I don't mean to denigrate the current docs, writing documentation is hard!


This exists. At least, a detailed manual exists. It's the jq wiki on github [1]. Lots of detailed information/recipes.

[1]: https://github.com/stedolan/jq/wiki


Whoa, I didn't know this existed. This does seem to address a lot of what I want. I wish that were more prominently linked to from https://stedolan.github.io/jq/.


Agreed. It's usually just a bunch of trial and error for me, or trying to phrase my problem in a way that matches stack overflow questions.

And the docs for yq (the jq for YAML) are 100 times worse. The documentation feels like it was written by aliens. Even incredibly simple things are impossible for me to find in the yq docs.



It doesn't seem to provide a command-line tool, which is the point of jq



A bit offtopic, but I don't see much people knowing/using the Algolia API[0]. It's much better to use than the HN official API[1], since it returns the whole tree data in one request.

Unfortunately (I guess this is a big reason why people don't use it), it doesn't sort the comments – if you need the orders, you'll have to parse HN HTML (or just use the official API).

Still just two requests (the HN site, the Algolia API) is much better than recursively requesting a hundred requests, so I use this approach in my client[2].

[0]: https://hn.algolia.com/api

[1]: https://github.com/HackerNews/API

[2]: https://github.com/goranmoomin/HackerNews


JSON data is also a valid Prolog term, and the declarative programming language Prolog is ideally suited for handling tree-shaped data.

Using for example Scryer Prolog, we can conveniently relate the data to a flat list of items with Prolog's built-in grammar mechanism, definite clause grammars (DCGs):

    flat_json(JSON) -->
            { JSON = {A,B,C,D,E,F,_:Cs} },
            [{A,B,C,D,E,F}],
            flat_items(Cs).

    flat_items([]) --> [].
    flat_items([I|Is]) -->
            { I = {(A,B,C,D,E,_:Cs)} },
            [{A,B,C,D,E}],
            flat_items(Cs),
            flat_items(Is).
Sample query, using the example JSON data from the article:

     ?- JSON = {
        "id": 27941108,
        "created_at": "2021-07-24T14:15:05.000Z",
        "type": "story",
        "author": "edward",
        "title": "Fun with Unix domain sockets",
        "url": "https://simonwillison.net/2021/Jul/13/unix-domain-sockets/",
        "children": [
            {
                "id": 27942287,
                "created_at": "2021-07-24T16:31:18.000Z",
                "type": "comment",
                "author": "DesiLurker",
                "text": "<p>one lesser known...",
                "children": []
            },
            {
                "id": 27944615,
                "created_at": "2021-07-24T21:26:33.000Z",
                "type": "comment",
                "author": "galaxyLogic",
                "text": "<p>I read this from Wikipedia...",
                "children": [
                    {
                        "id": 27944746,
                        "created_at": "2021-07-24T21:49:07.000Z",
                        "type": "comment",
                        "author": "hughrr",
                        "text": "<p>Yes although I ...",
                        "children": []
                    }
                ]
            }
        ]
    },
       phrase(flat_json(JSON), Cs),
       maplist(portray_clause, Cs).

yielding the flat list of entries, as desired:

    [{("id":27941108,"created_at":"2021-07-24T14:15:05.000Z","type":"story","author":"edward",...)},
     {("id":27942287,"created_at":"2021-07-24T16:31:18.000Z","type":"comment",...)},
     {("id":27944615,"created_at":"2021-07-24T21:26:33.000Z","type":"comment",...)},
     {("id":27944746,"created_at":"2021-07-24T21:49:07.000Z","type":"comment",...)}]


Anyway to avoid the need to enumerate the json fields that come before children with those placeholders? Otherwise, it will be brittle to modifications.

Prolog was the one language I couldn’t get my head around in the programming languages class I took at school.


Awesome technique, I need to get back into Prolog stuff :) love your videos btw.


I love jq for many reasons:

* it is very potent

* it has extensive documentation, written more like parabolas and new-age sorcery than actually helpful content

* it improves your search skills greatly, for many internet results try to give sense to its format

* it gives great satisfaction after you've died-and-retried 300x a whole afternoon for a command you'll only need once.

* it looks cool to use jq instead of [any language you're already familiar with].

Yes, I'm being sarcastic, yet honest. Note that I still use it and recommend it for simpler use cases though.

_[shrug]_


I too find the jq syntax arcane and I've found https://jqplay.org/ to be an invaluable help.


Jq makes me feel stupid, it's comforting to know that it's probably not just me.


To better appreciate the structure of the document the author is dealing with (and to cast a bit of light on which words are variables in the document and which are `jq` syntax. I offer a shameless plug to a one liner

well, I would but the result is "too long for a HN comment" so here a bunch is sniped out of the middle (unedited the result would currently be 140 lines)

  curl -s  https://hn.algolia.com/api/v1/items/27941108 | ~/bin/json2jqpath.jq  

  .
  .author
  .children
  .children|.[]
  .children|.[]|.author
  .children|.[]|.children
  .children|.[]|.children|.[]
  .children|.[]|.children|.[]|.author
  .children|.[]|.children|.[]|.children
  .children|.[]|.children|.[]|.children|.[]
  .children|.[]|.children|.[]|.children|.[]|.author
  .children|.[]|.children|.[]|.children|.[]|.children
  .children|.[]|.children|.[]|.children|.[]|.children|.[]
  <snip>  
  ...  
  </snip>
  .children|.[]|.children|.[]|.created_at
  .children|.[]|.children|.[]|.created_at_i
  .children|.[]|.children|.[]|.id
  .children|.[]|.children|.[]|.options
  .children|.[]|.children|.[]|.parent_id
  .children|.[]|.children|.[]|.points
  .children|.[]|.children|.[]|.story_id
  .children|.[]|.children|.[]|.text
  .children|.[]|.children|.[]|.title
  .children|.[]|.children|.[]|.type
  .children|.[]|.children|.[]|.url
  .children|.[]|.created_at
  .children|.[]|.created_at_i
  .children|.[]|.id
  .children|.[]|.options
  .children|.[]|.parent_id
  .children|.[]|.points
  .children|.[]|.story_id
  .children|.[]|.text
  .children|.[]|.title
  .children|.[]|.type
  .children|.[]|.url
  .created_at
  .created_at_i
  .id
  .options
  .parent_id
  .points
  .story_id
  .text
  .title 
  .type
  .url

[0] https://github.com/TomConlin/json_to_paths


Looks a bit like `jq -rc '[path(..)|map(strings//"[]")]|unique[]|"."+join("|.")'`, but might I suggest `jq -rc '[paths|map(("."+strings)//"[]")|join("")]|unique[]'`?


There is no need to guess, you can just look in the repo.

I like the explicit separators when I'm reading them instead of writing them. Helps the individual steps stand out for me.

Is there a reason other than a more compact line?


Thanks for linking this! I've wanted exactly this script, many times.


I've really loved having jq at my disposal ever since learning about it, but I feel like it took the combination of it and gron [1] to really transform my debugging and JSON workflows.

1: https://github.com/TomNomNom/gron


`gron` is really lifesaver for people like me who can't survive JSON (by simply looking / reading it) but have to deal with JSON on a daily basis (REST API, K8s related). Often times a few gron / ungron runs save the day ;-)


I can't believe the number of times I've wanted to grep a JSON file, and yet never thought to look and see if there was any kind of tool for it. Thanks!


There's a real need for a tool like jq, but jq unfortunately isn't it.

What I mean is this: the functionality offered by jq (parsing json on the command line and extracting what you need from it) is really needed in many modern data processing tasks, but jq's DSL is one of the most horrible thing I've had to learn in recent years.

The only way a casual user of that thing can hope to succeed in actually crafting a working jq query is pray that there is a stack overflow topic answering his exact need.

Here's the way I use jq 99% of the time:

    cat somefile.json | jq . | <long pipeline of traditional, well though-out unix text processing tools such as sed, grep, awk, cut, etc...>
And when this doesn't cut it, I write a python script.


You might prefer gron for this use case: https://github.com/tomnomnom/gron


I often use jq for random hacks but for something like this I would turn to XPath. I know that sounds impossibly retro, but XPath 3.1 for JSON is awesome and its language makes so much more sense than jq’s. There are several good implementations, and all of them are faster than jq, too.


jq is nice, but the moment i need anything more complex than "pull this attribute out of bunch of objects" i vastly prefer spinning up an actual language runtime. or use a tool built around a language (e.g. https://github.com/borkdude/jet) rather than a language built around a tool.


I love posts that are this information-dense. I know for a fact that this will be useful to me some day.


jq seems incredibly powerful. I only find myself using it a few times a year though, and have never been able to conceptualize the syntax enough to use it without prodigious googling.


Think of it just like bash. jq is the ultimate functional language and/or ETL tool. If I look back at larger jq transforms I've written a while ago (e.g. https://git.io/JBSfB) they still make perfect sense to me.


Yes, jq is very powerful yet incredibly inefficient to work with when you don't master its arcanes.


As a systems engineer working with containers, while I find jq really annoying to use, I much prefer it as a dependency to Python or Perl. Not only is it smaller, it's easier to install in weird environments.

Say you need to jump into a running container that isn't running as root, doesn't have sudo/su, and doesn't even have packaging tools. The only practical way to quickly install a debugging tool is either 1) volume-mount a file, 2) copy in a file, or 3) curl/wget. So a small(ish) static binary is the best thing you can hope for. I've actually built a dozen static versions of common debugging tools just to make this easier.

If I had more free time I would port jq's semantics to BusyBox and gain that much more utility out of that lovely little environment.


An introductory script showing the jq, REST, and JSON trinity:

https://github.com/DaveJarvis/github-email/blob/master/githu...


Instead of using recurse(), you can also define a function in jq and call that one recursively. E.g., the example from the OP could also be written as:

  def my_func:
    del(.children),
    (.children[] | my_func)
  ;
  [my_func]
(The function first emits its input object without the "children" field, then emits each element of the "children" array separately, after passing each through a recursive call of the function.

The ; char ends the function definition.

Finally, the function is called once on the top-level input and all emitted results are collected into an array.)

This is more verbose than recurse() but may sometimes be easier to understand if you want to know what is going on.


I wish jq could be used in Python. It would be so nice. Imagine writing a simple jq string and running that, rather than writing complicated loops for examining json data.

It does exist as a library for Python, but unfortunately it links to jq’s C library. That means if I use it, then whoever uses my program has to compile C code whenever they pip install my program. I’m unwilling to do that, because history shows that as t approaches infinity, the chance of C code failing to compile during a pip install approaches 100%.

If anyone’s looking for a challenge, please implement all of jq’s logic in pure Python. It would be so wonderful…


The most powerful thing about jq, IMO is that is can alter some of the json without having to parse and convert to an object all of the json. It is like using data transformation lasers.


Can someone comment on how does `jq` compare with `fx`?

- https://github.com/antonmedv/fx


In a job I did years ago, we had to deal with errors caused by small changes made in huge object trees in C# in a domain that requires handling thousands of parameters that all influence each other.

Fortunately there was a Visual Studio extension that could export objects into JSON/XML/C# representation; then all I had to do was to diff the object dump. Was often called a "Jedi Wizard" for using this trick lol


I'm happy to see that I'm not alone in my struggles with jq. I wanted to love it right out of the box. It appears to be very well engineered, but over and over again I have struggled with it's syntax.

What I think I want is a syntax closer to css selectors. What I think I'm going to have to do is really stop and learn jq. It looks like some of the links in here may help.


Jq is a Command-line JSON processor, if like me you didn’t know and thought it might be Jquery abbreviated


Jq has a playground that’s great for learning how to use it https://jqplay.org/

I often build my Jq steps in the playground before committing.


Jq's query language feels like regular expressions for JSON.

With the same amount of benefits and drawbacks.


Could the title be changed to specify `Extracting JSON objects recursively with jq` please?


jq is a fantastic tool for exploring data and doing simple transformations. I often wish it could consume/write data in other formats.


In python-land, I like to use glom[1], which is basically jq but operating on arbitrary python data structures. I believe there are bindings for jq in python in other languages, which would allow operating on data structures, but I imagine they are just spawning jq as a subprocess, since it doesn't seem like jq has a public C api.

[1] https://glom.readthedocs.io/en/latest/index.html



Jq cries out for a good O'Reilly cookbook.


Forget about jq. JSONata is much more powerful - http://docs.jsonata.org/overview.html


jq is a single binary which I can easily just drop into a remote server or throw into my ~/.local/bin, where as JSONata requires NPM.


Yea what's going on here, you have a couple of other comments pushing Jsonata?

You an author or something?


Not at all. After using jq in the past and seeing both it's potential and flaws I was looking for something more powerful and easy to use, so I stumbled upon JSONata and became a fan.


It doesn't appear to have a CLI, which is the whole point of jq





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: