Hacker News new | past | comments | ask | show | jobs | submit login
JJ: JSON Stream Editor (github.com/tidwall)
184 points by ingve on May 26, 2023 | hide | past | favorite | 48 comments



I'll take the chance to bring attention to the maintenance issues that 'jq' has been having in the last years [1]; there hasn't been a new release since 2018, which IMO wouldn't necessarily be a bad thing if not for the fact that the main branch has been collecting improvements and bug fixes [2] since then.

A group of motivated users are currently talking about what direction to take; a fork is being considered in order to unlock new development and bug fixes [3]. Maybe someone reading this is able and willing to join their efforts.

[1]: https://github.com/stedolan/jq/issues/2305

[2]: https://github.com/stedolan/jq/pull/1697

[3]: https://github.com/stedolan/jq/issues/2550


What exactly is missing/broken in jq right now which warrants a fork? I've been using jq daily for years, and I can't remember the last time I hit a bug (must have been many years ago) and I can't recall any features I felt been missing for the years I've been using it.

For me it's kind of done. It could be faster, but then I tend to program a solution myself instead, otherwise I feel like it's Done Enough.


I wouldn't say I need the program to grow with more features, but at the bare minimum they should have been more diligent with cutting releases after accepting bug fixes, instead of letting those contributions langish on the main development branch out of reach for users.

I mean it would be understandable if the maintainers didn't have the time to keep working on it at all, but clearly the review work was done to accept some patches so why not make .point releases to allow the fixed code reach users via their distribution's channels?


What I miss from jq and what is implemented but unreleased is platform independent line delimiters.

jq on Windows produces \r\n terminated lines which can be annoying when used with Cygwin / MSYS2 / WSL. The '--binary' option to not convert line delimiters is one of those pending improvements.

https://github.com/stedolan/jq/commit/0dab2b18d73e561f511801...


You’ll have a much better experience in Cygwin/MSYS2/WSL if you treat them like isolated environments and not call programs from outside of them. If you want to use ‘jq’ (or any tool) within Cygwin, install the Cygwin package. Don’t rely on the Windows install, and you’re guaranteed to run into problems like this.


> What exactly is missing/broken in jq right now which warrants a fork

AFAIK there’s quite a few bug fixes and features that are accumulated on the unreleased main branch, or opened as PRs but never merged.

IIRC I hit one of the bugs while trying to check whether an input document is valid JSON.

I should try checking out what’s happening to the fork, I’ve never opened a PR or something but I’ve read the source while trying to understand the jq language conceptually, and I’d say it’s quite elegant :)


The README fir Jj points out how it is exponentially faster than jq. Presumably some of those improvements would help this.


> It could be faster

A decaffinated sloth could be faster.


Looks like it's because @stedolan goes silent and not delegating the right GitHub repo accesses to the existing maintainers.

He seems to be working at Jane Street though, so if anyone is able to reach him please help the jq community :)

https://signals-threads.simplecast.com/episodes/memory-manag...


I like jq, but jj is so fast it is my go-to for pretty printing large json blobs. Its parsing engine is available as a standalone go module, and I've used it in a few projects where I needed faster parsing than encoding/json:

https://github.com/tidwall/gjson


I don't think I've ever been limited by jq's speed, but good to know there are alternatives if it ever becomes a bottleneck.

Other than that I can't think of a reason to use this over jq; the query language is perhaps a bit more forgiving in some ways, but not as expressive as jq (and I've spent ~8 years getting pretty familiar with jq's quirks)


The limiting speed factor of jq for me is, by far, figuring out how to write the expression I need to parse a fairly small amount of data. I do a bunch of support analysis and often writing a one-liner to put into a shell script to extract some bit of JSON to re-use later in the script. Often this is going to be used only once by me or a customer to run some task.

Followed closely by figuring out the path to the area of data I'm interested in. "gron" has been a real time saver there - it converts the json into single lines of key/value - so you can use grep and find the full path for any string.

Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there, but, I'm usually in the terminal doing a bunch of different tasks looking through all manor of command outputs, logs, etc :)

Relatedly my primary use of ChatGPT has been asking it to write jq queries for me, it's not too bad at getting close. It's biggest blindness seems to be string values with a dash, which you have to write as ["key-name"].


> Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there

Try https://jless.io/ then.


I agree that figuring out non-trivial jq expressions takes a lot of time, often accompanied with a consultation of the somewhat lacking docs, and some additional googling.

Nonetheless, it is pretty slow at processing data. For example, converting a 1 GB JSON array of objects to JSON Lines takes ages, if it works at all. Using the steaming features helps, but they are hard to comprehend. It gets memory consumption under control and doesn't take super long, but still way too long for such a trivial task IMO.


I’m far more likely to parse json into clojure repl session and go from there these days. Learning jq for the odd json manipulation I need to do seems like overkill


For me it's usually for some automation task to gather a list of IDs for some cloud environment to build infra things.


> Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there

I use an app called OK JSON on the mac for this. Its okay.


emacs has a command to get the current path at point.


Which one is it exactly, please? I'd like to use it.


Interesting! I tend to use gron to bring JSON into (and out of) the line-based bailiwick of sed and awk where I'm most comfortable, rather than a custom query language like jq that I'd use much more rarely. But I guess that's at the opposite extreme of (in)efficiency than both this and the original jq.

There might be a nice 'edit just this path in-place in gron-style' recipe to be had out of jj/jq + gron together...


Are there any gron-like tools for xml? I'm aware it's a harder problem (and an increasingly rare problem) but perhaps someone has tackled it nonetheless?


xml2[1] turns xml into line-based output. and 2xml reverses.

[1] https://github.com/clone/xml2


Just looked up gron - thanks. This looks useful.


Am I correct in understanding that this can only manipulate (get or set values) from a JSON path? That is, is it not a replacement for jq?

For example, I frequently use jq for queries like this:

    jq '.data | map(select(.age <= 25))' input.json
Or this:

    jq '.data | map(.country) | sort[]' input.json | uniq -c
Is it possible to do something similar with this tool?

This is not a slight at jj. Even if it's more limited than jq, it's still of great value if it means it's faster or more ergonomic for a subset of cases. I'm just trying to understand how it fits in my toolbox.


It looks like the README in jj repository does not do justice when it comes to available syntax for queries. jj uses gjson (by the same author) and its syntax [0]. From what I saw the first one can be handled with:

    jj 'data.#(age<=25)#' -i input.json
I don't think there is a way to sort an array, though. However, there is an option to have keys sorted. Personally, I don't think there is much annoyance in that. One could just pipe jj output to `sort | uniq -c`.

I just discovered that gjson supports custom modifiers [1]. So technically, you could fork jj, and add another file registering `@sort` modifier via `gjson.AddModifier` and have a custom jj version supporting sorting.

[0]: https://github.com/tidwall/gjson/blob/master/SYNTAX.md

[1]: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#modif...


Annoyingly, I think `jq` might still be the only tool capable of these kinds of things. The rest seem to be "query simple paths and print the result" (which is handy, of course - I often use `gron` to get an idea of the keys I'm after because the linear format is easier to handle than JSON.)


A while ago I wrote jlq, a utility explicitly querying/filtering jsonl/json log files. It’s powered by SQLite. Nice advantage is it can persist results to a sqlite database for later inspection or to pass around. Hope it helps someone :)

https://github.com/hamin/jlq


I've been using the gjson (get) and sjson (set) libraries this is based on for many years in Go code to avoid deserialising JSON responses. Those libraries act on a byte array and can get only the value(s) you want without creating structs and other objects all over the place, giving you a speed bump and less allocations if all you need is a simple value. It's been working well.

This program could be an alternative to jq for simple uses.


For those wondering, the README states it's a lot faster than JQ, which may be the selling point.


jj is faster than jq.

However, jsonptr is even faster and also runs in a self-imposed SECCOMP_MODE_STRICT sandbox (very secure; also implies no dynamically allocated memory).

  $ time cat citylots.json | jq -cM .features[10000].properties.LOT_NUM
  "091"
  real  0m4.844s
  
  $ time cat citylots.json | jj -r features.10000.properties.LOT_NUM
  "091"
  real  0m0.210s

  $ time cat citylots.json | jsonptr -q=/features/10000/properties/LOT_NUM
  "091"
  real  0m0.040s
jsonptr's query format is RFC 6901 (JSON Pointer). More details are at https://nigeltao.github.io/blog/2020/jsonptr.html


Looks neat. One suggestion: add better build instructions on wuffs readme/getting started guide. I jumped in and tried to build it using the "build-all.sh" script that seemed convenient, but gave up (for now) after nth build failure due yet another missing dependency. It's extra painful because the build-all.sh is slow, so maybe also consider some proper build automation tool (seeing this is goog project, maybe bazel?)?


Thanks for the feedback. I'll add better build instructions.

If you just want the jsonptr program, instead of everything in the repo (the Wuffs compiler (written in Go), the Wuffs standard library (written in Wuffs), tests and benchmarks (written in C/C++), etc) then you can use "build-example.sh" instead of "build-all.sh".

  ./build-example.sh example/jsonptr
For example/jsonptr, that should work "out of the box", with no dependencies required (other than a C++ compiler). For e.g. example/sdl-imageviewer, you'll also need the SDL library.

Alternatively, you could just invoke g++ directly, as described at the very top of the "More details are at [link]" page in the grand-parent comment.

  $ git clone https://github.com/google/wuffs.git
  $ g++ -O3 -Wall wuffs/example/jsonptr/jsonptr.cc -o my-jsonptr


Presumably the memory footprint is often far less too.


Hey there,

Just wanted to drop a quick note to say how much I'm loving jj. This tool is seriously a game-changer for dealing with JSON from the command line. It's super easy to use and the syntax is a no-brainer.

The fact that jj is a single binary with no dependencies is just the cherry on top. It's so handy to be able to take it with me wherever I go and plug it into whatever I'm working on.

And props to you for the docs - they're really well put together and made it a breeze to get up and running.

Keep up the awesome work! Can't wait to see where you take jj next.

Cheers


This behaviour looks confusing to me:

$ echo '{"name":{"first":"Tom","middle":"null","last":"Smith"}}' | jj name.middle

null

$ echo '{"name":{"first":"Tom","last":"Smith"}}' | jj name.middle

null

It can be avoided with option '-r' which should be the default, but is not.


I don't get this behavior for your second command, it just seems to return an empty string.

edit:

There are three cases to cover:

1. The value at the path exists and not null.

2. The value at the path exists and is null.

3. The value at the path doesn't exist.

jj seems to potentially confuse 1 and 2 without the -r flag. "middle": "null" and "middle": null more specifically. It probably confuses "middle": "" and missing value as well, that's 1 and 3.


I wish this existed when I was trying to look at 20G of firebase database JSON dump.


that is what gets me, why did the file get to 20g? At that point just ship a SQLite file.


Does it matter why? Sometimes files gets big, and you don't control the generation or trying to change the generation is a bigger task than just dealing with a "big" (I'd argue 20GB isn't that big anyways) file with standard tools.


Nope, it matters a lot! Unstructured unindexed files get that gig usually as the result of some design flaw.


Interesting. How often do you manipulate a 1+MB JSON file? Maybe I am wrong, but going from 0.01s to 0.001s doesn't motivate me to switch to jj.


Datasets are often stored in (sometimes gzipped) jsonlines format in my field (NLP). The file size could reach 100s of GBs.


100s of GBs?

In those cases, querying un-indexed files seems quite a thinko. Even if you can fit it all in RAM.

If you only scan that monstrous file sequentially, then you don't need either jq or jj or any other "powerful" tool. Just read/write it sequentially.

If you need to make complex scans and queries, I suspect a database is better suited.


Usually you indeed scan this file sequentially, doing some filtration / transformation. As you do this transformation for each record, the speed of the tool used (e.g. jq) really matters.

Databases are not used in this case because it’s a complexity overhead compared to plain-text files. The ability to use unix pipelines and tools (such as grep) is a bonus.


I would like to see a comparison with jshon. Jshon is way faster than jq and for many years available in your distro repositories.


Cool, didn’t know about jshon, how’s the query language?


Almost non-existing. A couple of excerpts from man page:

  {"a":1,"b":[true,false,null,"str"],"c":{"d":4,"e":5}}
  jshon [actions] < sample.json
  jshon -e c -> {"d":4,"e":5}
  jshon -e c -e d -u -p -e e -u -> 4 5
Yet this covers like ~50% of possible use cases for jq.


Is this the SAX of JSON?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: