Hacker News new | past | comments | ask | show | jobs | submit login
Gron: A command line tool that makes JSON greppable (github.com)
247 points by oftenwrong on Apr 2, 2018 | hide | past | favorite | 51 comments

This is cool... but it appears to also work as an HTTP client as well? I'm not sure why the additional complexity of including this is needed.

I maintain that any HTTP client functionality that supports enough options to be useful is complex and if it supports so few options that it's a toy, why include it?

There is non-negligible overhead in keeping track of what shell tools can make network requests, and not being correct and up to date in that area has been the cause of numerous bugs and security issues in tool sets and programs that include and utilize a component like this without realizing it.

To many, this might sound like some overly nit-picky complaint, but I maintain if you ask just about anyone that's been in the trenches as a sysadmin for more than a couple years whether they think saving the few characters it takes to use curl and pipe is a good trade-off for the possible unintended consequences of some developer shelling out to this without proper validation from some webapp, they'll tell you no.

This tool seems awesome, but keep it simple. It's not like it's a GUI tool and it's hard to pass data between programs. A pipe is perfect here.

Author of the tool here.

I think you raise a good point; and I don't think it overly nit-picky.

Personally I don't find myself using the built-in HTTP client at all (and pipe the output of curl instead); but I know people who do. I ummed and ahhed about keeping this functionality for a while, and surveyed the—at the time fairly small—user base to figure out what I should do. What I found was a subset of users whose use-case was very different to my own; they were often running Windows, with no install of curl (which I believe is actually a default on Windows now?), and only wanted very basic functionality.

I'm definitely a proponent of the Unix philosophy, but I try my best to be pragmatic, especially where it can lower the barrier to entry for users.

I had thought of this after my original comment. Windows provides a slightly different set of assumptions that a traditional UNIX-like platform. Mainly, that piping is not as common and the chances that the operating system the tool is deployed on will be used as a server at much reduced. There are a couple middle-ground solutions which might be worth exploring:

- Only build that functionality on windows. Optionally, provide two packages for windows, one with URL fetching and one without it.

- Provide a flag that enables/disables the feature. Either default toit off and allow it to be enabled with a flag (and a helpful error message if oyu attempt to use it without a flag) or vice-versa.

- Do nothing. In the end, it's an entirely valid decision to leave it as it is unchanged. I thought it was something worth discussing, both for this tool and for the larger trend in general, but that doesn't mean the problem is so large that it makes the tool unusable until addressed.

In any case, thanks for making this. The simplicity and obvious usefulness of this tool means that as someone who deals with JSON quite often, generally in an archived form where I care about one or two entries in lists of hundreds, I imagine I'll be finding lots of use for it in the future.

curl is out of box with the next Win10 release expected this month (version 1803) but still not present for older releases or older versions of Windows.

By the way, do the Windows builds of gron automatically recognize and support UTF-16? If anyone does try piping curl output in PowerShell, that's what gron would receive -- PS automatically converts stdout and stderr output to UTF-16 because of its use of .NET String for stream I/O.

>with no install of curl (which I believe is actually a default on Windows now?)

Alas, no. It's just an alias to invoke-webrequest, or what ever their equivalent power shell incantation is.

As of Spring Creators Update (coming any time this month) both curl and bsdtar are present in the base Windows install. On my Insiders build 17133.1, curl 7.55.1 and bsdtar 3.3.2 are both present in C:\Windows\System32.

The problem is that PowerShell 5.x still maintains the "curl" Invoke-WebRequest alias, and that captures `curl` on the PS command line before curl.exe out of the path. However this (and a bunch of other *nix-conflicting aliases) are removed with PowerShell Core, and annoying aliases can be deleted out of the Alias:\ PS-drive on PS 5.x and older.

Good to know, Thanks. I’m sure I will be getting more questions from my co-workers, but adding these utilities is a good thing in the long run.

I just wanted to say that gron is one of the most useful tools I've seen posted to HN in a very long time.


Thank you so much! That really means a lot! :)

I came here to say the same thing. It looks like a great tool which I will use frequently on large json files but I don't get why the author bothered to implement a basic HTTP client with some curl-like options when it can just as well read from stdin.

Maybe in a container build environment like CircleCI where having a small concise toolset is desirable?

Is it? The one the curl does very well is make network requests. Wait no, that's netcat*.


That seems overly harsh. It's a cool utility, and the author even shows it being used with pipes in numerous ways. I'm just arguing that including non-essential features is a trade-off. Often it's just complexity for ease of use, but I think this one has some possible security implications, and what you get for that may not be worthwhile.


Harsh because it became negative about the person, rather than the tool. Being immediately negative and critical of the tool is already quite harsh in the real world, but engineers (and especially HN) have a tendency to do this and find it normal. I don't know why, but that horse seems out of the barn. But a glib dismissal of an emotional being's knowledge after they make themselves vulnerable is just that: harsh.

I appreciate your relentless cutting through the bullshit and fluff attitude. I get it, and I do see the value: say it like it is, don't sugar coat it. But in this case, you cut right down to and through a person's flesh. It's different when you talk about established culture or faceless companies, from when you talk about a human being who is literally reading this right now.

Edit: just to add, it seems to me, your attitude is precisely what would make you a good engineer. Bam, here's the problem, fix this, the system is better. It wasn't intended as personal, I don't think. But yeah, at the end of the day, we're all vulnerable emotional beings who crave love and affection. We can be honest and clear, and still a little supportive and loving at the same time :)


> Please... Do you really think that upon reading my comment the author going to assume the foetal position and cry?

I'm not sure about the GP here, but my original reply was not because I thought you were hurting someone's feeling, but because ultimately it seemed inaccurate. I looked through the project README before commenting, and what was obvious is that the author does know how to use the command line, so I tried to keep my comments relevant to the point.

The README contains aliases, redirection (both truncating and appending), multiple pipes within a single command line, and ultimately, within a reproduction of the output of the --help flag an example of piping curl to this command.

Ultimately I found your original comment inaccurate. I find comments that are style over substance, especially when that style is used to denigrate an individual, rarely useful.

> Children in my school used to use the phrase "that's harsh!" upon certain remarks from other children. And do you know which part cut the deepest? It was the "that's harsh". Why? Because it is a double attack on me. It both acknowledges that was said is true and adds that I am too weak to handle the attack myself.

Saying "That's harsh!" doesn't itself imply accuracy to the statement. Just the opposite, in fact, since it's an acknowledgement that the parson went a too far. The delivery, on the other hand, can change the meaning entirely, and I think that's what you are referring too.

That said, I think you're mistaken to equate my use of it to this past experience of yours. I obviously wasn't saying it in a way like "Ha, sick burn!", but instead "I think you've overstepped in a way that hurts your point, if your point is valid at all".

> Are you projecting your insecurities on the author?

It's interesting that you included this, as I think you're perhaps projecting some insecurities from your own past onto some of the statements being used here. At a minimum, your own words have brought that possibility into the discussion.

Author of the tool here.

I can definitely see why you might think that, but I don't think it's fair to bring my presumed lack of understanding of the command line up as a reason; especially when some of the documentation[0] is a bit of a pipe party.

I've addressed this a little in another comment already[1], but the truth is that the HTTP client functionality only really exists for the sake of a few users that I heard from directly. curl being a default on Windows now[2] may be enough that this feature can be removed. Perhaps the feature should never have existed, but now that it does it's difficult to justify removing it - you never know whose workflow you might break[3].

[0] - https://github.com/tomnomnom/gron/blob/master/ADVANCED.mkd

[1] - https://news.ycombinator.com/item?id=16733361

[2] - https://blogs.technet.microsoft.com/virtualization/2017/12/1...

[3] - https://xkcd.com/1172/

Fair enough. Might be worth changing the docs to use curl and adding that the built in simple HTTP client is a convenience that exists for limited platforms.

If you use jq[0] and are wondering why Gron, the answer is at the very bottom of the readme:

jq is awesome, and a lot more powerful than gron, but with that power comes complexity. gron aims to make it easier to use the tools you already know, like grep and sed. gron's primary purpose is to make it easy to find the path to a value in a deeply nested JSON blob when you don't already know the structure; much of jq's power is unlocked only once you know that structure.

[0] - https://stedolan.github.io/jq/

    $ jq -c tostream <<<'{"a":[{"b":2}]}'
However, filtering that and then reconstructing JSON from that is... not possible at this time:

    $ jq -c tostream <<<'{"a":[{"b":2}]}'|jq -crn 'fromstream(inputs)'
    $ jq -c tostream <<<'{"a":[{"b":2}]}'|grep b|jq -crn 'fromstream(inputs)'

The reason is that tostream and fromstream can handle multiple top-level JSON texts, since jq normally does too, but then there's an ambiguity issue to resolve by having a sort of an object terminator. Filtering tostream's output with grep loses the terminators, and so fromstream cannot operate normally.

But it should be possible to define a function that does allow this, by, e.g., requiring just one top-level JSON text.

The other thing is that a path-based encoding that does not require quotes and commas would be handier -- tostream's output is itself JSON, so it's not shell-friendly. This is gron's brilliant innovation: it's got a path-based encoding of JSON that is easy to deal with in a shell script. (Mind you, I'm not sure that using brackets to denote array indices is all that easy to use, but the need to disambiguate object keys that look like numbers is critical. Also, there's an ambiguity as to keys that have embedded periods ('.') in them. And lastly, even gron can't shake off the string quotes for values.) That jq has the builtin functionality needed to do the same is not good enough if it doesn't actually do it out of the box.

It occurs to me that the way to get rid of quotes in string values is to not include the quotes but print the actual string with newlines (and maybe other characters, like double-quotes) escaped.

And the way to get rid of ambiguity regarding object keys that contain periods or " = " (and also square brackets) is to escape them: ".." and " == " or similarly.


    .foo.bar[0].baz == ..blah = this is a\ntwo-line string
where the last key in the path is "baz = .blah".

Also, " = " is a bit annoying. I'd prefer ": ":

    .foo.bar[0].baz: this is a\ntwo-line string
The the quoting rule for the special chars in keys can then be generic: double them.

    .foo.bar[0].baz[[5]]:: ..blah: this is a\ntwo-line string
Here the last key in the path is "baz[5]: .blah". Mind you, this is still not trivial to deal with in a shell script, so perhaps we need some other escaping mechanism -- one that doesn't reuse the escaped characters, such as \u escaping.

What exactly was the benefit of removing quote characters? It does allow for easier grepping in some instances, but as you already covered, the brackets from array notation generally need to be quoted as well. I think you're better off just assuming the need for single quotes when grepping this output, since the fact the output is valid javascript is very useful, IMO (especially with autovivification in JS), and losing that to make it slight easier to search is regression in my eyes.

But maybe I'm missing the benefit you're seeing, and it's not about searching?

First off, you'd still have to escape newlines, and probably keep all other escapes required by JSON. But then the quotes would be unnecessary, thus wasteful. In particular, if I wanted to print a raw string (with escapes) at a particular path I could first use grep(1) to extract that path, then I'd have to write a fairly complex sed(1) command to first remove the path, then remove the quotes, whereas I could otherwise use grep(1) and cut(1) only.

Mind you, I'm sticking to jq, as I know it really well. But I'm thinking of other users here. I think the value of a path-based transformation of JSON is ease of use, which motivates me to think about making it even easier to use, such as by removing those quotes.

> then I'd have to write a fairly complex sed(1) command to first remove the path

No need for sed for the path, use cut for that as well. cut -d'=' -f2- will remove the path (but leave a space).

In the end, you can accomplish it with the following, whichI think is fairly easy:

  echo 'json[0].foo.bar.baz = "some string";' | cut -d'=' -f2 | sed -e 's/^\s*"//' -e 's/";$//'
For me, it's a toss up whether I would use that or Perl, since chances are I'm doing it as a first step in some other process, and I can just continue on in Perl for the rest of the process anyway.

  echo 'json[0].foo.bar.baz = "some string";' | perl -pE 's/^.*?"//; s/";$//;'
I find keeping the output as valid JS extremely useful though, since I can just paste a grepped entry into a developer console to get a valid object to play with on a page. That's cutting out a pipe to a js prettifier, pipe to less, search for identifying text, and careful cut and paste to get the enclosing block of text for what ends up being a semi-common action for me. On the other hand, I can get raw strings, but barely ever have need of that, and could fairly easily make an alias for that if it became common.

A single grep and cut would be simpler. You've proved my point :)

> Also, " = " is a bit annoying. I'd prefer ": ":

    .foo.bar[0].baz: this is a\ntwo-line string

The output of gron is perfect valid JavaScript. It wouldn't be that way with `:` as key-value delimiter.

I see. And, indeed, it's perfectly valid jq as well...

Mind you, one should not eval code to parse data. So I count this as a minus.

and this sounds awesome, I'll definitely test Gron. I love jq, but oh boy how complex it is. I find its syntax getting very confusing very quickly as soons as you try to be a little fancy.

It's a full-blown, dynamically-typed functional programming language. Well, not quite full-blown: it's missing closures of indefinite extent, but still.

I wrote a tool called jsonsmash[0][1] that's meant for a more explorative view of the JSON files. It basically exposes the data in a minishell, complete with `ls` (with a ton of the standard flags), `cd`, `pwd`, `cat` (which outputs in yaml), and some others. My main use case was to read some json files that were far too big to load in standard editors (210mb+) and for that it has worked great.

[0] https://www.npmjs.com/package/jsonsmash

[1] https://blog.tedivm.com/open-source/2017/05/introducing-json...

This sounds awfully similar to Augeas CLI. Not that I endorse Augeas in any capacity...

This kind of tools borderline the very interesting threshold where writing your own tool can be less of a mental effort than discovering, learning and keeping track of these small utilities separately.

I've written a similar tool in Python, both for JSON and XML. Especially the JSON version was dead simple, probably fits on a single screen and took 15 minutes to test and write. Surely it didn't have any "features" but it does the job of letting me grep json.

Gron is probably 10x more versatile and actually comes with useful features but I'd really have to have pressing needs to do transformations of JSON on a regular basis to switch over.

The same applies to libraries in programming languages. There is a very vague threshold, depending on the expressiveness of the language and operating environment as well as the hardness of the problem itself, where it either makes sense to write your own library or reuse an existing one.

If you'd like to see some more tools for dealing with structured text take a look, https://github.com/dbohdan/structured-text-tools a pretty nice list.

There's some prior art; namely jsonpipe/jsonunpipe:


Alternative: use ogrep (https://github.com/kriomant/ogrep-rs / https://github.com/kriomant/ogrep) on pretty-printed JSON.

I find jq very useful for tasks like this. But it's more awk for JSON than grep for JSON.

And rq has even easier syntax.

I.. really don't agree. I had to use rq because I was working with a toml file and I could not for the life of me figure out even the basic things I was trying to do. The documentation was sparse and unclear compared to jq's relatively concise but clear documentation.

In the end I wound up just using rq to convert the toml to JSON so I could use jq on it.

It was a while ago so I suppose it's possible I just used it too early in its life or something though.



Never had heard of rq, so thanks to the person above for that. I also use jp from JMESpath. I like it because both azure and aws cli use the JMESpath way of dealing with JSON so filtering between two different cloud providers is at least a _little_ easier when hunting needles in haystacks.

For some compare and contrast between: jsonpath, jq, and jp these were for getting AWS ec2 instance IDs:

  cat foo.json | jsonpath -p $.InstanceProfiles.[*].RoleId
  cat foo.json | jq .InstanceProfiles[].Roles[].RoleId
  cat foo.json | jp InstanceProfiles[].InstanceProfileId

apologies to mobile users!

I have written a similar tool called catj [1]. I mostly use it when I need to construct JSON expressions when working with jq.

[1] https://github.com/soheilpro/catj

A few years ago I made a Python script that does the same thing, minus the reverse mode of converting the assignments back into JSON: https://github.com/xenomachina/jsflat

I've found flattening JSON in this way not only useful for line based tools like grep, but also for understanding unfamiliar JSON. Sometimes it's nice to be able to see the whole path down to the value you're looking at.

Thanks for this! "Grep for absolute path" is the use-case that the otherwise awesome `jq` doesn't address well. I've found myself having to iteratively drill down to find the field that has the data I need. One of those "eh, I'll automate this [poorly] someday..." things. :D

If you’re stuck on a foreign box or don’t want to install a new tool:

   cat xyz | python -mjson.tool | grep foo

Python's json.tool is a pretty-printer. It's not anything like gron.

(json_pp is another pretty-printer that is likely to be already on your system.)

This assumes that you have python installed ;)

I might have chosen a name that sounds less like Cron, but regardless, I can see how Gron would be useful.

    json.Host = "headers.jsontest.com";
    json["User-Agent"] = "curl/7.43.0";
Why not use simple notation for eveything without '.' in key?

Author of the tool here.

The output is designed to be valid JavaScript, which doesn't allow certain characters in unquoted object keys, like the dash in User-Agent.

Using JavaScript's rules for quoting keys makes it a lot easier to specify the grammar (and therefore write the parser); and makes it trivial to 'parse' the output using JavaScript should you want to.

There must be _some_ rules in place for when to quote the key (e.g. when there is a dot, equals sign, square brace etc in the key name), so I see no reason to adopt something custom and potentially error-prone when a known-good set of rules already exists.

Hope that answers your question well enough!

Looks like I have a new Chocolatey package to create and push up when I get home tonight.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact