I've been frequently derided for demoing it but I should write something about how to do similar tricks with PowerShell, as I have found it easier to learn than jq and an underrated tool in dealing with JSON.
I haven't used pwsh in a few years, but does `ConvertTo-Json` still have that terrible `-Depth` parameter that you have to remember to set so that it doesn't mangle your JSON? A quick web search indicates that it does.
So you have to remember to set `-Depth 100` for every invocation of `ConvertTo-Json`. You also can't set it higher because it has a hard-coded limit of 100, so hopefully you never deal with JSON that has more nesting than that.
Yeesh. I like PS, but this one commandlet's design has always baffled me.
I was wondering about that, because the docs say it uses Json.NET, which while an excellent library is being replaced with System.Text.Json in the BCL/FCL as far as I know.
Yep. Love powershell, disliked this decision, but now understand it: objects returned from native powershell commands have a loooooooooooooooooooot of nesting. Try:
In the words of Raymond Chen, adding a `-Depth` parameter is using a global solution to a local problem. If certain types should not be serialized beyond a certain depth, the config should be per type, not on the whole serialization.
Or leave it as it is and expect the user to map the complex values into hashtables. Eg in your example, make the user pipe the objects through `%{ @{ Name = $_.Name; Mode = $_.Mode; } }` first.
In any case, I don't know about other PS users, but all my command-lines that ended with ConvertTo-Json either started with ConvertFrom-Json (transforming an existing JSON file), or started with HashTables (building a JSON file from scratch) or a mix of the two. Therefore I always wanted everything to be serialized, and the `-Depth` parameter was always a nuisance.
Why? Both JSON strings and their deserialized value's memory usage grows only linearly if you nest stuff (unless you're doing something dumb).
By that logic they should also add -MaxArrayLength to make the parser stop parsing long JSON arrays, -MaxProperties to make it only parse object properties up to a limit, -MaxStringLength to only make it parser strings up to a certain length...
Again, I'm not saying there shouldn't be a default limit. I'm saying the limit should be per type, not for the whole serialization.
It can be something as trivial as "PSHashTable is allowed to serialize without limit. Every other type is limited to N levels, unless the user sets `$SERIALIZATION_LIMIT[System.Type]` to some other value." Or make the `-Depth` parameter a `Dictionary<System.Type, int>`.
It turns an object hierarchy into flat text. References in the object hierarchy can be circular, and the Depth parameter limits the infinite loop a circular reference would cause.
Detecting circular structures (sometimes just called graphs), is a solved problem. The Lisp reader and printer even serialize them correctly: #1=(42 . #1#) makes a pair of two items. The first item is a 42 and the second is the pair itself.
It sounds like whoever implemented this just didn't know what they were doing.
Idk that's just not something I do with the shell. At work I process a lot of large files, but typically those require hand written code to use off heap or file/db backed collections (I work in the JVM).
Since Powershell isn't working on text but on objects, you might just want to use grep.
Anyways, this should reproduce the output of grep in powershell:
You will want to only search in files for this to not ouput some lines of InputStream if there is a match in the actual path.
If $pattern is an actual regex you will want to drop SimpleMatch.
In case you want some proper powershell objects you might want to pipe this into Get-Item.
Everything in powershell is an object, but sometimes the output format doesn't make it clear. Select-String basically returns a match object with some nested results, but the output formatting makes it awkward to see. Pipe it to
| format-list *
to investigate it, and you'll see a bunch of members. In this case if what you want is just the matched string, you want to expand the `Matches` property of the object, then expand the value property of each of those. In this case the term 'expand' is key: if you just use `select` it'll return you the property, name and all. Use `-expand` to return the plain array. So something like:
PowerShell is supported on Linux and macOS as well [1]
I completely agree with the parent that PowerShell blows the existing shell options out of the water. Having pipelines with objects rather than just text makes everything so much easier. Instead of spending an hour futzing with awk or regex or applicatioons like jq to parse values from command results you can just access what you want directly and get on with your work.
I'm glad you're productive in your CLI tool usage but just because it takes you an hour futzing with basic *nix tools doesn't mean it's the same for everyone else. Powershell is extremely cumbersome for me due to the sheer verbosity of each command and the apparent MO of "let's come up with a new name for everything" and "sensible defaults? what's that?"
PowerShell is, in my opinion, quite discoverable and usable with a few commands to find your footing. I don't memorize a lot, because it's so well-structured and intuitive for the most part that once you get the basic structure it's easy to hack around with the docs.
Three commands that are useful to memorize, though, (particularly if you're having trouble remembering names) are Get-Command, Get-Alias (also with -Definition), and Get-Member. Get-Command gets you info about a command, like if it's an alias or not, and the path if it's a unix command. Get-Alias shows you all the active aliases, and Get-Alias -Definition shows you the active aliases for a given command. Get-Member shows you all the members of an object which can help you with Select-Object and Where-Object and so on.
With tabcompletion you do not need to have aliases, but you need to name your commands in a sane way, having the meaningless words forst like powershell does is really cumbersome for tabbing. Examples: Where-, Get-, Remove-, Export-, From-, Select-, Invoke-
FWIW I'm talking about CLI, I have little experience with menu completion, but how does menu completion solve the Get-<TAB> problem it will give too many hits and parsing that will take too long. Menu-completion in Intellij, and it's ilk, is more about discoverablity than fast typing then you can take your time parsing the output.
I just don't want to have to type that much physically, let alone the verbose commands and slightly odd syntax you often have to memorize. I know some love it, but I'm much happier with just using bash everywhere, even though msys/git bash on windows has its' own quirks, WSL ftw though.
Yes, awk and jq “just work” for me: whether it’s because I’ve trained myself through constant usage or whatever, I’ve yet to understand the supposed benefits of a richer IPC mechanism than byte streams.
You definitely should! Is it alright if I copy this comment onto my blog underneath this post in the "comments" section?
If you do write such a post make sure to let me know so I can link it from mine! You could write a response "here's how I'd do all the stuff in that post in Powershell with no extra tools" that would be very cool :)
I needed this tutorial 6 months ago (and 6 months before that, and 6 months before that). :D Highly recommend looking at and maybe including "gron"[1] in this as a very nice complement to jq. It fills in some use cases in a very straightforward way that are pretty cumbersome in jq, such as finding a field deeply nested in an optional parent.
Earler today I was looking at some deeply nested structures which had leaf nodes with a field named "fileURL", and values that were mostly "https://" but some were "http://". I needed to see how many, etc.
And later I had to do the same for other fields that ended in URL (websiteURL, etc.)
Anyway, gron made it simpler to get a quick summary of what I needed because it represents every node in the tree as a separate line that has the path in it, which is perfect for grep.
gron is great! And because the 80% use case of "gron -> grep -> gron -u to turn it back into JSON" is so common, I use these two convenience wrappers that make it easier. I called them grop because its' gron + grep:
Of course, although I wouldn't be surprised if a jq expert comes along and gives an easy way to deal with deeply nested children of an optional parent, making my comment moot. :D
Ha, I never let that stop me! You can do some of the stuff in my post other ways, like using the `--jsonpath` switch in kubectl, but 1. I prefer learning jq rather than the idiosyncratic syntax/flag of each tool & 2. Let 1000 flowers bloom. I figure its OK if there's more than one way to do something!
I love idea of "json lines" which is implicitly here, in jq. You can convert giant json array of objects to "json lines" and back with jq.
A lot of tools could produce them, and then you can mix and match jq, simple unix grep, head etc, because each unix line of your stream is a complete parsable json, guaranteed.
back in a day I've loaded a lot of data to postgresql and elastic search after preprocessing it with very simple but powerful chain of CSV parsers (I've used CSVfix), jq, sort, grep, etc.
I found the most value in some of the additional tricks shown in the article (like the “pretty print JSON stored in the clipboard) over the introduction/tutorial aspect; there are hundreds of these “look, I figured out how to do a handful of interesting things in jq so let me share with you all” tutorials floating around and most of the basics are redundant among those by now.
If jq’s documentation had some basic tutorials it would probably be a better authoritative source for this information.
Until then, as said, thanks for preparing and sharing this!
> But, this prints all 317 resource types on a new line, not super helpful.
> So, let's group by and count:
> cat 1396-Ledner.json | jq '.entry[].resource.resourceType' | sort | uniq -c | sort -nr This gives me a list that looks like this:
This is so funny! We came up with some of the exact same combinations of tools (jq + sort, uniq, wc etc.). I mean, it makes sense so I shouldn't be surprised!
> We could select for this using jq, but promql already lets us filter by metric names so we'll do that instead
I think it would have been much better to do this via jq instead, for the sake of demonstration. This is, after all, an article on jq, not on promql...
The reason I was putting the querying into promql where possible is that it is (I assume) more efficient to do that filtering on the server & only return matching values rather than returning "everything" & filtering locally. But your point stands, it's tangential here. Thanks for the feedback this will definitely improve the post!
I think it makes sense to point out that it could be done better using just promql, but it's useful to show how to do it in just jq for those using other tools!
+1 for jq. It is so useful for mucking about with API responses. I used it to write a bash script that acts as a NAGIOS plugin for spotting changes in our Digital Ocean server and DNS configs - curl to pull from the API, jq to strip out the bits we don't care for, and sed to put actual newlines in instead of \n.
We had Cisco as a customer, and infrastructure change-monitoring was one of their infosec team's requirements... that was part of our solution.
Suggestions for a more memory-efficient JSON processor for resource constrained computers? All the ones I have tried were no better than using standard line-oriented UNIX utilities.
If you want to extract some data from JSON and wrangle it into another form (filter, count, dedupe etc), but _don't_ want to use a CLI, you might want to look at our drag and drop software for Windows and Mac: https://www.easydatatransform.com
JSON is so slow and inefficient. All that marshaling and unmarshaling is crazy. Binary formats will rise again.
This comment says it best I think, "On the other hand, if you only need 53 bits of your 64 bit numbers, and enjoy blowing CPU on ridiculously inefficient marshaling and unmarshaling steps, hey, [JSON parsing] is your funeral." - Source https://rachelbythebay.com/w/2019/07/21/reliability/
REST is so inefficient. All those bloated text fields.
HTTP is so inefficient. All those newline-deliminated case-varying strings.
Logs are so inefficient. Error codes are so much shorter and precise.
Computers are here to make humans' lives easier. But it's not easier if we're forced to use tools that communicate in obscure ways we can't understand and later need translators for. If a human has to debug it, either it has to be translated into a human-language-like form, or it can just stay that way to begin with and save us the trouble of having to build translators into anything that outputs, extracts, transforms, or loads data.
JSON is actually a wonderful general-purpose data format. It's somewhat simple (compared to the alternatives), it's grokable, it's ubiquitous, and you can `cat` it and not destroy your terminal. But of course it shouldn't be used for everything. Actually one of the only things I find distasteful about JSON is the lack of a version indicator to allow extending it. But maybe this was its saving grace.
If "blowing CPU on JSON marshaling" is your big problem, boy, your business must be thriving.
I'm struggling to remember who, but I just recently saw a tech talk about microservices at scale by a fairly large company that had started adopting more efficient request/response formats because they discovered that in their tens of thousands of cores, some high single-digit percentage of all CPU cycles were spent marshaling and unmarshaling JSON.
Actually some parsers support bigint/bigdecimal. I've used jackson to parse json numbers into those data types in jackson. It's just not all json parsers are created equally.
jq + bash + (some other cli tools) are very powerful toolset for simple websites that need to act on external data. I've been generating static websites from cronjobs using this sort of workflow for a while now and it works really well. Mainly dashboards and such. I have one that doesn't use jq, but instead sqlite3 to query database and generate the static site content.
I usually mash together a 10-steps pipeline of jq, grep (-v), vim, xargs until I find the data I need. If I need a reliable processing pipeline, I can afford to google 10min for how to use jq in more details!
Shameless plug, but that was one of the reasons I created jello[0], which allows you to use pure python list/dict/comprehension, etc. syntax. Also, here is a blog post[1] explaining in more detail.
KATE text editor mentions using this as well but I've had so much trouble getting it to work. It has it's own console in the editor where you can use javascript to modify things if you want. Something I still need to figure out but it's just more complicated than I think it needs to be for some reason.
The first example:
Getting properties by name: