The yaml document from hell[1] needed three changes ("*.html", "*.png", "!.git") to be parsed by yq at all. "Norway problem" is not a problem as no was converted to a quoted string. Unquoted strings in "allow_postgres_versions" part were not quoted by yq.
It's a bit hard to be sure because YAML is so insane, but I would argue that yq's behaviour is not incorrect?
It is erroring on `*.html`, which is reasonable because `.html` is an invalid anchor identifier, though the error is not that useful. The parsing of the unquoted version numbers also seems to be "correct", in that things that look like numbers are supposed to be parsed as numbers.
Templating yaml with a text templating language like Helm's templating language is a terrible idea. Templating objects and serializing them to Yaml (with input also being Yaml) I find quite nice: https://github.com/con2/emrichen
1. yq appears to accept the unquoted tag !.git for me, without change? (This is at least a correct parse syntactically, I think.)
2. The unquoted aliases (*.html, *.png) are invalid: that yq errors on them is the correct output. (I.e., if you want those as literal strings, they MUST be quoted.)
"The plain (unquoted) style has no identifying indicators and provides no form of escaping. It is therefore the most readable, most limited and most context sensitive style."
Which reads to me that they expected people to treat the plain style as a convenience that had notable downsides.
Edit: Note that this is the "plain" style, where there's also single and double quoted styles.
A reasonable choice (in contrast to "no" or "null"). YAML is simply not a format for all use cases. It's good enough for many tasks, and more readable than most other formats where it fits.
Maybe, but unquoted strings not being the right choice for all use cases (or, similarly, structure-by-indentation not being) doesn’t show that, since YAML supports unquoted and quoted strings, and supports both indent-sensitive “block style” and delimiter-based “flow style”.
To me, it doesn't fit where people with a less technical background can/must edit configuration files, or where there's a large risk of mixing up null and "null". For readability, it's fine, except for the ugly node reference syntax.
gojq works great with YAML and reimplements jq itself in Go. I use gojq with --yaml-input or --yaml-output (or sometimes both) and flip back and forth between JSON and YAML promiscuously and have 100% jq UI compat, which helps because I use jq a lot. First thing I looked at on yq is '-s', which is 'slurp' for jq, but different for yq. Slightly altered semantics would just trip me up, and it seems like you can make a nearly straight bijection between YAML and JSON so you can just do exactly the same things with either one (with some minor exceptions.)
gojq does not preserve key order or offer option to sort keys. Which is a non-starter for me. The majority of my jq use is to cleanup API responses for easier human review.
They should be feature requests to gojq. There must be libraries for maps with sorted keys or preserving insertion order to use in place of the std 'map'.
You might enjoy the httpie cli, which is better than curl for testing APIs for many reasons, one of which is automatic pretty printed and colorized text response output. https://httpie.io/cli
I recently dug into the docs of jq and was surprised to find that, contrary to my prior belief based on shallow experience with it, jq’s expression aren’t merely a path syntax but apparently a turing complete language. I was blown away.
I wish MySQL and AWS could have figured out a way to adopt it, or a subset of it, rather than each using different ones. Now I have varying levels of knowledge for 4-5 variations of JSON path semantics/standards, it’s annoying.
I have a similar complaint but I'd guess there are (at least) two problems standing in the way of awscli getting jq language support: a python impl of the language with a license that awscli tolerates, and awscli being (in general) very conservative about changes. There are innumerable open issues about quality of life improvements that are "thank you for your input" and I'd expect that change to be similarly ignored
> I wish MySQL and AWS could have figured out a way to adopt it, or a subset of it, rather than each using different ones.
For AWS CLI, you can just output unfiltered JSON and pipe the results through jq; the filtering is client-side anyway, so it’s not like you are losing anything doing external filtering vs. filtering within the AWS CLI.
Looks very cool! I don't care so much about YAML, but I do a ton of processing of JSON and csv/tsv. Any word on the performance relative to jq and xsv [1]?
I am all for faster tools, but I am curious as to your use case where the jq speed would be limiting. I only ever cleanup a maximum of a few megabytes at a time, where the jq response is close enough to instant that it has never been a concern.
I typically work with multi-gigabyte JSON and CSV files. I just did a quick test with yq and it's only about 30% faster than just using Python's csv and json libraries. Whereas the same thing is 1,200% faster with jq and xsv. It's just my use-case though, so YMMV.
I personally find the yq tool from https://github.com/kislyuk/yq much more useful: it has all the same options and formats as `jq` (as it's really a wrapper around jq). Rather than the `yq` in the OP here where only partial functionality exists.
I think my dream is yq but with JSONata and an interactive editor at the command line.
I love yq and jq, but imo the core feature they’re missing is queryability. The problem is that afaia the jq syntax doesn’t support things like “where value = x”.
There’s another lesser known but imo better querylang called JSONata [0], which is basically a querying and reshaping syntax for structured data.
I’m working on this in my spare time but if any know of one that exists so that I don’t have to GO (lang) down a rabbit hole, please do share.
Okay I must have forgotten this, now that you point it out. But that’s nowhere near as elegant as compact compared with how JSONata handles it. The ideal tool probably lets you choose. Xpath, jq, jsonata, etc
Side rant: Every normal yaml processor I’ve tried struggles with CloudFormation. I end up using the cfn-flip command line program/Python module to deal with CFT Yaml.
Lately I have had to do a lot of flat file analysis and tools along these lines have been a godsend. Will check this out.
My go to lately has been csvq (https://mithrandie.github.io/csvq/). Really nice to be able run complicated selects right over a CSV file with no setup at all.
There is also another tool named yq that is Python based, and passes everything through jq. The Go-based yq is pretty awesome, but does have some limitations.
the python yq is by far my preferred utility. i love native jq and the fact it simply wraps it means every thing just works, which is not the experience i’ve had with the tool mentioned in the topic
Looks cool, but disappointing it's written in Go. Go is fast, but not as fast as Rust or C. I'm sure with large streams, you probably can see the difference in time if it's written in Rust or Go.
As they say: always measure when it comes to performance. Unless it’s a programming langauge that you don’t like. Then you don’t even have to run the program.
"I have a <very large> YAML file, and it takes yq <some long time> to parse, whereas <some workflow I use written in rust/c> takes <much less time>".
"Go is slower than Rust, the author should have written it in that".
It's likely for most YAML documents you encounter, the difference between using Go and using Rust or C is negligible. -- Though, if this isn't true, some numbers would be useful, too.
A comment like "Go is a bad language to use" is just a thought; but it's also a low-effort dismissal of something someone has put effort into, and of a tool that's quite useful.
[1] https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...