> Conform to tools that share a similar niche. If you’re using Rust to make a fast alternative to popular coreutils: model its behavior, help-text, and man pages after `ripgrep` and `fd`.
I'm surprised by this. Why wouldn't you model the behaviour after grep and find? The fact that ripgrep uses a different regular expression syntax than grep is my main problem with it; if there were a utility with the functionality of ripgrep, but the RE syntax of grep, I would switch over in a heartbeat.
What syntax is in grep that you miss from ripgrep?
Anyway, there are a lot of different ways to interpret the OP here. Honing in on "use the same regex syntax as grep" seems strange. Maybe they just mean: if you want to see how man pages are produced in Rust programs, then take a look at how these programs do it. Rather than something about the specific content that should go in it.
Notice also that the OP started this thought with "alternative to coreutils" program. Rather than a "clone of a coreutils program."
It's nothing missing, it's just different for no clear reason, and that means remembering different syntaxes depending on which tool I am using at the moment. The one I use most: GNU grep and git grep use `\<` and `\>` for word boundaries, with ripgrep that produces "error: unrecognized escape sequence" and I have to use `\b` instead. (Note: GNU grep and git grep do support `\b` too.)
> Maybe they just mean: if you want to see how man pages are produced in Rust programs, then take a look at how these programs do it.
I do not think that is what they mean. A man page looks the same to the end user no matter what tools are used to produce it.
> Notice also that the OP started this thought with "alternative to coreutils" program.
The start of the thought was "Conform to tools that share a similar niche." ripgrep and fd being implemented in Rust are details that are not relevant to an end user. For ripgrep, the "tools that share a similar niche" would not be "tools that are implemented in Rust", it would be "tools that search in files using regular expressions", and I would be happier with a tool that matches the other tools that I use. So actually I completely agree with the author's principle, I just think it was a bad example.
> I do not think that is what they mean. A man page looks the same to the end user no matter what tools are used to produce it.
And you can look to ripgrep to learn how to produce one! And in particular, how to do it without duplicating docs.
In any case, my point is that maybe there is a more expansive viewpoint here that makes sense instead of honing down on to something hyper specific.
> It's nothing missing, it's just different for no clear reason, and that means remembering different syntaxes depending on which tool I am using at the moment.
Right, so you want a POSIX compatible tool because POSIX is what you're used to and you don't want to use any other interface. Which is fine. Keep using the old stuff. I don't see any good reason to exactly imitate POSIX when I can go build something that I think is better without being constrained by POSIX. For example, I think just having \b is better than having both \< and \>.
> I just think it was a bad example.
I don't. There's a lot that tools like ripgrep and fd do that the old tools don't do that turn it into its own niche. Like enabling "smart" filtering by default and other things that many consider to be quality of life improvements. So if you're in that niche, then looking to those tools for what they do is a good idea.
> Right, so you want a POSIX compatiblr tool because POSIX is what you're used to and you don't want to use any other interface.
Well, I would want to use a single RE syntax if possible. It does not have to be POSIX BRE with GNU extensions but that's what is used by the other tools I use.
Looking closer, GNU grep, git grep, ripgrep all support PCRE. That might be an okay alternative. It would take some time and effort to switch over. I do not know yet whether it would be worth the effort, but getting used to a different syntax that's used by a wide number of tools seems more useful than getting used to a different syntax that's used by a single tool, no matter how useful the tool.
> Which is fine. Keep using the old stuff. I don't see any good reason to exactly imitate POSIX when I can go build something that I think is better without being constrained by POSIX. For example, I think just having \b is better than having both \< and \>.
To be clear, \< and \> are GNU extensions, not part of POSIX.
I can't really see how making \< and \> errors is better for users than supporting it in addition to \b. I can see an argument that it's not worth the cost to implement the feature, and that's of course one you're entirely within your rights to make, but not that the feature has a negative overall value.
For the rest, it seems like you and I have different ideas about what the author meant that are going to be impossible to clear up without input from the author, so I will suggest that we drop that unless and until we do get such input.
> Looking closer, GNU grep, git grep, ripgrep all support PCRE. That might be an okay alternative. It would take some time and effort to switch over. I do not know yet whether it would be worth the effort, but getting used to a different syntax that's used by a wide number of tools seems more useful than getting used to a different syntax that's used by a single tool, no matter how useful the tool.
Yes, that's true. You could do that.
I will say though that if you take PCRE, EREs and ripgrep's default regex engine, they all support largely the same syntax. There are differences of course (you've pointed out one already), but they're a lot more alike than they are different.
BREs are the oddball with its escaping rules. But if you know the escaping rules they can be quite useful in a context where most of what you're searching for is a literal with some regex aspects. Vim's default regex syntax is like this for example and it is quite useful in that context.
And yes, grep's tend to have a lot of extensions. So I guess what you're after is "GNU's particular extension of regex syntax." Idk, just seems like if that's what you care most about, then just keep using GNU tooling. Otherwise, ripgrep's default regex engine is very very similar to EREs and other Perl inspired regex syntaxes. Which is intentional.
The \< and \> stuff might be something I implement eventually, but I've always found them strange. YMMV.
BRE vs ERE is really a non-issue for me: the way they're done in GNU's regular expression implementation, BRE and ERE are the same except for small differences in when you do and don't put \ in front of certain characters. GNU supports \< and \> in BRE and in ERE mode exactly the same way, and I do use ERE mode a lot as well.
I had a stab at implementing this to see how easy or difficult it would be. Coming from someone who doesn't write Rust: it seems to be pretty much trivial. A quick and dirty implementation is at <https://github.com/hvdijk/regex/commit/511c6485e49c22c79b47a...>. I guess I can use ripgrep with this patch for now to stop my own complaining.
> BRE and ERE are the same except for small differences in when you do and don't put \ in front of certain characters
I guess I'd argue that EREs and the regex crate syntax are the same except for small differences too. :P And I'd absolutely categorize "use \b instead of \< or \>" as a pretty small difference.
In any case, yes, I agree that adding \< and \> to the regex crate is not a difficult implementation challenge. To me, it's not "can we" but "should we." Another question is whether we need to also add the negation of each of \< and \> too, to match \b and \B. (I don't know of any commonly accepted syntax for the negation of \< or \>.)
> And I'd absolutely categorize "use \b instead of \< or \>" as a pretty small difference.
I don't know, to me \b always felt like a bad idea. Start of string (^) and end of string ($) make sense as different symbols, I'm not aware of people asking for a single "boundary of string" symbol that acts as (^|$), I'm not aware of any realistic use case for it that isn't better written using plain old ^ or $. I feel the same way about \< and \>: in any place where I would otherwise have to write \b I already know whether I'll want "start of word" or "end of word" and it won't make sense to allow the other, but \b forces that to be accepted too.
That said...
> Another question is whether we need to also add the negation of each of \< and \> too, to match \b and \B. (I don't know of any commonly accepted syntax for the negation of \< or \>.)
...the negation of word boundaries clearly has uses, but the negation of \< or \> would be a new invention that would add more incompatibilities between ripgrep and other implementations. As much as I'd be in favour of a negation of \< and \>, I'd be more in favour of something that also works in other implementations, which is the \B that you already have.
Just my personal views. I'm served either way by my locally patched version of ripgrep, I'll assume you know better what the rest of your users want.
They're not saying base your new version of grep on ripgrep, they're saying if you reimplement a core util then follow the standards of other tools in a similar spirit in that ecosystem.
Ripgrep and fd have re-implemented old, well established tools. If they've chosen to diverge from them in a common way then there's probably a very good reason for it, which is worth at least knowing about before doing something different.
All I care about is if weird stdout messes with scripts at all and gives proper error codes out for consumption by scripts as well. A lot of these guidelines are doubly useful to not only human readability but machine readability.
Beware a thing: the strong point of CLIs vs GUIs was and still is IPCs or the ability to pipe I/O from a single-purpose program to another. Rich textual output for better human interaction means the need of a "quiet mode" for easy piping, something many forget.
In interactive terms though the reasons for CLIs wane quickly: document-UI, some have called "2D CLIs" serve us far better in mean, see Emacs vs Unix classic. In that case CLIs are just a tool to call "functions" and see their return values.
For inclusivity also consider a thing: blind people who use screen readers migh have troubles in document UIs so they might prefer CLIs but those made for being audio-based, decorations and co does not help.
For good processing of the output it makes sense to add a flag like "--json", otherwise people have to come up with their own parsing that makes assumptions of the output and breaks as soon as the output is changed slightly, e.g., by adding a new field.
I hope not. CLIs are meant for interactive usage, quick passages with simple key-value or crafted regexp are far better than ANY pre-made DSL, JSON is not for humans. CLIs are for human.
The "integration" issue of CLIs is the classic Unix issue respect of Smalltalk systems, LispM etc: unix choose to be a container-ship for individual container instead of a unique ecosystems where programs are just functions/method of a common app. Such issue can't be solved hammering our genitals with JSON, YAML or something else, it's already well solved by OS designs like the aforementioned systems. A unique language, perhaps flexible enough to make new ones (like Racket, but any Lisp can essentially do the same easily) if needed, but still an OS-framework-operating-environment as a unique application where anything live inside.
The reason we do not have that on scale is simply commercial: it's far more profitable the divide et impera vs crafting unique systems where being open is the most natural, sometimes mandatory, way to go...
I’m personally not a fan of it, but it seems unfair to label it as “very wrong”. I prefer that people have the freedom to make design choices like this.
Would there be any value in creating a document spec for CLI help output? Once there was something, individual projects could adopt it - or not, but it would be toward the goal of a future CLI ux that supports something like NL2Bash (https://github.com/TellinaTool/nl2bash) where we can create natural language interfaces to OS based services.
> How “unique” is your tool’s output? Output should look as similar to other common utilities as possible, to reduce the learning curve. Keep it boring.
I actually disagree with this. When skimming through long logs looking for errors I find it really useful that output from different tools look different at a glance. It helps me narrow down into where I'm looking for.
I must say I thought this was an amazing critique. It was respectful, thoughtful, and well-structured.
Thank you, Rohan. You’ve got excellent points and great references too.
Much appreciated.