
Show HN: Choose – An alternative to cut and sometimes awk - theryangeary
https://github.com/theryangeary/choose
======
earthboundkid
I think it is very cool that Rust has led to a renaissance of rewriting
classic Unix tools to make them fit more with current use. Unix was never
meant to stand still. It just happened that AT&T broke up and it took a while
for Linux to catch up, and by then people got used to the idea of a fixed set
of POSIX utilities. But their CLIs are often quite bad, and security was never
a consideration in the olden days, so it’s good see them re-evaluated.

~~~
masklinn
> I think it is very cool that Rust has led to a renaissance of rewriting
> classic Unix tools to make them fit more with current use. Unix was never
> meant to stand still.

Technically it has not, even for the core tools they've been getting extended,
usually incompatibly, in both GNU and BSD lineages. Though it's pretty funny
how much the rust community has been taken up by providing alternatives and
replacements for "classic" (POSIX) utilities.

~~~
ryi
While development has not stopped, there haven't really been many advances in
improving the syntax. Many of these tools are stuck in awkard, unintuitive
syntax, which is cumbersome unless you use them frequently. I feel like the
renaissance has lately been one of usability, which I personally really
appreciate. Obviously, hard-core daily users would disagree, but considering I
use `awk` at most once every 6 months, I hate that I need to spend 20 minutes
re-learning how to use it every single time, particularly for basic purposes.

------
oefrha
Hmm, so compared to cut this

1\. saves -f because it doesn’t support cut’s -b and -c modes (edit: actually
-c is supported, I just didn’t see it);

2\. Uses -f instead of -d, making it rather confusing for cut users;

3\. Uses : instead of - for range specifications;

4\. Offers an exclusive indexing mode;

5\. Misses a bunch of other cut features (assuming coreutils cut).

Not sure I see much appeal...

Edit: Another thing I missed: regex separator instead of just character list.

~~~
masklinn
> Not sure I see much appeal…

The appeal is the same as replacing grep with a fancier searcher:

1\. it has good and sensible defaults (field mode, also I'd have to check but
hopefully and _unlike cut_ it doesn't print the entire line when it's unhappy
with the selection you asked for, that's worse error handling than ed) (edit:
confirmed, if you give `choose` nonsensical selection _it doesn 't print
anything_ e.g. if you ask `cut` for columns 10-15 of data with 3 columns it's
going to print the source as-is, choose is properly going to print a bunch of
empty lines, that alone makes it better than cut)

2\. It works better on actual data, which is generally whitespace-separated
rather than tab-separated, meaning cut requires preprocessing before it'll do
anything of use

Can you massage cut or the data to fit? Yes, in the same way you can massage
grep or your data to fit. That you don't have to and the utility behaves
sensibly by default is appealing. This exact thing is one I've been thinking
about for some time now, I'm glad somebody else agreed and did the legwork.

------
xthetrfd
I can't understand why you are mentioning awk. Cut or choose cannot be
compared to awk, awk is a programming language.

Also I don't think that it's so much easier to use than cut. On the other hand
every *nix system has cut so if you make scripts with it they are portable.

~~~
BiteCode_dev
> I can't understand why you are mentioning awk. Cut or choose cannot be
> compared to awk, awk is a programming language.

Because 99% of awk IRL use is just as a fancier cut.

It's very rare someone even sets a variable using awk. If you do it, you are a
statistical rarity.

> Also I don't think that it's so much easier to use than cut. On the other
> hand every *nix system has cut so if you make scripts with it they are
> portable.

I, for one, never remember the syntax for cut. If "choose" gets a deb, I'll
use it: Python slicing is something familiar to me.

I don't care if cut is on every unix system: if I have the possibility to
install things on the machine, then I'll just install what I need. I have a
script for that. If I don't, I'll google/man/\--help GNU commands as usual.

And as for writing shell scripts, I use Python anyway.

~~~
masklinn
> Because 99% of awk IRL use is just a as fancier cut.

You say "fancier", I say "working": since cut can't work on general whitespace
without a pre-processing phase (e.g. tr), it simply doesn't work for the vast
majority of the things I try to shove into it, and I pretty much always end up
using awk instead.

Choose means my awk use will fall down by 99% or so.

~~~
BiteCode_dev
Agreed.

In fact, anybody promoting cut, please give me the cut version of:

    
    
        echo -e "foo   bar   baz" | choose -1 -2
    

It should work on an arbitrary number of spaces, and fields.

The oneliner is going to be... interesting.

Now you can do it with awk using:

    
    
       echo -e "foo bar baz" | awk '{ print $NF " " $(NF-1)}'
    

But it's neither easy to type, nor to remember.

Choose is what cut should have been.

~~~
errnoh
By using only basic functionality that's easy enough to remember I guess I'd
go with something like

`echo -e "foo bar baz" | tr -s ' ' | rev | cut -d ' ' -f 1-2 | rev | awk
'{print $2 " " $1}`

Everything except the awk part is something that I use all the time and is
easy to type & remember.

To be honest I'd use `choose` if it was available everywhere, but for string
manipulation I can't justify using nonstandard tools since they aren't always
available.

Every now and then there are some new ones I actually start to use. For
example `ripgrep` mostly replaced `grep -R` for me some time ago, a lot of it
has to do with the fact that if `rg` is not found I can fallback to normal
grep and get the same result, just a bit slower.

I guess my point is that while I do appreciate innovation & making better
tooling, the hard part always is getting the tool where it's most needed.

------
asicsp
Not sure why it is even compared to awk instead of just cut. It could've been
introduced as cut-like command with regex input field separator. Or at least
not say things like:

>However, the awk command is not ideal for rapid shell use

And

>cut is far from ideal for rapid shell use, because of its confusing syntax

anything new is confusing until you learn enough to be comfortable

>ranges are just plain difficult to get right on the first try

and how does choose become easy to use with ':' character instead of '-'

Is this a typo or does inclusive/exclusive depend on whether first number is
specified?

>choose 2:5 # print everything from the 2nd to 5th item on the line,
_inclusive_ of the 5th

>choose :3 # print the beginning of the line to the 3rd item _exclusive_

~~~
BiteCode_dev
I heard those arguments when ag went out as an alternative to grep, and ffind
as an alternative to find.

But now, I install their successors, ripgrep and fdfind, on all my machines.
Including the windows ones.

~~~
oefrha
Tools like rg offer additional features and/or save you from tediously
specifying many options. This one gives you -f for free, that’s about it.
Detailed comparison:
[https://news.ycombinator.com/edit?id=23445931](https://news.ycombinator.com/edit?id=23445931)

~~~
BiteCode_dev
\- it supports regex separators. That's a great feature to me.

\- the default separator is "\s", like python's split(). Just for that I will
adopt it: not having to care about tabs/spaces/mixes is a much better
experience.

\- it has negative indexes, again like python. Getting the last field, or the
last nth field, is something common enought. I don't want to rewrite the thing
with a twisted double "rev" with proper index. And I don't want to have to
google it.

\- plus the syntax is just must easier to remember to me. When I use cut, I
always try: "echo 'foo bar baz' | cut 2", just to realize that I need to pass
'-f', then I do "cut -f 2", and get stump, and google it, to then remember I
need to pass the delimiter explicitly even if it's a space.

\- it works the same on windows. I dual boot.

Compare:

    
    
        echo -e "foo bar baz" | choose -1
    

To:

    
    
        echo -e "foo bar baz" | rev | cut -d ' ' -f 1 | rev
    

cut is, to me, the opposite of a friendly API.

Something so basic in the Unix world should have sane default.

Default are not sane if I have to google it once out of two.

~~~
masklinn
One more great feature of choose which cut doesn't have: not returning garbage
output on garbage input.

If you give cut columns which don't exist, it's going to output the _entire_
source as-is.

------
tzs
Can it output fields in an order other than their input order? That's the one
thing I regularly wish cut could do. I would like the output of the second cut
below to be "3,1", not "1,3".

    
    
      $ echo "1,2,3,4,5" | cut -d , -f 1,3
      1,3
      $ echo "1,2,3,4,5" | cut -d , -f 3,1
      1,3

~~~
Klasiaster
Yes:

    
    
      $ echo "1,2,3,4,5" | choose -f , 2 0
      3 1
      $ echo "1,2,3,4,5" | choose -f , 2:0
      3 2 1
    

Note that the indexing starts with 0, "-d" is "-f", and a range is denoted by
":" instead of "-" which is used for indexing from the end.

------
typon
I love my coreutils replacements...not that they're in Rust but because
they're generally faster and easier to use. fd, rg, bat and now i shall use
choose! I _almost_ always have to lookup awk's syntax but the defaults in
choose seem trivial to remember. Thanks for making this!

------
ilovetux
I am not sure why being zero-indexed is considered a feature. I have no
problem using a zero-indexed system, but I've never really thought of it as a
feature. Is there something I'm missing that makes zero-indexed systems
faster, easier to use or otherwise better than one-indexed system?

~~~
xxpor
There's a better reason than this that I'm forgetting, but never underestimate
the power of being the same as what people are already familiar with. Every
time I have to write lua or read some Matlab, the mental overhead of having to
remember everything is one-indexed is just incredibly annoying.

~~~
fwip
Anyone used to command-line tools is used to fields being 1-indexed.

awk uses $0 as the whole line, and $1 as the first field. cut uses -f1 as the
first field $1 is the first argument to a posix shell script /1 is the first
matched reference in a sed $1 is the first regex match in perl

A command-line tool being 0-indexed breaks from expectation of what everybody
is used to using on the command line.

~~~
akritrime
Anyone is a generalization based on a narrow viewpoint. I would say I am
pretty used to command-line tools at this point, at least enough to be using
linux as a daily driver comfortably. And I frankly I didn't know 1-indexed
fields were the norm, even though I knew $1 is the first argument to a poxis
shell script (I always assumed $0 referred to the script or command itself).

~~~
fwip
That's fair, I overgeneralized.

------
Klasiaster
Inspired by a remark about Python's default split behavior in comment
[https://news.ycombinator.com/item?id=23446146](https://news.ycombinator.com/item?id=23446146)
I wrote a Python oneliner for field selection works similar to "choose" but
throws exceptions when the field cannot be found:

    
    
      $ echo " a    b c" | choose 1 2
      b c
      $ echo " a    b c" | python3 -c 'import sys; [print(f[1], f[2]) for line in sys.stdin if (f := line.split()) or True]'
      b c

------
BiteCode_dev
Someone should make a bundle installer with this, bat, fdfind and ripgrep. I
do enjoy those alternative to GNU, and install as many as I can: they are
easier to use, usually faster, and just make more sense to my brain.

This pain is real: [https://xkcd.com/1168/](https://xkcd.com/1168/)

~~~
asicsp
There is
[https://github.com/uutils/coreutils](https://github.com/uutils/coreutils)
implemented in Rust

~~~
BiteCode_dev
Yes but their goal seems to be API compatible, which I understand the point
of, but is not useful to me.

------
jpxw
What does this do that cut can’t?

~~~
LeonB
This is a poor question as it invokes the Turing tarpit. (Why using any
language that is higher level than machine code?)

If it is more comfortable to use for some people then it’s a great invention.

------
tyingq
Very cool. An inverse mode to suppress matched fields might be a neat feature.

------
LockAndLol
Very nice. Good work.

