Hacker News new | past | comments | ask | show | jobs | submit login
Useless Use of Cat Award (2000) (porkmail.org)
31 points by Alupis on May 28, 2020 | hide | past | favorite | 42 comments



The ‘useless cat’ shtick is dumb. Unix command line is based on connecting programs in pipelines. The first step of a pipeline like that is handing the file to a filtering/processing util (or whatever else). Like, if I decide to use awk instead of sed, I shouldn't have to dance around the ‘<file’, because the ‘<file’ is the first step and awk/sed is the second.

(Edit: another comment provides a different example: putting in ‘cat’ instead of ‘head’—or often, switching back and forth a couple times.)

The ‘useless cat’ bunch of dorks are just astoundingly oblivious to the actual interface of the system. Instead, their argument is the hilariously nitpicky ‘omg one more process, it's so slow’. Did running a ‘cat’ really take more time in '95 than fiddling around the ‘<file’?


> Did running a ‘cat’ really take more time in '95 than fiddling around the ‘<file’?

It does now for a sufficiently large file. And it's even more extreme when the recipient can do things with a file that it can't do with a pipe.

Which isn't to say it's worth bitching about the rest of the time.

Edited to add a silly example:

    $ time cat /dev/zero | tail -c +5000000000 | head -c 10 | xxd
    00000000: 0000 0000 0000 0000 0000                 ..........
    
    real 0m2.808s
    user 0m0.348s
    sys 0m4.849s
    $ time </dev/zero tail -c +5000000000 | head -c 10 | xxd
    00000000: 0000 0000 0000 0000 0000                 ..........
    
    real 0m0.006s
    user 0m0.008s
    sys 0m0.004s


Yeah big files need all kinds of fun special attention like ripgrep or zgrep


<file is allowed to be at front (for some reason it seems counterintuitive, but '<file grep foo' is valid)

Still 'useless use of x' complaining doesn't make much sense for me, unixy tools are generally used in pipelines, using separate tools makes each step more clear, easier to modify pipeline. I don't have to skim through pages of extra grep/whatever tool manual just to check if it maybe reimplemented wc -l


> unixy tools are generally used in pipelines

Yeah, for this reason I won't write ‘<file’ at the front even though I've heard of it before. I'm in the Unix game for good command-line pipelines among other things, and bending over backwards to sidestep the pipeline isn't really in my plans.


<bucket hose | hose | hose

vs

hose <bucket | hose | hose


Something I like about putting my redirections at the front is that if I pull this out into a function there often is no redirection, or the redirection will take the form of an `exec` on an earlier line. In either case, keeping the redirection out of the pipeline proper makes that transformation easier.


First, we shouldn't be insulting people for having different opinions on the nuances of command line tools.

Secondly, I agree with you that "useless use of x" should not be viewed as dogma and people shouldn't stress too much about it. Instead, these can be viewed as optimizations, or just details regarding how unix shells and tools work that can help inform your coding.

IMO, the best thing to do when writing shell scripts, regardless of skill level, is configure ShellCheck in your text editor's linter, and then do everything it tells you to do: https://github.com/koalaman/shellcheck/


It's no more "dumb" than pointing out to people they don't need to do something like:

`if (something == true) { return true; }`

It's about understanding what the system does, and why. It makes you a better systems administrator in cat's case, or a better programmer in the return true case.


> makes you a better systems administrator

Better because you move past the ‘<file’ every time you decide to correct the first command in the pipeline? Such better, wow.


I mean, if you're concerned about the keystrokes, learn your editing shortcuts. Jumping to the beginning of the line is typically one or two keystrokes.

But the parent obviously wasn't talking about the behavior of avoiding useless cats making you better, but the understanding that drives the recognition of which cats are useless and why you might want to avoid them.

I'm... not sure about that claim, fwiw. On the one hand, I see a lot of people really mushy on the unix process model, what pipes are, how redirection works, etc. Firming that up is important, will lead you to be a better sysadmin and a better programmer if you work in a sufficiently unixy environment (and possibly otherwise), and spotting useless cats falls out of it.

On the other hand, caring about useless cats is not a necessary consequence. Caring might motivate understanding... but might just lead to overfitting a bunch of poorly understood patterns to avoid the label.

So I am skeptical. But you should respond to the claim actually made.


Indeed. Unix philosophy is many tools that all do one small thing well. Cat handles opening a file and sending it to another file. I start pipelines with cat whenever possible, in case later on I have to drop in zcat or similar instead.

The other examples given in the article also demonstrate that they have completely failed to understand not only Unix philosophy but also basic readability. For example, preferring the cryptic `something | grep . >/dev/null && ...` over `something | grep '..*' | wc -l`.


I always used to wonder what kind of attitude has been making new software slower and slower, now I know. There are people who write Turbo Pascal and there are people who insult others for pointing out waste and inefficiencies in a program.

The former writes software that's stays relevant and leaves behind a legacy. Not so sure about the latter. Granted that it is easy being latter.


Here's why the suggestion (in Randall's “form letter”) to use `<file command` was bad in 1995, and is still bad today. In bash, if you type `<file co` and press TAB, you don't get command completion on `co`. You get filename completion. If you type `cat file | co` and press TAB, you get command completion.

(Bash has behaved this way for as long as I can remember, and I just verified it in bash 5.0.3 on a Pi.)


Here's why I often have command chains that start with cat:

    head -n999 $LOG | grep $WANT | cut $BYTESORFIELDS | sed $CHANGES
I build that iteratively, a pipe at a time, until I'm getting the right output. Then I replace "head -n999" with "cat". That saves me a lot of time, as the log file might be hundreds of MB or even GB in size, and working on the whole file when a small representative chunk will do is stupid.

Less places to change, and all at the very front, is less likely to make a mistake and put a param in the wrong spot. In that case, cat isn't useless, it was the minimal edit to go from test case to production case, and minimizing differences between testing and production is good engineering.


A reasonable workflow. I suppose the counter argument is that you could replace "head -n999 $LOG |" with "<$LOG" instead.


I could, but them I'm replacing characters before $LOG and a character (or two, depending on spacing) after $LOG as well. It's both conceptually and work-wise easier to consider it as "I take the first X lines of " and "I take all of". and just replace those bits.

It's a very small bit of complexity, but when chaining things together, those small bits of complexity build on each other as well, so reducing them where you can reduces the complexity (and problems caused by it) overall.

It's the same reason I don't use grep -c usually, but instead always pipe to wc -l. There's less cognitive overhead in having one way to do it that always works, rather than recognizing when I'm ending some command or chain of commands with a grep and I can use a flag on grep to count instead. The "| wc -l" that are always at the end are much easier to notice and correctly remove than a -c flag to grep I might or might not use depending on the last command.

Just like they say the best diet is the one you can keep doing, the best/most useful way to formulate your shell commands are the ways that provide you the most benefit. Don't let anyone change your workflow "because you're doing it wrong". At the same time, you should examine suggestions and if you try it out to see if you find it beneficial. It's a fine line to walk. I had a friends that recommended screen for years when I didn't use anything of the sort. I use tmux (an equivalent) religiously now, and I can't imagine going back to life without it, just from the ability to resume work on servers from lost shell connections or from different computers.


One could just as easily say this is AT LEAST a useless use of head and grep because one could replace both with a single POSIX command:

    awk "NR < 999 && /$WANT/" $LOG | ...
...but such pedantry should only be unleashed on someone pointing out a UUOC.


I used to reject the 'useless use of cat' crowd, I like to build my commands from left to right and the original file should be left most, not an argument to the first transformation.

But I changed my mind. The correction of `cat file | grep someting` is not `grep something file` it is `< file grep something`. The file is still the beginning of the pipe just as `cat file` and `< file` alone dumps the content of the file.

I still won't knock people's use of cat, but I'm using it less myself.


Amusing, though wish there were more/better examples.

I not infrequently do something like this when experimentally/interactively building up a pipeline:

    cat somefile | foo | bar | baz
Even though I could say

    foo < somefile | bar | baz
the former is clearer, and easier to edit if I'm shuffling things around a lot in real time.

And honestly, on modern hardware, worrying about an extra 'cat' is almost always a foolish optimization, unless you're doing it a zillion times.

I can probably make up other reasons why the 'cat' might be useful. For example, it makes it easier to separate the case of errors due to reading the file versus other errors, if that matters.

Also, 'cat' makes a nice "null" pipeline element if you need to slip one in for some reason.

And so on.

Perhaps one of the best "uses" of these "useless" commands is as a marker of the author's experience. When I'm working with someone who writes these sorts of things, I can adjust my expectations downward. :-)


> Also, 'cat' makes a nice "null" pipeline element if you need to slip one in for some reason.

It's a great way to get another 64KB kernel buffer in your command pipeline. Very useful for buffering your data stream when you have an impedance mismatch between the priority/scheduling/io/cpu time of the input and output.


Your second form can be `<somefile foo | bar | baz` which preserves the order of your first example. And `<somefile` on it's own behaves as `cat somefile` (in my shell atleast).


> And `<somefile` on it's own behaves as `cat somefile` (in my shell atleast).

zsh?

It doesn't work in sh, bash, csh, tcsh, or fish.


In short, you seem to be right.

Whilst fish will complain and tell you that it wants a command and not a redirection first, it should work in POSIX shells.

    sh -c '< tmp.c tail'
Testing the above command it works:

With Busybox 1.31.0. (Probably the least compliant.)

GNU bash, version 5.0.16.

However, dropping the command that is being passed to, doesn't result in output to stdout. Redirecting stdin to stdout doesn't seem to work either, but I'm not sure that redirecting stdin is all that well defined.


Redirection is dup2, opening files first where relevant.

If all you do is open files and dup around descriptors, no file contents actually go anywhere.


One of the zsh options which changes this behaviour is called CSH_NULLCMD, which seems to suggest it is possible to be csh compatible in zsh.

    $ setopt cshnullcmd
    $ < task >task2
    zsh: redirection with no command
`man zshmisc` and search for "REDIRECTIONS WITH NO COMMAND" for all the gory details :/

Edit: inverted csh compat logic in my head :(


I've been using Unix in one form or another since 1986, and I was today years old when I learned about 'grep -c'. I've probably done some form of 'grep foo | wc -l' once or twice a month every month over those 34 years.

While I'm sure that says something about me, it also says something about the depth and complexity of Unix/Linux. There is a LOT there, so it's not surprising that people find less-than-optimal ways to solve their problems.


This is a classic example of blaming users for not being able to understand/remember/intuit a mess of wretchedly hostile design with no consistent logic or standardisation.

Vintage operating systems like TOPS-20 and VMS (up to a point) would go out of their way to be friendly and helpful. Conscious effort was put into this.

Unix shell commands seem to be the opposite - random feature accretion with deliberately obscure magic-spell UX.

What percentage of the population can define what "catenate" means without looking it up - never mind work out what "cat" abbreviates without being told?

How about left/right precedence and data flow? Why do some commands/operators have left precedence while others have right precedence? How about switch standardisation? Are the switches '--' or just '-' or maybe just a letter? Can you pipe subcommands to variables or not? [1]

And so on. Of course users don't immediately produce minimal solutions. Most users won't, most of the time.

[1] It depends on the shell. Mostly not reliably, because you often get different behaviour inside a terminal command and a shell script.


That particular example actually says nothing against you, and certainly doesn't mean the Unix tools are hostile, as the neighbor comment says. The ‘grep|wc’ flow is natural in the Unix paradigm, because each command does its own thing there—it's exactly like functional programming. The ‘grep -c’ thing is a shortcut that may be an optimization but doesn't conform to the ideals.

I don't know if it's easier for humans to remember one command with flags or two commands, but it's certainly best to recognize the principles and learn to use them, which is what the ‘grep|wc’ case demonstrates.


I have to admit that I originally thought this was about useless ways to apply the small domestic animal.


One reason that I use cat is that if by mistake I write ">" instead of "<", I'll lose my whole file. There's no risk with cat.


I know of and have used

``` <file.txt program | program1 | ... ```

And zsh does well doing all this, but honestly I prefer `cat` because I like each logical area to be separated by a `|`. I've tried and I've not succeeded to get rid of the `cat`, so you're going to have to wait for me to die.


While I think the popularity of it in such contexts came about after this page was published, I'd also add "Useless use of dd": the application of dd to do a non-transformative copy to or from device files.

In Unix, "everything is a file" means that even hardware devices are accessible using standard file APIs, and as such, doing procedures like "cp debian.iso /dev/sdg" to write out a USB device with a bootable ISO image is perfectly valid. One could even use "cat debian.iso > /dev/sdg" if desired, though the command doesn't quite convey the same meaning as cp, nor is it a concatenate operation.

In the 21st century however, it has become popular to use "dd if=debian.iso of=/dev/sdg" instead, and for the best I can tell, I think it stems from former DOS and Windows users, that make the assumption that hardware devices are special and require special programs to access, and somehow landed on dd being the special tool. Even though dd is really meant for transformative copies -- the program was originally written to convert documents between ASCII and EBCDIC! Worse than that, too, is that dd's mechanism of reading in and writing out in fixed-size blocks means that it's often much slower than just using cp.


Pretty sure older Unixes wouldn't do "cat debian.iso > /dev/sdg" correctly, though not sure. If nothing else, 'dd' will pad out the final block, which can be handy.

Also, if you have lots of RAM and the file isn't too big, 'dd' can pull the whole thing into RAM before the first byte is written, which could be handy if you have a read I/O error.


With dd you can specify the block size with bs=1M or so. Sometimes you do need this capacity for performance reasons.


Might it also be the destructive nature of writing directly to a device might warrant a more explicit syntax, as-to avoid potential errors? There's no confirmation about copying a iso to a device directly or anything.


Contrary to cp, dd can report progress (status=progress switch)


You could use pv, which I think gives a much better progress bar.


Or the excellent progress¹ which can show progress state for already running commands, great for when a process has been running for longer than you expected. Comes with a bunch of filtering options, and can be applied to all sorts of commands as it just digs about in the file handles.

1. https://github.com/Xfennec/progress


    cat debian.iso > /dev/sdg
Does this win the award?

    < debian.iso > /dev/sdg


While cat could be argued to be overkill (... but not by me...), that's not a useless use of cat because if you pull it out (as you've done) there's nothing to actually read the bytes from stdin and write them to stdout. All you've done is opened two files and closed them again.

(Edited to add: unless you're using zsh, apparently)


Ah interesting! I do use zsh and didn't know this didn't exist in bash.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: