A lengthy discussion alluding to the pitfalls of mixing fork() arbitrarily with libraries and threads, discussing terminal sizes in the line discipline and on the terminal itself (although missing that these aren't necessarily the same), showing how library functions are underpinned by system calls, and basically covering the ground that led to the invention of Daniel J. Bernstein's pty tool in 1990 and others.
There's even scope for observation that, like the number of processes that get spawned and instructions that get executed to increment an integer shell variable by 1 (in the traditional way using expr), it's taking four extra forked processes just to write a coloured shell prompt.
Hacker News comments don't get beyond the useless use of cat aside in the third paragraph. (-:
Huh, go figure. I’ve always used `unbuffer` from the expect package to make things think they are connected to a tty and was annoyed at the bulkiness of that solution (IIRC it’s an invocation of `expect` with a passthrough script).
Besides per-program flags like `--color=always`, you can use the `unbuffer` command¹ from the expect package² to make a program that you're piping act like it's being run interactively (pretty printed, colorized, etc.). Unbuffer is a command that takes a command in its command line, so you run it like
unbuffer rg TODO src/
or
find . -iname '*.md' -exec unbuffer ls {} \; | less -R
or
unbuffer cowsay --rainbow 'Hello from a very colorful cow!' | sudo wall --nobanner
and so on.
If you make an alias or an abbreviation for your shell, it ends up being more convenient than figuring out all the flags for making piped output look like interactive input for each individual program you might want to use that way. :)
Yep, what I had would definitely eat it under either of at least two conditions: too many files and whitespace in filenames.
Your correction fixes both of those issues (the former through using xargs and the latter by splitting the arguments passed to it on the nul character instead of the default (newline, right?)).
find does not pass arguments through an intermediate shell, so there are no quoting concerns with regard to spaces in filenames, etc.
And with the form of -exec terminated by a ; it runs a single command for each matching filesystem entry, so there will be no issues with exceeding the maximum argument size.
You can verify via strace:
$ touch 'a b.md' 'c d.md'
$ strace -fe execve find . -iname '*.md' -exec unbuffer ls {} \;
At the end of the exec chains you should see the following:
No, your solution works correctly in both of those cases. My issue with it is that it spawns a separate `ls` subprocess for each file, whereas mine will do it the minimum number of times.
I have to strongly disagree with "useless use of cat". Using cat first have 3 main advantages making it vastly superior:
1. It allows you to drop context quickly and focus on algorithm. You write "cat filename" and you can free your mind and focus on the rest of the pipes. Without cat, you have to keep filename in your mind while you write the first command.
2. It puts filename closer to the beginning so when you need to repeat for few more files, you just press home and one Ctrl+right, whereas without cat, you would have to move several times right depending on how many switches first command needed which is always random. Plus most of the back and forth switch editing will be in the first command and filename would just get in the way.
3. It works the same for every app. Some apps need file as last parameter, some needs -i, or whatever switch, some use some other random scheme. Cat is always the same.
There was a recent thread on HN about generating HTML using string templates, arguing that if users keep on doing something for years despite constant recommendations against it, the recommendations must be wrong.
I didn't agree with it - it's an ok rule of thumb in moderation but I don't think HTML string templates are a good example. But, `cat` is absolutely the perfect example.
I use shellcheck which automatically corrects all my "useless" uses of cat, and I can tell you there's no more common error in other people's code that it complains about. And the benefits of "fixing" this error are always so marginal, the resultant code so much less readable.
I honestly have to agree ... I tend to use cat because I don't always know what the next thing in my pipeline is going to be and frequently change out later steps in the pipeline. Putting the real first operation in place of cat means it's harder to reuse a particular idiom or "stage" parts of a pipeline that might be slow in a temp file while you're figuring out how to string things together.
> You write "cat filename" and you can free your mind and focus on the rest of the pipes. Without cat, you have to keep filename in your mind while you write the first command.
In bash (don't know about other shells) the < operator sets the stdin of the process to the given file.
< filename.json jq
Here, data is read directly from the given file.
As for the | operator, the stdout of the left-hand process and the stdin of the right-hand process become a pipe:
cat filename.json | jq
Here, data is read from file, copied to a pipe buffer, the other process is
notified (in most cases, by being woken up) that there is incoming data and then it finally reads the data.
Not that any of this matters in today's world of pocket supercomputers but I just don't see how any of your points back up the useless use of cat. It's still useless :)
The cat and pipe are semantically easier to understand though, at least that has been my experience when onboarding new hires.
While Bash has all sorts of nice bells and whistles for various use-cases, I really think they are only worth relying on when you are doing something highly specific. In general mastering wildcards and pipes is sufficient for the vast majority of situations, I think for everything else it's better to use commands that people can easily search or find the man page for.
In the world of pocket supercomputers as you called it, interface is king. So I'd argue that if anything, `<` is the useless one here when we can could just have just used the more semantically obvious cat and pipe instead.
I have honestly not seen a more egregious case of "premature optimization". Who the fuck cares about 1% extra performance on something they are doing on the interactive shell? (And the answer wouldn't even change if the performance difference was 100%.)
What an utterly useless thing to get pedantic about.
Infix operators are very weird, though. They come from math, where every elementary school student has to memorize "please excuse my dear aunt sally" to be able to answer a simple question like "what is 2³ + 4 ÷ 2". C copied this concept, and nobody really ever bothered to memorize C's operator precedence, so they just end up writing Lisp but with the operators in the middle. pow(2, 3) + (4/2). Note that some operators appear prefix instead of infix for no apparent reason.
(You can't just remember what you learned in math class when writing a computer program, because now we have bitwise and, or, xor, left shift, right shift, equality, pointer dereferencing, assignment, logical and, or, modulo, pre-increment, post-increment, bitwise not, logical not, etc. So you have to learn it all again. And it's different for every language.)
Put everything up front, and suddenly you don't have to think about this anymore. (+ (expt 2 3) (/ 4 2)). The computer just does what you say and there are no rules to memorize.
Personally, I always use RPN calculators (dc is my favorite), because I don't trust the calculator to remember operator precedence. They usually do it right, but it's difficult to reason about. (The example would be "2 3 ^ 4 2 / + p" in dc.)
Unfortunately we will never get rpn taught in schools so it will always feel unnatural and weird to many people.
what is unnatural is the complicated case specific precedence rules mid notation requires. rpn still has precedence rules but because they are simple it appears that there are none and because they are universal there is no confusion as to where any new operators fit in.
I wonder if anybody has ever written a linguistic history of math notation and how it ended up the way it did.
>every elementary school student has to memorize "please excuse my dear aunt sally"
In Indian schools it used to be BODMAS, for Brackets Of Division Multiplication Addition Subtraction, for operator precedence, for which our teachers told us the mnemonic was badmaash, meaning a scoundrel or a villain, in Hindi. Guaranteed to make kids remember it. :)
Of, as in half or a quarter or ... of some other number.
I think FORTRAN copied it first. Which makes sense, because of the original scientific target market for FORTRAN, represented in its name derived from “formula translating system”. Translating “a sufficiently rich mathematical language”, as Backus put it, was an explicit goal as early as 1954.
C then (presumably) just followed the convention established by FORTRAN.
Yes but it cleanly separates code and data, and notably on an interactive shell makes editing a historical command much simpler site to cursor positioning.
It’s like an “article” or “preposition” in English. Arguably not required, but in practice invaluable. One could argue that shell could have been designed better to not need it but, meh, that ship sailed. Uses of “cat” are ubiquitous and excellent. Not sure why people remotely care when there are bigger problems to worry about.
I agree, I use cat all the time, it is a normalization of access patterns. it lets you isolate the file input and output into it's own section of the pipeline. There is nothing wrong with using the shell redirection operators but I like seeing the input and output in it's own explicit pipeline section.
get file data and put it on the pipe: cat
get pipe data and put it in a file: tee (with huge caveat you have to do something with stdout so you still need to mess around with the redirection operators. tee does not really gain you any normalization. Sigh. I would make a tee that is just "tee > /dev/null" but I am used to it at this point. Stockholm syndrome?)
It's just OCD people not having anything to bitch about in life. But to add to the point:
4. You are just ctrl-a away from changing source
the common use for me for example
cat f.log |grep sth # okay it shows in history; C-a
tail -f f.log | grep sth # let's try to repeat it live; C-a
zcat f.log.1.gz | grep sth # no dice, let's try to dig a bit in history
A good friend of mine called "cat file.txt" UUOC on the grounds I should have run "less file.txt". If you're writing a script, sure. If it's the first step in a pipeline that you're typing on the fly, does it really matter?
cat file.txt is always worthless. Best case scenario, you will end up with text that fits into one scree, but more often than not you will have to retype the command with $EDITOR filename or less filename anyway and now you just wasted (precious) time doing what could've been done few seconds ago.
Sure, few seconds isn't a lot, to someone who opens terminal once a year.
If you're using a terminal multiplexer to handle scrollback, you can navigate the output using your usual pager whether you explicitly invoked it in the pipeline or not. And that leaves it searchable in your scrollback history even after you've gone on to run several more commands, unlike if you pipe it directly to a pager.
Additionally, if you're using a nicer colorizing pager, like bat, it'll automatically send the output to your pager for you when it detects that the output wouldn't fit on screen.
For both of those reasons, I almost never use `less` directly on a file.
cat file leaves the contents on my screen as I compose my next shell command, as opposed to erasing them as quitting the editor flips back from the alt screen.
I sometimes use the following prevent screen clearing when exiting vi:
:set t_te=
Useful because vi makes it more convenient to select the chunk of file I want to view after exiting. More convenient than cat/head/tail at least. Competitive with more. :)
After your cat command has run, type 'vim ' then hit the escape key plus the period/full stop key. That inserts the previous command's parameters/arguments at the cursor.
Obviously you don't have to use vim, any other editor or whatever can be used.
My terminal emulator has supported scrollback since I started using it in '95. I didn't get the impression at the time that it was a particularly new feature :)
pipes have an inherent data-first natural order processing that is counter to most programming languages
<take this file> | <do this to the file> | <then do this>
And UNIX philosophy says that is a good thing. I takes a universal UNIX technique (the pipe) and the programs just have to concentrate on processing input and output.
Sure you CAN use the args. Look up the fucking specific flag and syntax after the bajillion flags in the man pages (which all seem to lack good examples that would communicate this much more quickly.
But the cat | jp seems FAR more UNIXy. I agree, it is stupid to criticize cat there. This is pointless fad thinking by the author. Ignore it. There MAY be certain performance advantages possibly in certain high volume conditions like large files or something similar, if the program has "better" file processing than cat + a pipe. Maybe.
Anyway, I really like the data flow of a pipe command like:
cat <somefile> | sort | head -n 10
Meanwhile, this is how a programming language would look (one liner version):
head( sort( cat(<somefile) ), 10)
The logical flow of that, where you get data, sort it, then head it, is natural in the piped instance, but the practically every language ever version of it you need to parse the expression first, extract the inner steps, subparse / resolve the args to the head, then get the value.
The expression is "inside out" in a normal progarmming language. You have to turn it inside out: PUSH head op on the stack, PUSH sort onto the stack, resolve CAT, POP sort op, POP stack op.
The CLI + pipes has no stack, just a handoff
I have started doing "data first" programming a lot with say Groovy, where you can do:
String passed_in_filename = "/path/to/file"
passed_in_filename.
with { File.lines(it)}.
with { it.sort()}.
with { it.head(10)}
with a lot of the I/O or transform data code I write. (The with construct in groovy takes the output of the invoked code and then passes it to the provided closure, so I can chain the closures in the natural way that piped output works on the CLI)
That function takes no argument. I've never seen initializing a single variable and returning the result of that initialized variable from a function call taking no argument called "memoization".
Memoization is a specific type of caching with a space/time trade-off where the result of a particular (set of) input(s) are cached.
I find it quite a stretch to call initializing a single integer for a function taking no parameters "memoization". I'd just say the function's return value is "cached".
I mean: by kinda, sorta, bending the definition of memoization (for example by changing "caching the function's results based on the inputs" to "caching the only result for there's no input to the function") you could say that it is actually memoization but then I'd argue this is not a great example of what memoization is.
> an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again
"the same" could be none, default, or null, I think.
I mean it's a perfectly fine hill to die on, but I think "memoization for n=0 arguments" is cromulent. Or even "memoization for n=1 argument, which only has one possible value" (like a unit enum in Rust). I could see both positions being justifiable tbqh.
I enjoyed the _meat_ of the article, but really disliked the stream-of-thought narrative of the post. But I know it's this author's style. So that's fine.
Anyway. Is there something better than just screen-shotting your terminal window and making PNGs or GIFs for stuff like this?
That's an interesting thought. Obviously one can present this stuff as just in-line text instead of using images, with appropriate styles for monospacing and colours (although that does then highlight that these fancy prompts often make unwarranted assumptions about what's in the Unicode Private Use Area). There are many WWW pages out there doing exactly that. But I for one have not heard of a tool that can screenshot a text terminal emulator's buffer straight into HTML.
That's not recording the terminal. That's recording the characters, escapes, and control sequences sent to your terminal. It isn't actually recording what the terminal does in response to them. It isn't screenshotting, and terminal emulators differ in what they do; the same output potentially appearing very differently.
Screenshotting would involve reading a Linux KVT /dev/vcsa file, or a display file with my terminal emulator, or using a Kitty plug-in to get at its display buffer contents, or attaching to a screen/tmux server and reading out its display, or similar (if even possible) for other terminal emulators.
Ironically, BRLTTY can do some of the aforementioned. But it most certainly doesn't spit out HTML.
That's not screenshotting the actual terminal display buffer, and it relies on whatever one uses to do the conversion to process all of the escapes and control sequences in exactly the same way as the terminal emulator did.
There are some long standing bugs against the aha tool pointing out that it does not even come close; including not coping with switching to and from the alternate display buffer. No-one seems to have yet noticed that it doesn't recognize ED or EL or any relative cursor motions, so will have serious problems with things like ZLE in the Z shell, or just plain clearing the screen; or that it doesn't handle 24-bit colour in the same ways as some popular terminal emulators do.
> I enjoyed the _meat_ of the article, but really disliked the stream-of-thought narrative of the post. But I know it's this author's style. So that's fine.
This is why it was such a struggle for me to get through Godel, Escher, Bach as well. If I'm engaged on the topic, I want the author to get straight to the point, not to try to anticipate my questions or lead me with someone else's potted questions.
> Anyway. Is there something better than just screen-shotting your terminal window and making PNGs or GIFs for stuff like this?
I've done it manually in HTML/CSS for documentation, to preserve copy-pasteability. It's straightforward: body bgcolor, and font-family (something fixed width), then a bunch of spans with colors for syntax highlighting.
Would love to find something automatic though! E.g.:
Well there's http://www.andre-simon.de/doku/ansifilter/en/ansifilter.php for the latter. But it doesn't do direct screenshots. So one would have to find something else that would take the terminal emulator's display grid and, ironically, turn it back into a sequence of characters, escapes, and control sequences.
> Is there something better than just screen-shotting your terminal window and making PNGs or GIFs for stuff like this?
There is and it's been on my TODO list forever. In fact, the article you just read (congrats on getting through my narrative devices) was written /while tackling that/.
https://asciinema.org does it for "moving pictures", it shouldn't be too hard to do it for stills.
These sort of articles are popular for a reason, maybe it reflects a pedagogical trend both ways (preferences in teaching and how one likes to be taught). Personally I gravitate in the opposite direction.
I’ve come to appreciate “dry”, technical guides void of overt personality. I’m not a fan of “Why_’s Poignant Guide to Ruby” in the least, but there is a visceral element in its whimsicalness (and other why_ materially) that similar efforts don’t compare to (I count the listed guide here among this). For me, why_’s work is an outlier in this “literary tech writing” genre. If I ever intended to learn the language, I would contemplate scrubbing out the comic strips if I couldn’t find a better alternative guide.
Yeah I pretty much skipped over it looking for some tidbit I don't know as terminal workings are pretty well known to me already but ugh it's painful to read for that type of article.
I think that style could work well when you're doing say a post-mortem of some failure, where you want to show the whole train of thought that lead to it or to finding a fix, but for this kind it just doesn't work very well
I couldn’t finish it, it felt like noise to me. It was also frustrating because there was little content and a lot of words/pictures (even the prompt was distracting).
I guess I’m getting too old for this - I’m leaving this to the younger crowd !
> There! Now you have pretty colors! But say your JSON file is actually quite large, and it doesn't fit in your terminal window, so you want to use a pager, maybe something like less.
> Well, showing a less invocation is actually quite annoying, so instead we'll just use cat instead.
If `cat` is a pager on your system (which is the only way I make sense of this)... Why? And what's so annoying about paging to `less` (retaining colors) than your paging `cat`? I also don't see why the author doesn't `< $FILE | jq $FILTER` instead instead of `jq $FILTER $FILE`. Finally , if it makes the process easier for the only user of the script, that `cat` served a purpose and was not useless.
I imagine the comment about less being "annoying" to show is just about conveying the interactive TUI in an unchanging plain-text form, given that the terminal wouldn't ever contain both the invocation and the output.
And the comments about useless cat is just a gag/OCD.
Maybe a beginner question (since IDK much about Rust or Tokio) but in the final Tokio example, what happens if the child program writes something and exits while we're in the read block? Skimming the docs[1] it seems the `select!` macro does not check its branches in order. So wouldn't it be possible that the next loop iteration processes the exit first and some output is lost?
Upon that the child application closing, you loop reading from the primary using a synchronous but non-blocking read, until you finally get back a zero byte read. Since the child process is now dead, and nothing else should be writing to the TTY, once you get back a single zero byte non-blocking read, you know you have all the data.
Ideally in C on Linux, you would structure this code a bit differently. (I'm only talking about the C API here, because I don't know rust or tokio well enough to have any real chance of knowing how to do this correctly for those.)
To avoid having to use multiple threads, or handle SIGCHLD, you would fork with the clone() call and CLONE_PIDFD, to get back a PIDFD. [1]
Now you can use poll() against both the parent TTY descriptor and the pidfd. Regardless of which-one became ready, you will do the synchronous non-blocking read loop, to get any data available. Then you conditionally run the process exit code if was the pidfd that became ready, including reaping the child process with wait(), and closing the pidfd. Otherwise loop back to the poll() call.
Needless to say this sort of thing is extremely complicated, as there are many potential race conditions, and getting things wrong here is quite common. I've seen multiple languages have bugs in their code for reading just normal process output, and many lack built-in support for running in a pty, which has pretty much all the same considerations.
Footnotes:
[1] You could also do a normal fork, and then get a PIDFD from the resulting PID using pidfd_open(). This also works fine, but the lack of a glibc syscall wrapper for pidfd_open may be annoying, and it is an extra sycall. This is safe though, as we are the parent process and until we reap the child by calling one of the wait() functions, the PID will not be reused.
Colored output feels like something a syscall should handle. Something like setcolor, unsetcolor, separate from the typical printf functions. At least escape sequences are easy to detect and remove, albeit a hassle.
The sad thing is that none of them are right. To do it properly, one effectively has to take the full state machine of a DEC VT-like terminal emulator and stick it into a filter program. Regular expressions in perl/sed/python aren't enough. There are quite a number of things that can be output to real or emulated terminals that none of these collections of regular expressions will strip correctly. And conversely, these regular expressions will wrongly strip things that, whilst odd, will actually work on real/emulated terminals, including C0 characters that take effect in the middle of control sequences.
I hate colorized output with a passion and disable it in my personal shell. You never know what the background color of a terminal might be, and there is a good chance at least one color will disappear entirely into the terminal background.
Nothing grinds my gears like the DARK BLUE on BLACK colors in the UEFI shell.
Each to their own of course, but I can't imagine _not_ using colours. It would be like using a code editor with syntax highlighting turned off. Madness!
Even on revered terminals such as xterm or rxvt (and variants) it's often possible to modify the displayed characteristics of ANSI colour sequences, see:
I'm not sure that this is also possible on, say, the Linux console itself, though I suspect it may be.
I too had a first impression over a quarter-century ago the first time I saw a colourised 'ls' output on my first freshly-installed Linux system that they were garish, and have often griped about dark-blue-on-black (which is how I'd discovered the palette configurations referenced above). But I quickly found the feature useful and helpful and miss it when on a true monochrome device (of late: an e-ink tablet with Termux installed for a Linux userland).
On Linux, if something annoys you, there's almost always an alternative or configuration to alleviate that.
>You never know what the background color of a terminal might be
You can set the background color to. So if you set the background to dark grey you can assume the background is dark grey. Assuming the terminal supports color that is.
In the specific example I give, no I cannot change the background color of the UEFI shell as far as I know. That's why the dark-blue on black color chosen by the UEFI shell people is so horrible.
* https://jdebp.uk/Softwares/djbwares/bernstein-ptyget.html
There's even scope for observation that, like the number of processes that get spawned and instructions that get executed to increment an integer shell variable by 1 (in the traditional way using expr), it's taking four extra forked processes just to write a coloured shell prompt.
Hacker News comments don't get beyond the useless use of cat aside in the third paragraph. (-: