> `pee` [...] It runs the commands using popen, so it's actually passing them to /bin/sh -c (which on most systems is a symlink to Bash).
Do not assume /bin/sh is Bash!!
On Debian-based systems, including Ubuntu, Dash is the default non-interactive shell. Specifically, Dash does not support common Bashisms.
(Not to mention Alpine, OpenWRT, FreeBSD, ...)
This is a bit of a pet-peeve of mine. If you're dropping a reference to `/bin/sh -c` like the reader knows what that means, then you don't need to tell them that "it's a symlink to Bash." They know their own system better than you.
You'll never know for sure what shell is the default, so write your scripts to a minimum shell family, and encode the name of that shell family in the shebang.
The default shell, for either the entire system or an individual user, could be:
- A Bourne shell
- A C shell
- A POSIX shell
- A "modern" shell, like Bash, Zsh, Fish, Osh
Use the following shebangs to call the class of shell you expect. Start with /usr/bin/env to allow the system to locate the executable based on the current PATH environment variable rather than a fixed filesystem path.
#!/usr/bin/env sh
# ^ should result in a Bourne-like shell, on modern systems, but could also be
# a POSIX shell (like Ash), which is not really backwards compatible
#!/usr/bin/env csh
# ^ should result in a C shell
#!/usr/bin/env bash
# ^ should result in a Bash shell, though the only version you should
# expect is version 3.2.57, the last GPLv2 version
#!/usr/bin/env dash
# ^ you expect the system to have a specific shell. if your code depends
# on Dash features, do this. otherwise write your code using an earlier
# version of the original shell family, such as Bourne or POSIX.
If you need to pass a command-line argument to the shell command before interpreting your script, use the fixed filesystem path for the shell. Due to bugs in the way kernels execute scripts, all arguments after the initial shebang path may be sent as a single argument, which is probably not what you (or the shell command) expect. (e.g. use #!/bin/bash --posix instead of #!/usr/bin/env bash --posix as the latter may fail)
Different shells have different implementation details. For example, Bash tends to skew toward POSIX conformance by default, even if it conflicts with traditional Bourne shell behavior. Dash may try to implement POSIX semantics, but also has its own specific behavior incompatible with POSIX.
In order to get specific behavior (like POSIX emulation), you may need to add a flag like --posix, or set an environment variable like POSIXLY_CORRECT. For shells that support it, you can also detect the version of the shell running, and bail if the version isn't compatible with your script.
Here are some references of the differences between shells:
- Comparison of command shells[1]
- Major differences between Bash and Bourne shell[2]
- Practical differences between Bash and Zsh[3]
- Fundamental differences between mainstream \*NIX shells[4]
I use `ts` quite often in adhoc logging/monitoring contexts. Because it uses strftime() under the hood, it has automatic support for '%F' and '%T', which are much easier to type than '%Y-%m-%d' and '%H:%M:%S'. Plus, it also has support for high-resolution times via '%.T', '%.s', and '%.S':
Assuming semi-recent bash(1), you can also get away with something like
while read -r line; do printf '%(%F %T %s)T %s\n' "-1" "${line}"; done
as the right-hand side/reader of a shell pipe for most of what ts(1) offers. ("-1" makes the embedded strftime(3) format string assume the current time as its input).
Not a moreutil, but I recently discovered `pv`, the pipe viewer and it's so useful. Like tqdm (Python progressbar library) but as a Unix utility. Just put it between two pipes and it'll display rate of bytes/lines
Apparently it's neither a coreutil nor a moreutil.
If you have a long running copy process running but forget to enable progress, you can use the "progress" utility to show the progress of something that is already running.
pv file.tar.gz.part1 file.tar.gz.part2 | tar -x -z
Just like cat, pv used this way will stream out the concatenation of the passed-in file paths; but it will also add up the sizes of these files and use them to calculate the total stream size, i.e. the divisor for its displayed progress percentage.
I have also discovered that certain implementations of dd have a progress printing functionality that can be used for similar purposes. You can put a "dd status=progress" in a pipeline and it will print the amount and rate of data being piped!
This dd option is not as nice as pipe viewer but it's handy for when pv isn't around for some reason.
Even if you don't pass this argument, you can poke most implementations of dd(1) with a certain POSIX signal, and they'll respond by printing a progress line.
On Linux, this is SIGUSR1, and you have to use kill(1) to send it.
On BSDs (incl. macOS), though, the signal dd listens for is instead called SIGINFO (which probably makes this make a lot more sense for why a process would have this response to it.) Shells/terminal emulators on these platforms emit SIGINFO when you type Ctrl+T into them!
Bonus fact not mentioned in the above article: dd used in the middle of a pipeline will still "hear" Ctrl+T and print progress, since signals generated by a shell (think: SIGINT from Ctrl+C) are propagated to all processes in the process group started by the command. Test it yourself:
...except in the absolute most-common case, as the article mentions: making or restoring backups of disks.
Not because dd is needed for this case, mind you. Rather just because disks are big, and so it's a lot faster to (be able to) specify a big block size. One much larger than the disk's own sector size; one much larger, even, than the current 128KiB default of cat(1)!
Consider: cat's design principle is that it's reading a stream (or series of streams) serially, and writing that concatenated stream serially, as a synchronous back-and-forth process. cat(1) aims to have a low, fixed memory usage: it reads data into a buffer, then writes data from that buffer to the output. It only ever uses the one buffer, and that buffer is static — part of the program's .bss section.
dd(1), meanwhile, is an asynchronous pipeline between the read and write ends. It doesn't care how much memory it uses; it'll use however much is required to do the job you've configured it to do. It doesn't mind submitting a bunch of simultaneous read requests; assembling them out-of-order into a buffer as large as you like; and then writing out the buffer with a bunch of simultaneous write requests. And it doesn't need to fully assemble the "read batch" before turning it into a "write batch" — dd(1) uses a read-write ring-buffer, so as soon as the earliest contiguous read-results are available, they get fed into the write queue.
This design means that dd(1) can take strong advantage of devices like NVMe that use high internal read/write parallelism when slammed with a high IO-queue-depth of read/write requests. Which is exactly what you're doing when you specify a `bsize` to dd(1) that's higher than the disk's sector size.
Try it out for yourself — do something like:
for p in $(seq 5 20); do
bs=$(echo 2^$p | bc)
echo "bs=$bs"
time dd if=/dev/nvme0n1 of=/dev/nvme1n1 bs=$bs
done
And, while each one is executing, do an `iostat -x` and look at the `aqu-sz` column. The speed of the copy should peak at exactly the same time that your aqu-sz hits the effective limit of the read-parallelism of the source disk or the write-pararallelism of the target disk.
Yeah bit me in the butt once on Mac OS usr1 killed dd. When I want progress of dd and didn't define status I mostly use progress now. Also works with many pther utils like cp, xz and all the usual suspects
For copying disks, an even more powerful tool is ddrescue. It works exactly like dd but writes its progress into a map-file that can be used to pause and resume the operation any time. It can also skip certain sectors known to be broken.
Super useful if you have a big slow disk that you have to copy under unreliable conditions. Saved my ass several times with old partially broken hard drives.
I use `sponge` and `ts` (mentioned in the article) pretty regularly, and am really happy for them.
I have used `isutf8` a fair amount in the past, but I find it mostly redundant these days (thankfully!)
The other one that I don't use very often, but is absolutely invaluable when I do need it, is `ifne` - "Run command if the standard input is not empty". It's like `-r` from GNU `xargs`, but for any command in a pipeline.
I'd never heard of the :cq command in vim before. Seems useful, but in practice it's so unknown that things like editing the git commit message cannot rely on it and instead check whether the file has been changed. Also, reading its documentation, it probably would be better named :cqall .
I was wondering that too although I don't have access to vim right now. What's the punch line ?
EDIT : The difference with :q! is the exit code !
(Yes, and :wall is actually the :update command on all your buffers, that is, unlike :w, buffers are written only if there has been changes. Bad naming is the mother of all pedagocical pain)
Yeah, I should've written "cannot rely on that alone and also check". I've worked with vi, later vim, for 34 years and read about it here first; ddg'ing for it doesn't give many hits.
I just learned about vidir [1]. Emacs Dired [2] can rename & delete files by editing the buffer directly, and let's say I was thrilled when I saw someone replicated that behavior as a general Unix tool.
I have a personal vendetta against moreutils because it provides a vastly inferior version of parallel compared to GNU Parallel. GNU Parallel alone provides more value than moreutils, all of which can be replaced by UNIX one-liners or Emacs.
Yes, the exit code. See e.g. `:help cq` in vim.
:q! and ZQ will yield exit code 0, which sometimes is not what you want if you want to ensure some task is properly aborted.
vidir within ranger is really nice. vipe is also pretty cool. Mostly I use vipe for editing my clipboard contents and then sending the modified version back to the clipboard, or occasionally editing some text stream before sending it to my clipboard, such as some grep output I only want some of.
moreutils parallel can also come in handy for quick command parallelization (not to be confused with GNU parallel which serves a similar purpose but can be more complicated)
The parallelism isn't part of POSIX though (AFAIK), that's an extension by whoever wrote your xargs.
If what you really mean is that it's already installed on every machine you use, fair enough. But it's not strictly portable in some standards-based sense.
In case anyone was wondering, the moreutils tools:
chronic: runs a command quietly unless it fails
combine: combine the lines in two files using boolean operations
errno: look up errno names and descriptions
ifdata: get network interface info without parsing ifconfig output
ifne: run a program if the standard input is not empty
isutf8: check if a file or standard input is utf-8
lckdo: execute a program with a lock held
mispipe: pipe two commands, returning the exit status of the first
parallel: run multiple jobs at once
pee: tee standard input to pipes
sponge: soak up standard input and write to a file
ts: timestamp standard input
vidir: edit a directory in your text editor
vipe: insert a text editor into a pipe
zrun: automatically uncompress arguments to command
Confusingly, moreutils parallel is not "GNU parallel". Moreutils parallel is very simple, while the other parallel is very featureful. Linux distributions can deal with the conflict, but bad package managers like homebrew cannot.
Because capturing the output of `seq` requires spawning a whole separate process (significant for small sequences) and shoving all the data into a single buffer (significant for large sequences) rather than working incrementally.
Do not assume /bin/sh is Bash!!
On Debian-based systems, including Ubuntu, Dash is the default non-interactive shell. Specifically, Dash does not support common Bashisms.
(Not to mention Alpine, OpenWRT, FreeBSD, ...)
This is a bit of a pet-peeve of mine. If you're dropping a reference to `/bin/sh -c` like the reader knows what that means, then you don't need to tell them that "it's a symlink to Bash." They know their own system better than you.