Hacker News new | past | comments | ask | show | jobs | submit login
Practical Shell Patterns I Use (zwischenzugs.com)
130 points by zwischenzug on Jan 10, 2022 | hide | past | favorite | 67 comments



The one that I find repeatedly useful:

find . -name <some pattern> | { while read F; do <do something to file F>; done; }

This lets you do things of almost any level of complexity to files or data listed as input and it doesn't require temporary files so you can process a lot of things.

e.g. you might use sed -i to alter the files in some way or you might find media files of some kind into ffmpeg to convert them.


You can use another construct built into find that negates the need for a for loop:

    find . -name foo -exec do something to {} \;
For example:

    find . type f -exec chmod 644 {} \;

    find . -size +1G -exec ls -lh {} \; |awk '{ print $5" "$9 }'


Yes, I think most of the time this is all you need.

I just like the for loop because I can do filename manipulations in the loop and the input to the loop doesn't always have to be find - it can be "grep <something> -l ./" for example - so I use it even if it's verbose because it's only one thing to remember.

Perhaps that is a problem with bash and the tools - there are so many features that take years to discover. Every few weeks I come across a new one.


If you really had a lot of ffmpeg work to do, then GNU xargs can let you run the jobs in parallel, one for each of your available CPUs. You can even adjust the number of processes up or down by sending the xargs parent the appropriate signals.

https://www.linuxjournal.com/content/parallel-shells-xargs-u...


xargs is a fantastic tool - there's another similar one, think called GNU parallel which is worth a look:

https://www.gnu.org/software/parallel/


Unfortunately filenames can have newlines so this isn't safe for robust scripts.


One can make it safe if it's worth the effort. In practice I don't ever come across such files but if I was trying to write a script that would work everywhere I would have to use

  find -print0 | sed -r 's#\r#\\r#g;s#\n#\\n#g;s#\x0#\n#g;
... and then I would have to remember to use the filenames in such a way that my escaped newlines would be converted back to real ones at the point where I accessed the file

Newlines in filenames is one of those things that one would personally try to avoid purely because of the problems it can cause.


It is fine for a personal script where you know your files don't have newlines, but for a robust script as part of a package running on other people's systems you'd have to take it into account.


> ps -ef | grep VBoxHeadless | awk '{print $2}' | xargs kill -9

There's pkill for that :)

  pkill -KILL VBoxHeadless
(the above assumes that the binary is called `VBoxHeadless`)


   kill -9 $(ps aux | grep .exe | sed -r 's/ +/ /g' | cut -d ' ' -f 2)
I fucking hate wine. I wouldn't use it at all if it weren't for steam. And this is the only way to kill a wine process.


it's easier to use pkill -9 -f <pattern> usually. depending on the goal here.


let me introduce you to TASK_COMM_LEN

    $ pgrep systemd-journald
    $
and:

    $ pgrep systemd-journal
    431
    $
Guess how long it took me to figure this out. This affects killall, pkill, and pgrep. You can use the "-f" flag with pkill/pgrep.


Generally you might want to use the `-x` argument to `pkill`.


Or killall.


> Here’s a real-world example I used recently

    gci -rec | ? name -match zabb | ? Extension -eq .deb | % Name  
Of course it could be replaced with

    gci -rec | ? name -match zabb.+\.deb$ | % name  
And you can filter on any property (a little excerpt):

    Mode                : --r--  
    BaseName            : zabbix-release_5.0-1+bionic_all  
    Length              : 4240  
    DirectoryName       : /home/appliance  
    FullName            : /home/appliance/zabbix-release_5.0-1+bionic_all.deb  
    Extension           : .deb  
    Name                : zabbix-release_5.0-1+bionic_all.deb  
    CreationTime        : 5/11/2020 12:07:35 PM  

> ps -ef | grep VBoxHeadless | awk '{print $2}' | xargs kill -9

    Get-Process apach* | Stop-Process -Force  
But 'pOwErShEll iS sO vErBoSe'

> Find Files Ending With…

    gci / -re | ? name -match \.deb$  
Would take longer, sure, but still the same Get-ChildItem cmdlet, no need to remember another two utilities

> $ env | grep -w PATH | tr ':' '\n'

    PS /home> ( env ) -match '^PATH' -replace ':',"`n"  
    PATH=/opt/microsoft/powershell/7  
    /home/appliance/bin  
    /home/appliance/.local/bin  
    [stripped]  
    
Again, just built-in functions, no external utilities

As usual - there is a ton of useful and nice QOL utils in *nix world, but it always feels like scratching left ear with the right hand.


It event gests way easier then that in PowerShell:

> Get-Process apach* | Stop-Process -Force

    ps apach* | kill
> gci -rec | ? name -match zabb.+\.deb$ | % name

    gci -rec | ? name -like zabb*.deb | % name


> ps apach* | kill

Not in Linux, because it calls native binaries there.

> gci -rec | ? name -like zabb*.deb | % name

Yep, even simpler. Just the original example was with regex, so I made a comparable regex example.

> $ env | grep -w PATH | tr ':' '\n'

Today it occurred to me what you can just use split:

  ( env ) -match '^PATH' -split ':'


Linux is clearly rotting my mind.

   $env:PATH -split ':'


> Quick Infinite Loop

I use this a lot for polling type commands:

  do_every_n_secs "ps -eaf | grep blablabla" 5


  function do_every_n_secs (){
    cmd=$1
    secs=$2
    if [[ $secs -eq "" ]]; then
      secs=10
    fi
    while true; do
      echo "################################################################";
      date;
      eval $cmd
      sleep $secs;
    done
  }
```


In case you're not aware watch(1) performs basically that task: watch -n 5 "ps -eaf | grep blablabla". It can also highlight changes in the text, break itself out of the loop under conditions, and a few other things.

The major difference between your function and watch being that watch will use the alternate screen for output if possible.


Nope, I was not aware. Neat.


The grep above will match itself in the output. One way that I have seen to filter it out is grep -v grep, but if you are actually looking for grep processes, then this won't work.

The shell can also filter these out. Consider this basic shell function:

    pps () (

    IFS=\|

    ps -ef |
    while read a
    do case $a in
         *$1*|+([!0-9])) printf %s\\n "$a";;
       esac
    done
    )
This uses a shell pattern instead of a regular expression (shells with more features than POSIX can also use regex). It's actually a dual pattern, either no numbers which match the header, or a match of the pattern specified as the first function argument.

The function uses a subshell () instead of command grouping {}. This allows IFS to be manipulated without modifying this setting outside the function, at the cost of a fork. POSIX doesn't support local shell variables; if your shell does (with typeset or local keywords), it will be slightly more efficient.

My function would slot into yours, but yours would not work in the Debian default (dash) shell, because the "function" keyword is not defined in POSIX.

POSIX approaches are the most important, because you can use them just about everywhere.


Is there a way to make `watch` do this instead? It always goofs up my output, but it would be cool if you could use it here.


Interesting stuff. One small correction that I noticed:

   $ sudo locate cfg | grep \.cfg$
should contain quotes, like this:

   $ sudo locate cfg | grep '\.cfg$'
because the shell swallows the backslash.

But locate(1) supports globs, so I think this could be simplified to just:

   $ sudo locate '*.cfg'
But on my Ubuntu, /etc/updatedb.conf has PRUNE_BIND_MOUNTS="yes", so my /home, on a separate partition, does not get updated. So I often resort to:

   $ find -name '*.cfg'
somewhere in my $HOME directory.


You can create a separate locate database for /home if you want. Set it up for periodic updates (via cron or other means) and then use "locate -d <path_to_db>" for searching.


Or using globbing in zsh:

    $ echo **/*.cfg
You can even order my most recently modified:

    $ echo **/*.cfg(om)
`setopt EXTENDED_GLOB` needs to be on.


You might want to add -b:

       -b, --basename
              Match only the base name against the specified patterns.  This is the opposite of --wholename.
To ignore files which have .cfg somewhere in their path, but aren't called .cfg. Or you might not!


bash 4 supports `|&` as an alternative to `2>&1` which looks better in pipelines e.g.

  $ docker logs container |& grep word


I was looking for this just the other day, but thought it would have been the other way around (&|, consistent with &>), and moved on with 2>&1 |


Can anyone recommend a modern explanation of sed's more advanced features?

I'm ashamed that, despite having learned many other ostensibly more complex systems and languages, sed continues to elude me. Of course I get the usual 's/foo/bar/g' stuff with regular expressions and placeholders, but I still don't really understand most of the other commands or, more usefully, the addresses.

Perhaps I need to spend quality time with the manpage in a cabin in the woods, away from modern distractions, but so far it has proven impermeable.

Perhaps I'm missing a proper understanding of its 'pattern space' vs 'hold space'.


Unlike the GNU sed manpage, the info docs are really quite thorough and contain many examples. They cover the vast majority of the features, and they call out GNU specific extensions so they're useful for learning non-GNU sed too.

They're available in HTML too¹, if you don't like the info user experience.

¹ https://www.gnu.org/software/sed/manual/html_node/index.html


Awesome! They are indeed much clearer than the FreeBSD manpage I have on macOS.


https://www.grymoire.com/Unix/Sed.html is a good resource to get started, though it's definitely something you should not just read due to the fact that sed code is pretty hard to read. Have a terminal on the side and try it out live. Once you understand pattern and hold spaces sed becomes a pretty fun esoteric language :)


I try to use "smaller" commands instead of more versatile but complex alternatives. I'm not sure if I get better cpu/io/etc usage by using cut or tr instead of sed or awk (or perl and so on) but it is a pattern I followed by decades, at least for simple enough commands. And I may have something less to remember if I do i.e. rev | cut | rev instead of awk to get the last field.

And having something less to remember, and lowering the complexity for the reader (that not always be me, or if I am, may not be in the same mind state as when I wrote that) is usually good. But it also is having predictable/consistent patterns.


As an aside, "zwischenzug" is a great name. A "zwischenzug" is an "inbetween move" in chess that might happen in the middle of a sequence, and can often lead to unexpected results if one side hasn't been precise in their move order.

https://www.chessjournal.com/zwischenzug/ for example has some more info if you're curious.


One feature I really wish for in shells is something similar to Perl's unquoted strings / single line concise HEREDOC syntax / raw literal syntax. e.g.

  $ echo q(no ' escaping ` required " here)
  no ' escaping ` required " here
This would make typing sql / json much easier. To my knowledge none of the shells implement this. Does anyone know why?


The POSIX shell was set in stone in the early '90s. The standards board actually removed features from the Korn shell in order for Xenix-286 or comparable systems to be able to run it in a 64k text segment, with clean and maintainable C (ksh88 is very ugly C).

The standards for the POSIX shell are controlled by the Austin group/OSF, and they are not receptive to changes, unfortunately.


Going to echo: Bash is bad. Bash is a feature-impoverished language and belongs in the dustbin. I don't understand why we script with one hand tied behind our back.

With Python I can install "click" and get great argument parsing and multi-command support with a simple install. Bash's argument parsing is much more verbose and arcane.

Sure, it has nice ways to combine tools and parse strings and such. However: You could also implement those nice abstractions in a higher quality language. So a good thing about bash does not cancel out all the bad.

I would LOVE a toolkit that is:

- A single binary

- That parses a single file like make ( but NOT a makefile )

- That uses a well known, powerful language.

- That lets me declare tasks, validations ( required arguments, etc ), and workflows as a first-class citizen

- With out of the box `--help` support

- That lets me import other makefile-like files to improve code re-use ( both remote and local )


It feels like trying to program in an esoteric language, or a toy language made by a 16yo.

It's nice if you have no logic or loops or variables, but just goes downhill from there.

I really don't know why we can't just take a fork of MicroPython, agree that we are not gonna make any backwards incompatible changes anytime this decade, and none at all without changing the name, and then just include it everywhere.

All these "Let's make a better shell" projects are still held back by being shell. They are trying to be a UI and a language at once, and they're just OK at UI and kinda crappy at programming.

I think the closest tool to what you describe is Ansible which is in fact wonderful.


I think the features that it has make it the language that it is.

If you try to remove the need for combining other programs by making everything builtin you wouldn't design it the way it is.


As much as I like these, these are also how you blow both feet off.

As long as you're just exploring (everything is read only), you're okay.

The moment "kill" and "rm" (anything mutable) come into the picture on a long shell pipeline, stop. You probably don't want to do that.

You need to be thinking very hard to make sure that an errant glob or shell substitution isn't going to wipe out your system. Make a complicated pipeline that creates a simple file, examine the simple file by hand, and then feed that file to "kill"/"rm"/"whatever" as close to unmodified as possible.

You may still blow your system away. However, at that point, you've given yourself the best chances to not do so.


Being able to write a file directly in the shell with heredocs is so cool, but I know I'll never use it because it's far quicker to just open vim and type with all the commands and verbs and motions that I'm used to.


Ah, but that only writes the file once. Heredocs can write a new file, in a different position and with different variable replacements, as many times as you like. Very powerful.


I’m not sure I follow. It’s pretty straightforward to write the contents of a vim buffer to a file.


I know there are many ways the same thing can be done in the shell but there are so many problems here. Please take this as feedback and not harsh criticism (I know there's comment section on the blog but I'd rather not give my e-mail to yet another website).

    > cat /etc/passwd | sed 's/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/'
Besides the needless cat this is clearer in awk, a tool you mention just prior.

    awk '{FS=":"} {print "user: " $1, "shell: " $NF}' /etc/passwd

    > $ ls | xargs -t ls -l
Beware files with whitespace in their names.

    > find . | grep azurerm | grep tf$ | xargs -n1 dirname | sed 's/^.\///'
    >
    > ...
    >
    > for each of those files, removes the leading dot and forward slash, leaving the final bare filename (eg somefile.tf)
No it doesn't it returns the names of parent directories where files ending in "tf" are found and where the path also includes the string "azurerm".

    > $ ls | grep Aug | xargs -IXXX mv XXX aug_folder
Ew.

   mv *Aug* aug_folder/
Though now the issue of whitespace in path names is mentioned. Another minior point here is that with GNU xargs at least when using -I -L 1 is implied, so the original example is equivalent to a loop.

    > $ ps -ef | grep VBoxHeadless | awk '{print $2}' | xargs kill -9

    pkill VBoxHeadless
You acknowledge to avoid SIGKILL if possible so I don't know why you put it in the example.

    > $ sudo updatedb
    > $ sudo locate cfg | grep \.cfg$

    locate '*.cfg'
No idea why sudo is used when running locate, also it takes a glob so the grep here can be avoided.

    > $ grep ^.https doc.md | sed 's/^.(h[^])]).*/open \1/' | sh
Now this is just nutty.

    > This example looks for https links at the start of lines in the doc.md file
No it doesn't, it matches the start of the line, followed by a character, then "https".

The sed then matches start of the line, followed by a character, starts a capture group of the letter "h" followed by a single character that is not "]" or ")", closes the capture group and continues to match anything.

The example given will always result in "open ht" for any hit.

Then there's the piping to sh. You mention like to drop this to review the list. I'd suggest focusing on extracting the list of links then pipe to xargs open. If you get a pipeline wrong you could unintentionally blast something into a shell executing anything which might be very bad.

    > Give Me All The Output With 2>&1

    &> stdout-and-stderr
But each to their own.

    > env | grep -w PATH | tr ':' '\n'

   echo $PATH | tr ':' '\n'
or

    tr ':' '\n' <<< $PATH
Edit: I know the formatting of this comment is terrible, I think quoting the sed example caused the rest of the comment to be marked up as italics so I indented it along with the other quotes as code.


Thanks for taking the time to explain all of these cases.

While I was reading I noticed some of them being weird but your insight was really helpful.


No mention of my pet peeve! You don't need a subshell/seq to count.

The shell is smart, it can count for you - and even respect digits. These examples are subtle, but different in result:

    for i in {00..10} ; do echo $i ; done
    for i in {0..10} ; do echo $i ; done


I have written thousands locs in bash but every time I have to write another one my teeth start to ache. Not sure why


After almost 30 years and tens of thousands of LOC, possibly even into 6 figures, I've now thrown in the towel and just use Python for all my scripting needs. I've yet to have a single shell script that hasn't turned into a maintenance nightmare, regardless of "bash best practices" (such as they are).

Bash is a terrible language, and needs to die.


I've occasionally wondered why Python isn't the default scripting language on *nix by now. It's almost always installed by default, is more capable than bash, is dynamic/interpreted, has a repl, and seemingly everything else you would want in a Bash replacement. Is it just Bash's momentum, or something else that keeps bash hanging around?


Please show me how to do `ls | grep -i old` in Python?

For a short while I was following https://github.com/matz/streem (from the original author of Ruby) with interest in possibly being adapted to superseed Bash. I have no idea how realistic that actually is.

I also use fish a lot, but it has a lot of it's own problems.

The thing about all these "shells" that really seperates them from Python (et all) is that they lookup commands from PATH and execute them directly. Anything else is simply not a shell language no matter how great it is at scripting.


> Please show me how to do `ls | grep -i old` in Python?

  for filename in os.listdir(os.curdir):
    if 'old' in filename.lower():
      print(filename)
One of the advantages of using a language like Python is you don't need to set up a lengthy pipeline of program executions just to reformat program output. In my personal experience using Python for basic automation scripting, I just haven't really come across any need for me to pipe output from one command into another.

The problem I have with a lot of shells is that their variable syntax is almost completely broken if you want to do anything slightly harder than trivial. A recent example I had was trying to write a script that reads a shell and executes the last line with some extra arguments. So that should be `PROG=$(tail -n1 commands.sh); creduce-script "$PROG"`, nothing hard about that... except the last line was `clang "-DFOO= " bar.c`.


I'm not arguing that something like Python is nice for when I'm writing scripts out and saving/committing them.

A shell is a REPL designed to use quickly and easily for one-off jobs. Shell scripts are also sometimes just simpler to string together than referencing a whole new language.


You probably wouldn't do that in Python because Python isn't a shell. It was never designed to write mini programs in 5 seconds as a way to interact with the computer.

You would write a script with os.listdir and 'if "old" in path' if you wanted to automate it.

If you were doing things interactively in a python minded OS, you might either have something much like shell is today(Maybe very stripped down without a focus on programming features beyond glob, expansion, and pipes), or you would have python library of misc utils.

Maybe you would do

x=ls() search(x, "old")

But that would be rather slow. Maybe you'd have some completely new model of text based interaction that was more menu driven and looked a lot like VS Code's command palette.

You'd press tab at the beginning of a line(Arbitrary easy to reach shortcut), and get your command pallet. Start typing and fuzzy search happens.

Select "Search filenames", and maybe that's a common enough snippet that it would be built in. Now you have that whole script, and your cursor is already in the search term field.

Press tab there. You get a menu of recently set variables, recently used short strings, etc, that the heuristic might want you to have.

With better UI we could afford a more explicit traditional language syntax.

Both scripting and shell could be a lot better.


How about Groovy?

("ls".execute() | "grep -i old".execute()).waitForProcessOutput(System.out, System.out)

It can also be written as

"ls".execute().or("grep -i old".execute()).waitForProcessOutput(System.out, System.out)

It's not exactly the same because it uses a thread and a buffer instead of an OS pipe but I guess it's close enough.


I wonder why Groovy decided to map "or" to the shell's "|". Wouldn't it have made more sense to map "or" to the shell's "||" and "pipe" to the shell's "|"?


I think this is just how operator overloading works in Groovy: https://groovy-lang.org/operators.html#Operator-Overloading

"Or" is the name of the "|" operator. From the looks of it it's impossible to overload "||" in groovy.

By the way the "|" operator and the "or" method both call "pipeTo" directly so a third way to write it would be:

"ls".execute().pipeTo("grep -i old".execute()).waitForProcessOutput(System.out, System.out)


Why should I need to explicitly state that I want to wait for my process output? I'm in a shell, I put a `&` afterwards if I want things in the background. The model is synchronous by default.


    ls | ? name -eq old

    ls | ? Name -match '^old$'
Any other /property/ of the file (datetime fileds, name, extension, path) can be matched woth a string representation or an object methods.

But yes, this isn't Python.


Common shell tasks (piping, process substitution, create a string with the output of a program, creating and redirecting output and input to files) are extremely cumbersome in python.


Because shell is really good at gluing together commands and has reasonably straightforward error management mechanisms

Contrast that with:

    subprocess.run("some_thing with-output")
    subprocess.run("the_next_thing")
which requires that the caller take steps to grab, check, and then re-emit any errors (or even just the "normal" output) produced by either of those, not to mention the horrors of output buffering for long-running processes and the ever-present encoding woes since subprocess interfaces as bytes not str

I guess it is the same situation with all programming -- diligent programmers can make it work, less disciplined users inflict pain upon all downstream users of poorly coded scripts


historically, this is exactly the role Perl has played. It's also why people keep repeating the nonsense that Perl looks like line noise. Because the only Perl code people are familiar with are throwaway scripts written by (usually) some system admin person who is absolutely not a developer.


And JAPH (Just Another Perl Hacker) type constructs that are meant to bend the language in weird ways and show off the person's knowledge of the language intricacies. They were a lot of fun, but likely gave a distorted idea of an average Perl program, to those unfamiliar with it.


I've also wanted a better shell scripting experience, and have bashed [no pun intended] my head against this several times. I think some of the major pain points that resist adoption of a language like Python or Ruby for in-line shell scripting or simple automation is:

* These languages prefer to operate on structured data. If it parses from json, you have a much easier time. But, most commands you'd invoke from the shell emit unstructured text by default. You can deal with this, but it's a pain point, and it means any serious scripting starts off with a "here's how you parse the output of <foo>", and several rounds of debugging.

* The shell's first and foremost usage is a user-facing interactive interface to the computer. Most of what you do is invoking other programs and seeing their output, and doing this is very easy. While python & ruby have REPLs, these are mostly focused on trying out language features or testing code, not invoking other programs, nor navigating a file tree. A lot of shell scripts start as 1-liners that grow and grow.

* Invoking other programs: In sh, invoking other programs is the primary activity, and it's relatively straightforward - you type the name of that program, press enter, and hope that it's on your path. In Python or Ruby, it requires importing the proper library, choosing the correct command, wrapping it and arguments, and ensuring everything escaped correctly. [Ruby _does_ have the backticks operator, which does actually make a lot of 1-off scripts easy, but this is not a panacea]

* In sh, a lot of the 'utility' programs like cut, sed, awk, grep, head, tail, etc., are standing in for the capabilities of the language itself. In pure Python or Ruby, you'd do these kinds of things with language built-ins. But, that's a learning curve, and perhaps a bit more error-prone than "| head".

* On top of all that, yes, momentum. If tomorrow you showed me a shell replacement for nix that was unambiguously* improved in every way, had excellent documentation, community support, and was actually pre-installed on every machine, it would still take a decade or more before it was really a default.

-----

I want it to happen, so I'd never discourage anyone from taking a swing. IMO, some of the top-level considerations that are necessary for making a successful sh alternative are:

* minimize the additional # of characters required to invoke a program with arguments, compared to bash.

* Decide which suite of typical utilities should actually be built-ins (i.e., things like cd, ls, cp, grep, curl), and make those standard library, built-in, without additional import or namespacing.

* Focus on an append-style workflow. Functional programing styles can kind of help here. Wrapping things in loops or blocks is a point of friction.

* An additional highly-desired feature which just isn't in sh by default, to overcome momentum. I have no idea what this would be. More reliability and better workflow are _nice_, but sh is sticky.


If you are producing lengthy bash scripts, you are probably using the wrong tool

Bash is good at being a programmable shell, and bad at being a poor programming language. It's great to be able to express complex operations in the command line, but once you start putting them in files you're gonna have headaches.


Define "lengthy"? I write bash scripts in the 300 line range regularly, and they are fine for my purpose. Why should I be using Python?


maybe micropython to save some memory and cycles, python for the larger scripts if need be

i'm surprised how bash survives.. I think it's part of a few traditions that are tied to *nix, just like sysvinit, or raw string pipes as IPC .. they'll soon fall in sequence




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: