Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Advanced Shell Scripting Techniques: Automating Complex Tasks with Bash (omid.dev)
78 points by omidfarhang 14 days ago | hide | past | favorite | 63 comments



Both `[` and `test` are built-ins (they are the same command). There's nothing in-efficient about using them. You might always use `[[` for consistency, but I personally use `if test` unless I need the functionality of `[[` since I find `if test` more legible and when I see `[[` it's a hint that, yes, I really do need its functionality.

https://www.gnu.org/software/bash/manual/html_node/Bourne-Sh...

All of recommendations about efficiency seem misguided to me.


> All of recommendations about efficiency seem misguided to me.

I've seldom had to worry about the performance of my scripts. It almost always is dominated by the performance of whatever the script is running. I did have one very notable exception, though. It's kind of embarrassing.

On one project---for "reasons"---I had written several bash scripts to collect system performance metrics to push into Nagios or Graphite. A couple of these had to calculate some floating point, and I was invoking `dc` to do that, as bash only supports integer arithmetic. The script would loop once every 5 seconds or so, iirc, invoking dc each time. One small process every 5 seconds doesn't sound like a lot, but the systems also ran ClamAV---again, for "reasons"---and were already under high load, and that extra process invocation had a noticeable and significant impact on the system.

My workaround was to run a single instance of dc using bash's `coproc` keyword, then redirect all the calculations through the returned descriptors. This had a drastic improvement.

Except... it turns out that dc leaked 80 bytes of memory whenever it read and parsed a float. So, the long-running dc process would eventually get oom-killed. So, I added a check to restart the dc process every 5000 iterations, or something like that.

Which all sounds like a big headache, but for "reasons" this was far easier than getting some other solution into that environment.

I did submit a bug report (2020 Feb 19), and Ken Pizzini (dc's original author, I think) was kind enough to reply even though he wasn't the maintainer. Unfortunately, my patch never got upstreamed.

Then again, who else is crazy enough to use dc as long-running math server?


Note that you've almost certainly spent your time correcting "LLM slop" - https://news.ycombinator.com/item?id=40723766


I do not use bash for scripting^1 but I prefer to use "if test" rather than "if [". Same reason: legibility.

However I continue to see shell scripts from what I believe are probably competent scripters, i.e., scripts in tarballs of software from authors I trust, and they almost always seem to prefer using "[". Wondering if there is a reason or it is purely style.

1. Bash is too big and slow for scripting. The bash author admits it is "too big and slow" in the manpage. The world's most popular Linux distribution does not use bash as its scripting shell; it uses something smaller and faster derived from NetBSD Almquist shell. For scripting, there are more efficient shells for scripting than bash.

NB. Nothing against bash. As an experiment I have been using busybox bash as an interactive shell. Normally I use NetBSD ash. I like the way busybox bash does not keep duplicates in history and its command search is quick. But like its full-sized counterpart it is too slow for scripting. Using busybox to do scripting is so slow that it is bascially useless for anything but the smallest scripts.


FWIW, in a tight for loop of a million executions of either test 1, [[ 1 ]], true, or :, I get:

test 1 -> 2.70s true -> 2.55s : -> 2.25s [[ 1 ]] -> 2.10s

Like, I don't even have a way to measure the overhead of just the for loop as [[ is faster than that.

But we can do something and see the cost of that: let's do a -z $a when a=. We don't need to quote the $a with [[, but I'll do it both ways:

test -z "$a" -> 3.25s [[ -z "$a" ]] -> 2.65s [[ -z $a ]] -> 2.10s

So, sure: it isn't a performance difference that is likely to kill a use case, but it does exist. I also agree that it isn't necessarily worth bothering with... but I just can't get behind something -- using test -- that is both slower and more complex... like, to me, it isn't [[ that has "functionality" that I might not need, it is test: with test I can programmatically decide operators by putting them in variables, while with [[ everything must be syntax. This is an incredible amount of extra power that you get with test that I frankly don't just not need... I actively would prefer not having except in a scant handful of scenarios. It is much easier to glance at a use of [[ and know 100% for sure what it is doing than with test, where you must carefully consider all of the quoting rules, and even then the behavior might be dynamic.


Now what about my own probably unjustified bash performance I mean style habit, using (( )) instead of [[ ]] whenever the arguments are all numerical.

I'll even use $# or ${#a} in place of -z sometimes if it means being able to keep the whole test in a single (()) instead of needing a seperate [[]].

(also [[]] does not need -z. Just [[ $a ]] is the same.)


I always prefer `if test` too, and a line break instead of semicolon:

    if test 1 -gt 2
    then echo 123
    fi

    if test 1 -gt 2
    then
        echo 123
    fi
Much cleaner than the random symbol fiesta


This is a pretty suspect list.

"Built-in commands" showcases [ ] vs [[ ]]. This is not an efficiency win, but a matter of using older POSIX syntax vs more modern and richer bash syntax. The utility of the bash syntax is that it supports nicer operators, such as < > in addition to stuff like -gt / -lt.

"Efficient File Operations" showcases IFS=, which is not used for the sake of efficiency, but to change how the line is tokenized. (And what good is an efficiency claim that's not backed up by any data?)

I did not look too closely, but I suspect the rest of this list is equally eyebrow-raising.


You probably have the `printf` in your PATH. However, if you type `printf` on a bash script, a builtin will be used instead.

Try a small benchmark to see the results of /usr/bin/printf vs printf. It really makes a difference (not invoking an external program is a huge performance win).

The author is indeed confused though. `test` for example, is also a builtin. You can have an empty PATH= and still invoke printf, test, echo and some few others.

The list is weird.


Type `help` in your shell to see a list of all the builtins.


In _some_ shells! (most user shells anyway). `dash` for example does not have help.

Some form of builtin list is always on the man pages though.

There's no good way of telling if a builtin exists programatically (for, let's say, feature detection).

There's an almost good way with `{ PATH=; command -v isthisreallife; }` and some shells will provide a `builtin isthisreallife` that you can use without having to nuke the PATH variable.


I appreciate pedantry as much as the next person, but I think it's safe to say that if you're using a shell without support for `help` you can make your way without it.

As a more practical suggestion, I find functionality detection is much less important since I learned to disavow all knowledge of a script when sharing it


> There's no good way of telling if a builtin exists programatically (for, let's say, feature detection).

Doesn’t the `type` command (technically a builtin) meet your requirements?


It kinda does, and it is portable enough (modern bash zsh ksh dash busybox), thanks!

Of the mainstream packaged shells, it fails only on `posh`, but that's fine, posh is super austere, it also lacks `command`.

However, I will probably still use `command -v` for sporadic uses. That's because I don't need a subshell to capture the output:

    OLDPATH=$PATH;PATH=
    if command -v echo
    then echo is builtin
    fi
    PATH=$OLDPATH

    if test "$(type echo)" = "echo is a shell builtin"
    then echo is a builtin
    fi
`type` however seems to be able to pick up more stuff, so I might use it some cases like this:

    case $(type echo) in
        'echo is a shell builtin') do_something;;
        'echo is aliased'*) do_something_else;;
        'echo is /'*) do_something_else_entirely;;
    esac
It kinda fits an early feature detection that I could put in a more sensible place (my lib init) and pay the subshell cost only once.


Both of those examples stood out to me also. All the examples could use explanation.

This reads like someone's personal cheat sheet more than an informative post.

There are some good suggestions but I'd really like to know why the author has these opinions.


Maybe the blog writer put it up for their own reference, but personally I like learning the ‘whys’.

I think I will just stick with Posix KSH.


KSH has just as many extensions beyond posix as Bash does.


The blog writer is ChatGPT


Was this article written by AI? It's a strange mix of really basic stuff and true advanced suggestions, jumbled together with no apparent organization scheme.

Then the "complex tasks" are just a couple basic system administration examples that I can't figure out the purpose of either, they don't show off most of the tips mentioned above and don't really help inspire a sense of power for shell scripts.


> a strange mix of really basic stuff and true advanced suggestions, jumbled together with no apparent organization scheme

Sounds very bash-ish.


Plus none of this "advanced" garbage is handling quoting properly. ShellCheck would light it up like Christmas.


I really don't understand the "User Management" example.

Rather than typing `useradd foo`, you can type `user.sh add foo`! So much easier?


If I look at the other posts, yeah it feels very LLM-ish

https://omid.dev/posts/

They are good at generating code with some stilted prose around it

Also yeah the cadence and number and topics of the posts makes me suspicious -- I don't a theme or purpose, just a bunch of "content"


I'm 110% certain it's ChatGPT based on the other posts. If you look at the current top three latest posts:

https://omid.dev/2024/06/19/advanced-shell-scripting-techniq...

https://omid.dev/2024/06/19/advanced-angular-change-detectio...

https://omid.dev/2024/06/18/haskell-monads-functors-and-appl...

They all end with: "Happy coding!", "Happy scripting!", and "Happy Haskell coding!". For some reason, when you ask ChatGPT for a blog post about coding, it'll always end with "Happy coding/scripting".

I run a similar blog, and there's a lot more massaging I have to do post-query to get it into a more "human-like" state (removing any match to "Happy%" as last line, expanding certain sections, etc). Also, all of the blog posts are about the same length - which is a dead giveaway if you don't ask it for a variable length.

The script I wrote that does the query changes topics/categories at random, and assigns a random length between 2000 to 4000 words, then it does paragraph length detection and re-queries when a paragraph is not-long-enough. Does a few more things like updating the Front matter and adding some times to the Front matter (this linked blog also runs on Hugo with the hugo-coder theme by the looks of it lol) before outputting to the markdown file.


Yup, so the question is then why does this story have 56 points?

I thought Hacker News readers were better than that ...

"Slop" is indeed taking over the web, even Hacker News.


That's a very good question. I don't post mine because I just use it to drive adsense traffic - and because it's disingenuous to post autogen stuff and pass it off as your own.

I honestly would have thought readers here would have been better at picking this sort of stuff out.


Also quite a few of the explanations are flat out false.


Reminder of the handy ShellCheck:

* https://www.shellcheck.net

Even if you don't follow or agree with its advice, it can be a handy and quick second opinion / sanity check.


Integrated into IDEA (and other IDE's)... so it checks as you type.


Agreed, so useful if your whole identity isn’t bash.


Definitely just as useful even if it is!


I’ll throw out a recommendation for Pro Bash Programming as a great book for advanced scripting techniques. I just received my copy of the third edition (which is now simply Pro Bash). The original author and his co-author also wrote Shell Scripting Recipes. I bought the books because I recognized the original author from Unix.com forums, where he wrote about how his code would be more efficient than another example when deployed over N number of instances. I had no ability to verify at the time, but his confidence made me believe in the “Pro” title. It was way over my head until I became fairly proficient through other books. Definitely would recommend. No affiliation.


If you start to think about optimising the performance of your bash scripts it might be time to rewrite in another language


Published in 2003 in sysadmin magazine:

https://www.kozubik.com/published/self_parsing_shell_scripts...

Backups, system monitoring (same df example) …

I will charitably conclude that great minds think alike!


So that is where this ChatGPT generated article got its data!


The very first "technique" is odd and based on a false assumption. Bash doesn't call /bin/[ binary when evaluating expressions. It uses its own builtin implementation instead.


> Using set -e ensures that your script exits immediately if any command fails, preventing cascading errors.

There's a huge asterisk around using -e

https://www.in-ulm.de/~mascheck/various/set-e/ contains a matrix of the different shells (and versions) and how they handle different conditional orders with -e


I would add `shift` and `command shift` to that matrix.


Both of the examples given in the "avoid subshells" section use subshells. There are probably more problems, but I stopped there.


I've written an article sharing how I use Bash to do all sorts of things. I love Bash (even though I use python a lot too now) :)

https://www.dsebastien.net/2020-05-09-bash-aliases-are-aweso...


I have a thing that I only wrote in bash as a stunt to see if I could. It interacts with a hardware device over rs232, and the protocol is all polling and timeouts. So it needs to sleep for short durations many bezillion times, and bash has no built-in sleep like ksh does, and I don't want the script to rely on plugins. So....

  _sleep () {
   read -t ${1:-1} -u 4 ;:
  }

  readonly sleep_fifo="${XDG_RUNTIME_DIR:-/tmp}/.${0//\//_}.${LOGNAME:-$$}.sleep.fifo"
  [[ -p ${sleep_fifo} ]] || mkfifo -m 600 "${sleep_fifo}" || abrt "Error creating \"${sleep_fifo}\""
  exec 4<>${sleep_fifo}


I’ve found my comfort zone writing my commonly used CLI tasks into a single Go binary as subcommands. It’s a departure from shell scripting but it’s been worth it so far to have the entire suite of CLI building libraries, rich loggers and sane syntax.

If there’s some new task I need then I add a subcommand then `go install`. Fairly quick process end to end.

For example I wrote a `ghcd <org>/<repo>` command to clone GitHub repos or `cd` to them if they already exist. Could’ve been a pure shell script but I enjoyed clobbering cobra, charm bracelet logger and go-got together and I’ve got the foundation down to quickly build more tasks.


How does defining `IFS` here make the second statement more efficient than the first?

```

# Inefficient while read -r line; do echo "$line" done < file.txt

# Efficient while IFS= read -r line; do echo "$line" done < file.txt

```

https://omid.dev/2024/06/19/advanced-shell-scripting-techniq...


Too late to edit, but serious question if anyone actually knows.

While the tricks are nice, we also have the option of jumping to something like http://www.nushell.sh/ (or one of the other options) for an overall more sane environment.


My pet peeve, grep unnecessarily followed by awk/sed.

Original: df -h | grep "$partition" | awk '{print $5}' | sed 's/%//'

Efficient: df -h | grep -Po "\d+(?=%\s+$partition)"


Original is pretty readable. Efficient looks like what I get when I accidentally get shifted over to the right by one column.


FWIW, on ubuntu 22.04, your "Efficient" doesn't work

    # df -h | grep "$partition" | awk '{print $5}' | sed 's/%//'
    24
    # df -h | grep -Po "\d+(?=%\s+$partition)"
    #


Easiest to read is what I usually try optimize for, especially in a shell script.

I think the first one is much easier to read.


To be fair, if I needed something optimal or it was used often enough to matter, I'd probably reach for the original data in a real language. For a one-off, I can tell what grep/awk/sed does immediately - but I need to stop and think for the efficient solution.


Shouldn't the first one be `grep -F/--fixed-strings "${partition}"? The second example will break in any case where $partition contains special characters.


Yes, it's an easy fix though:

    df -h | grep -Po "\d+(?=%\s+\Q$partition\E)"


Oh cool, this is the first time I've heard of `grep \Q..\E'

https://stackoverflow.com/questions/54405892/are-q-and-e-sup...

Thanks!

At some point I should probably write an article about the 20k lines of bash and a little python that power my homelab and various automations. Bash isn't perfect but often 99% good enough is fine.


I would say the first way allows you to learn more tools that you can write one line CLIs with without digging through grep’s man page.


So Omid Farhang is at best an AI blogspammer, at worst a plagiarist.

So this is overkill for many other people, but you can also write scripts in Rust. I’m authoring a Cloud Natuve Buildpack (CNB) and my team picked Rust even though all prior proprietary-spec Buildpacks save one are written in Bash.

Is it actually overkill though? Well, the classic buildpack I maintain is older than 13 years and maintenance is by far the most expensive part. Scripts get kind of ignored but deserve the time, testing, and ecosystem for extension that other “mainstream” languages enjoy. We get that from Rust along with a strong type system, ability to cross compile to different systems, deploy and run self contained binaries (the scripts).

It’s not for everyone, but just putting it out there in a chance to find some like minded weirdos, maybe we can help each other.

A lot of my (in progress) library work has been trying to improve the ecosystem around these kinds of workflows. Examples: https://crates.io/users/schneems

I don’t think rust should replace all scripting (of course) but I really want to stress test just how good of an experience it could be.


This sounds like slop. Why would anyone want to script in Rust when bash and python exist?


I replied to another comment with some details.


Scripts generally refer to interpreted languages. They're useful to just adhoc do something with a low effort cost.

Purpose built binaries are extremely useful and should replace a ton of script usage, however, scripts will always have their place.


When people say “scripts” many time it’s right after “deploy” which is my target area. I’m not trying to take away from the bash conversation, I’m trying to find allies who care about writing the best possible “scripts” (and script adjacent programs) in one or more languages.

One of the most recent things I worked on was a hybrid of bash and rust. So I still write a fair share of bash.

> scripts will always have their place

100% agree, hence all the caveats. I resonated with the article in that it was talking about ways to make your code faster, better, etc.

It’s my experience that few aim to write a suite of bash scripts that stay in production for a decade+. It just grows that way because it was the lowest cost, lowest effort thing to do. To your point.

I think content like this is a really good first step, but what about a second step when it’s no longer enough or you realize this adhoc “temporary fix” utility is now the linchpin of your team?

One idea is to rewrite in another language. If not the whole thing, a subsection. I tried Ruby scripting (Ruby dev since 2006) and while I feel it’s better than bash (to maintain) it still has some strong downsides like the need to bootstrap with an interpreter. Which eats into the upsides.

Bash and Ruby struggle with sharing or extending functionality. Bash relies on system commands and Ruby relies on gems/libraries. And in a lot of these adhoc weird situations I find myself in I cannot install packages let alone bundle install libraries. Then you mix in interoperability of gnu versus bsd tools and throw some windows in the mix and it can get really unwieldy really fast.

On the deploy front here’s the CNB Rust project https://github.com/heroku/libcnb.rs

And docs for people who might want to use (but not author) them https://github.com/heroku/buildpacks.


I use rust-script + run_cmd (for external commands). Rust-script recompiles the script when it has been changed, otherwise it just runs it.


Awesome! I've seen several command runner variations but haven't looked too closely at them. How do you like `run_cmd`?

I've got a crate for running commands that helps with the run ergonomics as well as error formatting but still allows for relatively-"ideomatic" rust usage. I would love to hear your feedback https://docs.rs/fun_run/latest/fun_run/


run_cmd uses macros so you can just type shell commands directly, but it does make empty strings disappear ("").

I'd rather not split up arguments, but to each their own.

Either way, it's great to do some shell things to speed up coding, and Rust things to speed up the program safely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: