Hacker News new | past | comments | ask | show | jobs | submit login
How to write idempotent Bash scripts (arslan.io)
654 points by kiyanwang 10 days ago | hide | past | web | favorite | 189 comments

It’s good to know which command line tools are idempotent, but I think once you start running automated scripts repeatedly you’re better off with something like Chef where you declaratively define the end state and let chef figure out how to converge upon that state. Taking the “create a directory” example, you would say “I want to end up in a state where this directory is created and has these permissions” and chef will determine if the directory already exists.

There are of some downsides to that approach which stem from the fact that you can’t possibly declare every possible thing. For instance, if you were to run the above declaration and then later remove the declaration, chef wouldn’t know to delete the directory; at that point chef wouldn’t know anything about the directory at all. So ideally you can build infrastructure from scratch each time (Docker, etc) rather than having to converge an existing machine.

Absolutely this. When the intent is to get the system into a desired state, reach for Puppet/Salt/Ansible.

If you are doing sysop^Wdevops work it is the single most important thing you can learn, once you have a good grasp of your shell and your editor.

The difference between a giant git archive of shell scripts that various people have modified over the years and state changes described in a configuation management language is the difference between fixing things in the small and reasoning about integrated systems. It's something that needs to be experienced to be appreciated.

Reasoning about system state as an integrated whole is just as relevant when you are shipping applications as containers, if not more so. It's not uncommon to start using something like Kubernetes without first being able to describe global state and the result is just as messy as before, if not worse. Something like Helm is impossible to understand unless you have complete control over your configuration.

Is it not a bit of a large jump from "shell script" to "chef"? There is some inbetween.

I understand a shell script is not suitable for orchestrating a datacenter, but sometimes it is suitable for installing a couple of things.

Ansible is good for small or large installs.

100% Agree. I originally got into Puppet after spending several months with a client that wanted me to write bash scripts to push out a major middleware deployment, switchover, and uninstall to 10,000+ stores. There was no standard for the hardware in the stores, and their configuration was all over the place because 100's of teams also used (non-idempotent) shell scripts to push changes.

I learned a lot about catching all the different bad things that can happen, adding prerequisite checks, rolling back changes automatically, etc. In the end, a tool like Puppet/Chef/Ansible would have taken a fraction of the time to develop scripts for.

Containers have definitely lessened the need for mutable configuration management, but they are still useful in the 90% of environments that are still working in a past decade.

Chef does have an action on their primitives, so it is possible in the directory example to set the action to :delete instead of only implying :create

You’re right, it’s totally functional, the (minor) challenge I see with it is that it breaks the illusion that you’re describing the state of the system, and feels more like describing a set of changes you want to make.

Chef isn't magic, it just uses a lot of cookbooks which are just running things to check what the state is.

Usually those cookbooks will want to have things in a particular way, which might not be the way you want.

Chef/Ansible/Puppet all have the problem of having many layers of overhead for the same thing.

This makes them slow, hard to debug, hard to read and even write (in my opinion).

Sure, bash is hell, but the go from bash to something like Rust, instead of Chef/Ansible.

Sure, again, this might sound rather unconventional, but you get safety, speed and convenience of a full fledged programming language, single binary output, etc. And you need to test cookbooks/playbooks too, so why spend enormous effort on cobbling together scripts and high-level abstractions instead of writing just what you need with the assumptions you really have - instead of what a general cookbook/playbook has to have (which is a vast difference).

Rust doesn't get you anything resembling what you want here.

Chef/Ansible/Salt could be described as rudimentary type systems for operating system states, that enable you to convert between different types/states.

It's conceivable you could build such a system in Rust, but Rust itself has no primitives that would make it easy. It certainly isn't a good starting point when you want to set up a database server.

All Config Management systems execute other programs, parse their output, check/parse files underneath their fancy skin.

I've used both Chef and Ansible. Their maintenance cost is pretty high unfortunately, and they are not that flexible, nor resilient to worth it.

No wonder the container/k8s boom is so big. Immutable infrastructure (docker images) with really powerful operational features is what can justify high maintenance efforts. Whereas fancy but brittle (due to the inherent problem of state discovery) CM systems are at best only useful for the initial platform setup.

Rust is just useful because you can quickly and relatively safely produce single binary programs. (Many people use go for this, but it's easier to skip error handling in go which result in runtime problems, which is very inconvenient in a provisioning/setup system.)

Nix/guix might be a better approach. Ansible and chef describe part of the desired state of a system and require a very large definition to cover the whole system. nix aims to track the whole system state and allow you to supply changes which again combine to another whole system state

Yes. Though I'd probably want to simply use something "trusted", either Ubuntu LTS or CentOS/RHEL, keep the installed packages to a minimum, use a local repository mirror proxy, track package changes there, etc.

And the image build should be just a simple imperative install these packages, use this config, run this command on invocation.

Nix is rather amazing with its powerful CLI stuff it provides (S3 compatible dependency store, fetching via SSH, closures, etc).

My only problem with NixOS is that it's very much like Gentoo. It has infinite composability built-in, but it means you have to rebuild everything. In Debian/Ubuntu land you usually can simply enable/disable install/uninstall specific feature related packages. (For example postfix and postfix-mysql packages.)

Writing a shell script in rust would be obnoxiously difficult for no real benefit.

Yeah, and tools like Chef are 99% blocked on disk operations anyways, so I have no idea why GP thinks Rust would help

CM systems spend a lot of time on bootstrapping, creating their workspace, downloading the scripts, parsing them, etc.

Then they simply execute other programs to then parse their output. And/or fiddle with files (parse them, alter them, write them).

Sure fundamentally the syscalls and apt/dnf/yum will be the slow parts, but I found that development of CM scripts/plays/recipes are usually bottlenecked on the turnaround time of the CM system's own workflow. And execution time is a significant part. (The bootstrap, the transfer of whatever files, and so on.)

Rust would help with writing things that are relatively well error handled at compile time, and gives you single binaries.

Why do we care about speed here? Surely reliability and readability are paramount features...

Development speed is important. You can use the most reliable thing in the world if it takes a day to test/build/deploy.

And simple imperative script is a lot more readable than a custom DSL with who knows what ruby hooks.

Currently the CM system that makes sense on the long run is a git repo for Terraform. (Because everything else runs in containers anyway, and to set up the immutable images you don't need "state management". And where you need it you need active state management such as k8s and its specific operators.)

It's about functionality and maintainability, not performance. If you have complicated scripts then they start to resemble actual programs, at which point you might be better off using a real programming language to code.

I wouldn't pick Rust for this but there are lots of examples of using Node/JS, or C# or Python. It's a much nicer environment and benefits from full IDE support.

For a good example, look at what Pulumi is doing for modern infrastructure: https://www.pulumi.com/

Rust would help due to the type system and lack of library dependencies (it does static linking AFAIK), I suppose.

I feel like my comment went in one eyeball and out the other.

Chef and friends are blocked on disk I/O. How does a typesystem and/or thinner abstraction layer to disk I/O speed up the underlying expensive operation: blocking disk I/O?

I - on the other hand, feel like your comment somehow forgot that the op said “safety, speed and convenience”, so you attacked the least important point made in the first place.

It’s much easier to see the failure conditions in a Rust program rather than in bash. Also rust seems like easier to maintain, too.

How does having a borrow checker and snazzy memory safety benefit you when nearly all the operations you're performing are disk I/O?

How is a 100 lines of rust easier to maintain than 10 lines of shell script?

It's exactly what the other commnet said: "Writing a shell script in rust would be obnoxiously difficult for no real benefit."

Well, for once Rust's type system can enforce the usage of objects (e.g. temporary file must be used or deleted).

>> How is a 100 lines of rust easier to maintain than 10 lines of shell script?

In 10 lines of _proper_ bash you won't be able to even check if the arguments supplied to the script even exist, let alone parse something more than simply subscripting argv[].

Rust would be a step up from writing bash scripts of yore—imagine, being able to tell the difference between a variable being empty vs unset—but really any non-stringly-typed language will get you that, and the GC-based ones are unlikely to have memory errors and are generally fast enough for this purpose.

I don’t think there would be anything wrong with a rust implementation of infrastructure management, but I also don’t think it’s the silver bullet to solve what plagues the space.

They didn't claim Chef was magic, and yeah it's just a bunch of cookbooks, but that doesn't change their point that it's declarative and useful for describing an end-state. Chef is hard and slow if you rely solely on others' work without understanding what it's doing, which is functionally equivalent to magic. But if you know how to tune things, keep your cookbooks down to only required ones, and a few other tricks you'll learn from usage, it can be fast as anything. I've seen one particular setup meant to keep a server up to date and configured to have HAProxy serve over 100+ endpoints run in under a minute. It's possible, just a different approach.

I loved Chef after we got familiar with it. It was nice, cute, had solutions for a lot of problems. (foodcritic, kitchen, rubocop, etc.)

But it simply is a solution in search of a problem in today's container orchestrator/platform world.

HAproxy is amazing, but a k8s Ingress service [0] has a nice API, so I don't have to run Chef to add a new vhost. Yet I can persist the config in a YAML. (Or json, or whatever.)

[0] which can of course be backed by HAproxy down below, but traefik has dynamic config; though haproxy2 will do that too


Yeah, I find out-of-the-box cookbooks about as helpful as out-of-the-box docker images: useful as a starting place but to leverage chef you really need to write your own. Chef resources/providers are nicely composable and the primitives are pretty easy to rewrite if you want, my main point is that chef gets you to think in terms of convergence instead of brittle imperative changes.

You could rewrite your bash scripts in rust but while it would probably be safer/more correct than bash, the whole paradigm of writing imperative scripts to manipulate existing machines is flawed, whichever language you write it in. And executor performance is almost never the bottleneck in my experience.

My comment tried to argue that Chef's convergence is nothing more than a well hidden imperative thing.

I agree that manipulating existing machines is folly, but that's why people use containers, and immutable images. And then you can prepare the images anyway you like, but since they are very deterministic (modulo the external repositories/packages/curl-bash-piped-scripts), you usually use something simple (eg bash), because there's no need for that state management.

And actual state management should be left for the platform (k8s, swarm, or some cluster manager).

I found the chef workflow slow, not just the executor itself. (Uploading cookbooks, fiddling with dependencies, testing them, debugging them, etc.)

Ansible was even worse in my experience. Much slower execution, harder to debug, opaque python blobs, extremely confusing and fragile "YAML programming" coupled with hostile variables files. (All the things that are tenfold more intuitive in Chef.)

Fair enough, most of my experience is with a sort of homegrown chef-solo. It has served us pretty well for almost ten years but these days we're in the process of moving to kubernetes + immutable infrastructure.

Many of the comments in this thread seem to be sucked into a false dichotomy. There are better options than _either_ Rust or Bash for writing complicated shell scripts. Perl/Python are more suited to the task than either. The challenge of deciding what degree of complexity warrants the investment of more complicated language is a design decision left to the developer to figure out. Any thought given to designing more resilient software is a good thing, but I can't help thinking that expecting any real degree of safety and security from bash is expecting too much. I don't work in devOps mind you, but whenever I have to write scripts that do anything more dangerous than creating files I leave bash on the shelf.

My experience is that going from bash to something without a compiler has an overhead that rarely pays off. mypy or TS can probably do the job well, but at that point Rust gives you a better deployment story.

Don't get me wrong, I'm absolutely an advocate for statically typed languages. Most of the work I'm doing with bash scripts involves command invocation, piping, and filesystem I/O. Which are things that I've never seen compiled, statically typed languages do particularly better than Perl/Python. I've never done this kind of programming in Rust, but my experience with Rust suggests to me that it's not any easier or more robust to do this work in Rust than it is in Java or C++. Mind you, you might have a completely different experience depending on the task at hand.

Many people who write scripts do not factor the environment they are run in and miss including traps to handle events like being terminated mid-way thru and cleaning up to a sanitized state. After all, creating a file, you can't presume their is enough free space, but people do in scripts all the time.

Then you have scripts that you want run once and if run again not have any more effect. FOr example a script to be run by ops on production servers, what happens if they accidently run it twice! An easy way to cover that is to have the script create a file, which you check at the start of the script and if present you exit. For this you can use the date/time, PID and script name for a unique file name, create in a /tmp directory that you cleanup weekly via sculker or whatever frequency you require. Handy way to handle scripts that you want run successfully once and never to be run again.

I have not been doing DevOps for very long, but a useful trick I've found is to create a staging directory, perform my work there, diff the result with the existing state, and copy if a difference exists. I've found it very easy to clean up to a sanitized state in this way; the existing state is safe, since all I have to do is just destroy my work.

If possible, I’ll try to make a symlink just point to the staging dir after tests complete. This isn’t always possible, but makes it easy to roll back to the old prod folder if anything goes awry.

You might find rsync cleaner to use than a diff/cp loop.

Another approach is to create a unique directory (named by hash or date) with the results, and once it's good, symlink the real directory to it. (Full disclosure, I stole this approach from nextflow.io, a great workflow system aimed at bioinformatics).

    ln -sfn "build-cache/2019-07-07-21:23:04" "build"

Apple's new-ish APFS filesystem includes something for this pattern: the renamex_np(2) system call with the RENAME_SWAP flag. This atomically swaps two files, and can be used to atomically swap your staging directory with the existing directory. I'm not sure if there's command line support yet, though.

Linux has something similar with RENAME_EXCHANGE. Unfortunately MV(1) doesn't appear to have support for any of the renameat2(2) flags.

What do you use to compute the diff? Just the diff command itself on individual files?

diff -R

You can use numerous tools to compare/manage branches; md5sum / shasum (md5 is usually useful, though not entirely safe), diff and kin, vimdiff, rsync, fdupes, jdupes, git, hg.

I went the „git” way once and I found it really cool! With all the goodies you get for free from git, like applying patches, checking differences, resetting to previous state etc.

Same, but with mercurial.

Could use something like rsync to automatically check for a difference and copy

> traps to handle events like being terminated mid-way thru and cleaning up to a sanitized state

Traps are underused. They solve a lot of error-handling and cleanup problems really easily!

Here[1] is a simple example of using a trap to cleanup a tempfile, that safely handles pipeline errors and unexpected exit/return. The same basic idea works for many kinds of error handling, cleanup, or similar work.

If you ever need to cleanup something at the end of a function

    foo() {
        # ...
it using a trap might be as simple as moving the cleanup command to trap at the earliest possible time in the function the command can run:

    foo() {
        trap "cleanup_command" RETURN
        # ...
[1] https://gist.github.com/pdkl95/61fc242e7961cc2584a787ed1760c...

Intriguing. I always trap before setting up, on the basis that a script could get interrupted before setup is completed. Is there any general agreement on what the correct order should be?

It probably depends on if the cleanup is safe to run without its setup. For "rm -f", that's true so the trap probably should be first in that case. If you have to worry about strict dependency ordering, trapping first may not be possible.

Of course, the better solution to those situations is to make the cleanup command idempotent ...

(that said, any order is much better than the unfortunately-common style of simply ignoring errors and exceptions.)

> An easy way to cover that is to have the script create a file, which you check at the start of the script and if present you exit.

This does sound a good solution on paper, but "checking at the start" and "creating a file" are two different steps (i.e. not atomic) and will cause trouble eventually if your system has the tendency to run the script twice. A better solution is to use the `flock` command before creating the flag file. For example,

  exec 999>/var/lock/my.lock
  flock -n 999 || exit 1
  if [ ! -f flag_file ]; then
    echo "script not ran before, running"
    touch flag_file
    echo " already ran, exiting"
    exit 1
  # do stuff here

Within a shell script context, 'mktemp' and 'mv' are almost always what you want, possibly 'ln -s' for libraries.

The first safely creates a temporary file. The second (with noclobber set or -n argument) will atomically rename a file, but not overwrite existing contents.

Linux uses symlinks to system library files in order to allow in-place atomic replavement without affecting running processes with open filehandles to earlier versions. MS Window's lack of this feature is what makes (or made, I'm very out of date) malware removal and system updates so painful.

>Linux uses symlinks to system library files in order to allow in-place atomic replavement without affecting running processes with open filehandles to earlier versions

Nope. If you rm a library, the processes using that library keep using the old library. Symlinks have nothing to do with it.

Just have a database table, and update it with a special key. When the transaction commits, you know the script has executed correctly, and the key is written as a result. When the script doesn't complete, the commit is not performed, and you can run the script again. No need for "idempotent" scripts.

Trying to do this with a filesystem instead of a real database, and things get messy.

You still have to make that messy choice between at-most-once and at-least-once execution of each transactionally guarded section.

The idea is that you run the entire script as a single transaction. Then you only have to check for a key once (which is trivial).

Seems risky. What if it blows up, or is killed, halfway through?

I'm into using transactions as explicit guards around chunks of known-non-reversible logic as a robust way to warn that manual intervention may be necessary, but using them to replace lockfiles seems to have the worst drawbacks of both systems: it's inherently at-least-once, plus it's substantially more risk surface than a lockfile to establish and maintain a database connection/transaction.

Other concerns:

What if the database has gone away or crashed by the end of your possibly-long-running script (unless you're using sqlite, in which case, why not use a lockfile? On second thought, sqlite is probably better at lockfiles than flock...)?

What about the tech choices that using a transaction precludes? You can't write ops scripts in pure shell any more (without some seriously weird file descriptor munging and job backgrounding to keep the transaction open while you do stuff). Installing your runtime of choice plus a database driver may be unnecessary and time/space-consuming on a lot of systems you need to manage this way.

You now also have a bootstrap problem: your database driver and associated tooling may not already be available on your target system, so using such transaction-managed scripts to provision clean/empty systems is another area where additional challenges emerge.

All of those can be mitigated or worked around, and this shouldn't be taken as advice to never use an RDBMS's transaction system to manage ops tasks, but I think the cases where it adds more than it costs are pretty rare.

Without proper locks, this sets you up for race conditions.

Any modern database will perform the locking for you as part of running code inside a transaction.

Is the OS operation part of that transaction?

Assuming umask is one that bites any dev that has never worked with hardened systems.

This is generally terrible advice. A better option is to 'set -e' and ensure that the bash script exits when there's a failure.

Bash scripts can't be idempotent because they operate in an external environment that can't ever be. The better option is to just be extra safe.


> http://redsymbol.net/articles/unofficial-bash-strict-mode/

That's orthogonal to the article - in fact, the article hints that you'd take this advice when your bash script is failing midway, such as it might with `set -e`.

The article is about writing bash scripts that don't care if you've ran them once, or 10 times.

Op here. Exactly this. I’ve seen many times where people had scripts and had to modify it (such as commenting code) when something has failed in the middle of the script.

Don't forget about pipefail!!

set -eu set -o pipefail

Anything else I'm missing?


IIRC, `set -o nounset` is the bash equivalent for `set -u` in sh.

`set -x` can be very useful.

Seems like all the pedants came out for this article. In good-faith I understand the principles OP is suggesting and have first-hand experience of their usefulness. But it's easier to nit-pick I guess.

Everytime I discuss Bash on the net, these gremlin style characters ruin the fruitful discussions I've had ... I think the article provides excellent advice.

This is a case where the pedents should at least be considered.

>The article is about writing bash scripts that don't care if you've ran them once, or 10 times.

Is the idea that participants don't know what "idempotent" means in a thread with that word in the title?


I'm not sure why people see errors as a bad thing. Errors are good. They stop you from doing stupid things. But maybe this article is good for telling people why you should handle errors well or when you write code to make sure it fails gracefully. Because if you don't people are going to start telling others to use the force flag when removing (almost always a terrible idea).

Learn to handle errors, don't learn to brute force your way through them.

My understanding of 'set -e' is to ensure that errors are handled appropriately, not to brute force around them. If an error occurs that is not handled, this is treated as a problem and the script exits to ensure that one failure doesn't cause additional unforeseen problems as a result of the script running to completion.

The brute force method is the force flag (eg: `rm -f`). `set -e` is error handling. Force flag was constantly used in the post which is what I disagree with.

I agree. using -f is a poor excuse for not handling errors properly. imagine doing that in any other language.... oof

This is surely a good collection of flags to know, but I sort of object to the premise of the article.

There is a good reason that many of these flags are not default behavior, and it’s that they can be quite destructive.

If you’re writing a script, as the beginning scenario says, that has errors in it, and you expect to have more, do you really want to tell all the commands to power through and delete and overwrite stuff if it encounters an unexpected state? Telling people to just always use these flags and not only when they’re quite sure what their script is doing, is probably playing with fire.

This may make the script idempotent in repeated runs, but the first run might not be what you expect at all.

> If you’re writing a script, as the beginning scenario says, that has errors in it, and you expect to have more, do you really want to tell all the commands to power through and delete and overwrite stuff if it encounters an unexpected state?

Where are they saying this? These flags are about making sure the script wouldn't error out right away if you run it again, eg how a regular mkdir for a path that already exists would not exit 0 and thus end script execution (given set -e is active, which the article seems to imply). Also it is not so much about purposely writing a script that contains errors, because that would just mean the error is encountered every time, but more about transient errors like running out of disk space, a curl call failing because of network hiccups etc. Everything your script did up until that failure shouldn't prevent a second call from succeeding.

> This is an easy one. Touch is by default idempotent. This means you can call it multiple times without any issues. A second call won’t have any effects

Every call to touch will alter the modification time of the file (example.txt in this case).

It will change the access time too. Almost every suggestion in the post has this problem which is unfortunate since access/modify/create time are used in a lot of scripts to see which files were or were not processed by prior runs of a script (whether this is a good idea or not).

It is very difficult to write truly idempotent bash scripts (consider log files). Creating sort of idempotent or at least rerunnable scripts would be nice but I think even that would take more care than this.

Do people actually use access time for anything?

I occasionally do full-text searches on system files, and this resets access times on all files. Thus, I cannot imagine using a script which cares about it.

A lot of the suggestions in this post are "idempotent" only in the sense of the specific use case that the author is interested in. Please do not take this advise to be generally applicable for the uses of "idempotent" or in general to mean best practice while writing bash scripts.

The advice such as replace rm <FileName> with rm -f <FileName> could lead to Disaster depending on the scenario. So a HUGE YMMV

"The -f flag removes the target destination before creating the symbolic link, hence it’ll always succeed."

Removing something and adding it back seems to be, by definition, not idempotent. What if something tries to access that file in between? What about the filesystem timestamps (same issues as with the "touch" claim)?

True. In such case you need to create a different symlink within the same filesystem and then use "mv" to atomically overwrite the current symlink.

ln -sfn target new && mv -Tf new old

Idempotency focuses on the result, rather than implementation. If the intent of the script is to ensure the file is there when the script is finished, then it’s idempotent. If the time stamps matter for the intent of the script, then, yes, what you pointed out would be an issue.

The chmod command can alter the flagset on the link not the target, if you select the right options.

How do you guarantee the replacement symlink had the same chmod flags and grp flags as the original? (chmod -h for reference*)

I don't think these commands are idempotent in the wide. They satisfy only the narrowest "yea, most normal-ish cases are sort of maybe ok" but since we lost atomicity, there are significant windows of time in the filesystem where inodes are in flux, and the new thing has a different inode to the old thing, if you delete and replace it. The disk is different.

This is incorrect. Idempotency is about not needing to worry about running something more than once. If you introduce race conditions with follow-on runs, you're not idempotent.

If race conditions are an issue within the context of your intent, then yes, you'll need to take that into account. Again, one needs to take into account the intent and context of the script. If other aspects of the environment ensure that the operations are serialized (say you're making an update to a system which you've rotated out of production for the update), you won't need to worry about race conditions. Need to make sure you have no timing effects? That will effect how the script is written. Need to worry about CPU or IO utilization (including reads)? That will need to be considered.

All the examples are more-or-less idempotent. For example the `blkid /dev/sda1 || mkfs.ext4 /dev/sda1` doesn't guarantee what type of filesystem the partition will have. `touch file` changes the mtime of the file. `ln -sf` will fail if the target is a directory and is not atomic.

It would be nice to have a toolset of commands that are all idempotent to make that type of task easier.

For everybody interested in portable shell programming I like to recommend the following page which offers a good overview over POSIX compliant commands with links directly into the POSIX standard:


Is POSIX relevant today? I'd think that overwhelming majority of the shell scripts would require either Linux or BSD dialects.

And if one wanted to limit my shell scripts significantly, I'd limit myself to commands supported by busybox's "ash" shell -- this is a limited shell environment which is pretty widely used in initrd's and embedded devices.

The first example ("idiom") is bit tricky, because the title is "Creating an empty file". An alternative would for example be

    echo -n > example.txt
Both fit the description of "Creating an empty file". Is it idempotent, or more or less so than `touch example.txt`? If there is already an existing file, then touching it obviously will not empty it, so the end state is not "empty file exists" like one would expect from operation "Creating an empty file"

Ironically article calls "This is an easy one"

"echo -n " is useless, is sufficient:

> example.txt

without prepending the echo command. This will create an empty file or it will truncate an existent file. The touch command could be preferable to preserve the content, if needed.


>> example.txt

this one (with a double > ) could replace touch in this application case: it's less readable but more efficient.

If a file has a line ending in it, is it truly empty?

The -n flag keeps echo from emitting a line ending.

doh! I read it as \n, but then that's for other languages, so i was just all sorts of not paying attention

I used a distro in recent memory where `echo -n foo` really produced `-n foo`.

The workaround was `printf "%s" foo`.

Ah yeah, posix doesn't require -n to do anything in particular. For most use cases, strict adherence to posix compatibility isn't important, but sometimes it really really is.

This is useful and I think it would be good if more tools had idempotency options available. Just today I needed to run a script that added a zypper repo in OpenSuse. It fails the second time because the repo is already added and there is no option to avoid it. Also the error code is not distinct so it is not easy to script around. Ideally every tool should have a mode "accomplish this result" in addition to "do this".

I think Python is getting so popular & ubiquitous now I can't see any reason to write bash scripts any more except for perhaps very simple ones that are just a few lines. Other people's bash scripts are regularly just too difficult and complex to maintain.

I agree, in general, but I do think it is worth pointing out that bash is very good at what it was made for---i.e., starting processes, redirecting their output, etc.


  { a; b && c; } ≫ log.txt
With the same written in Python:

  from subprocess import check_call, CalledProcessError
  f = open("log.txt", "a+")
      check_call(["a"], stdout=f)
  except CalledProcessError as e:
      check_call(["b"], stdout=f)
      check_call(["c"], stdout=f)
  except CalledProcessError as e:
(Note: I took these examples from an article I wrote a while back about programming languages https://innolitics.com/articles/programming-languages/ )

That's why Xonsh exists !

While not disagreeing, I would not agree that "everyone should go to Python" since I would personally prefer (for example) Ruby or Elixir, which can work just as well as scripting languages and are at least as readable, while also not triggering too much "VM startup" time.

Of course, others would prefer, say, Clojure, or Kotlin (although the JVM startup time is the worst of all), but therein lies the issue... We're stuck with Bash scripting because everyone (more or less) knows/understands it to some extent, it's installed everywhere, and it has essentially no startup-time cost.

Python startup time is just not in the same league as bash though, if you tried to replace bash across the board you'd end up with lots of unnecessary lag.

Is that really true with limited imports though, and/or with precompiled bytecode?

I was surprised to find that Ubuntu’s handler for unknown commands that tries to suggest what you could apt install is written in Python. I’ve never noticed it being slow.

If you’re into minimal docker images, bash or similar shell might be the only script interpreter available.

Python might not exist on your windows machine.


Surely we could all agree on a minimal subset of useful Python to write scripts : loops, arrays, lists, some strings functions, etc. The resulting Python interpreter would probably be tiny.

For example, there is https://docs.bazel.build/versions/master/skylark/language.ht... which is a subset of Python.

Micropython runs in 16k RAM.


> Python might not exist on your windows machine.

Bash might not exist either.

Besides, EAFP is actually a good way to make things idempotent AND free of race conditions.

Is there any hope for a more modern ubiquitous light weight scripting language?

It can be learned, and one can even learn to love it. But scripting is often done by people that very seldom write scripts, the barrier to make decent scripts is much too high and readability isn't much better. The nonsensical syntax is also easy to forget, the result is just an awful lot of headache and poor scripts floating around. Surely we can do better?

I think the problem should be solved within the OS using sandboxing, like in iOS (or to some degree Android too) and macOS and Windows store applications. For example, when you uninstall an app, there are no leftovers in the system. You don't need to check, if file exists or not ...

To some degree, this has been also solved with containers. It's much easier to create new container image, then create proper idempotent script. And you can create create image from running system too. So you can just "bash" commands and don't care about the state of the system. When done, you just create final image and you can be sure, that state of the system in the container would same as you intended.

I'm talking mostly about install scripts. Of course there are valid use cases for idempotent bash scripts.

> ln -s source target

Terminology is broken. Correct is "ln -s TARGET LINK_NAME" per its man page.

Don’t forget mktmp for making temporary files or directories, instead of putting files in a set place in /tmp.


Well, sadly mktemp is not part of POSIX. So while it is available on a wide range of systems, there are also different implementations with different options. So while I like the job it does, I am also pretty disappointed how 'unportable' scripts become as soon as they use mktemp :-/

It is sad to read these comments by people believing POSIX is something relevant today. My advice is to give up on false hope that POSIX will solve portability problems. POSIX ideas of shell are 30 years old and refusing to use anything newer alone won't make your script work correctly on all systems. If feasible, install bash, which has become the standard de facto. Or just write your script to test for the shell version and then launch appropriate code path.

I know that POSIX doesn't meet the expectations many of us have, but I don't see how it isn't relevant anymore? I mean, if you want to write portable shell scripts it is still a good reference on what you can expect to find on many systems (and which are options introduce by GNU and the likes).

Yes, ultimately you have to test your scripts on the actual systems, but that is something you have to do anyway. For example, when you run scripts on MacOS and you run into old Bash bugs because Apple refuses to ship an up-to-date version, those are issues a standard can't solve.

However, I have no experience how much you can count on POSIX when it comes to C APIs and the like.

POSIX is part of unix history and running systems have some compatibility level with it, so in that respect it is relevant. But as for writing universally working scripts, it is not a panacea, because systems shells are not exactly POSIX compliant. For example, bash and Linux, the standard of unix de facto, deviate from POSIX, the standard de iure.

> but that is something you have to do anyway.

Exactly my point.

You can invoke mktemp(1) in ways that work across most contemporary systems. At least if you don't care that the resulting names are going to look a bit weird (have lots of XXXXXX in them due to name prefix vs. template differences among implementations).

Actually, I am not sure if that is true. I have no example at hand (I could look it up, if you are interested), but a few months ago I was trying hard to create a folder with mktemp using the same command for Linux, MacOS and Android, and as far as I remember, that wasn't as easy as it is supposed to be.

Shouldn't mktemp -d -t name.XXXXXX do it? Possibly in combination with setting TMPDIR if you want to create the directory somewhere else than within the default temp directory.

Works with GNU mktemp, the old BSD variant on macOS and also Busybox. Current Android uses Toybox, I think? Its mktemp implementation looks like it takes the same flags as Busybox mktemp.


Well, at least on my Android 7.1.2 (toybox --version 0.7.1-3125af0e06f4-android) mktemp doesn't recognize the -t option (https://github.com/landley/toybox/blob/0.7.1/toys/lsb/mktemp...). So it seems to be fixed in newer versions, but I am sure there will be some installations with the old versions around for a while.

Instead, I could use --tmpdir, but somehow that one seems to be buggy:

  $ mktemp -d --tmpdir name.XXXXXX                                                                                                                                                                                                                                   
  mktemp: Failed to create directory name.XXXXXX/name.XXXXXX/tmp.sL2WbI: No such file or directory
The macOS version, on the other hand, does work with those options, but it creates a file like name.XXXXXX.veCNnwkX (instead of name.veCNnwkX) so not a deal-breaker, but it is certainly not what you would have expected. And using --tmpdir with macOS doesn't work either.

So yes, in theory, it shouldn't be too hard but sadly, the reality is often buggy and outdated :-/

And anyway you really want mkstemp ;)

Not in a shell. There is no `mkstemp` command, at least on a typical GNU system. There is a `mktemp`, which may very well use the `mkstemp` function in its source.

I.e. this is about `mktemp(1)`, not `mktemp(3)`.

mktemp couldn't make a script idempotent. I remember a script that failed because it needed to create temporary executables in /tmp. To apply some hardening policies, someone remounted /tmp in "noexec" mode and ... kaboom. So if you use that, better if you specify TMPDIR with proper checking.

I like to use stuff like

mountpoint -q $MOUNTPOINT || mount ...

Instead of

if ! mountpoint -q $MOUNTPOINT; then mount ... fi

It's much more compact in the case of having a lot of commands that need to be checked in case they don't need to be run again.

I was going to mention this, but frankly it is just a style preference. No real difference between either variant. Also, you can do something like:

    mountpoint -q /proc || {
        mount -t proc none /proc
        chown 0400 /proc/slabinfo
if you need multiple statements.

I use ansible only not to have to do this :) nice tricks though

Another tricky thing: recursively copying a folder is not easy to get right in an idempotent way due to arcane bash behaviors:


That is hardly arcane, the context dependent behaviour of cp and mv is a well-known standard. You are right it does make making a copy in a given path more complicated task.

Context and purpose of the bash script in question is important here. In the example, the author is writing a simple bootstrap script for a dev machine. A number of the critiques here, while valid, are aimed at different use-cases.

A good way to remember idempotent is to think of the 'delete' or 'mark as read' function for gmail.

You can delete an email or mark it as read even if it's been done before (on another open screen as an example).

Actually I think the "summon lift" button is a better example of idempotence. It doesn't matter how many times you mash the button, the lift is coming ASAP, no quicker. Hitting the button when it's lit doesn't cancel summoning the lift.

By 'lift' you mean 'elevator'? When I first read this I thought you meant 'summon lyft' the ride service but then realized what you meant!

Interestingly in the 'old days' of elevator operators hitting it multiple times would be either a 'hurry up' for the operator or an annoyance that maybe made it come slower!

> This means you can call it multiple times without any issues.

Unless you have processes that depend on modification time.

> ln -sfn

If you don’t mind the possibility of ownership changing, this is OK.

Great article, but a common scenario is missing: I want to run a long running script, but not if another copy of it is already running.

A word of caution on flock: it may fail when used on some network filesystems, and it is tricky to make sure you've got it right. If you're locking on-disk it's pretty straight-forward, though.

flock(1) is not POSIX, though. mkdir(1) can be used if you absolutely want a POSIX way to manage locks. For example:

    if ! mkdir .lock; then
        printf >&2 "Already running?\\n"
        exit 1
Some network file system implementation do not guarantee atomic mkdir, so you still need an extra caution with this method.

Existence of a file is an unreliable indicator of script instance running. Much more reliable is to search the script name or other characteristic in the list of running processes. To use this, the script has to have unique name though.

Hey I just learned about `mountpoint`. Great. I used to grep the output of `df`. No more.

Why wouldn’t you use make? Side-effect tracking and cleanup is the job of a build tool.

make only knows how to track side-effects on files though.

A better system needs to be able to track directories creation, package installation, filesystems creations and mounting, service restarts and so on.

How do you get make to track something other than the existence and age of a file?

If it’s IO, then the inputs and outputs can be tracked as files. If it’s not IO, and just a pure computation it can be reproduced on demand.

   something_else && touch something_else_done

Why would you use a build tool when you're not writing a build script?

It's not necessarily a build tool - it's commonly used for that, but it's a useful DSL. rake in the Ruby world is used for all sorts of tasks for web apps and more, even though it was originally designed as the Ruby version of make. Look at npm - the NODE package manager, but it quickly grew to be used for more than server-side applications. A title is a description, not a constraint.

Just because a language can do the task doesn't mean it's actually a good idea to use it for that task.

Make is much less well known than shell script. In fact, I'd go so far as to say that make is infamously poorly understood and makefiles are infamously poorly written. That makes it less maintainable, or otherwise you'll need to put "experienced with GNU make" on your job description.

If you've got the script scheduled, then you're going to have to document it everywhere that it's not actually a build script so please don't disable it because it doesn't obviously look like it needs to be running every 6 hours.

And because it's a DSL and you're somewhat going out of scope, you're more likely to have a problem like needed to extend your script to do something you can't do as easily in make. Worse, you might want to do something that you're expressly not supposed to do [0].

You're going to have to defend your decision every time you present the script, too. And if you ever need to pass it off, the first thing you're going to have to do is explain why you used make. Whether or not the person you hand it of too is an idiot or not, you know the first question is going to be "why isn't this a shell script or Python script?" You're going to have to defend your decision to use make instead of shell script because of idempotence, even though you can write idempotent bash scripts. And if the person you're talking to is an idiot -- and let's be fair that there's a good chance that that is the case -- then they'll never figure it out.

So, for me, you've got to go beyond "this language can technically do the task" in order for me to understand why you'd want to use something that's generally understood to be used for build scripts for, well, anything other than that. By choosing make, you're doing something unexpected. That's a bad idea.

[0]: https://www.gnu.org/prep/standards/html_node/Utilities-in-Ma...

Instead of "built tool" think "state machine". (What is the state of the .o file compared to the .c file? Newer or older?)

Let's try to avoid using a hammer on anything that resembles a nail: https://github.com/valvesoftware/steam-for-linux/issues/3671

How is that related to the article aside from "both involve bash"?

It's because rm -rf "$STEAMROOT/"* will evaluate as rm -rf "/"* if $STEAMROOT for some reason is blank or unset, which is what happened here. (As someone in that discussion mentioned, a little more Bash knowledge might have suggested using rm -rf "${STEAMROOT:?}/"* instead, to force it to (at least) error if it is empty or unset.)

So literally the only thing in common with the article is that they involve bash

I can't help you if you don't see the connection.

Great article. Very helpful.

I got hung up on the claim "Touch is by default idempotent. ... A second call won’t have any effects". The whole point of touch is to have an effect every single time you run it. It updates the file atime and mtime. That may seem harmless in your application, but it's definitely not no effect. Also promiscuous touching is the source of a bunch of bogus last modified dates in source code bundles.

Last modified timestamps in source code are a bad idea and they make builds non-hermetic. Better to fix up those scenarios in code than work around them.

Irrespective you have a good point that touch does have side effects.

FWIW, I don't think "hermetic" is in widespread usage.

My understanding is that "hermetic builds" refer to the kind of thing that tools like Guix, Nix, etc. go for. Something like "Declare all the inputs and dependencies to get a reproducible output".

I prefer "deterministic" in this case (I think?)

I've always heard "deterministic" or "reproducible," curious if the term "hermetic" is somehow different in this context.

Google’s release engineering docs describe hermetic build processes as those which are “insensitive to the libraries and other software installed on the build machine. Instead, builds depend on known versions of build tools, such as compilers, and dependencies, such as libraries. The build process is self-contained and must not rely on services that are external to the build environment.”

Basically, if your build process requires you to pull code from any repository that you do not own, it isn’t hermetic. If you have to `pip install` or `go get` a third party (or even in-house!) dependency from a source that you do not control, your build is not hermetic.

Effectively, this means that you have to have versioned copies of all of your third party dependencies, and version-specified build graphs. Very hard to do without a mono repo and a build tool.

In the context of the GP, I’d say that deterministic would have been a better word choice. A reliance on time stamps technically wouldn’t make a build process non-hermetic, but it would definitely make it non deterministic. It’s technically possible to have hermetic builds without having reproducible builds, although that would be a very bizarre org. :)

I agree that deterministic would've been a better choice in this context.

I work on things close to Bazel, and the word "hermetic" gets thrown around a lot. And because of that, hermetic in my mind gets translated to "how an ideal build of a project should behave" (which obviously is wrong).

One way of putting this might be that deterministic implies that the same input artifacts give the same output artifacts on the same machine.

Hermetic implies that unexpectedly different input artifacts are not possible, and builds are deterministic across machines/environments.

The (subtle?) distinction is around "what it is" rather than "what it is for":

A deterministic process always produces the same output given the same set of inputs.

A hermetic behavior ensures that indeed you'll always have the same inputs. It can just be a set of best practices (e.g. being very careful of not depending on external inputs that might change outside of your control) or it can involve an active barrier that sandboxes your environment in order to ensure that you indeed always have the same inputs.

A reproducible process is a process that can be repeated later in the future. There are various degrees of reproducibility you might be interested in. For example, you might want "bit for bit" reproducibility (important for security) or you just want to make sure you can rebuild something functionally equivalent (e.g. the compilation or link phase might not be fully deterministic in the order and layout of compilation units).

Reproducible processes usually rely on a deterministic system and leverage hermetic behaviours to ensure reproducibility (over time or across locations)

"deterministic" just means with a given set of inputs you get the same bitwise-identical output each time.

I hate to nitpick, but from the man page of touch:

> touch -- change file access and modification times

If one were to argue side effects of touch, it would be that non-existent files are created. The purpose of touch is to update access and modification times.

I think you're misunderstanding the way the term side effect is being used in this context. A side effect can refer to any impacts on a system that happen outside of the scope of the code itself. In this case, the primary intended effect of "touch" is to initiate the side effect of modifying file access and modification times.


I had the same thought about `ln -sf` - removing and recreating the symlink aren't no-effect.

To avoid that I use readlink in my scripts to read the previous value of the link and call ln -s only when the value is not what is expected.

Op here. Yeah you’re right. I’ve mentioned it though in the context of file contents. Just updated to make it clear.

Within the scope of the article, which is about modifying the structure and logical contents of the filesystem (a-la `chef converge` - the author never once mentions FS metadata), it's idempotent.

I've heard Idempotency confused with Consistency. Idempotency is where a function f can be applied (as a composition) multiple times and it gives the same result, so Idempotency: f(f(x)) = f(x). Whereas Consistency: f(x) = f(x) = f(x). An example of an idempotent function f is RaiseToThePowerZero(x) on x > 0. We usually take consistency for granted, so it's perhaps better to think of an example of a non-consistent function f which would be RandomNumberGenerator().

If you count the filesystem stats on a file as part of the system state, it is certainly not idempotent. You need to be aware of that effect on system state before you can decide whether to ignore it.

Idempotency requires that the function map from a space back onto itself. Touch is only roughly idempotent in that you can define a statistic S on the overall system state discarding filesystem stats such that S(f(f(x))) = S(f(x)).

> as a convolution

Sorry, what does idempotency have to do with convolution?

Should be composition, not convolution, my brain fart.

Better examples: Write("filename", position, data) is idempotent; Append(handle, data) is not.

Write("filename", position, data) inputs 3 arguments and outputs probably only 1, a file. So is not a function that you can convolute, you cannot pass the output of Write() as the input to another Write(). So instead, f: SetFirstWordOfFileToZeros(file)=>file, would be a better example if we want file functions. Then SetFirstWordOfFileToZeros(SetFirstWordOfFileToZeros(file))=SetFirstWordOfFileToZeros(file).

Yes mathematically composability is part of it. That's because math functions return the same type. Its easy to express the 'state' of a function as the value of the operation.

There's a wider idea of idempotency in computing - that state changes in general respond to operations. Repeating the same function more than once (because the network stuttered or the sender resent the message) and arriving at the same state is a common example.

Just a note... even reading a file updates the access time, so if you reason this way then no program touching any files in the worktree will ever be idempotent... which seems to lose the utility of the concept.


Please don't do this here.

I'm sure you can find a way to avoid promiscuous touching if you read the (gentle)man page.

personally, I feel like rm -f is dangerous, and never would recommend using it.

To make your bash scripts truly idempotent, go to root and turn your system into a git repo. You're welcome.

The problem with the idempotent trend in configuration management is that it’s all based on not tracking or knowing what the current state of something is. So reasoning about how these systems work is fundamentally impossible. It would be better to focus on systems where by we can always know the state and improve the tooling there.

On the contrary. With idempotence you know the state after an action. rm -f will always delete the file (if possible). ln -sfn will always work, even for a directory.

With the default behaviour of rm, ln -s, etc you know neither the state before, nor after.

The option -f is making the script more heavy handed and less likely to fail. But this is not necessary for the script to be idempotent and sometimes is not desirable. You can deal with errors in a more safer way and still be idempotent.

I don't see any conflict here. Idempotency is not a silver bullet and cannot fully replace state tracking, but it adds simplicity where appropriate.

It's a pragmatic approach to this problem. Usually tracking state means that you write some version number or similar to a file, but than if for some reason that goes out of sync with the actual state, for example because you put it in /etc and someone accidentally restores an older version of it, hilarity ensues, because it's guaranteed the devs never expected this to happen and don't handle it in any way.

For the case of a setup script you'd need to update your state file after every step and in turn when running it again see what the file says and resume at whatever point it says. But then you risk running into above problem.

I try to write idempotent scripts whenever possible, combined with general sanity checks specific to whatever environment the script expects.

The reason it’s pragmatic is due to how systems are designed to begin with, it’s not what you want.

We know what the state is right after the command for an undetermined amount of time, but we have no knowledge of what’s changing it. Why not cover that? There’s no reason not to other than it hasn’t been worked on enough. Consider a database schema migration, there’s a reason those are not idempotent, you know each state. Seems quite solvable.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact