find . -name <some pattern> | { while read F; do <do something to file F>; done; }
This lets you do things of almost any level of complexity to files or data listed as input and it doesn't require temporary files so you can process a lot of things.
e.g. you might use sed -i to alter the files in some way or you might find media files of some kind into ffmpeg to convert them.
Yes, I think most of the time this is all you need.
I just like the for loop because I can do filename manipulations in the loop and the input to the loop doesn't always have to be find - it can be "grep <something> -l ./" for example - so I use it even if it's verbose because it's only one thing to remember.
Perhaps that is a problem with bash and the tools - there are so many features that take years to discover. Every few weeks I come across a new one.
If you really had a lot of ffmpeg work to do, then GNU xargs can let you run the jobs in parallel, one for each of your available CPUs. You can even adjust the number of processes up or down by sending the xargs parent the appropriate signals.
One can make it safe if it's worth the effort. In practice I don't ever come across such files but if I was trying to write a script that would work everywhere I would have to use
find -print0 | sed -r 's#\r#\\r#g;s#\n#\\n#g;s#\x0#\n#g;
... and then I would have to remember to use the filenames in such a way that my escaped newlines would be converted back to real ones at the point where I accessed the file
Newlines in filenames is one of those things that one would personally try to avoid purely because of the problems it can cause.
It is fine for a personal script where you know your files don't have newlines, but for a robust script as part of a package running on other people's systems you'd have to take it into account.
In case you're not aware watch(1) performs basically that task: watch -n 5 "ps -eaf | grep blablabla". It can also highlight changes in the text, break itself out of the loop under conditions, and a few other things.
The major difference between your function and watch being that watch will use the alternate screen for output if possible.
The grep above will match itself in the output. One way that I have seen to filter it out is grep -v grep, but if you are actually looking for grep processes, then this won't work.
The shell can also filter these out. Consider this basic shell function:
pps () (
IFS=\|
ps -ef |
while read a
do case $a in
*$1*|+([!0-9])) printf %s\\n "$a";;
esac
done
)
This uses a shell pattern instead of a regular expression (shells with more features than POSIX can also use regex). It's actually a dual pattern, either no numbers which match the header, or a match of the pattern specified as the first function argument.
The function uses a subshell () instead of command grouping {}. This allows IFS to be manipulated without modifying this setting outside the function, at the cost of a fork. POSIX doesn't support local shell variables; if your shell does (with typeset or local keywords), it will be slightly more efficient.
My function would slot into yours, but yours would not work in the Debian default (dash) shell, because the "function" keyword is not defined in POSIX.
POSIX approaches are the most important, because you can use them just about everywhere.
You can create a separate locate database for /home if you want. Set it up for periodic updates (via cron or other means) and then use "locate -d <path_to_db>" for searching.
Can anyone recommend a modern explanation of sed's more advanced features?
I'm ashamed that, despite having learned many other ostensibly more complex systems and languages, sed continues to elude me. Of course I get the usual 's/foo/bar/g' stuff with regular expressions and placeholders, but I still don't really understand most of the other commands or, more usefully, the addresses.
Perhaps I need to spend quality time with the manpage in a cabin in the woods, away from modern distractions, but so far it has proven impermeable.
Perhaps I'm missing a proper understanding of its 'pattern space' vs 'hold space'.
Unlike the GNU sed manpage, the info docs are really quite thorough and contain many examples. They cover the vast majority of the features, and they call out GNU specific extensions so they're useful for learning non-GNU sed too.
They're available in HTML too¹, if you don't like the info user experience.
https://www.grymoire.com/Unix/Sed.html is a good resource to get started, though it's definitely something you should not just read due to the fact that sed code is pretty hard to read. Have a terminal on the side and try it out live.
Once you understand pattern and hold spaces sed becomes a pretty fun esoteric language :)
I try to use "smaller" commands instead of more versatile but complex alternatives. I'm not sure if I get better cpu/io/etc usage by using cut or tr instead of sed or awk (or perl and so on) but it is a pattern I followed by decades, at least for simple enough commands. And I may have something less to remember if I do i.e. rev | cut | rev instead of awk to get the last field.
And having something less to remember, and lowering the complexity for the reader (that not always be me, or if I am, may not be in the same mind state as when I wrote that) is usually good. But it also is having predictable/consistent patterns.
As an aside, "zwischenzug" is a great name. A "zwischenzug" is an "inbetween move" in chess that might happen in the middle of a sequence, and can often lead to unexpected results if one side hasn't been precise in their move order.
One feature I really wish for in shells is something similar to Perl's unquoted strings / single line concise HEREDOC syntax / raw literal syntax.
e.g.
$ echo q(no ' escaping ` required " here)
no ' escaping ` required " here
This would make typing sql / json much easier.
To my knowledge none of the shells implement this.
Does anyone know why?
The POSIX shell was set in stone in the early '90s. The standards board actually removed features from the Korn shell in order for Xenix-286 or comparable systems to be able to run it in a 64k text segment, with clean and maintainable C (ksh88 is very ugly C).
The standards for the POSIX shell are controlled by the Austin group/OSF, and they are not receptive to changes, unfortunately.
Going to echo: Bash is bad. Bash is a feature-impoverished language and belongs in the dustbin. I don't understand why we script with one hand tied behind our back.
With Python I can install "click" and get great argument parsing and multi-command support with a simple install. Bash's argument parsing is much more verbose and arcane.
Sure, it has nice ways to combine tools and parse strings and such. However: You could also implement those nice abstractions in a higher quality language. So a good thing about bash does not cancel out all the bad.
I would LOVE a toolkit that is:
- A single binary
- That parses a single file like make ( but NOT a makefile )
- That uses a well known, powerful language.
- That lets me declare tasks, validations ( required arguments, etc ), and workflows as a first-class citizen
- With out of the box `--help` support
- That lets me import other makefile-like files to improve code re-use ( both remote and local )
It feels like trying to program in an esoteric language, or a toy language made by a 16yo.
It's nice if you have no logic or loops or variables, but just goes downhill from there.
I really don't know why we can't just take a fork of MicroPython, agree that we are not gonna make any backwards incompatible changes anytime this decade, and none at all without changing the name, and then just include it everywhere.
All these "Let's make a better shell" projects are still held back by being shell. They are trying to be a UI and a language at once, and they're just OK at UI and kinda crappy at programming.
I think the closest tool to what you describe is Ansible which is in fact wonderful.
As much as I like these, these are also how you blow both feet off.
As long as you're just exploring (everything is read only), you're okay.
The moment "kill" and "rm" (anything mutable) come into the picture on a long shell pipeline, stop. You probably don't want to do that.
You need to be thinking very hard to make sure that an errant glob or shell substitution isn't going to wipe out your system. Make a complicated pipeline that creates a simple file, examine the simple file by hand, and then feed that file to "kill"/"rm"/"whatever" as close to unmodified as possible.
You may still blow your system away. However, at that point, you've given yourself the best chances to not do so.
Being able to write a file directly in the shell with heredocs is so cool, but I know I'll never use it because it's far quicker to just open vim and type with all the commands and verbs and motions that I'm used to.
Ah, but that only writes the file once. Heredocs can write a new file, in a different position and with different variable replacements, as many times as you like. Very powerful.
I know there are many ways the same thing can be done in the shell but there are so many problems here. Please take this as feedback and not harsh criticism (I know there's comment section on the blog but I'd rather not give my e-mail to yet another website).
> cat /etc/passwd | sed 's/\([^:]*\):.*:\(.*\)/user: \1 shell: \2/'
Besides the needless cat this is clearer in awk, a tool you mention just prior.
awk '{FS=":"} {print "user: " $1, "shell: " $NF}' /etc/passwd
> $ ls | xargs -t ls -l
Beware files with whitespace in their names.
> find . | grep azurerm | grep tf$ | xargs -n1 dirname | sed 's/^.\///'
>
> ...
>
> for each of those files, removes the leading dot and forward slash, leaving the final bare filename (eg somefile.tf)
No it doesn't it returns the names of parent directories where files ending in "tf" are found and where the path also includes the string "azurerm".
> $ ls | grep Aug | xargs -IXXX mv XXX aug_folder
Ew.
mv *Aug* aug_folder/
Though now the issue of whitespace in path names is mentioned. Another minior point here is that with GNU xargs at least when using -I -L 1 is implied, so the original example is equivalent to a loop.
No idea why sudo is used when running locate, also it takes a glob so the grep here can be avoided.
> $ grep ^.https doc.md | sed 's/^.(h[^])]).*/open \1/' | sh
Now this is just nutty.
> This example looks for https links at the start of lines in the doc.md file
No it doesn't, it matches the start of the line, followed by a character, then "https".
The sed then matches start of the line, followed by a character, starts a capture group of the letter "h" followed by a single character that is not "]" or ")", closes the capture group and continues to match anything.
The example given will always result in "open ht" for any hit.
Then there's the piping to sh. You mention like to drop this to review the list. I'd suggest focusing on extracting the list of links then pipe to xargs open. If you get a pipeline wrong you could unintentionally blast something into a shell executing anything which might be very bad.
> Give Me All The Output With 2>&1
&> stdout-and-stderr
Edit: I know the formatting of this comment is terrible, I think quoting the sed example caused the rest of the comment to be marked up as italics so I indented it along with the other quotes as code.
After almost 30 years and tens of thousands of LOC, possibly even into 6 figures, I've now thrown in the towel and just use Python for all my scripting needs. I've yet to have a single shell script that hasn't turned into a maintenance nightmare, regardless of "bash best practices" (such as they are).
I've occasionally wondered why Python isn't the default scripting language on *nix by now. It's almost always installed by default, is more capable than bash, is dynamic/interpreted, has a repl, and seemingly everything else you would want in a Bash replacement. Is it just Bash's momentum, or something else that keeps bash hanging around?
Please show me how to do `ls | grep -i old` in Python?
For a short while I was following https://github.com/matz/streem (from the original author of Ruby) with interest in possibly being adapted to superseed Bash. I have no idea how realistic that actually is.
I also use fish a lot, but it has a lot of it's own problems.
The thing about all these "shells" that really seperates them from Python (et all) is that they lookup commands from PATH and execute them directly. Anything else is simply not a shell language no matter how great it is at scripting.
> Please show me how to do `ls | grep -i old` in Python?
for filename in os.listdir(os.curdir):
if 'old' in filename.lower():
print(filename)
One of the advantages of using a language like Python is you don't need to set up a lengthy pipeline of program executions just to reformat program output. In my personal experience using Python for basic automation scripting, I just haven't really come across any need for me to pipe output from one command into another.
The problem I have with a lot of shells is that their variable syntax is almost completely broken if you want to do anything slightly harder than trivial. A recent example I had was trying to write a script that reads a shell and executes the last line with some extra arguments. So that should be `PROG=$(tail -n1 commands.sh); creduce-script "$PROG"`, nothing hard about that... except the last line was `clang "-DFOO= " bar.c`.
I'm not arguing that something like Python is nice for when I'm writing scripts out and saving/committing them.
A shell is a REPL designed to use quickly and easily for one-off jobs. Shell scripts are also sometimes just simpler to string together than referencing a whole new language.
You probably wouldn't do that in Python because Python isn't a shell. It was never designed to write mini programs in 5 seconds as a way to interact with the computer.
You would write a script with os.listdir and 'if "old" in path' if you wanted to automate it.
If you were doing things interactively in a python minded OS, you might either have something much like shell is today(Maybe very stripped down without a focus on programming features beyond glob, expansion, and pipes), or you would have python library of misc utils.
Maybe you would do
x=ls()
search(x, "old")
But that would be rather slow. Maybe you'd have some completely new model of text based interaction that was more menu driven and looked a lot like VS Code's command palette.
You'd press tab at the beginning of a line(Arbitrary easy to reach shortcut), and get your command pallet. Start typing and fuzzy search happens.
Select "Search filenames", and maybe that's a common enough snippet that it would be built in. Now you have that whole script, and your cursor is already in the search term field.
Press tab there. You get a menu of recently set variables, recently used short strings, etc, that the heuristic might want you to have.
With better UI we could afford a more explicit traditional language syntax.
I wonder why Groovy decided to map "or" to the shell's "|". Wouldn't it have made more sense to map "or" to the shell's "||" and "pipe" to the shell's "|"?
Why should I need to explicitly state that I want to wait for my process output? I'm in a shell, I put a `&` afterwards if I want things in the background. The model is synchronous by default.
Common shell tasks (piping, process substitution, create a string with the output of a program, creating and redirecting output and input to files) are extremely cumbersome in python.
which requires that the caller take steps to grab, check, and then re-emit any errors (or even just the "normal" output) produced by either of those, not to mention the horrors of output buffering for long-running processes and the ever-present encoding woes since subprocess interfaces as bytes not str
I guess it is the same situation with all programming -- diligent programmers can make it work, less disciplined users inflict pain upon all downstream users of poorly coded scripts
historically, this is exactly the role Perl has played. It's also why people keep repeating the nonsense that Perl looks like line noise. Because the only Perl code people are familiar with are throwaway scripts written by (usually) some system admin person who is absolutely not a developer.
And JAPH (Just Another Perl Hacker) type constructs that are meant to bend the language in weird ways and show off the person's knowledge of the language intricacies. They were a lot of fun, but likely gave a distorted idea of an average Perl program, to those unfamiliar with it.
I've also wanted a better shell scripting experience, and have bashed [no pun intended] my head against this several times. I think some of the major pain points that resist adoption of a language like Python or Ruby for in-line shell scripting or simple automation is:
* These languages prefer to operate on structured data. If it parses from json, you have a much easier time. But, most commands you'd invoke from the shell emit unstructured text by default. You can deal with this, but it's a pain point, and it means any serious scripting starts off with a "here's how you parse the output of <foo>", and several rounds of debugging.
* The shell's first and foremost usage is a user-facing interactive interface to the computer. Most of what you do is invoking other programs and seeing their output, and doing this is very easy. While python & ruby have REPLs, these are mostly focused on trying out language features or testing code, not invoking other programs, nor navigating a file tree. A lot of shell scripts start as 1-liners that grow and grow.
* Invoking other programs: In sh, invoking other programs is the primary activity, and it's relatively straightforward - you type the name of that program, press enter, and hope that it's on your path. In Python or Ruby, it requires importing the proper library, choosing the correct command, wrapping it and arguments, and ensuring everything escaped correctly. [Ruby _does_ have the backticks operator, which does actually make a lot of 1-off scripts easy, but this is not a panacea]
* In sh, a lot of the 'utility' programs like cut, sed, awk, grep, head, tail, etc., are standing in for the capabilities of the language itself. In pure Python or Ruby, you'd do these kinds of things with language built-ins. But, that's a learning curve, and perhaps a bit more error-prone than "| head".
* On top of all that, yes, momentum. If tomorrow you showed me a shell replacement for nix that was unambiguously* improved in every way, had excellent documentation, community support, and was actually pre-installed on every machine, it would still take a decade or more before it was really a default.
-----
I want it to happen, so I'd never discourage anyone from taking a swing. IMO, some of the top-level considerations that are necessary for making a successful sh alternative are:
* minimize the additional # of characters required to invoke a program with arguments, compared to bash.
* Decide which suite of typical utilities should actually be built-ins (i.e., things like cd, ls, cp, grep, curl), and make those standard library, built-in, without additional import or namespacing.
* Focus on an append-style workflow. Functional programing styles can kind of help here. Wrapping things in loops or blocks is a point of friction.
* An additional highly-desired feature which just isn't in sh by default, to overcome momentum. I have no idea what this would be. More reliability and better workflow are _nice_, but sh is sticky.
If you are producing lengthy bash scripts, you are probably using the wrong tool
Bash is good at being a programmable shell, and bad at being a poor programming language. It's great to be able to express complex operations in the command line, but once you start putting them in files you're gonna have headaches.
maybe micropython to save some memory and cycles, python for the larger scripts if need be
i'm surprised how bash survives.. I think it's part of a few traditions that are tied to *nix, just like sysvinit, or raw string pipes as IPC .. they'll soon fall in sequence
find . -name <some pattern> | { while read F; do <do something to file F>; done; }
This lets you do things of almost any level of complexity to files or data listed as input and it doesn't require temporary files so you can process a lot of things.
e.g. you might use sed -i to alter the files in some way or you might find media files of some kind into ffmpeg to convert them.