The suite of tools you're typically working with in bash (i.e. bash built-ins and basic stuff that comes on most linux systems) doesn't like arrays very much. It's generally more productive to split things into multiple lines and/or insert delimiters so that you can play to the strengths of the tools you have at your disposal.
Everything is basically either a formatted input or a formatted output at the end of the day. Trying to shoehorn things into data "structures" that enable efficient access in certain cases rarely makes sense because of all the processing you need to do to get it there in the first place and it winds up being more efficient just deal in formatted streams of data even if you need to deal with some ugly nested loops and function calls to do it.
This! I was doing some work with a colleague and he was showing off some bash knowledge. I know a bit but my experience is limited since my last job was a Windows a shop.
We did a bunch of stuff with “xargs”. He said it was the one misunderstood command. One of the cool things was the ability to parallelize the work with a simple -P flag.
You do a find, make sure to use the -print0 flag, pipe it into xargs and use the -0 option then run a command over all the files. If you add -P it magically becomes parallel.
I really hate the way there's no standard flags, so you have to remember that find is -print0 while xargs is -0 and something else might be -null.
I spend half my time trying to find the right arguments and the whole time wishing I just bit the bullet at the start and used a real programming language.
And god forbid if a filename might have a quote character or a space in it. Then you have to think 5x as hard about what you're doing.
While I agree that arrays are usually not the thing you want in bash, in the end to me this just feels a bit like having to adapt the job to the tool instead of being able to select the right tool for the job. Both of these principles have their place of course, but having to tell people something as common as an array is not really available in this widely used environment is like telling a carpenter to avoid nails or glue all together because this particular workshop only uses screws and an non-electric screwdriver. Perhaps not the most appropriate metaphore, but you get the point: mistakes will be made and time will be wasted. Because 'formatted stream of data' really means 'any tool can output a different arbitrarily formatted stream so if you need a field it is up to you to figure out how, every single time again because no-one can remember this for all possible tools anyway'.
Agreed! I love bash and use it all the time, but if some solution gets to the point of needing actual data structures, that's when you know it's time to move to a real programming language.
Does anyone get the "Accept all cookies" stackexchange modal popup every time? And if you go into Customize settings, you're faced with a dark pattern of being confronted with the "Accept all cookies" button AGAIN, and in the spot where you'd expect "Confirm my choices" to be.
Honestly I'm flabbergasted that stackoverflow is willing to engage in such unethical, immoral behavior on their very first customer-facing interaction.
I get it all the time when I don’t use blockers, but what I’ve seen when I choose to customize is a set of toggles where the additional cookies (don’t recall them, but they’re two among a total of three) are turned off. The size of the dialog on desktop is a huge annoyance.
When I use a browser with blockers, the experience is better.
Each of StackExchange's sub-sites has its own cookie consent management. That said, once I make my choices initially on each one, I haven't had any issues with them popping up again (I'm also signed in, which could impact things).
It is this kind of thing that makes me want to avoid Bash every time. Something that should be trivial has no general consensus about how it should be done and, not only that, many pitfalls.
Bash is indeed very limited on some aspects (and confusing/quirky on others, etc.etc.).
Regarding the multiple ways though, I think it's a matter of good judgment. Depending on the input, one may choose different tools - one should not forget that Bash is a glue language, intended to make tools work together.
This whole class of problems, where there's no general consensus about how to achieve something, makes it incredibly difficult to train juniors who still have a one-correct-answer mindset. It makes me wonder how any of us got past it in the Java world, where there were no fewer than three commonly-accepted ways just to represent time.
Declarative languages, like SQL or (dare I say) Terraform, seem to be good teaching tools in that regard.
Looks like the correct answer is to call a python script (or similar). I'm impressed by the bash knowledge of the replier, but yikes. I stand by my rule to abort from bash as soon as anything less trivial than an if statement is required.
Sometimes it's nice to have a pure bash script if you don't know your environment.
If the script is simple enough it might be worth learning and implementing a bash script than trying to test if Python is installed and the right version etc.
And that's why I switch to a sane language like python once my shell scripts get longer than a few lines.
But I would love a shell with a sane language. At the moment I am using zsh + oh-my-zsh because I can not let go of the autocompletion I get. I tried some other shells like oil shell, ion shell, nushell and elvish but sadly the completion is just not there yet. The only shell with maybe even better completions I came a cross is fish, but I don't love the language, while it seems better than sh/bash to me I'd much rather have something more similar to ion shell with stronger typing.
Thinking about command completion, this seems like the analog problem to editors and the language server protocol. Is there something like a command completion server?
The downside to Python is people start screeching if you use subprocess to call executables, and dammit, sometimes I don't want to find and implement the API in Python - I just want to run the program that already does that for me.
My personal flip point is error handling. If errors aren't important, shell. If they are, shell with `set -e`. If errors are important but also shouldn't immediately kill the script from one failure, Python.
Doesn't everyone :) Piping typed objects instead of text (notably Powershell) solves quite some general bash issue, unfortunately Powershell isn't exactly a sane language.
The long rebuttal from bgoldst fails to answer the question of how to solve this "in bash" when he introduces additional commands like tr(1) and sed(1). You should avoid using additional programs to perform actions where bash builtins can do the job. The extra overhead and impact on runtime of context switching to load in a new program is non-trivial if you have to loop over it thousands of times. It's better to normalize the string data for use with 'read' using builtin string substitution.
$ string="Los Angeles, London, Belfast, New York"
$ IFS="," read -r -a array <<< "${string/, /,}"
$ echo ${array[0]}
Los Angeles
$ echo ${array[1]}
London
..etc.
Don't have the free time today to read the rest of it unfortunately.
> The extra overhead and impact on runtime of context switching to load in a new program is non-trivial if you have to loop over it thousands of times
The specification is: "speed does not matter".
The long answer addresses this solution:
$ string="Los Angeles, London, Belfast, New York"
$ IFS="," read -r -a array <<< "${string/, /,}"
$ echo ${array[0]}
Los Angeles
$ echo ${array[1]}
London
as "not very generic" in point #3, which is correct. Bash simply doesn't support generic splitting by itself (things go downhill quickly once, for example, newlines are introduced, and so on), and if precision/flexibility are priority over speed, then it's better to use standard linux tools.
If you have newlines present then process the data a line at a time, as you would if reading from a file. This is nowhere near as difficult or cumbersome as you're making out.
One certainly can, but the increase in complexity shows that Bash starts not to be the most effective tool, when performing tasks it isn't designed for (and compare to a full-blown programming language, there are many).
The pure bash solution has overhead too. If you need to split 1000 strings it will create, write, and read 1000 temp files. Depending on your hardware and file system, that's more expensive than creating 1000 or 2000 processes.
I would worry about making it correct before making it fast, the former being a big challenge!
(Oil doesn't do this; it creates a process for here docs without touching disk. In theory this could be eliminated for here docs less than PIPE_BUF, which is probably a lot of them)
It's the read/readarray builtin that's creating the tmpfiles and that's not great, but the string substitution doesn't. My point was there's no need to call out to another program to do something that bash is capable of doing itself.
Sadly shell is not very good at string manipulation :-/ I would pipe the string to something like this and read it back into an array of lines (readarray):
python -c 'print sys.stdin.read().split(", ")'
Or you can use sed and use read -d $'\x01':
echo -n "$mystr" | sed $'s/, /\x01/
That will handle newlines but not the 0x01 byte.
I think really a shell should have the ability to iterate over bytes/code points reasonably efficiently to do arbitrary string processing. Python isn't great at this either, since it creates a lot of 1 byte string objects.
Afterwards you can also use the fact that arrays and array elements act far more predictably in fish (no implicit splitting on whitespace, for example).
Shells aren't designed for general programming, they are designed for quick and dirty tasks. They are excellent for that purpose, not because the shell is useful, but because of the large ecosystem of tools that makes the shell useful.
Virtually any problem of "how do I do X in bash?" is solved trivially by Awk. If you don't know Awk, then any number of tools cobbled together in an inefficient, janky Bash function will get the job done so you can get on with your day.
If you need something robust, use an actual programming language, as "robust" things already need significant investment in development, testing, and maintenance.
Everything is basically either a formatted input or a formatted output at the end of the day. Trying to shoehorn things into data "structures" that enable efficient access in certain cases rarely makes sense because of all the processing you need to do to get it there in the first place and it winds up being more efficient just deal in formatted streams of data even if you need to deal with some ugly nested loops and function calls to do it.