Hacker News new | past | comments | ask | show | jobs | submit login
Understanding the fork system call in Unix (athwani.net)
77 points by objectivem 39 days ago | hide | past | web | favorite | 104 comments



Question: What happens (and should happen) if a GUI app forks? Do/should the windows get duplicated? Do/should two apps control the same windows? Is the answer the same/should it be the same regardless of all the underlying stack (X/Wayland/Gnome/KDE/whatever)?

Edit: I really care about the "should" part too, not just what currently happens. I've been trying to think of semantics for forking that could work in these situations but I haven't been able to think of great solutions.


The connection to X is just a socket file descriptor. Two processes using it will most likely desync the protocol and result in wreckage.


No, windows are not duplicated. Yes, two apps control the same windows. Yes, this answer is the same as underlying stack.

That solution is very obivious once you realize it is exactly the same as what happens with the terminal and network connections.

For example, after forking, both processes are still attached to the same terminal (what else can they do? It’s not like they can make new terminals - what if it was an ssh connection). They better agree who gets to read/write, or user will get confused - in shell, the parent voluntarily stops accesing terminal until the child is done.


It's not clear to me if you can really just solve this like you can with the terminal case. The thing with a terminal is that it's insanely easy to fork streams -- you can just interleave the streams, and even without active cooperation between the processes you still get a behavior that has some semblance of semantic sense. But with a GUI system, you're not just dealing with a pipeline; you're dealing with a state machine. Do you really take a similar approach? When a window receives a message/event, which process responds? If multiple processes respond, which message/event gets sent and which gets discarded? If only one responds, does the same process also get the reply to the reply, or might the fork get the reply instead? You could interleave events randomly here too, but it seems like far less sensible of a solution than with the terminal case.


the Answer to most of your questions is that the GUI would quickly break after being forked.


you close the file descriptors in the child process to avoid interfering with the parent, you can do this automatically on execve if they are marked with FD_CLOEXEC

or you simply avoid touching them. remember that forking only creates one new thread in the child process, there won't be any code running that you don't control or that would know about those file descriptors existing.


Mainly only the address space is copied, other resources usually become shared, except threads (which don’t get resumed in the child).


It's worth noting that on Linux, the address space itself is copy-on-write, but the page tables describing the address space are nevertheless copied eagerly, and this copying can take significant time. If you're going to be calling execve right after fork, you should be using vfork, not fork.


Why don't people just use the standardized posix_spawn() or popen() in the first place? Why fork then exec?


posix_spawn (at least in GNU/Linux) is a library function, not a syscall. Under the hood, it calls clone (which, like vfork, avoids copying page tables) and then execve.

The reason fork (or vfork, or clone) and exec need to be separate steps is because you may want to do some stuff in between the two steps. The most obvious example is redirection: when you run "grep foo < bar", the shell needs to set stdin to point to "bar" before it execs "grep".


Thanks but why don't people just use the portable posix_spawn when they can (which should be the vast majority of the time)? That it calls clone/execve underneath doesn't really answer my question...


As far as I know, there isn't any good reason not to. So indeed people should, if they can.

(IIRC, posix_spawn was introduced relatively recently, so most old software wouldn't use it).


I was trying to find a year on it and couldn’t, but agree it seems newer.

FWIW, it’s not required to be able to run any setup code in the child to do arbitrary setup. For instance, WinNT makes that very hard, but has a larger syscall interface to compensate, so it’s actually more general. An alternate design of running all code in the parent can be made work if the kernel supports passing a pid handle to every syscall: https://lwn.net/Articles/360747/

On the other hand, this is perhaps a bit more complex in some ways? or at least just different.


> I was trying to find a year on it and couldn’t, but agree it seems newer.

http://man7.org/linux/man-pages/man3/posix_spawn.3.html#CONF... says POSIX.1-2001, so I'm guessing between 1998 and 2002 (according to https://en.wikipedia.org/wiki/Single_UNIX_Specification#2001... it started development in 1998 and was released in 2002). That man page also says "since glibc 2.2", which according to Wikipedia was released November 2000.


Seems newer? Fork predates the whole idea of POSIX by decades, let alone the particular API posix_spawn.


Yeah this. We had problems on an embedded system running python where things started breaking when the system was low on memory, but not so low that things should have failed. We ended up writing a posix_spawn() wrapper to avoid the OOM caused by duplicating the page table. I think it's weird Python doesn't do that by default when you're just launching an external command.


vfork isn't particularly of sound design: https://dwheeler.com/secure-programs/Secure-Programs-HOWTO/a...


What does "sound design" mean? If you'd said vfork is error prone, I might have agreed --- but lacking soundness suggests that it's not possible to use vfork correctly, and, well, it is possible to use vfork correctly.


It is not possible to (usefully) use the former POSIX vfork correctly:

> The vfork() function shall be equivalent to fork(), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit() or one of the exec family of functions.

This roughly simplifies to "the only safe thing to do after a vfork is exec", and in this decade you can use posix_spawnp for that instead.

In particular, you can't do safely do anything that you'd normally want to after forking, like closing fds or moving them around (that'd be "calls any other function"). This means that unless the executable you're execing into is also under your control and knows what it's getting into, you have to mangle your fds &c. in the parent, vfork-and-exec, then undo the mess you've just made.

It is in principle _possible_ to navigate this error-prone process without observable side-effects. (The same could be said about implementing speculative execution, and look where we are now.) But I would hesitate to call it correct; the approach is clearly wrong, even if the result looks okay..

The situation is only slightly better if you're targetting some particular OS, which can provide stronger guarantees.


The POSIX specification for vfork() is a poor match for reality. Linux and NetBSD, and all good OSes, provide a vfork() that is actually useful. There have been some bad OSes that mapped it to fork(), just like there are bad OSes that restrict you to the portable character set (not even full ASCII) or that don't have coherency between mmap() and read(). Somewhere you must draw the line, or you'll spend forever trying to deal with obscure portability issues that are mostly theoretical.

After a vfork(), the memory image is shared and the parent is prevented from running. The child is a fully separate process in all other ways. This special vfork state ends when the child does an exit or exec. This is how it works, no matter if POSIX wants to add weasel words or not.

Even with the lame POSIX specification, the "executable you're execing into" is perfectly safe messing with file descriptors. Once that exec happens, the child becomes like any other. Systems with a proper vfork() are also safe before the exec.

It certainly would be nice to have an interface similar to vfork that would take a function pointer and allocate a stack, optionally letting the parent continue on. Linux itself actually can support this, but glibc is uncooperative in exposing it.


POSIX is frequently hyperconservative garbage. There's what POSIX allows, and there's what Linux and other real Unix systems actually allow. How do you think posix_spawnnp is implemented on systems that don't have a native system call for it? vfork(2)! close(2) in practice works fine.

I don't understand how you can claim that vfork can't be used correctly when there are thousands of existence proofs showing that it works fine.


> I don't understand how you can claim that vfork can't be used correctly when there are thousands of existence proofs showing that it works fine.

As libc implementer, you're inherently OS- and probably compiler-bound, and you can either fudge it to work on the system you're writing for, or have enough clout to make it so. It's an implementation detail.

As the author of a program which is not strictly OS-bound, you do not have this luxury. Even if you want it, you're probably better off calling clone(2) directly so that you won't get an unintended semantic.

Without counting indirect uses via libc popen/system/&c., I do not believe there are "thousands" of correct uses of vfork. If we count distinct hand-written code paths that call vfork, I'd wager closer to a few dozen, with at best a handful outside of a libc. It's a hideous interface.


I'm not suggesting that people use vfork over posix_spawn --- I'm objecting specifically to the claim that it can't be used correctly. It can be. Emacs uses it, portably, and the sky doesn't come crashing down.


> So “cin“, “cout” and “cerr” also known as stdin, stdout and stderr are shared.

Note that these are different things: std::cin is a std::istream, std::cout and std::cerr are std::ostreams, and stdin, stdout, and stderr are FILE *s.


They clearly mean in a practical sense they have the same purpose, not that they’re literally aliases.


Please don’t use this code. fork can fail, and this doesn’t handle it.


I wouldn't criticize anyone for posting code snippets for educational purposes without confounding the learner with all of the error checking. Pedagogy vs pedantry.


I think the number of people who will copy-paste from a website and not revisit it may be alarming. It is on those who publish snippets to make these things explicit. Maybe the following is enough:

    if (pid < 0) { /* TODO: handle error */ }
Reading the actual code I doubt it's just an oversight, the coding style used seems to lend strongly towards not thinking an error is possible:

    if ((pid = fork()) == 0)
    { //Child
        longTask(); 
    } else {
        children[i] = pid;
    }


> the coding style used seems to lend strongly towards not thinking an error is possible

So does every tutorial which doesn't check the return from printf or check for errors from cout. Did you notice that the parent and child are both writing to the same file descriptor? Maybe the writer should add some words and tests in his code about the inherent race condition there. A true pedant would push this even further and teach their audience how to eliminate this problem with cross process mutexes or semaphores. Better be sure to check the error codes on those calls too.

I know arguing with a pedant is like wrestling with a pig, but honestly I'm more offended by the "#define N 2" than by the lack of error checking.


It would be a good idea to mention the error cases it doesn't handle, though.



Thanks, the site itself can no longer fork ;)


Stick a fork in it. It's done.


At first I though this was a tongue in cheek joke that the title asks the question about forking and the answer is a site crash.


While we're on the topic of calling fork and overloading servers to the point that they can't establish database connections or serve web pages, I'd like to point out that forking another process is an astronomically EXPENSIVE and WASTEFUL way of performing simple arithmetic and string manipulation.

Please stop wasting electricity and time by writing shell scripts that fork processes to do ridiculously simple things like calling "bc" to add two numbers or "sed" to perform simple string manipulation, since your computer has a "mul" instruction that doesn't require millions of instructions and bytes of memory and disk accesses to perform.

It's like using Bitcoin to pay for a stick of gum.

https://www.quora.com/How-much-energy-does-it-take-to-make-a...

>According to Digiconmist, each bitcoin transaction took 215 kWh. Whereas an average Indian household consume : 90 kWh per month. It sums up that, each bitcoin consumption can power a average indian household for 2.5 months probably. The average electricity cost in India is 5 rupees per kwh.

It's like rolling coal because you want to trigger the libs.

https://www.youtube.com/watch?v=fkmLf5OXxz0

Stop and think before you write that shell script. It's not going to take you any more time to write it in Python, and then you won't be so shocked and surprised and totally fucked when you realize later that you need to add more features and make it more complicated than you originally though it would need to be in the first place.

It just boggles my mind that some people have such a hard time understanding how spectacularly inefficient shell scripts are, on top of how horrible to write, read, understand and maintain they are.

https://news.ycombinator.com/item?id=19886164


For many things shell scripts are the easiest and most expressive way to write them. And probably you don't know that human resources are more expensive than cpu cycles.

Sure, I love python, but if I want to find and filter some files nothing is easier to write than bash, so it's possible to have a quite big bash script even doing a good job and being well readable.

I'm honestly a little shocked that this doesn't seem to be common knowledge anymore and your post got upvoted so high.

My main language is Python btw.


What if one of your file names has a space in it? Most bash scripts will crash when that happens.

Please don't tell me never to use spaces in file names.

How many human resources are required to track down and fix why a bash script suddenly starts failing mysteriously, years after it was written, when somebody mistakenly dares to use a space in a file name?

That kind of problem never happens with Python scripts (including problems with file names including other punctuation and unicode characters), because Python simply represents file names as strings.

For a scripting language that was indented to manipulate file names all the time, bash sure fucked up big time.

https://www.linuxjournal.com/article/10954

>Work the Shell - Dealing with Spaces in Filenames

>[...] I've outlined three possible solution paths herein: modifying the IFS value, ensuring that you always quote filenames where referenced, and rewriting filenames internally to replace spaces with unlikely character sequences, reversing it on your way out of the script.

>By the way, have you ever tried using the find|xargs pair with filenames that have spaces? It's sufficiently complicated that modern versions of these two commands have special arguments to denote that spaces might appear as part of the filenames: find -print and xargs -0 (and typically, they're not the same flags, but that's another story).

>During the years I've been writing this column, I've more than once tripped up on this particular problem and received e-mail messages from readers sharing how a sample script tosses up its bits when a file with a space in its name appears. They're right.

>My defensive reaction is “dude, don't use spaces in filenames”, but that's not really a good long-term solution, is it?

>What I'd like to do instead is open this up for discussion on the Linux Journal discussion boards: how do you solve this problem within your scripts? Or, do you just religiously avoid using spaces in your filenames?

>Dave Taylor has been hacking shell scripts for a really long time, 30 years. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

(Dave wrote that 8 years ago, and there are still zero comments on it. Nobody has a good solution! Bash is just permanently fucked when it comes to file names.)


bash has handled spaces in filenames easily for a long time:

    #!/bin/bash
    # run this scri0t with 0 or more filename args
    for file in "$@" ; do
        do_something_with "${filename}"
    done
> Nobody has a good solution! Bash is just permanently fucked when it comes to file name

From the man page bash(1):

    Special Parameters
       @  Expands to the positional parameters, starting from one.
          When the expansion occurs within double quotes,
          each parameter expands to a separate word.
          That is, "$@" is equivalent to "$1" "$2" ...


It's possible to write bash that handles spaces in filenames properly, but it's really hard to do it consistently. Unless you test it then you've probably got it wrong.

Most people don't test their bash on spaces. Look at the dismissive commenter above who just things "nah why would people use spaces in filenames?". Do you think he quotes everything properly? No chance.

CMake is also really bad about this. Although it is arguable slightly better because it uses semicolons as list delimiters instead of spaces, and semicolons in filenames are going to be very uncommon. I would expect there isn't a single CMake project out there that works if you build it in a path containing a semicolon though!


To compound the problem, bash tends to attract the kind of care-free security-unconscious people who don't want to be bothered with going out of their way to handle unexpected "trivialities" like spaces in file names or carefully sanitizing user input or meticulously testing their code.

The same kind of lazy care-free people who love the convenience of PHP's "register_globals", because it "saves typing".

If they're too lazy to use something like Python in the first place, they're probably also too lazy to be conscientious and anal retentive enough to write and test bash scripts that handle spaces in file names properly, or do any kind of user input sanitization.

And they don't think twice about running their scripts as root, either. Or writing blog postings telling other people to copy and paste their code snippets into a root shell, or worse yet download and run them without auditing first, by piping curl into a root shell.

I recently had an Xcode project fail to build (blender, which uses cmake), because I was using a version of Xcode that had a space in its directory name (because I had foolishly added a space followed by the version number to the name "Xcode 9.4.1.app"). Of course that was because of some of the tools and custom build steps were calling out to bash. The problem wasn't in any of the source code or configuration files, it was in the file name of the tool itself! It took hours of frustration poring over machine generated project and makefiles to diagnose the problem from the obscure unhelpful error message, which I will never get back.

In the long term, isn't it more efficient of a lazy person's time to simply use a scripting language that just doesn't have these problems in the first place, and doesn't require you to preemptively bend over backwards and dodge silent invisible bullets all the time?

It's pretty ridiculous that a scripting language intended to be used to operate on files falls flat on its face and is so difficult to use when you have something as common and innocuous as files with spaces in their names (or worse yet, file names that begin with dashes -- think of the security implications of that: not something a lazy careless bash scripter would ever pause to reflect about)!

It really is an extremely common and unsolved problem. Notice how each "better answer" is actually worse than the last and comes with its own host of problems and edge cases and work-arounds:

https://unix.stackexchange.com/questions/9496/looping-throug...

https://stackoverflow.com/questions/3967707/file-names-with-...

https://www.cyberciti.biz/tips/handling-filenames-with-space...

https://www.tecmint.com/manage-linux-filenames-with-special-...

https://www.hecticgeek.com/2014/02/spaces-file-names-command...


And how do you safely iterate over the output of "find" (including properly handling file names beginning with dashes, while you're at it), without forking another sub-bash for each file name?


As I said, that hasn't been a problem for a long time:

    $ find . -type f
    ./foo bar
    ./--stupid-filename
    ./-annoying dir name!/--very annoying path...

    $ echo "pid=$$"
    pid=31281

    $ while IFS= read -r -d '' file ; do
    >     echo "[pid=$$] do something with '${file}'"
    >     list[i++]="$file"
    > done < <(find . -type f -print0)
    [pid=31281] do something with './foo bar'
    [pid=31281] do something with './--stupid-filename'
    [pid=31281] do something with './-annoying dir name!/--very annoying path...'

    $ declare -p list
    declare -a list='([0]="./foo bar" [1]="./--stupid-filename" \
       [2]="./-annoying dir name!/--very annoying path..." [3]="./iter.sh")'
Using the character literal $'\0' also works as the delimiter, but Bash doesn't support sending null bytes in args. In bash, $'\0' is the empty string:

    $ x=$'\0'
    $ declare -p x
    declare -- x=""


> What if one of your file names has a space in it? Most bash scripts will crash when that happens.

And good for them! What kind of savage puts spaces on their filenames?


All my Whitespace scripts have file names with nothing but spaces in them!

https://en.wikipedia.org/wiki/Whitespace_(programming_langua...

I also use spaces in the file names of all my C++2000 source code that takes advantage of the wonderful new whitespace overloading feature -- it's conventional to use the ".cpp " and ".h " suffixes with trailing spaces.

http://www.stroustrup.com/whitespace98.pdf


That kind of savagery is deeply respected, of course! And it should go even further.

Now that the modern compilers and interpreters (clang, python3, etc) allow unicode in variable names, the next step is certainly to accept spaces in variable names. With a slight modification, the python language can be adapted to support spaces inside variable names.

After all, people have the right to name their variables however they want!


Not to mention that Unicode has several delicious varieties of invisible non-breaking, zero width spaces and shy hyphens to choose from! (Oooops, I mentioned it.) Tastes great, less filling, and perfectly justified!

https://en.wikipedia.org/wiki/Non-breaking_space#Width_varia...

https://en.wikipedia.org/wiki/Zero-width_space

https://en.wikipedia.org/wiki/Soft_hyphen


I seriously would have never thought that anybody would really consider using spaces in filenames. For me it's a sign to not work in such an environment, which might be the reason I rarely even see it. But sure if you want a script that is widely used you might want to handle that case as well.

In any case I don't say put in a Bash script and let it grow to ten million lines over 20 years.

My approach to bash scripts is this:

1. start with a bash script to understand a problem and see how a solution might be structured. PoC level is enough here.

2. When you know you solve the right problem and the customer is willing to make something bigger, then put ~80% of the code in a good programming language (Python is my personal favorite, but Ruby, Java, PHP, C#, C++, Lisp might also work). That happens about 3 months into a project.

3. If you start to hit bottlenecks and performance ceilings put the 30-50% that are relatively stable and most performance hungry into C/C++ code and really optimize the heck out of it. This should be done only with an aged project, like 2 years in, and very pessimistically, since highly optimal code usually is badly readable and badly maintainable. You certainly don't want to go into that code 6 years later and drastically change its API to your python code.

So a ten year project would have a thin layer of bash/docker at the top, a strong core of highly flexible code (of Python for me as well) in the middle, and a leg of high performance C code here and there.

I honestly have never seen a project that really needed the C-layer though. In modern IT world it seems most problems are old at least from a political/management point of view between 2 weeks and 3 months. So in fact, although I count myself as a Python coder, the last few years I've spent writing way more bash and docker related stuff, which then is either never touched again or directly thrown away after a few weeks, no matter if successful or not.

PS: Please don't use spaces in file names. Not just for scripts. It is also not clear for users where something ends, and they are even less used to putting string indicators around file names than you are about handling spaces with bash there.

And last but not least stuff like spaces in filenames, starting scripts with upper case letter etc, they might even be just a small annoyance. But you can be sure that skilled developers will put you 2 classes below their average when they see this, because it simply shows a lack of the right spirit.


Not everyone who uses files is a developer. People should be able to name their files what they want. When the time comes that somebody runs a script on it, it should not do something wonky because they decided that "Spring Pictures" was a much better name than "spring_pictures".


People also shouldn't use the razor for cutting grass. It's sadly true that you need to learn how to use your tools and every tool needs some learning to use efficiently. That's the nature of things.

I do agree though that filesystems are not the most user friendly abstraction of data. Maybe we can do better than that.

But also filesystems have the huge advantage that they are somewhat user friendly while also being somewhat technology friendly. Therefore I'm also not too mad about their existence. There are a lot of end user problems that can be solved quickly exactly because they just need a professional one time query and FS+bash+clis are such a good combo to craft that query.


Razors can cut more than grass, and some file systems are more user hostile than others, while making it difficult to clean up forensic evidence.

https://en.wikipedia.org/w/index.php?title=Comparison_of_fil...

(Spoiler: Note the last column. Also: https://en.wikipedia.org/wiki/Hans_Reiser ...)


> People should be able to name their files what they want.

This is not true, and I do not see why it should be like that. You cannot, for example, have slashes in filenames; and for good reasons. Disallowing spaces in filenames solves a lot of problems.


Slash is the ONLY character you're not allowed to have under Unix. There are no good reasons to disallow spaces. Disallowing characters in file names that you're allowed to have solves absolutely no problems, it only causes them.

There used to be a bug in the Gatorbox Mac Localtalk-to-Ethernet NFS bridge that could somehow trick Unix into putting slashes into file names via NFS, and Unix would totally shit itself when that happened. That was because Macs at the time (1991 or so) allowed you to use slashes (and spaces of course, but not colons), and of course those silly Mac people, being touchy feely humans instead of hard core nerds, would dare to name files with dates like "My Spreadsheet 01/02/1991".

https://en.wikipedia.org/wiki/GatorBox

I just tried to create a file name on the Mac in Finder with a slash in it, and it actually let me! But Emacs dired says it actually ended up with a ":" in it. So then I tried to create a file name with a colon in it, and Finder said: "Try using a name with fewer characters or with no punctuation marks." Must be backwards compatibility for all those old Mac files with slashes in their name. Go figure!

If you think nobody would ever want to use a space or a slash in a file name, then you should get out more often and talk to real people in the real world. There are more of them than you seem to believe!


> Slash is the ONLY character you're not allowed to have under Unix. There are no good reasons to disallow spaces. Disallowing characters in file names that you're allowed to have solves absolutely no problems, it only causes them.

I think that there are more good reasons to disallow spaces than slashes. For one, it is very useful to have a natural character to separate filenames in a string. Then your bash scripts can be simpler! :)

> trick Unix into putting slashes into file names via NFS, and Unix would totally shit itself when that happened

LOL. I love this stuff. There is a similar trick that you can play today to wreak havoc on osx systems. You commit into a git repository a new file whose name only differs in case with an existing one, and ask a mac user to update the repo. Hilarity ensues.

> If you think nobody would ever want to use a space or a slash in a file name, then you should get out more often and talk to real people in the real world. There are more of them than you seem to believe!

But these users should not see the filename anyway, but the "title" of the document, which can be stored inside the file and can have whatever character they want. At the minimum, the user-facing interface can convert user-typed spaces into another character (e.g., a unicode non-breaking space) when writing the file into the filesystem. Raw spaces in filenames are an abomination that make shell scripting unnecessarily difficult. I would love a mount option that disallow the appearance of such things in my disk.


That ship has sailed a LONG time ago. We're here. We're spaced. Get used to it.

So you think all files should have a human readable title inside of them, huh? Then you're OK with MICROS~1 8.3 file names, and not ok with any file format that doesn't allow a human readable title inside? Should empty files be prohibited too?

https://en.wikipedia.org/wiki/8.3_filename


That would be a great argument if only you were Hitler and could tell everyone how to name their files, but you're not. I can think of a lot of great changes I would make if only they'd make me Hitler. Or Darth Vader! Or even Scorpius.

https://farscape.fandom.com/wiki/Scorpius


Shell scripts and doing a good job don't go well together.

For simpler, more expressive Python system scripts use Plumbum: https://plumbum.readthedocs.io/


Yeah and don't use vim for code editing. It's too much work to learn, right? Rather use nano. </sarcasm>


Easiest and most expressive way for you maybe. And yea human resources are expensive... doubly so when you need to throw more of them at this thing written by a former developer that thought he or she was saving money for the org by not spending an extra 30 minutes writing the program in Python or Go.

> but if I want to find and filter some files nothing is easier to write than bash

That's like 10 lines in Python and much easier to read and test.


> Easiest and most expressive way for you maybe.

Look. We are working an professional field with a strong scientific foundation. There is no "this bridge is harder to drive over for me therefore I think it's bad architecture". The easiness and expressiveness depend on the problem you try to solve, not on the people, their tastes or their experience.

Shell tools are highly optimized to work in pipes, to work on files and to work on line separated entries like lines of code or lines of CSV. So if you have a problem where you need to pipe file handling into some filtering and into some line based transformations then you will have a hard time to find anything as easy or expressive as a bash script. And most projects on a unix-like operating system at least start like this: Understand the problem based on database dumps, logs, csv data measurements, previously written code etc.

If your regular expressions are not able to cover all the edge cases anymore and/or become unreadably complex then of course you should start an actual program. That only happens a few weeks into most topics though.

> doubly so when you need to throw more of them at this thing written by a former developer that thought he or she was saving money for the org by not spending an extra 30 minutes writing the program in Python or Go.

Yes, there are people who lack the skill or ambition to develop good code. We all hate when we need to deal with their stuff. But honestly, would they have written better code if they had used Python or Java or C++ for that problem? Unlikely.

A good coder would have some feeling for when he outsources some functionality to a python script. And if he develops a code base for a longer time most of his code will be in a real programming language. But even then it might be wrapped into a thin layer of bash to use find or grep or something that cleans up the input a little.


> That's like 10 lines in Python and much easier to read and test.

It's a single command in bash, invoking a program far more efficient than anything you can put together in python. It is literally a tool written for this exact purpose. I think it is simply ridiculous to not use a pre-existing tool that frankly you cannot out-do with a scripting language.

Even if you are writing a python script, you should _still_ use find to search for files if there is _any_ complexity in the search requirements.


> Stop and think before you write that shell script. It's not going to take you any more time to write it in Python

Have you actually measured the amount of energy each takes? I would not be surprised if the Python VM's startup makes it more inefficient.


You don't need to start a new Python VM every time you need to multiply two numbers or split two strings, like you need to do with a bash script that calls dc and sed.

That is because Python actually has a built-in way to multiply numbers and perform string manipulation, right out of the box, without forking another process! The overhead of the Python interpreter required to get to that "mul" instruction is nothing compared to the overhead of forking another process to run "dc". This isn't rocket science, it's not even close. You really should check it out some time. So do a lot of other languages. Just not bash.

I can't believe I need to explain this. This isn't about Python being amazingly well designed or efficient. It's about bash being extremely badly designed and inefficient, and people having absolutely no clue how expensive it is to fork processes like "bc" to multiply numbers and "sed" to split strings.


> That is because Python actually has a built-in way to multiply numbers and perform string manipulation, right out of the box! You really should check it out some time. So do a lot of other languages. Just not bash.

  $ echo $((1*2))
  2
  $ string="hello, world"
  $ echo ${string/hello/goodbye}
  goodbye, world


Now show me how to add 1.5 and 0.5.

    bash-3.2$ echo $((1.5+0.5))
    bash: 1.5+0.5: syntax error: invalid arithmetic operator (error token is ".5+0.5")
https://unix.stackexchange.com/questions/40786/how-to-do-int...

I gave up INTEGER BASIC for APPLESOFT several decades ago, and never looked back.

    (LOADING INTEGER INTO LANGUAGE CARD)
    ]INT
    >PRINT 1.5 + 0.5
    *** SYNTAX ERROR
    >FP    
    ]PRINT 1.5 + 0.5
    2
It's 2019, and you're struggling to do in bash what you could have done on an Apple ][ in 1979, but the syntax you need to use is even WORSE than BASIC, so your code looks like line noise from when your mom picks up the phone while you're using a 300 baud modem. Is that really progress?

https://en.wikipedia.org/wiki/Integer_BASIC

>Integer BASIC was phased out in favor of Applesoft BASIC starting with the Apple II Plus in 1979. This was a licensed but modified version of Microsoft BASIC, which included the floating point support missing in Integer BASIC.

https://en.wikipedia.org/wiki/Applesoft_BASIC

>Applesoft BASIC is a dialect of Microsoft BASIC, developed by Marc McDonald and Ric Weiland, supplied with the Apple II series of computers. It supersedes Integer BASIC and is the BASIC in ROM in all Apple II series computers after the original Apple II model. It is also referred to as FP BASIC (from "floating point") because of the Apple DOS command used to invoke it, instead of INT for Integer BASIC.

Notice that there are even machine language instructions for cos, atan, tan, sin, ln, etc. How do you call those instructions in bash?

https://docs.oracle.com/cd/E18752_01/html/817-5477/eoizy.htm...

>x86 Assembly Language Reference Manual: Floating-Point Instructions

>The floating point instructions operate on floating-point, integer, and binary coded decimal (BCD) operands.


> Notice that there are even machine language instructions for cos, atan, tan, sin, ln, etc. How do you call those instructions in bash?

How do you call those in Python ;)

Fundamentally, you're making a big deal about something that isn't really that horrible. Not everyone has Python on their system, and a fork() is not a terribly costly operation for a shell script. Sure, it might take an extra microwatt, but arguing that this is the same as Bitcoin mining or rolling coal is an argument in the extreme.


[flagged]


> Name somebody in the real world who doesn't work full time for the Computer Museum, who doesn't already have Python on their system, and can't easily install it? That's a straw man argument and in no way justifies using bash in 2019.

Me: I SSH into systems that I don't have permission to install new things on.

> without decorating it with a bunch of ridiculous punctuation ((double parens, really?))

This is a nitpick, and you're missing the point: bash isn't the best tool to do numerical computation because it can call out to other programs with very little syntactical overhead. The fact that it has the ability to do basic arithmetic is something I brought up to be snarky: this feature is rarely used in practice.


>Me: I SSH into systems that I don't have permission to install new things on.

Then you probably also don't have permission to demand that people not use spaces in their file names.


> Notice that there are even machine language instructions for cos, atan, tan, sin, ln, etc. How do you call those instructions in bash?

Simple! You shell out to your math-compiler:

https://github.com/skx/math-compiler/

That compiles your math down to assembly language, which has got to make sure they're super-fast, right? ;)



bash is Turing complete... the rest is left as an exercise to the reader. :-)


Some people just love to dive into the deep end of the Turing Tarpit, because their point is to be as wasteful and inefficient as possible. Thus my point about buying gum with Bitcoin and rolling coal to trigger the libs.

https://en.wikipedia.org/wiki/Turing_tarpit

>54. Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.

But I think that most people who use bash simply have no clue how expensive forking another process is, and not enough experience to realize that software always ends up growing up to be much more complex than they originally expected, and think that "1+1" is as much arithmetic as they will ever need.

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

>Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

https://en.wikipedia.org/wiki/Jamie_Zawinski#Principles

>Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.


I really thought you were being facetious comparing bash and forking to bitcoin. Shell scripts aren't so terrible and you wouldn't use them in performance critical scenarios. It's not like all our CPUs are running at 100% utilization.

You're talking to someone who ran a web farm in the 90s so yeah, I saw the evolution from CGI to mod-perl and on. I mean I know the cost of forking. But come on, aren't you taking your criticism a bit far?

I also had an Apple ][+. Actually, one wasn't available so we Frankensteined one together from an Apple ][ and an Applesoft language card by swapping the motherboard ROMs and language card ROMs.


As long as people don't understand how wasteful and inefficient bash scripts are, I haven't taken my criticism far enough. ;)

We both have some perspective from the Apple ][ days, and can appreciate how astronomically fast computers are compared to how slow they used to be.

But doesn't it give you pause (literally and figuratively) when you see people thoughtlessly and casually write code in 2019 that runs slower on a high-end MacBook Pro than the equivalent code would run on an Apple ][ in 1979?

People should take a momemt to reflect on what they're trying to accomplish and if there are better ways of going about it, before writing Rube-Goldbergesqe bash scripts heavily punctuated with line noise, haphazardly stringing together Unix processes and temporary files with cryptically abbreviated commands using mysteriously curt flags and parameters (like "dc -l", or is it "bc -l", and is that dash-ell or dash-one? I can't remember, and we haven't even gotten to the part about how I have to quote and escape the arithmetic expression parameter)!

It's not as if they're making thoughtful trade-offs, and that as a result of choosing to use bash despite all its weaknesses, they're actually developing more robust, easier to read and maintain code than they would have had they used Python.


Promoting Python over shell scripts in the name of efficiency is silly. It's like saying you should prefer turtles instead of snails.


Thanks for illustrating how some people still completely miss the point. We are discussing the overhead of forking processes to do math, which Python doesn't have to do.

You're making a false equivalence like "there is slow arithmetic on both sides". No: one side is astronomically slow, requiring millions of instructions, the other side is tolerably slow, requiring hundreds of instructions. There is no comparison.

Again, this isn't about Python. That's just an example. It's about forking or not forking. You know, the topic of this thread.

If you like, I could translate my Python argument to JavaScript, and you could explain how a bash JIT compiler could optimize out all the instructions necessary to fork "bc" to call the cos instruction.

Let me head you off at the pass before you ask "Why would I ever want to calculate a cosine in a bash script?" when you should be asking "Why would I ever want to use a language that doesn't let me calculate a cosine? Or even draw a circle?"

To illustrate my point: Perhaps you want to write a script that draws a pie chart to illustrate your disk space. That's a typical sysadminy scripty thing to do, right?

With bash, you'd have to call out to many other processes to do that. Why is that? Tell me why hasn't anyone come up with a way to draw like the canvas API in bash, without forking off dozens if not thousands of processes?

Don't you think floating point would be useful for calculating disk space usage more accurately than integer percentages, and the canvas api would be useful for drawing pie charts?

If you think you'd never need to do something like that in bash, then you have an enormous blind spot from bash tragically lowering your expectations, imagination, and productivity as a programmer.

Because bash can't even perform the simplest APPLESOFT BASIC floating point math operations or HGR hires graphics HPLOT drawing commands.

So you would first have to fork off to one process to multiply and add a few numbers and calculate a cosine, and then you'd have to fork off to another process to multiply and add a few numbers and calculate a sin (because "bc" can only return one result at a time, thank you). Then you would have to pass those numbers to another program to draw them, probably with a here-file, some loops, a bunch of variable substitutions, temporary variables, intermediate files, lots and lots of string concatenation, converting back and forth between binary floating point numbers and strings, all meticulously decorated with a phalanx of quotes, escapes, double and triple backslashes, back-ticks, colons and exclamation marks before single character codes, and double secret parenthesis.


Ahh, you edited significantly while I was replying. Here goes to your current message:

> [...] you ask "Why would I ever want to calculate a cosine in a bash script?"

I wouldn't ask that. This kind of argument strategy is silly too. You should choose your own words, and I'll choose mine. Leave the fake dialogs to Aristotle.

> To illustrate my point: Perhaps you want to write a script that draws a pie chart to illustrate your disk space. With bash, you'd have to call out to another language to do that. Why is that? Tell me why hasn't anyone come up with a way to draw like the canvas API in bash, without forking off dozens if not thousands of processes? Don't you think that would be useful?

I routinely use gnuplot as plotter in a child process. This works with bash, python, C++, or any other language which supports popen (that's fork, dup, and exec).

> Because bash can't even perform the simplest APPLESOFT BASIC floating point math operations [...]

I'm not defending bash. I'm saying that putting Python much ahead of it is silly. Why would you want to run a race with either a snail or a turtle? Silly parables about the tortoise and the hare aside, pick a cheetah or something.

> And of course needlessly converting all those numbers back and forth between strings and floating point so many times.

Python tediously checks if the vtable contains the necessary operator, evaluates the function pointer it finds, which checks the other operand's type, unpacks the 8 bytes of doubles from each 32 byte struct, finally performs the operation, then allocates a new 32 byte structure for the return result, which is checked to see if an exception occurred, and then pushed onto the stack. I probably missed some steps, and I certainly ignored the sophomore level interpreter loop overhead.

I mean, yeah that's faster than fork, but I'm not sure it's a huge win over anything else.


> Thanks for illustrating how some people still completely miss the point.

You're being feisty, whatever.

> We are discussing the overhead of forking processes to do math, which Python doesn't have to do.

Python uses dynamic dispatch on boxed types for every single operation. You're right this /only/ takes hundreds of instructions, but that just shows my analogy is appropriate. That's the turtle, in case you didn't understand it. I think it's silly for you to say this is "tolerably slow" for anyone but yourself. For other people, a fork per operation is completely tolerable. (I'm not one of those people...)

> [...] explain how a bash JIT compiler could optimize out all the instructions necessary to fork "bc" to call the cos instruction.

If anyone cared, bash could implement "bc" as a builtin the same way it does "echo". That'd be ugly, but it seems like the kind of thing the GNU folks would do if it was a common enough use case.


You're still making a false equivalence and a bad analogy, whatevers.

So I'll make an analogy too: A turtle -vs- a snail are the same order of magnitude slow. The Nazis -vs- the protesters are not of the same order of magnitude evil in Charlottesville, just like forking a new process -vs- dynamically dispatching to a function in the same process are not of the same order of magnitude slow in Unix.

And I'll reiterate:

With Python (and JavaScript) it's POSSIBLE and even NORMAL for a JIT compiler to optimize the code to call mul instructions directly. PyPy is a thing, it's free, and it works just fine, thank you:

https://pypy.org/

But it's IMPOSSIBLE to implement a bash JIT compiler (or even AOT compiler) that optimizes out the millions of instructions that are required to make a system call to fork "bc" to call that same "mul" instruction. In Linux, you simply can't optimize out system calls. (Unless Alexia Massalin's Synthesis kernel is giving you a free piggy back ride ;).

https://en.wikipedia.org/wiki/Alexia_Massalin

http://infolab.stanford.edu/~manku/quals/summaries/gribble-s...

And no, integrating "bc" into bash isn't a valid solution, or they would have done it ages ago. How about integrating the canvas API into bash too, while you're at it? Tell me when you convince them to accept your pull request!


> The Nazis and the Anti-Nazi protesters were not of the same order of magnitude of evil in Charlottesville

Are you intentionally invoking Godwin's law because you know your argument is untenable?

> PyPy is a thing, and it works just fine, you know

I like PyPy. It's not perfect, but it's pretty good. Maybe you should recommend that in the first place next time.

> But it's IMPOSSIBLE to implement a bash JIT compiler [...] to fork "bc"

Never say never. Before V8, everyone thought JavaScript had to be as slow as Python too. Now we have PyPy and LuaJIT, which I would've thought were impossible.

> And no, integrating "bc" into bash isn't a valid solution, or they would have done it ages ago

You seem really hung up on "bc". You do realize that "echo" wasn't always a builtin, and that they added it for performance reasons, right? If anyone cared enough, they could add "bc" as a builtin. Hell, the expression evaluator for integers is well on it's way. You're right that probably nobody cares enough to build in "bc", but I wouldn't be surprised to see a triple paren or square bracket math interpreter with floating point support in bash someday. Heh, maybe they'll even add complex numbers. And a bash JIT is exactly the kind of thing some grad student would do for the absurdity of it all.


[flagged]


> Even Michael Godwin says there are times when it's acceptable to compare shitheads to Nazis

I wonder if we were in our early 20s during Nazi Germany if either of us would have the wherewithal to reject being a Nazi. I suspect we're all shitheads depending on the context.

> I'm dis-recommnending bash.

Yeah, I got that. I just thought it was absurd for you to only aim for a few orders of magnitude speed improvement when you could've easily gotten several more by arguing for something faster than Python.

> You admit floating point math in bash would be a good idea, and you seem to think [...]

You're doing that thing again where you try to represent my argument for me. This time instead of telling me what I might say, you're telling me what I think. You're wrong again, and it's still a poor argument strategy.

I'm not defending or advocating bash. I'm certainly not recommending any improvements to it - I only said they were possible. I am however calling you out for recommending Python in any case where performance matters.


Speak for yourself, but the exterminating Jews thing would have turned me off even as a small child. Please don't represent my tolerance for Nazis for me, and I won't represent your arguments for you.

Once again: If performance matters, then use PyPy or V8 or SpiderMonkey or ChakraCore or JavaScriptCore or Carakan or Nashorn of course.

It doesn't invalidate my argument that I used the name of the language Python instead of listing out every possible interpreter or VM that runs Python scripts, or every JavaScript VM, or every C++ compiler.

I've already said several times that Python is just an example. Did you not understand that part?

The hundreds of instructions difference between multiplying with CPython and PyPy still pales in comparison to the millions of instructions required to fork "bc".

And with bash you're still stuck with that terrible mish-mash syntax and converting back and forth between text and binary floating point numbers.


[flagged]


[flagged]


First you say I'm a bash apologist and then a Nazi apologist. Where is a philosophical Python criticizer supposed to go in order to get a fair shake. You can do better than ad hominem rebuttals.


Your criticisms of Python and your rationalizations that we would all probably follow Adolph Hitler if we were 20 year olds in Nazi Germany are non-sequiturs. Speak for yourself, and find another platform for arguing about Python and Nazis.


Ugh, guys guys guys. Please don't lead HN into circles of hell.

We love your oracular knowledge of internet and software history and (dare I say it) we love you, but you simply can't do this on Hacker News. If I can't persuade you on first principles just think of what sort of example you're setting.

https://news.ycombinator.com/newsguidelines.html

civility 38 days ago [flagged]

> Speak for yourself, and find another platform for arguing about Python and Nazis.

You're the jackass who brought Nazi's into a tech conversation, and you're also the one who brought Python into the conversation. As for telling me where or what to argue, you don't really have any authority in the matter. We can add this to your other weird delusions.


Heh, you edited significantly again while I was replying. It's tough to keep the conversation coherent when you do that.

Thank you for the link. I'll give it a look.

> While you're farting bubbles in the bathtub about integrating "bc" into bash, and haven't even proposed how you'll integrate the canvas api yet.

I tend to fart in the shower, and I don't give a crap about bc or bash. As for the canvas api, you must've missed the part where I said you can call gnuplot from bash (or any language which supports fork, dup, and exec, or popen). I'd guess that bash calling gnuplot as a subprocess is on par with Python calling the Tkinter canvas through their hackish hidden-Tcl abomination. You're probably a matplotlib guy though...

Incidentally, I have built command line plotters before, but I can't share the code, so you don't have any reason to believe me on that.


Nope, Python's Tkinter canvas calls into TCL/Tk directly. It doesn't run it as a separate process. There is no comparison: Another false equivalence.

(No I'm not a matplot guy, but I've built command line plotters too and even terminal emulators that accepted escape codes with PostScript commands. ;) And I've done a lot of work with TCL/Tk and canvas both directly and through Tkinter, and also with PyGTK using pycairo, and I've integrated lots of code into Python with SWIG, including Python C extension integrated with PyCairo. Can you integrate C extensions into bash with SWIG, or any other way short or recompiling?)

Here are some open source reasons to believe me:

(search for "termulate-ps-string"):

https://www.donhopkins.com/home/code/term.ps.txt

https://github.com/SimHacker/micropolis/blob/master/micropol...

https://github.com/SimHacker/micropolis/blob/master/micropol...

https://github.com/SimHacker/micropolis/tree/master/Micropol...

https://github.com/SimHacker/micropolis/tree/master/Micropol...

You keep going back to this mistaken belief that forking off separate processes is incredibly cheap. Or even convenient. You really should rid yourself of that misconception.

Sure, there is some cost to all the glue code required by something like TCL or SWIG or COM or P/Invoke or any foreign function interface. But that's not even in the ballpark of how long it takes to fork off a Unix process.

It's ironic that the strongest argument bash apologists have is that it's convenient, but once you are faced with all that mish-mashed haphazard syntax and kludgy tricks (like testing for an empty string like "x$foo" == "x"), it's actually one of the most inconvenient languages there is!


> Nope, Python's Tkinter canvas calls into TCL/Tk directly. It doesn't run it as a separate process. There is no comparison: Another false equivalence.

Python's Tkinter converts all calls into strings which are parsed and executed by the hidden Tcl interpreter in there. It's really pretty hackish. I didn't say it ran in a separate process, I said it was probably on par performance-wise as bash calling gnuplot through a separate process.

> Can you integrate C extensions into bash with SWIG, or any other way short or recompiling?

Yuck. Why would I want to do that? I don't like SWIG or bash. SWIG generates an awful mess. When needed, I prefer to write Python extensions by hand.

However, I suspect the only reason bash doesn't support plugins/extensions has more to do with some GPL ideology.

> You keep going back to this mistaken belief that forking off separate processes is incredibly cheap. Or even convenient. You really should rid yourself of that misconception.

You should really stick to things I've actually written. I'm starting to think you're having an argument with some imaginary bash and fork advocate in your head and pretending I'm him.


No it's not on par with bash calling gnu plot as a separate process, because bash and gnuplot ALSO have to convert back and forth between text and binary floating point numbers, and more importantly, as I keep repeating and you refuse to acknowledge, the cost of forking a Unix process is way more expensive than converting between text and binary strings. Not comparable. False equivalence.

And then on top of that, you still have to convert just as many numbers and strings back and forth -- if not more! It's not just the calling and conversion overhead of gnuplot: bash also has to fork off many more "bc" processes that themselves convert text numbers to binary, evaluate expression, then convert binary numbers back to text (and there could be hundreds or thousands of expressions).

While Python with pycairo can draw a pie chart without once converting numbers between text and binary, reading and writing text to a pipe, context switching, or forking off thousands of processes.

The only thing I was imagining about you was that you were listening to what I was saying, but I keep finding I need to repeat myself.


> I keep finding I need to repeat myself

Communicating with people is hard, especially when you only really want to repeat yourself and not try to understand what anyone else is saying. I get your point - fork is slow. Now what do you think my point was? Please don't bother repeating yourself again.


> Can you integrate C extensions into bash with SWIG, or any other way short or recompiling?

Yes[1], although I don't recommend doing that. See my longer comment[2] re: bash is a simple scripting/glue language, not a general-purpose programming tool.

[1] https://github.com/taviso/ctypes.sh

[2] https://news.ycombinator.com/item?id=19951990


Ugh, you edited a bunch while I was replying again. I can't keep up. My only point had been that Python is also slow.


But nowhere near as slow as forking separate processes. The subject of this discussion.


> one side is astronomically slow, requiring millions of instructions

This is not "astronomically slow" on any hardware from this millennium.

> Let me head you off at the pass before you ask "Why would I ever want to calculate a cosine in a bash script?" when you should be asking "Why would I ever want to use a language that doesn't let me calculate a cosine? Or even draw a circle?"

Because there are places for languages that can't do everything you'd ever want. Sometimes they can be extra specialized for a common task you do often: for bash, this happens to be "how can I glue together things quickly?"

> Tell me why hasn't anyone come up with a way to draw like the canvas API in bash, without forking off dozens if not thousands of processes?

Because this isn't something that bash would be good at, as you've mentioned. It's a task that would be a lot better suited to some other tool, which bash could then interact with.


You missed the point. The argument was that if your problem is simple enough, a dozen sed and BC calls might still be more efficient than spinning up the whole python vm.


Sometimes I think complicated shell scripting is just some weird form of vanity. By making something simple look utterly incomprehensible some people think they're looking really clever to their colleagues (when in reality they're just annoying the hell out of them.) It's the same urge that spawns bizarre C++ template meta-programming and macros that are more complicated than the problem they're solving. I don't mind those things when people just admit they were bored or doing it for laughs, but the number of people that do those things with no sense of irony is way more than I'd care to really think about.


I think you can actually start a Bash process and quite many bc processes in the time the CPython interpreter needs for startup.


As others have already menti9oned, running 'bc' isn't actually that expensive. Comparing it to Bitcoin is ridiculous. Let's investigate the actual cost of that sum:

    $ cat foo.sh 
    #!/bin/bash
    a="1.5"
    b="0.5"
    sum="$(echo "${a} + ${b}" | bc)"
    echo "The sum of ${a} and ${b} is ${sum}"

    $ /usr/bin/strace -q -D -w -c ./foo.sh 
    The sum of 1.5 and 0.5 is 2.0
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     55.53    0.001225         123        10           read
      9.75    0.000215         215         1           execve
      5.53    0.000122           7        18           mmap
      4.03    0.000089           8        11         3 open
      3.04    0.000067           7        10           mprotect
      2.99    0.000066           5        13           rt_sigprocmask
      2.54    0.000056          56         1           clone
      2.45    0.000054           5        10           close
      2.40    0.000053           5        11           rt_sigaction
      1.81    0.000040           5         8           fstat
      1.41    0.000031           8         4           lseek
      1.22    0.000027           5         5         3 stat
      0.91    0.000020          20         1           write
That was 56us to clone(2) the process and 215us to execve(2) 'bc'. All together just 12.74% of the time spent running syscalls, and an even smaller portion of the total time to launch and execute this simple example. Most of the time is sent launching bash itself at the beginning. Using a language such as python or launching a custom binary will have similar costs to to launch the program. If we're talking about a single persistent process (e.g. an interactive shell), then we're still talking about paying just 0.25ms of CPU time to perform a useful calculation. Compared to intentionally wasting time by design maximizing hash prefix guesses to "mine" Bitcoin "ain't the same fuckin' ballpark, it ain't the same league, it ain't even the same fuckin' sport"[1].

However, the actual cost to launch an external program like 'bc' isn't particularly important. It could be multiple orders of magnitude slower and it would still be acceptable because we're talking ab9ut a cost that only needs to be paid a few times. Yes, compared to "full" programming languages like python, running an external program to perform a simple calculation might seem like a wasteful, half-assed solution. As Gary Bernhardt explained[2], "Half-assed is OK when you only need half of an ass."

I don't care about the cost of running 'bc' to do simple floating point calculations because I usually don't need to do floating point calculations (which are a lot harder to use safely[3] compared to simple integer math) for the type of task I would implement in a shell script. If the task required more than a few calculations (or other complex features such like a GUI), using something other language like python/ruby/C/rust would be a better tool than shell script. Learning how to choose the appropriate tools to use when solving a problem is an important skills for any engineer.

Shell scripting languages like Bourne (sh) are designed to be a glue language that you use for simple (often interactive) tasks. They are the user interface (or "IDE") of Unix. Asking why shell d9oesn't have built-in support for efficient floating point math or complex GUI features is kind of like asking why Eclipse or Visual Studio don't have a built-in spreadsheet or email client. Yes, you could probably make that happen as a half-assed hack, but it's obviously not the type of task those tools were designed for.

As for the syntax, if you find it confusing or ugly and prefer to use other tools, that's fine. There are a lot of great languages available, so use the tool that is best for you. If it had #!/bin/sh art the top, you can probably substitute any language you prefer without and problems. However, remember that for some of us Bourne shell is the easy, reasonably efficient tool we prefer to use for some problems.

[1] https://en.wikiquote.org/wiki/Pulp_Fiction#Dialogue

[2] https://www.youtube.com/watch?v=sCZJblyT_XM

[3] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...


>"for the type of task I would implement in a shell script"

And therein lies the rub. Let's put the donkey before the cart: The reason you would not do many kinds of task in a shell script, is that shell scripts are not any good for many kinds of tasks, not because you are wrong to want to to some kinds of tasks.

The term "shell script" is an artificial construct, just like the shell scripting language itself. There is no reason you need to limit your imagination, aspirations and abilities by defining yourself as a "shell scripter" who cannot add 1.5 and 0.5 to get 2, and who cannot draw a pie chart.

Of course you should learn other languages, then you won't need to write shell scripts. Their disadvantages vastly outweigh their advantages in important ways specific to scripting and unix administration (like dealing with file names with spaces, or performing arithmetic and statistics, for examples).


> shell scripts are not any good for many kinds of tasks

Shell scripts are good for gluing together tools that do disparate things and simple automation. Most shell scripts end up becoming "real programs" once they start containing actual logic in them.


More importantly, it doesn't matter that there are many kinds of tasks that shell scripts aren't good at. It matters that there are tasks that shell scripts are good at, and indeed better suited to than any popular general-purpose language.

When you have a pipeline between two nontrivial programs, for example, or any kind of fd manipulation.


Yet the problem remains that bash scripts are NOT actually good for what they're designed for: dealing with file names. Because some file names have spaces in them, and some file names begin with hyphens, and so many careless bash programmers don't give a shit, and would rather blame their audacious uncontrollable users for daring to put spaces or hyphens in their file names, than write scripts that don't suddenly and inexplicably crash, years after they're written and the programmer has long since left.

Python and JavaScript are also good for gluing together tools that do disparate things and simple automation. But they can do arithmetic and graphics, too! And without the enormous overhead of forking a process to perform simple arithmetic and string manipulation.

Bash scripts aren't good for many useful tasks because bash is badly designed, not because there's some universal law of computing that proclaims that scripts should never use floating point numbers or draw graphics.

Apple figured all this out in 1979 when they made a deal with Microsoft and switched from INTEGER BASIC to APPLESOFT BASIC, and added floating point and HIRES graphics. Even Apple ][ DOS 3.2 could deal with file names with spaces in them! How long will it take the Unix community to figure that out, too?

https://www.youtube.com/watch?v=RiWE-aO-cyU


fork() explanations generally describe the api's, maybe talk a little about COW and memory protection... but I never see sufficient explanation of WHEN to use fork().




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: