GNU Parallel is extremely sluggish because it does all sort of different things behind your back: It buffers output on disk (so the from different jobs are not mixed and you are not limited by the amount of RAM - it will even compress the buffering if you are short on disk space), it checks if the disk is full for every job (so you do not end up with missing output), it gives every process its own process group (so the process with children can be killed reliably with --timeout and --memfree), and a lot of other stuff.
It lets you code your own replacement string (using --rpl), and lets you make composed commands with shell syntax:
myfunc() { echo joe $*; }
export -f myfunc
parallel 'if [ "{}" == "a" ] ; then myfunc {} > {}; fi' ::: a b c
It does not need a special compiler, but runs on most platforms that have Perl >=5.8. Input can be larger than memory, so this:
In other words: Treat GNU Parallel as the reliable Volvo that has a lot of flexibility and will get the job done with no nasty corner case surprises.
It is no doubt possible to make a better specialized tool for situations where the overhead of a few ms per job is an issue and where you neither need brakes, seatbelts nor airbags. xargs is an example of such a tool, and you can have both GNU Parallel and xargs installed side by side.
A few years ago, debian made GNU parallel provide the "/usr/bin/parallel" executable, instead of moreutils. The maintainer of moreutils had some interesting things to say about that:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=597050#75
He lost me at the point when he complained that GNU parallel "includes the ability to transfer files between computers". For me at least that is _the_ feature of GNU parallel that actually makes it really useful. Which I guess is the problem with all these discussions, one persons useless bloat is another persons nr. 1 killer feature.
The actual Perl code calls out to ssh and rsync (or can be configured to use something else) when it's time to actually connect and transfer files. It just does it in a way that is nice and reasonably transparent to the end user.
It felt like his complaint was that that was 'bloat' since Real Men can achieve almost the same thing by just piping some output through some bash scripts they just hacked together.
And it is exactly the hacking part that GNU Parallel tries to help with: A lot of the helper functions in GNU Parallel could be done by expert users (--nice, --tmux, --pipepart, env_parallel, --compress, --fifo, --cat, --transfer, --return, --cleanup, --(n)onall).
But non-expert users will invariably make mistakes (e.g. get quoting wrong, not getting remote jobs to die if the controlling process is killed, or re-scheduling jobs that were killed by --timeout), and why not just have small wrapper scripts built into GNU Parallel that are well-tested, so the non-expert users can enjoy the same stability as the expert users?
Having written my fair share of those hacky wrapper scripts before I discovered GUN parallel I certainly am very happy that they offer everything I need in a single easy to use command.
I'm sure that's the cause. Their test script is just a straight `echo` of the input, so each test process will exit essentially immediately - it's unlikely the parallel aspect actually kicks in to any decent degree. The majority of the test is spent in the Rust/Perl code vs actually running commands. That said, while the test isn't hugely useful, the fact that this Rust implementation has much less overhead is still a notable improvement.
To add to this, parallel mentions as much in its man pages (that there is a certain startup cost, and a certain job-startup cost), and offers tips for speeding up the processing of jobs which exit fairly fast. But there's no reason you couldn't also do those things in the Rust version, so it's going to win every time. When dealing with commands which take a while to complete though, the extra overhead of the perl script would probably be negligible.
Err, no. It execs other processes. the runtime overhead of running the interpreter is irrelevant as it provides no overhead over the general runtime of the sub-process.
GNU Parallel takes a surprising amount of CPU time. It does have various tasks (track it's children, feed them input, gather all their output and output it to the screen in the correct order), but I'm still surprised how much CPU it takes.
Hmm.. The example put up on the github README is a bad test case(for performance comparison), as it does almost no processing on the actual subprocesses. A better one could be creating thousands of small files(with dd or something in a directory, though it's io-bound not cpu-bound). Best might be some kind of repeated floating point exponentiation repeatedly.
Slides 28 and 35 suggest it was 3 servers, each with 4 GPUs, i.e. 12 GPUs total. If that's the case, then you can probably build a Google scale network (1 billion parameter, 9 layer neural network, which needed 1,000 machines and 16,000 cores running for a week in 2012) at home for around £4K (US$6.2K).
reply