
GNU Parallel Cheat Sheet [pdf] - ole_tange
https://www.gnu.org/software/parallel/parallel_cheat.pdf
======
bonoboTP
Ah, the rare case of nagware in GNU.

From the man page:

"\--citation Print the BibTeX entry for GNU parallel and silence citation
notice. If it is impossible for you to run --bibtex you can use --will-cite.
If you use --will-cite in scripts to be run by others you are making it harder
for others to see the citation notice. The development of GNU parallel is
indirectly financed through citations, so if your users do not know they
should cite then you are making it harder to finance development. However, if
you pay 10000 EUR, you should feel free to use --will-cite in scripts."

Asking for donations/citations is one thing, but putting this junk in the man
page about 10000 EUR and nagging users is quite an annoyance. How GNU allows
such junk in their man pages puzzles me. Obviously the GPL allows one to
remove the nagware and redistribute, but I don't know if anyone has forked it.

~~~
musicale
Beyond being supremely irritating, nagware is simply not scalable.

Imagine if every utility, library, or driver in a typical Linux distribution
took this approach. :(

I encourage Debian et al. to adopt a "no nagware" policy.

~~~
jwilk
There's something similar in Debian Policy §2.3
([https://www.debian.org/doc/debian-policy/ch-
archive.html#cop...](https://www.debian.org/doc/debian-policy/ch-
archive.html#copyright-considerations)):

> _Programs whose authors encourage the user to make donations are fine for
> the main distribution, provided that the authors do not claim that not
> donating is immoral, unethical, illegal or something similar; in such a case
> they must go in non-free._

BTW, the nagware code has been removed in Debian unstable:

[https://bugs.debian.org/905674](https://bugs.debian.org/905674)

~~~
musicale
It's about time! In addition to my "it's annoying and simply not scalable"
comment, the bug discussion brings up some additional compelling points:

1\. It included a click-wrap agreement in violation of the Debian Free
Software Guidelines.

2\. Fishing for inappropriate citations should not be encouraged, as it
compromises the integrity of scholarship.

------
gcommer
A few slightly more advanced GNU Parallel features that I've used:

\- --joblog writes out a detailed logfile of the jobs, which can be used to
resume from interrupted runs with --resume{,-failed}

\- `--slf filename` can be used to provide a list of ssh logins to remote
worker nodes to run jobs. Importantly, parallel will automatically reread this
list when it changes. This lets you very easily distribute batch jobs across
preemptible gcloud vms (or ec2 spot instances) and gracefully handle worker
nodes appearing/disappearing with just a few lines of bash
[https://gist.github.com/gpittarelli/5e14fb772ce0230a3c40ffad...](https://gist.github.com/gpittarelli/5e14fb772ce0230a3c40ffad2c2262be)

\- When used with bash, parallel can run bash functions if you export them
with `export -f functionName` .

~~~
ziotom78
Yeah, --joblog is a very handy feature. I once hacked a small Python script to
produce an ASCII time plot from its output:

[https://github.com/ziotom78/plot_joblog](https://github.com/ziotom78/plot_joblog)

------
mruts
I've never used GNU Parallel. But could someone explain to me the value add vs
GNU xargs -P/\--max-procs? From the examples at the top, it seems like those
could be achieved with xargs.

~~~
gcommer
parallel is like xargs++; for simple cases it does the same thing as xargs,
but it also has many more advanced features such as:

\- Splitting input lines into multiple fields and building more complex
commands from them

\- Running jobs on remote nodes

\- Pausing/resuming batch jobs (--joblog)

\- ETA and progress bars

\- Passing data to programs on stdin and generally many, many other ways of
distributing and collecting data that xargs can't do

You can see a bunch of examples at:
[https://www.gnu.org/software/parallel/man.html](https://www.gnu.org/software/parallel/man.html)

    
    
      $ PAGER=cat man xargs | wc -l
      259
      $ PAGER=cat man parallel | wc -l
      3985

~~~
e12e
OT: I got curious, and this also works:

PAGER="wc -l" man xargs

(although my man page for xargs is just 211 lines)

~~~
magissima
Try changing your terminal's width and running it again ;)

~~~
e12e
Ah.

MANWIDTH=80 PAGER="wc -l" man xargs 292

------
bloopernova
Parallel is Good Stuff (tm) and works very well but I haven't had much cause
to use it.

For ad-hoc system modifications I've found myself using tmux's synchronize-
panes feature, or xargs. For anything bigger or more involved then I break out
Ansible/Chef/Puppet depending on which client project I'm working on.

I remember one place I worked at had a huge elaborate configuration/deployment
system hand written by the head IT guy which used Parallel+bash+perl
extensively. Thing is, while it was a great system, I could make the same
changes in Ansible or Puppet with a couple of lines and push them within
minutes, while making changes using the hand written system might take hours.
Plus no logging and poor error handling led to all sorts of problems with that
system, despite it being a real labour of love by that wacky Finnish dude.

However this sheet is really nice because it is just one side of a letter/A4
piece of paper and lays out the information clearly. I definitely want to mess
around with Parallel now because of this cheat sheet. I wonder how it was
typeset or laid out on the page? I try to write my own cheat sheets but they
always seem way too sparse with too much white space. Maybe it is written in
LaTeX or similar.

~~~
mechanical_jane
Not LaTeX, but LibreOffice:
[http://git.savannah.gnu.org/cgit/parallel.git/tree/src/paral...](http://git.savannah.gnu.org/cgit/parallel.git/tree/src/parallel_cheat.fodt)

------
jason_slack
I use GNU Parallel for pulling stock data from various sources, massaging it,
creating flatfiles of the data, creating models of the data, etc.

I also use it as a rudimentary queue system for stacking up the next jobs
(while scripts stack up the next jobs, but..).

It had a bit of a learning curve because the docs are really technical and not
geared towards new users enough, but reading and re-reading and trying some
examples helped cement.

Here are a few ways I use it:

echo "Number of RAR archives: "$(ls _.rar | wc -l)

ls _.rar | parallel -j0 1_1_rarFilesExtraction

ls -d stocks_all/Intraday/*.txt | parallel -j${ccj}% 1_2_stockFileProcessing
{}

I'd like to scale this to work with multiple machines (as Parallel can do) but
I get really tempted to just write my own parallel processor just to rely on
my own code.

------
scrummyin
My favorite parallels command `$ find ~/Source/folder -name .git | parallel
"cd {}/.. ; git pull ; git checkout -b new_branch" `

------
akramer
Each time I've seen something about GNU parallel pop up I've been tempted to
post, but I've never made an account until now.

I wrote a very different style of command parallelizer that I named lateral.
It doesn't require constructing elaborate commandlines that define all of your
work at once. You start a server, and separate invocations of 'lateral run'
add your commands to a queue to run on the server, including their
filedescriptors. It makes for easier parallelization of complex arguments.

Take a look if this sort of thing interests you, as I haven't seen anyone
write one like this before. Its primary difference is the ease with which each
separate command can output to its own log, and the lack of need to play games
with shell quoting and positional arguments.

Check it out:
[https://github.com/akramer/lateral](https://github.com/akramer/lateral)

~~~
the_it_girl
I think it is good you finally made an account: How are people going to find
your software if you do not tell them about it :)

Can you make a comparison between lateral and sem?

[https://www.gnu.org/software/parallel/sem.html](https://www.gnu.org/software/parallel/sem.html)

------
res0nat0r
Lots of good examples also here:
[https://www.gnu.org/software/parallel/man.html](https://www.gnu.org/software/parallel/man.html)

------
Mizza
If you're using GNU Parallel for simple, non-parallel command line tasks and
scripting, I've written a tool which I find to be much more intuitive:

[https://github.com/Miserlou/Loop](https://github.com/Miserlou/Loop)

The author of GNU Parallel wrote a pretty detailed comparison, which you can
find in the linked README.

~~~
isaachier
Your tool looks nice, but it doesn't seem to parallelize the work in any way.

~~~
isaachier
Never mind, missed your point about not being parallel.

------
devy
Is there a Rust port for GNU parallel? It's written in Perl and having to
install dependencies for Perl is not as simple as download a binary :)

~~~
wyoh
[https://github.com/mmstick/parallel](https://github.com/mmstick/parallel) but
it's unmaintained and the author wanted to do a rewrite.

------
hprotagonist
Still often the simplest way to get parallel computation in python, sadly.

~~~
bonoboTP
The multiprocessing module is pretty good in Python.

~~~
zaphirplane
Logging was lost last time I tried it

~~~
ZeroCool2u
Has this issue recently. Turns out there's a great library for this
specifically. [https://github.com/jruere/multiprocessing-
logging](https://github.com/jruere/multiprocessing-logging)

