
GNU Parallel 2018 - Tomte
https://zenodo.org/record/1146014
======
tbrock
It’s a shame that this tool is so incredibly useful but simultaneously so
unergonomic to use that it requires a 12 page manual.

Every time I reach for it the magical incantations that finally get it working
properly are so completely unmemorable that it’s begging to have a modern
replacement (like httpie is to curl).

If someone wrote a version with a better interface it would be instantly
adopted in preference to this. Might be a good weekend project.

~~~
kylek
Maybe not any better syntax, but xargs is usually installed by default in most
distros, and I've never found parallel to be advantageous over it. I don't
think this particular wheel needs to be invented again.

~~~
tbrock
xargs is great but doesn’t run the commands in parallel or chunk input does
it?

~~~
kylek
-P sets parallelism, -n number of arguments for each execution. Those are my main weapons. (-i is usually also necessary, and can allow for some pretty creative situations)

~~~
hyperpallium
I don't swear on HN, but Holy Shit, I just added -P to xargs in a testsuite
bash script, and it's 7 times faster.

I only have quad core, but -P values higher than 4 give better performance.
Maybe because the tests take different amounts of time, and it avoids them
backing up. To get x7, I had -P equal to the number of tests.

~~~
waterhouse
If the tests are ever waiting for I/O or something, or otherwise using less
than one full CPU core, then I think that could explain why running more than
the number of cores would be an improvement. If you run "top", or use the
"time" utility, what fraction of a CPU core does each test use?

~~~
hyperpallium
Good thinking, but they don't do any file io, and I'm not sure how it could
use less than a full CPU (wouldn't it just finish sooner? Though waiting for
io could do that). OTOH it's java (on android, so dex format) and does some
classloading of user classes... so maybe that is io? Here's time results:

    
    
      real    0m1.818s
      user    0m2.020s
      sys     0m0.900s
    

top. (It completes in 1 sec, so put in a while loop - BTW top isn't behaving
normally, "1" not giving multiple CPUs and not fitting on screen). Jus the top
line:

    
    
      User 51%, System 23%, IOW 0%, IRQ 0%

------
AlexDragusin
Immensely useful which I am forever grateful that it exists. I've made use of
it thoroughly during a search engine indexer test I ran back in 2013, which
allowed me to pretty much download and index (download + indexing in one go)
an entire country (Romania) pages in less than 1 hour on a laptop with only
4GB ram and i7 Ivy Bridge processor (Zenbook UX21 a particularly).

All the components I've wrote are optimized to the core (short of going
assembly). It allowed me to scale my operations up to 10000 processes
continuously for the duration of the test. All this was done on a 4G
connection with roughly 40Mbps bandwidth at the time on a virtualized
(VirtualBox) CentOS LAMP installation.

It blew all my expectations, without parallel it would taken more than a
month. Learned a lot from the experience developing it and maybe will make a
post sharing what I've learned.

In regards to being complicated, well it isn't really when you consider its
use scenarios, for these sorts of tasks, it's a indispensable tool.

Think of it this way: It is far less difficult than the difficulty of the
tasks where you need to use it.

~~~
mkl
How did you get the list of pages to download? Or was this spidering from some
starting set?

One hour on that connection means a maximum of 18GB of data, which seems
surprisingly small for a whole country's internet (not that I have any point
of comparison), small enough that you could easily do interesting analysis of
the whole lot.

~~~
AlexDragusin
I've started off a seed list (compiled from both Alexa and Romanian Wikipedia)
and from there on spidering through links found on each of the pages. I needed
only the pages written in Romanian language so right after download, a
language checking would take place along with the ranking and finally
indexing. Please note that only the HTML was downloaded, no images/css/js etc
so the size would be considerably less than if you would normally browse the
pages. The HTML/CSS/JS would be sripped off, leaving the content for analysis,
ranking and indexing.

True and after that I realized how small the Romanian internet really is in
the big picture, surely allowed me to put things into perspective. It's small
enough that it would be doable to have a near realtime update of the index
since less than 10% of sites update daily and less than 1% on hourly basis
such as news sites.

It correlates with traffic details of top trafficked webs in Romania where one
of the top sites gets around 200k uniques a day, a small amount when you
consider the big picture. It is likely that today the amount of data would be
larger but not by much.

------
edoceo
For everyone that don't like this one, they publish a big list of alternative

[https://www.gnu.org/software/parallel/parallel_alternatives....](https://www.gnu.org/software/parallel/parallel_alternatives.html)

------
matt-attack
I'm actually a huge fan on parallel. Use it often for various media processing
pipelines. Um, you don't need to be daunted by the 112 page manual. A brief
read of the man page has always sufficed. The best part is that every time I
think of something new I need to do, it's _always_ there.

------
LeoPanthera
Any discussion about GNU Parallel inevitably degrades into complaints about
the requirements for "\--will-cite".

Removing that requirement is a two line patch:

    
    
        4723,4724d4722
        <        1
        <        or

~~~
gnulinux
There is no such requirement. If you read the text you'll see that author
explains that's their only income source, so they ask you to consider citing
them in your paper. It is an entirely reasonable thing for a software engineer
to ask from their users.

~~~
bdowling
> It is an entirely reasonable thing for a software engineer to ask from their
> users.

It's also a reasonable thing for a user of free software to modify the
software and improve it by removing this kind of garbage.

------
edibleEnergy
Dang that's some list of options :D
[http://git.savannah.gnu.org/cgit/parallel.git/tree/src/paral...](http://git.savannah.gnu.org/cgit/parallel.git/tree/src/parallel#n1412)

------
nullc
I like some of parallels' functionality but I wish it was more performant and
that it was easier to avoid the complexity when you only want a few features.

As a result of parallel's performance and invocation complexity I find myself
using xargs -P more frequently than parallel.

------
0xFFFE
This is such a nice tool. Here is handy one liner I use it all the time.

parallel -a input_file_here -j16 --gnu 'ssh -q {} "echo 'Matrix has you'"'

~~~
mechanical_jane
Why not save some quoting:

    
    
      parallel --slf input_file_here -j16 --nonall echo 'Matrix has you'

~~~
0xFFFE
Sweet, this is more elegant, thank you.

