

Processes Paralleling to Speed up Computing and Tasks Execution in Linux - andreygrehov
http://kukuruku.co/hub/nix/processes-paralleling-to-speed-up-computing-and-tasks-execution-in-linux

======
deathanatos
> _Besides, xargs isn’t good at transferring special symbols, such as space or
> quotes._

Well, no, not when you're piping ls into it. That's what --null (or -0) is
for. (Use find instead.) The man page offers a similarly baffling argument:

> _xargs deals badly with special characters (such as space, ' and "). To see
> the problem try this:_
    
    
      touch important_file
      touch 'not important_file'
      ls not* | xargs rm
      mkdir -p "My brother's 12\" records"
      ls | xargs rmdir
    

Same problem as the article: piping in the output of ls into xargs.

> _You can specify -0 or -d "\n", but many input generators are not optimized
> for using NUL as separator but are optimized for newline as separator._

I still don't see the argument: if the generator is "optimized for newline as
a separator", then -d is what you need. But if your generator is outputting a
list of file names (perhaps the most common case of xargs), then -0 is what
you need. It is perhaps somewhat unfortunate that its use of whitespace (and
bash's use) make it so easy to write subtly incorrect programs; this is
perhaps why there's such common advice to use Python, or anything not Bash.
But the argument above leads me to think that `ls | parallels` would somehow
work, which it cannot, for the very reasons we must use xargs -0.

I suspect this is a long way about saying "we have (somewhat more, but still
not quite) sane defaults".

The correct command has always been `find . not* -print0 | xargs -0
~/code/random/args.py`. Anything else is subject to weird corner cases of
failure. It's a sad state of affairs really, and I'd love to change it. If
you're just pounding out a command on the command line, you can usually ignore
those corner cases, as you probably mentally _know_ whether `ls not*` is going
to cause issues or not; if you're writing a script for general use, then it's
only a matter of time until it fails in interesting ways.

~~~
fulafel
> The correct command has always been `find . not* -print0 | xargs -0
> ~/code/random/args.py`

If you need to be portable, you can do pretty well with

    
    
      find ... | sed 's/./\\&/g' | xargs command 
    

This will only break with filenames that have newlines in them.

(see
[http://www.etalabs.net/sh_tricks.html](http://www.etalabs.net/sh_tricks.html))

------
rbc
Use pipelines with simple filter commands. No thread programming required. Use
ssh to distribute this across nodes securely, assuming the I/O doesn't make
that unworkable.

------
rdtsc
The 'make' trick is cool. Used that a couple of times. I just kind of
discovered that on my own. Then someone told me about xargs, so now that is my
default.

Sometimes by playing with the -j or xargs --max-procs you discover interesting
bottlenecks. Remember spending time trying to speed up this script. Used the
make trick, it still was not running fast enough, it turns out disk i/o was
the bottleneck.

~~~
voltagex_
I don't know enough about make to explain, but sometimes -j will break the
build. Some Debian packages are explicitly built without -j enabled.

~~~
rdtsc
Sometimes there are incorrectly specified dependencies between source files
and things that depend on them. For example rules that are implicitly
sequential in -j1 mode but if they are written as

target: dep1 dep2 then dep1 and dep2 might be executed in parallel if -j>1

There are not many make expert (I am not one) so looking and figuring out the
issue is sometimes tricky so often I see -j1 used always by default.

------
noipv4
I also like the simple option to spawn processes in a loop in bash and then
wait.

[http://stackoverflow.com/questions/356100/how-to-wait-in-
bas...](http://stackoverflow.com/questions/356100/how-to-wait-in-bash-for-
several-subprocesses-to-finish-and-return-exit-code-0)

------
jchavannes
Here's a bash script for running scripts in parallel (which I use in
production at work).

[https://github.com/jchavannes/procman](https://github.com/jchavannes/procman)

~~~
shmerl
Yes, running in background and checking the status of the pid is coming to
mind when one has to do that Bash. I always feel that it's stretching beyond
the normal level of usage. For example what can you do with the output? There
is no way to sync it unless you'll capture it in a file and analyze later. In
cases when I need to write something parallelized I feel that using Bash isn't
the best idea.

~~~
jchavannes
Yeah, I agree, but there are some use cases for a simple bash script. I
updated the GIT repo with a PHP version that has output and errors like you
said.

------
gegtik
See also Gnu Parallel -
[http://www.gnu.org/software/parallel/parallel_tutorial.html](http://www.gnu.org/software/parallel/parallel_tutorial.html)

------
DrJ
why doesn't the person just use 'time' to show how long it took instead of
saying off the cuff 'saved X seconds'

------
dang
There have been 14 posts from this site (not counting deleted ones) and most
appear to be unauthorized, uncredited translations from a Russian site,
habrahabr.ru. Some of the Russian texts were themselves translated from
English. See the discussion at
[https://news.ycombinator.com/item?id=7754641](https://news.ycombinator.com/item?id=7754641).

Should we ban kukuruku.co? There's good content (if not good English) in the
articles, but these tactics are sleazy and it doesn't feel like behavior HN
should be involved in encouraging.

[https://news.ycombinator.com/item?id=7770909](https://news.ycombinator.com/item?id=7770909)

[https://news.ycombinator.com/item?id=7770887](https://news.ycombinator.com/item?id=7770887)

[https://news.ycombinator.com/item?id=7770727](https://news.ycombinator.com/item?id=7770727)

[https://news.ycombinator.com/item?id=7768391](https://news.ycombinator.com/item?id=7768391)

[https://news.ycombinator.com/item?id=7765671](https://news.ycombinator.com/item?id=7765671)

[https://news.ycombinator.com/item?id=7764938](https://news.ycombinator.com/item?id=7764938)

[https://news.ycombinator.com/item?id=7763187](https://news.ycombinator.com/item?id=7763187)

[https://news.ycombinator.com/item?id=7753792](https://news.ycombinator.com/item?id=7753792)

[https://news.ycombinator.com/item?id=7753166](https://news.ycombinator.com/item?id=7753166)

[https://news.ycombinator.com/item?id=7741937](https://news.ycombinator.com/item?id=7741937)

[https://news.ycombinator.com/item?id=7734202](https://news.ycombinator.com/item?id=7734202)

[https://news.ycombinator.com/item?id=7732049](https://news.ycombinator.com/item?id=7732049)

[https://news.ycombinator.com/item?id=7727016](https://news.ycombinator.com/item?id=7727016)

[https://news.ycombinator.com/item?id=7702566](https://news.ycombinator.com/item?id=7702566)

~~~
sbahra
Please ban them, this is absolutely unacceptable. The original author put hard
work into his writing, and I _still_ don't see any credit in a majority of
their translations. They have published 14 articles in total without any
credits going to the original author. It's definitely not the type of news I
would expect to be associated with a "hacker". I do not see any links to the
original source at [http://kukuruku.co/hub/cpp/lock-free-data-structures-
introdu...](http://kukuruku.co/hub/cpp/lock-free-data-structures-introduction)

They are stealing content.

Another user (leephillips) posted the following about their posts: "I went to
the site and left a comment complaining about how they stole your article.
They deleted the comment, and you still aren't credited. Does HN have a
blacklist of sites that steal content or publish only blogspam? I think they
should consider it." \-
[https://news.ycombinator.com/item?id=7754641](https://news.ycombinator.com/item?id=7754641)

~~~
dang
I understand how you feel. But to be fair, the author of that translation has
said (on HN, as well as in a reply to an email I sent him) that he didn't mind
it being used and that it was itself a translation of an English article to
begin with.

