
GNU Coreutils Gotchas - signa11
http://www.pixelbeat.org/docs/coreutils-gotchas.html
======
justinsaccount
oooh, can we talk about the broken BSD utils on OS X next?

    
    
      $ du -hs big.log
      199M	big.log
      $ cat big.log | time tr 'a' 'b' | md5
             27.22 real        23.70 user         0.43 sys
      c171314106134f6fde035b11f4354464
      $ cat big.log | time gtr 'a' 'b' | md5
              1.05 real         0.30 user         0.23 sys
      c171314106134f6fde035b11f4354464

~~~
pixelbeat
Yes GNU spends a lot of time improving performance. A couple of examples from
the most recent release, which you might think were too simple to optimize
significantly:

The yes command (which is generally useful for generating repetitive text):

    
    
        $ yes-old | pv > /dev/null ^C
        ... 55.8MiB/s ...
        $ yes-new | pv > /dev/null ^C
        ... 3.44GiB/s ...
    

Details on that fairly simple change are at
[http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h...](http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=35217221)

Also we more than doubled the speed of wc -l (by avoiding function call
overhead):

    
    
        $ yes | pv | wc-old -l ^C
        ... 230MiB/s ...
        $ yes | pv | wc-new -l ^C
        ... 558MiB/s ...
    

Also we now generate an infinite stream of integers more efficiently too:

    
    
        $ seq-old inf | pv > /dev/null ^C
        ... 13.3MiB/s ...
        $ seq-new inf | pv > /dev/null ^C
        ... 497MiB/s ...

~~~
justinsaccount
Nice!

------
feld
Just tested on FreeBSD and the chmod -R 644 works correctly. Weird that it's
broken in GNU coreutils.

~~~
jordigh
Hm, it should probably first recur then remove permissions, right?

At any rate, removing list permissions from directories is probably not what
you want to do.

------
kodis
The chmod gotcha that I hit occasionally when I haven't used the chmod command
for a while is the difference between "chmod -R <perm> <dir>" which recurses
as would be expected and "chmod -r <perm> <dir>" which interprets "-r" as the
<perm> spec, complains that the file <perm> doesn't exist, then removes read
permission from <dir>.

------
lmm
Why pipe into xargs when you can just use find -exec?

~~~
aaptel
xargs can consume multiple files at once. I'd say less forking/execv, better
perf.

~~~
lmm
find -exec can do multiple files at once too (use + instead of ;).

------
dvh
for me, it's details like this:

    
    
      $ basename $PWD
      etc
      $ echo $PWD | basename
      <this prints nothing>
    

I even reported it decade ago but nobody cares.

~~~
dhamidi
Why should basename accepts its argument on stdin? It doesn't act as a filter,
processing a stream of data, but rather as a plain function.

~~~
protomyth
> Why should basename accepts its argument on stdin?

Because a lot of people have use for a filter version

> It doesn't act as a filter, processing a stream of data, but rather as a
> plain function.

That's the problem with a lot commands, some work as filters, some as simple
commands, and other do both. The reasoning on the decision is confusing.

[edit: yes, combining with xargs will often work]

