

An interesting performance difference between perl and awk - bsg75
http://libertysys.com.au/blog/an-interesting-performance-difference-between-perl-and-awk

======
dtamhk
From the split man page

[http://perldoc.perl.org/functions/split.html](http://perldoc.perl.org/functions/split.html)

=====

In time-critical applications, it is worthwhile to avoid splitting into more
fields than necessary. Thus, when assigning to a list, if LIMIT is omitted (or
zero), then LIMIT is treated as though it were one larger than the number of
variables in the list; for the following, LIMIT is implicitly 3:

($login, $passwd) = split(/:/);

=====

and by using the a2p converter,

$ a2p <<< '{ SUM+=$5 } END {printf "%d\n", SUM}'

#!/usr/bin/perl eval 'exec /usr/bin/perl -S $0 ${1+"$@"}' if
$running_under_some_shell; # this emulates #! processing on NIH machines. #
(remove #! line above if indigestible)

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift; #
process any FOO=bar switches

while (<>) { ($Fld1,$Fld2,$Fld3,$Fld4,$Fld5) = split(' ', $_, -1); $SUM +=
$Fld5; }

printf "%d\n", $SUM;

So this behavior is confirmed.

------
dtamhk
About the second case, it was likely due to time spent on matching unicode
digits, as per the reasoning for this stackoverflow post,

[http://stackoverflow.com/questions/16621738/d-is-less-
effici...](http://stackoverflow.com/questions/16621738/d-is-less-efficient-
than-0-9)

------
bsg75
Submitted to see if the local Perl hackers have a better take on the author's
script.

~~~
draegtun
The Perl script will be slower because there are some extra steps it's going
through in that code.

So it can be improved. First here is the base line (using my own
transaction.list file):

    
    
      $ time awk '{ SUM+=$5 } END {printf "%d\n", SUM}' transaction.list 
      7725580800
      
      real	0m1.419s
      user	0m1.391s
      sys	0m0.027s
    
      $ time perl -e 'my $sum = 0; while (<>) { my @F=split; $sum+=$F[4]} printf"%d\n", $sum' transaction.list
      7725580800
    
      real	0m3.453s
      user	0m3.428s
      sys	0m0.024s
    

So this can be improved upon by removing need to create that temporary @F
array in each loop...

    
    
      $ time perl -ne '$sum += (split)[4]; END {printf "%d\n", $sum}' transaction.list 
      7725580800
      
      real	0m2.727s
      user	0m2.700s
      sys	0m0.026s
    

Better but has mentioned by _dtamhk_ elsewhere here, _split_ can be speeded up
if you are able to give it a limit:

    
    
      $ time perl -ne '$sum += (split /\s+/, $_, 5)[4]; END {printf "%d\n", $sum}' transaction.list  
      7725580800
    
      real	0m1.657s
      user	0m1.632s
      sys	0m0.023s
    

So now pretty much identical to awk speed. And if you're able to use _substr_
instead of _split_ then you can make it significantly faster than awk:

    
    
      $ time perl -ne '$sum += substr($_, 30, 12); END {printf "%d\n", $sum}' transaction.list 
      7725580800
    
      real	0m0.368s
      user	0m0.347s
      sys	0m0.021s

