
Faster Command Line Tools in D - petercooper
http://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
======
KirinDave
> (Note: For brevity, error handling is largely omitted from programs shown.)

I found this article interesting but to be honest I hate it when people do
this. Usually it is the real-world considerations and error handling that
cause the code to cease being an elegant demo and look more like the same
stuff everyone else writes.

~~~
agumonkey
Yeah code only dealing with elegant cases is elegant. News at 20.

Sometimes I wonder if software should error first, then when you bounded the
failure space, you iterate on the success space as you see fit.

~~~
AceJohnny2
You've just defined Test-Driven-Development.

~~~
KirinDave
Sorta? But unit testing is such a problem-solving methodology that it really
does seem like we want something stronger.

------
gregwebs
If you are interested in fast TSV tooling you might also try xsv [1], which is
written in Rust and has similar features to the linked tsv-utils-dlang
project.

The frequency command computes the frequency of values in 2 columns in < 2
seconds on my machine. Its a different task, but there are some similarities.

    
    
        xsv frequency -n -s 1,2 -l 100 googlebooks-eng-all-1gram-20120701-0.tsv
        5.19s user 0.06s system 351% cpu 1.490 total
    

[1] [https://github.com/BurntSushi/xsv](https://github.com/BurntSushi/xsv)

~~~
acehreli
xsv appeares in comparisons on the author's tsv-utils-dlang repo:
[https://github.com/eBay/tsv-utils-
dlang/blob/master/docs/Per...](https://github.com/eBay/tsv-utils-
dlang/blob/master/docs/Performance.md)

------
ramses0
This article is trash. They start with "obvious" python at 12s, run it with
pypy instead for 3s, and then rewrite and optimize a D version from 3s to 1s
w/o attempting any further optimization of the python version(!?!).

In my opinion omit _all_ of the discussion on python and just talk about "how
to optimize a D program" b/c that's what this article is.

~~~
qznc
In general, Python is slow (compared to C or whatever) because of excessive
memory allocation and overuse of hash maps.

PyPy probably manages to optimize the hash map/method call lookups for these
small programs, which explains the speedups. Removing memory allocations is
still hard.

The D language provides finer mechanisms to control memory and data
structures. This makes the language larger, but enables you to optimize if it
becomes necessary.

Still, I agree and I would like to see a Python expert to optimize it.

~~~
dom0
> In general, Python is slow (compared to C or whatever) because of excessive
> memory allocation and overuse of hash maps.

Python is a highly dynamic language with an API (towards both Python and C)
that is very invasive. These two things, taken together, make optimizing the
interpreter extremely difficult, because practically all of it can be modified
or introspected. CPython being implemented largely as a hashtable-interpreter
is only one facet to its performance.

Perhaps a talk recommendation:
[https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...](https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seLRqszBqVDF342RMlCWgOTm6q&index=11)

------
mhh__
Is it worth pointing out that the compiler flags for the D program didn't use
ldc's full optimisation settings? Namely, link time optimisation (But not sure
if that would have helped here or not), cross module inlining (Possibly the
same as before, but I know that this inlines parts of the runtime (AA
implementation) and -O3 (As opposed to -O).

------
biomcgary
Just FYI, I tried using a couple of the pre-compiled binaries in bash on
Ubuntu on Windows and got a segmentation fault. Same binaries worked fine in
real Linux.

~~~
voltagex_
I wonder if that's a WSL bug you should report.

Edit: do you mean the tsv utilities? Because they're working fine here on the
Creator's Update.

~~~
biomcgary
Yes, the tsv utilities. I haven't upgraded yet.

------
ajbonkoski
I don't get articles like this.. they seem to miss the bigger point.

Typically there are two modes in my computing: (1) Scripting / Command-line
get stuff done and throw-away, and (2) Serious applications that are heavily
used, need to process lots of data and be as fast as possible (e.g. processing
millions of files like this one where the algorithm constant factor really
matters).

In case 1: Hack something together with shell or python and get an answer. If
it takes 100-1000x of the equivalent C program, than fine..

In case 2: Custom special purpose C or C++ code

I really don't understand the middle ground here. The equivalent C version
that does the same job runs in ~250 millis on my slow Yoga 2 Pro laptop. Total
line count 83 of pure C (no other libraries).

Is it "elegant"? Depends who you ask.. But, at the end of the day, "elegant"
doesn't pay the bills..

~~~
bionsuba
This is a nice sentiment, but this rarely plays out in practice in my
experience.

People use Python all the time to manipulate data sets >= 100G in size despite
its speed failings at that size. Why? Because Pandas is just so damn
convenient. It would take me a grand total of 30 seconds to write Pandas code
which read a TSV and gave me the sum of two multiplied together columns
grouped by the day of a timestamp column. Doing that in C would take several
orders of magnitude more time.

It's an optimization of people's time problem. You could probably spend
several hours (or days) writing a C program for a specific problem. But if you
can spend only 40% of the time writing the program and have it only 20%
slower, then that's a definite win (these numbers are just an example).

------
maxpert
Makes me wonder, how many people actually use D-lang in production.
Specifically with HTTP stack what kind of numbers/performance benchmarks we
are looking at? Other than toy projects is there a company out there running D
on massive scale (millions per day)?

~~~
maxxxxx
D would make a good case study to look at why some technologies find
widespread use whereas others don't. I don't understand why it never took over
from C++.

~~~
milcron
D's standard library uses garbage collection.

~~~
destructionator
So do Java, C#, Python, Ruby, PHP, Javascript, and virtually everything else
and they are very heavily used. Garbage collection is a smashing success in
the real world and D made the right decision to follow that success.

Of course, it is also true that much of the standard library doesn't actually
use it... but these objections are never actually about facts.

~~~
pixel_fcker
None of those languages you listed are for systems programming, which is the
one niche you need to excel at if you want to replace C and C++. People who
care about that stuff tend to care a lot about managing memory.

~~~
dr_zoidberg
Ds GC can be turned off to do systems programming.

~~~
milcron
How much of the standard library can you use in that mode?

------
chickenfries
I'd be interested in the perspective of someone who has used both GO and Dlang
for command line tools. Go is my current tool for this, but D seems like a
much nicer, less dogmatic language.

~~~
mhh__
The answer is fairly obvious: dlang is a nicer language, by design. Go isn't
meant to be particularly impressive on the language design/innovation front.

~~~
penpapersw
Honest question: what strengths does Go have over D? I became very proficient
at Go several years ago, and was reading about D for hours and hours last
night, and it looks like D is overall a much better language.

~~~
chickenfries
This is sort of what I'm wondering. Are the compile times as good as Go? Is
the GC as good as Go, or do you end up having to do manual management to get
similar performance to Go? What is the build tool ecosystem like (I found Go's
to be the one part of Go that was not easy to pick up quickly)?

~~~
Mister_Snuggles
I've dabbled in D occasionally and the compile times are really fast. They're
fast enough that they even have a version (rdmd) that basically acts like a
scripting language.

See [https://dlang.org/rdmd.html](https://dlang.org/rdmd.html)

~~~
chickenfries
This is awesome, thanks for the link. This is where I'll start when I have a
good project to try out D.

------
systems
well, i dont think that there were much doubt that d was faster than python

i am thought impressed with how fast pypy did

~~~
e12e
I tried the python programs under pypy and python3 (after running through
2to3) -- and got similar speeds as the author. I was a little surprised that
my little awk script was slower than than pypy (completing in 6 to 7 seconds):

    
    
      $ cat sum.awk
      { a[$2] += $3 }
      END {
        for (i in a) {
          if (a[i] > max) {
            maxk = i
            max=a[i]
          }
        }
        print "max_key:", maxk, "sum:", max
      }
    
      $ time gawk -f sum.awk -O <ngrams.tsv
      max_key: 2006 sum: 22569013
    
      real    0m7.041s
      user    0m3.797s
      sys     0m3.156s
    

(This under Linux subsystem for windows)

According to the gawk profiler, and strace -c (count) -- the awk program
mainly spends its time reading the file (without the loop at the end, looking
for the max value, the runtime is essentially the same).

In fact, on the surface, pypy and python3 are quite similar on the
syscall/strace front - with roughly 23k "read" calls -- awk did 375k. And
adding cat in front sped it up by about two seconds:

    
    
      $ time (cat ngrams.tsv |awk -f sum.awk )
      max_key: 2006 sum: 22569013
    
      real    0m3.969s
      user    0m3.719s
      sys     0m0.516s
    
      $ time awk -f sum.awk ngrams.tsv
      max_key: 2006 sum: 22569013
    
      real    0m6.465s
      user    0m3.609s
      sys     0m2.859s

~~~
e12e
Out of curiosity (and with the danger of hijacking this thread as a
stackoverflow discussion), I changed the awk code so that it could be used
with GNU parallel - the "reduce"-step is essentially the same program as
before:

    
    
      (cat ngrams.tsv \
        |parallel --pipe awk -f map.awk \
        |awk -f reduce.awk )
      max_key: 2006 sum: 22569013
    

This now runs in 17 to 18 seconds... :-/

    
    
      $ cat map.awk
      { a[$2] += $3 }
      END {
        for (i in a) {
          print "ignore", i, a[i]
        }
      }
    
      $ cat reduce.awk
      { a[$2] += $3 }
      END {
        for (i in a) {
          if (a[i] > max) {
            maxk = i
            max=a[i]
          }
        }
        print "max_key:", maxk, "sum:", max
      }
    

[ed: However, there are faster awks than gawk:

    
    
      $ time mawk -f sum.awk ngrams.tsv
      max_key: 2006 sum: 22569013
    
      real    0m2.826s
      user    0m2.391s
      sys     0m0.422s
    

mawk is (a little) faster than pypy on my machine.

]

~~~
ole_tange
\--pipe is well know for being slow.

Try --pipe-part instead:

    
    
        parallel -a ngrams.tsv --pipe-part --block -1 awk -f map.awk |
          awk -f reduce.awk

~~~
e12e
Thanks for the tip, always nice to see the author of tools commenting on hn
:-)

The (old) version of parallel packaged with Ubuntu 16.04 (linux subsystem for
windows) - doesn't have --pipe-part -- but running from upstream, the speed is
more reasonable:

    
    
      $ time (./parallel-20170522/src/parallel -a ngrams.tsv \
        --pipe-part --block -1 -j4 mawk -f map.awk \
        | mawk -f reduce.awk )
      max_key: 2006 sum: 22569013
    
      real    0m2.265s
      user    0m4.672s
      sys     0m1.672s
    

(Tried a few variants with/without -jN -- and this seems typical for the fast
end of the spectrum).

    
    
      $ time (cat ngrams.tsv \
         | mawk -f map.awk \
         | mawk -f reduce.awk )
      max_key: 2006 sum: 22569013
    
      real    0m3.472s
      user    0m2.891s
      sys     0m2.406s
    

[ed: btw, did a double-take when I saw your Gnu Privacy Guard id: 0x88888888
:-) ]

~~~
ole_tange
\--line-buffer may or may not give additional speed up.

------
gwu78
"The task is to sum the values for each key and print the key with the largest
sum."

What is the smart way to do this in kdb+?

This is my naive, sloppy 15min approach.

Warning: Noob. May offend experienced k programmers.

    
    
       k)`t insert+:`k`v!("CI";"\t")0:`:tsvfile
       k)f:{select (*:k),(sum v) from t where k=x}
       k)a:f["A"]
       k)b:f["B"]
       k)c:f["C"]
       k)select k from a,b,c where v=(max v)

~~~
qesa
Using the file from the original,

    
    
        1#desc sum each group (!/) (" II";"\t") 0: `:tsvfile
    

Took about 3 seconds, 2.5 of which was reading the file

EDIT:

    
    
        q)\ts d: (!/) (" II";"\t") 0: `:tsvfile
        2489 134218576
        q)\ts 1#desc sum each group d
        486 253055104

~~~
gwu78
I was using the first example with a char in the first column.

    
    
       A 4
       B 5
       B 8
       C 9
       A 6
    

How to solve with only a dict?

Regarding the 1gram file at
[https://storage.googleapis.com/books/ngrams/books/googlebook...](https://storage.googleapis.com/books/ngrams/books/googlebooks-
eng-all-1gram-20120701-0.gz)

This is the result I got

    
    
       3| 1742563279
    

using

    
    
       q)\ts d:(!/)(" II";"\t")0:`:1gram
       q)\ts 1#desc sum each group d
       1897 134218176
       371 238872864
    

or

    
    
       k)\ts d:(!/)(" II";"\t")0:`:1gram
       k)\ts desc:{$[99h=@x;(!x)[i]!r i:>r:. x;0h>@x;'`rank;x@>x]}
       k)\ts 1#desc (sum'=:d)
       1897 134218176
       0 3152
       372 238872864
    

No doubt I must be doing some things wrong.

~~~
qesa
I actually had it wrong in mine. Wasn't paying attention and had the
dictionary the wrong way around. Probably would have been more obvious with
the char since you can't sum them...

With the reverse thrown in to switch the key/value around we get the correct
answer

    
    
        q) 1#desc sum each group (!/) reverse (" II";"\t")0:`:1gram
        2006| 22569013
    

or

    
    
        k) {(&x=|/x)#x}@+/'=:!/|(" II";"\t")0:`:1gram
        (,2006i)!,22569013i
    

Works the same for the simple example

    
    
        k)e: 4 5 8 9 6!"ABBCA"
        k){(&x=|/x)#x}@+/'=:e
        (,"B")!,13

------
mikulas_florek
Does anybody here have "D version 5" compiled? I would like to compare it with
my naive C++ version.

------
faragon
TL;DR: D is faster than Python.

~~~
gshulegaard
Is it though? Un-optimized Python vs. a D script iterated on 5 times?

Certainly eye-catching but I wouldn't call it conclusive.

~~~
throwaway7645
No, it isn't debatable which is generally faster (D), but I was impressed that
you could just prototype in Python (probably significantly faster than D) and
run it through PyPy when you're done and get 90% of the performance of D. Of
course Python has the bloat of the interpreter and Jitter, while D is just a
binary. Point is I was expecting more speed from D. I'm curious if this is
just luck or the two would be neck and neck on a range of tests?

~~~
gshulegaard
Optimizing/squeezing performance out of Python is a rabbit hole:

[https://www.ibm.com/developerworks/community/blogs/jfp/entry...](https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en)

I would speculate using numba or Cython would yield further performance gains
over PyPy...but that's mostly just based on anecdotal comparisons:

[https://cardinalpeak.com/blog/faster-python-with-cython-
and-...](https://cardinalpeak.com/blog/faster-python-with-cython-and-pypy-
part-2/)

I just think it is a bit dishonest to try and make a claim as pointed as this
article's in 2017 by stopping at simply running an un-optimized CPython script
with PyPy.

~~~
throwaway7645
I think they're just giving you bounds on what to expect and not selling
anything. The D optimizations looked a lot easier (write it slightly
different) than mucking with Cython or Numba. Simply running through PyPy is
another thing all together.

~~~
gshulegaard
I don't know, a lot of benefit can be had from Cython by just declaring types
and flagging for compilation:

[http://cython.readthedocs.io/en/latest/src/tutorial/cython_t...](http://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html)

[http://cython.readthedocs.io/en/latest/src/quickstart/cython...](http://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html)

But that is just my opinion.

