Hacker News new | past | comments | ask | show | jobs | submit login
Faster Command Line Tools in D (dlang.org)
123 points by petercooper on May 24, 2017 | hide | past | web | favorite | 94 comments

> (Note: For brevity, error handling is largely omitted from programs shown.)

I found this article interesting but to be honest I hate it when people do this. Usually it is the real-world considerations and error handling that cause the code to cease being an elegant demo and look more like the same stuff everyone else writes.

Well, the real world code is also linked at the start of the article: https://github.com/eBay/tsv-utils-dlang

Yeah code only dealing with elegant cases is elegant. News at 20.

Sometimes I wonder if software should error first, then when you bounded the failure space, you iterate on the success space as you see fit.

You've just defined Test-Driven-Development.

Sorta? But unit testing is such a problem-solving methodology that it really does seem like we want something stronger.

TDD is informal, I was thinking about something a bit more mathematical. It's hard to reason about spaces with TDD.

In a way, isn't this how software works? The problem being that an unaccounted for error might make your program exit, or produce an incorrect result, etc. I like your idea conceptually but struggling to imagine what working on software would like in a practical sense.

Given that one normally throws an exception when encountering a runtime error in D, it often is not necessary to have specific runtime error handling code.

If you are interested in fast TSV tooling you might also try xsv [1], which is written in Rust and has similar features to the linked tsv-utils-dlang project.

The frequency command computes the frequency of values in 2 columns in < 2 seconds on my machine. Its a different task, but there are some similarities.

    xsv frequency -n -s 1,2 -l 100 googlebooks-eng-all-1gram-20120701-0.tsv
    5.19s user 0.06s system 351% cpu 1.490 total
[1] https://github.com/BurntSushi/xsv

xsv appeares in comparisons on the author's tsv-utils-dlang repo: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Per...

This article is trash. They start with "obvious" python at 12s, run it with pypy instead for 3s, and then rewrite and optimize a D version from 3s to 1s w/o attempting any further optimization of the python version(!?!).

In my opinion omit all of the discussion on python and just talk about "how to optimize a D program" b/c that's what this article is.

In general, Python is slow (compared to C or whatever) because of excessive memory allocation and overuse of hash maps.

PyPy probably manages to optimize the hash map/method call lookups for these small programs, which explains the speedups. Removing memory allocations is still hard.

The D language provides finer mechanisms to control memory and data structures. This makes the language larger, but enables you to optimize if it becomes necessary.

Still, I agree and I would like to see a Python expert to optimize it.

> In general, Python is slow (compared to C or whatever) because of excessive memory allocation and overuse of hash maps.

Python is a highly dynamic language with an API (towards both Python and C) that is very invasive. These two things, taken together, make optimizing the interpreter extremely difficult, because practically all of it can be modified or introspected. CPython being implemented largely as a hashtable-interpreter is only one facet to its performance.

Perhaps a talk recommendation: https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...

The article wasn't really an attempt to show how D is "faster" or "better" than Python. I think the author was trying to baseline code size and relevance to the problem by illustrating how it compares to a typical Python solution.

I think this repository gives more context as to why they might of showed some Python code:


They basically explored new languages to rewrite some perl script in and liked D enough to shift over. They have other tooling in other languages, my guess is they'll unify a good amount of it in D. Disclaimer: this is based on my own assumption that they like D so much they want to just use it all over. It wouldn't surprise me to see this confirmed by an eBay employee. Although seeing as he wrote TSV Utilities, it really wouldn't surprise me if he wants to rewrite all in-house tooling he uses in D as the repository states.

You're right... naive PyPy and D are in the same ballpark. Then they only optimized the D version.

I was excited with D's performance before I realized it is barely faster than PyPy. Almost not much of a point unless it saves in other ways like concurrency and parallelism?

This problem is not designed to compare language speed. You can get C speeds (and sometimes better) using D.

Not to mention using .split(delim) instead of the proper CSV parsing library that ships with Python.

Out of curiosity, I gave it a shot. I came out roughly 20% faster using python's inbuilt csv library.

When I switched to pypy the csv library actually made it nearly 2x slower than pypy using .split(delim)

Of course it's faster. Sorry, I wasn't clear. But good luck handling a tab in a quoted field, then.

And you were using the pure Python CSV library, not the C one?

I used the stdlib one, just straight "import csv"

How can you tell?

Is it worth pointing out that the compiler flags for the D program didn't use ldc's full optimisation settings? Namely, link time optimisation (But not sure if that would have helped here or not), cross module inlining (Possibly the same as before, but I know that this inlines parts of the runtime (AA implementation) and -O3 (As opposed to -O).

Just FYI, I tried using a couple of the pre-compiled binaries in bash on Ubuntu on Windows and got a segmentation fault. Same binaries worked fine in real Linux.

I wonder if that's a WSL bug you should report.

Edit: do you mean the tsv utilities? Because they're working fine here on the Creator's Update.

Yes, the tsv utilities. I haven't upgraded yet.

I don't get articles like this.. they seem to miss the bigger point.

Typically there are two modes in my computing: (1) Scripting / Command-line get stuff done and throw-away, and (2) Serious applications that are heavily used, need to process lots of data and be as fast as possible (e.g. processing millions of files like this one where the algorithm constant factor really matters).

In case 1: Hack something together with shell or python and get an answer. If it takes 100-1000x of the equivalent C program, than fine..

In case 2: Custom special purpose C or C++ code

I really don't understand the middle ground here. The equivalent C version that does the same job runs in ~250 millis on my slow Yoga 2 Pro laptop. Total line count 83 of pure C (no other libraries).

Is it "elegant"? Depends who you ask.. But, at the end of the day, "elegant" doesn't pay the bills..

This is a nice sentiment, but this rarely plays out in practice in my experience.

People use Python all the time to manipulate data sets >= 100G in size despite its speed failings at that size. Why? Because Pandas is just so damn convenient. It would take me a grand total of 30 seconds to write Pandas code which read a TSV and gave me the sum of two multiplied together columns grouped by the day of a timestamp column. Doing that in C would take several orders of magnitude more time.

It's an optimization of people's time problem. You could probably spend several hours (or days) writing a C program for a specific problem. But if you can spend only 40% of the time writing the program and have it only 20% slower, then that's a definite win (these numbers are just an example).

It's one of the points of this article that you don't need those two modes with D. Scripting and custom code in the same language is possible but it's still an alien concept.

The point here is that you can write both the python and the C version in the same language. Your C version, excluding the preprocessor, might already compile as a D program anyway.

I'd expect a significant amount of the shell code I cobble together to run faster than anything I could write in a more advanced programming language.

Makes me wonder, how many people actually use D-lang in production. Specifically with HTTP stack what kind of numbers/performance benchmarks we are looking at? Other than toy projects is there a company out there running D on massive scale (millions per day)?

They have a page here: http://dlang.org/orgs-using-d.html

Based on that I would look at Funatics since they seem to have the "bigger" traffic, or even AdRoll...



To be fair a lot of the companies there look like they handle heavier loads. Also Garbage Collection is optional, and there are alternatives to the "standard library" for D that others have made that are probably usable without GC. Some people have done successful embedded systems programming without the GC, I remember one guy talking about it on the D irc channel.

Netflix and Machine Learning with D.

There was a similar comment : https://news.ycombinator.com/item?id=14064012 who also seems to work on Machine Learning with D at Netflix. It will be a great help if you can add it to https://dlang.org/orgs-using-d.html or share the experience and scale at which its being used at Netflix if its not a problem . Thanks .

D would make a good case study to look at why some technologies find widespread use whereas others don't. I don't understand why it never took over from C++.

D's standard library uses garbage collection.

So do Java, C#, Python, Ruby, PHP, Javascript, and virtually everything else and they are very heavily used. Garbage collection is a smashing success in the real world and D made the right decision to follow that success.

Of course, it is also true that much of the standard library doesn't actually use it... but these objections are never actually about facts.

None of those languages you listed are for systems programming, which is the one niche you need to excel at if you want to replace C and C++. People who care about that stuff tend to care a lot about managing memory.

Ds GC can be turned off to do systems programming.

How much of the standard library can you use in that mode?

> So do Java, C#, Python, Ruby, PHP, Javascript, and virtually everything else and they are very heavily used.

That's the point. There are already many popular languages with GC. Why would people switch from C++ (this was the original question) to D instead of one of those much more popular languages that you mentioned?

C++ was deliberately backwards compatible with C. D was not backwards compatible with C or C++.

D uses C calling conventions and struct layout so it's pretty trivial to call D code from D or vicecersa and link together, the only step added over C++ is writing an include file with the declarations of the stuff you want to call.

I'd be interested in the perspective of someone who has used both GO and Dlang for command line tools. Go is my current tool for this, but D seems like a much nicer, less dogmatic language.

The answer is fairly obvious: dlang is a nicer language, by design. Go isn't meant to be particularly impressive on the language design/innovation front.

Allow me to defend the honor of Go a bit here!

It's not to everyone's taste, but it was designed by people who've been programming for a long time to be a language they'd like to program in. It is my favorite language for many tasks (and I've been coding for a couple decades now).

For me what's impressive about it is it's very simple, reasonably expressive, and just a really well-designed cohesive whole that doesn't usually expose sharp edges.

The longer I program the less I care about fancy things or being elegant or writing the smallest possible code, and the more I care about eliminating bullshit problems and wtf moments. Go's good at that.

Honest question: what strengths does Go have over D? I became very proficient at Go several years ago, and was reading about D for hours and hours last night, and it looks like D is overall a much better language.

The other strength that Go has over D is that Go, being used by Googlers, is much more likely to have better networking support. Two examples come to mind: HTTP2 support (Go was used for either the first or second implementation of that and has standard library support), and TLS cryptography, where Go has it's own suite written by experts, where a good portion of the non Microsoft ecosystem relies on OpenSSL. For web/app servers, CLI tools, and networking stuff, I think Go has a strong set of options. Go also has a nice cross platform story for that subset of programs (networking and CLI tools).

That set of good internet sensibilities is what attracted me to Go in the first place, and why I still like working in it in my spare time. If D has a similarly strong story in that area, I'd love to know about it.

In other words, it's because Go has the backing of a huge software company, while D doesn't.

That is part of what allows Go to have those advantages, but it is those advantages that give Go an edge, not the corporate backing directly, in my opinion.

There is Vibe.d but I don't know if is as good as Go stdlib.

I can only give my own biased opinion: Go's only advantage is it's commercial adoption e.g. builtin in support in AppEngine and co. I am not referring to it's ecosystem: While gofix and friends are very good, D has similar tools available + the usual bells and whistles for IDEs.

By chance, I found (while looking for a rob pike quote) this article critiquing Go's design: It uses D to demonstrate it's arguments. http://nomad.so/2015/03/why-gos-design-is-a-disservice-to-in...

The first code comparison in that article compares Go error handling with doing a `catch Exception` in D. I don't find that a fair comparison; `catch Exception` for a block of code that can fail in different ways at different points is a bad practice.

It depends. If you don't care about the specific error and just want to notify the failure up the stack or recovering is perfectly fine. A lot of Go code with the same intent just do a lot of "if err!=nil return err" which is the same but writing for every function call.

Simplicity and consistency is the biggest strength of Go. D is a complicated language with a ton of features. That makes it harder to learn, and it makes it more likely that you'll encounter "cleverness" in other people's code.

As a tiny nit, it's interesting to compare D's `out` keyword to Go's multiple return values. An `out` keyword seems like a prime example of "thinking in blub."

I find that kind of ironic. I know there's a lot of people that love Go and it's a solid language, but I personally see it as the ultimate blub language.

That's fair. Go was designed from the start to be pragmatic rather than innovative. It is a refinement, rather than an enhancement, of its predecessors. And I definitely appreciate that when working with other people's code: it's unsurprising. Still, there is a gradient to "blub-ness". I think Go does a marvelous job of walking the line between discarding old conventions (like 'out' parameters) while remaining at the low level of abstraction conducive to writing fast code (e.g. unboxed types).

I haven't actually used Go, but one of its big strengths seems to be that you get a binary with no dependencies (beyond libc I believe) at the end of it. You can just deploy it through a file copy and have it just work.

So can you with D.

With D, in addition, you can link against C shared libraries (dynamic linking) AND you can write standalone libraries in D that even C can link against (see Mir/betterC).

AFAIK, you can do neither of these in Go.

Well, Go will require compilation just like C++. The strength is its build tooling seems better integrated with the language implementation. C++ has make and friends but honestly, I prefer Go's. This is all history. Go is designed from the ground up with pitfalls of other languages in mind. Rust is obviously even better at this, with Golang as a case study.

For production you want to omit things like debug from binary so your binary is smaller. You have to play with the build flags. I can get a simple TCP server in Go in just a dozen lines but the binary is around 5MB before optimization.

[1]: https://blog.filippo.io/shrink-your-go-binaries-with-this-on...

Golang: Easy concurrency using go-routines, a (comparatively) good minimal latency GC, an excellent comprehensive and stable standard library and few bugs.

Dlang: Traditional concurrency (thread based), great generics implementation, nice algorithm/container libraries and C++ interop

Actually the standard libraries containers (In D) are fairly poor/ignored. Also concurrency in D is not traditional in the way you seem to imply: Immutability and message passing are the recommended way to do it, Fibers are also in the standard library.

Actually, D suports "fibers" where you have a "Task". Then you can switch the task handler to another implementation and have Go-like M:N coroutines. The default handlers in the stdlib are only thread based or single-thread coroutines but Vibe.d for example supports Go-like coroutines.

Still, Go is really hard to beat on that front because it makes it super easy and has the really brilliant feature of making almost everything that waits for IO interruptable inside a function when you call "go function()" without having to mark operations with "async".

This is sort of what I'm wondering. Are the compile times as good as Go? Is the GC as good as Go, or do you end up having to do manual management to get similar performance to Go? What is the build tool ecosystem like (I found Go's to be the one part of Go that was not easy to pick up quickly)?

D can be arbitrarily slow due to its meta programming features. The comparable feature for Go would be some code generation integrated into the build process. This difference makes it hard to compare.

For simple (non-meta-programming) code. Dmd and Go are in the same league with respect to compilation speed.

Btw compilation speed is the main reason for me to put off Rust and C++ and Scala. I cannot stand slow compilers anymore after using D for a while.

D compile times are great for the reference compiler and ok for the other two. The GC is a bit of a mess, but still works well enough for most things thrown at it. D allows "manual management" C or C++ style (#include <memory>).

D's goto build system is called dub, it also handles packages hosted on dlang.org: I know no defects or problems with it, it's pretty good.

I've dabbled in D occasionally and the compile times are really fast. They're fast enough that they even have a version (rdmd) that basically acts like a scripting language.

See https://dlang.org/rdmd.html

This is awesome, thanks for the link. This is where I'll start when I have a good project to try out D.

well, i dont think that there were much doubt that d was faster than python

i am thought impressed with how fast pypy did

I tried the python programs under pypy and python3 (after running through 2to3) -- and got similar speeds as the author. I was a little surprised that my little awk script was slower than than pypy (completing in 6 to 7 seconds):

  $ cat sum.awk
  { a[$2] += $3 }
  END {
    for (i in a) {
      if (a[i] > max) {
        maxk = i
    print "max_key:", maxk, "sum:", max

  $ time gawk -f sum.awk -O <ngrams.tsv
  max_key: 2006 sum: 22569013

  real    0m7.041s
  user    0m3.797s
  sys     0m3.156s
(This under Linux subsystem for windows)

According to the gawk profiler, and strace -c (count) -- the awk program mainly spends its time reading the file (without the loop at the end, looking for the max value, the runtime is essentially the same).

In fact, on the surface, pypy and python3 are quite similar on the syscall/strace front - with roughly 23k "read" calls -- awk did 375k. And adding cat in front sped it up by about two seconds:

  $ time (cat ngrams.tsv |awk -f sum.awk )
  max_key: 2006 sum: 22569013

  real    0m3.969s
  user    0m3.719s
  sys     0m0.516s

  $ time awk -f sum.awk ngrams.tsv
  max_key: 2006 sum: 22569013

  real    0m6.465s
  user    0m3.609s
  sys     0m2.859s

Out of curiosity (and with the danger of hijacking this thread as a stackoverflow discussion), I changed the awk code so that it could be used with GNU parallel - the "reduce"-step is essentially the same program as before:

  (cat ngrams.tsv \
    |parallel --pipe awk -f map.awk \
    |awk -f reduce.awk )
  max_key: 2006 sum: 22569013
This now runs in 17 to 18 seconds... :-/

  $ cat map.awk
  { a[$2] += $3 }
  END {
    for (i in a) {
      print "ignore", i, a[i]

  $ cat reduce.awk
  { a[$2] += $3 }
  END {
    for (i in a) {
      if (a[i] > max) {
        maxk = i
    print "max_key:", maxk, "sum:", max
[ed: However, there are faster awks than gawk:

  $ time mawk -f sum.awk ngrams.tsv
  max_key: 2006 sum: 22569013

  real    0m2.826s
  user    0m2.391s
  sys     0m0.422s
mawk is (a little) faster than pypy on my machine.


--pipe is well know for being slow.

Try --pipe-part instead:

    parallel -a ngrams.tsv --pipe-part --block -1 awk -f map.awk |
      awk -f reduce.awk

Thanks for the tip, always nice to see the author of tools commenting on hn :-)

The (old) version of parallel packaged with Ubuntu 16.04 (linux subsystem for windows) - doesn't have --pipe-part -- but running from upstream, the speed is more reasonable:

  $ time (./parallel-20170522/src/parallel -a ngrams.tsv \
    --pipe-part --block -1 -j4 mawk -f map.awk \
    | mawk -f reduce.awk )
  max_key: 2006 sum: 22569013

  real    0m2.265s
  user    0m4.672s
  sys     0m1.672s
(Tried a few variants with/without -jN -- and this seems typical for the fast end of the spectrum).

  $ time (cat ngrams.tsv \
     | mawk -f map.awk \
     | mawk -f reduce.awk )
  max_key: 2006 sum: 22569013

  real    0m3.472s
  user    0m2.891s
  sys     0m2.406s
[ed: btw, did a double-take when I saw your Gnu Privacy Guard id: 0x88888888 :-) ]

--line-buffer may or may not give additional speed up.

> GNU parallel

Which is a 15000-line Perl script, so to be expected.

The un-optimized (and probably most readable/maintainable) version in D was actually slower than PyPy

I wonder how fast you can make the Python version. For example by not producing intermediate lists with .split(delim). This produced a significant speedup in the D version, it should do the same in the Python version.

collections.Counter in Python 2.7 was not speedy at the time of this Stack Overflow question:


Another likely optimization would be to use the csv module (which would parse the rows in C).

"The task is to sum the values for each key and print the key with the largest sum."

What is the smart way to do this in kdb+?

This is my naive, sloppy 15min approach.

Warning: Noob. May offend experienced k programmers.

   k)`t insert+:`k`v!("CI";"\t")0:`:tsvfile
   k)f:{select (*:k),(sum v) from t where k=x}
   k)select k from a,b,c where v=(max v)

Using the file from the original,

    1#desc sum each group (!/) (" II";"\t") 0: `:tsvfile
Took about 3 seconds, 2.5 of which was reading the file


    q)\ts d: (!/) (" II";"\t") 0: `:tsvfile
    2489 134218576
    q)\ts 1#desc sum each group d
    486 253055104

I was using the first example with a char in the first column.

   A 4
   B 5
   B 8
   C 9
   A 6
How to solve with only a dict?

Regarding the 1gram file at https://storage.googleapis.com/books/ngrams/books/googlebook...

This is the result I got

   3| 1742563279

   q)\ts d:(!/)(" II";"\t")0:`:1gram
   q)\ts 1#desc sum each group d
   1897 134218176
   371 238872864

   k)\ts d:(!/)(" II";"\t")0:`:1gram
   k)\ts desc:{$[99h=@x;(!x)[i]!r i:>r:. x;0h>@x;'`rank;x@>x]}
   k)\ts 1#desc (sum'=:d)
   1897 134218176
   0 3152
   372 238872864
No doubt I must be doing some things wrong.

I actually had it wrong in mine. Wasn't paying attention and had the dictionary the wrong way around. Probably would have been more obvious with the char since you can't sum them...

With the reverse thrown in to switch the key/value around we get the correct answer

    q) 1#desc sum each group (!/) reverse (" II";"\t")0:`:1gram
    2006| 22569013

    k) {(&x=|/x)#x}@+/'=:!/|(" II";"\t")0:`:1gram
Works the same for the simple example

    k)e: 4 5 8 9 6!"ABBCA"

Does anybody here have "D version 5" compiled? I would like to compare it with my naive C++ version.

TL;DR: D is faster than Python.

Is it though? Un-optimized Python vs. a D script iterated on 5 times?

Certainly eye-catching but I wouldn't call it conclusive.

No, it isn't debatable which is generally faster (D), but I was impressed that you could just prototype in Python (probably significantly faster than D) and run it through PyPy when you're done and get 90% of the performance of D. Of course Python has the bloat of the interpreter and Jitter, while D is just a binary. Point is I was expecting more speed from D. I'm curious if this is just luck or the two would be neck and neck on a range of tests?

Optimizing/squeezing performance out of Python is a rabbit hole:


I would speculate using numba or Cython would yield further performance gains over PyPy...but that's mostly just based on anecdotal comparisons:


I just think it is a bit dishonest to try and make a claim as pointed as this article's in 2017 by stopping at simply running an un-optimized CPython script with PyPy.

I think they're just giving you bounds on what to expect and not selling anything. The D optimizations looked a lot easier (write it slightly different) than mucking with Cython or Numba. Simply running through PyPy is another thing all together.

I don't know, a lot of benefit can be had from Cython by just declaring types and flagging for compilation:



But that is just my opinion.

Sure, was for saving time. In my opinion, that article could be more elaborated.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact