
Wc in D: 712 Characters Without a Single Branch - aldacron
https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/
======
greggyb
Perhaps I'm just being pedantic or maybe I am misunderstanding.

The author claims to be IO bound toward the end. But they are comparing to two
versions that are faster.

It is my understanding that IO-bound means that the IO subsystem is the thing
which limits run time of the program. But the author clearly demonstrates that
the IO subsystem of their machine is capable of supporting faster wc binaries.

So what am I missing here?

~~~
EvenThisAcronym
> So what am I missing here?

The Haskell program is multi-threaded (I don't know about wc's official
implementation, but I assume it is as well), while the D program is single-
threaded.

~~~
unwind
GNU coreutils' wc (which is the reference) does not seem to be using threads,
no.

At least not at all obviously. No obvious header included and no mention of
threads. I only checked the code [1] very quickly, though.

[1]
[https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;...](https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/wc.c;h=d18eaee6c33e1a67b65f1e62bb427e8777bc0aed;hb=HEAD)

------
thom
Gave myself a quick D lesson just to understand the approach to flags here
(Yes.keepTerminator rather than just a meaningless bool). Turns out D lets you
define a template with any args you like, in this case taking a string for a
name of a Flag type, which in turn contains an enum with 'yes' and 'no'
boolean values. This means that you can only use the right type of Flag, with
a matching name, and its yes/no value. But the syntax is a bit icky
(Flag!"keepTerminator) and so _another_ nice feature of D appears to be that
you can intercept field dispatch in a struct. And so the 'Yes' struct does
this, captures the 'keepTerminator' as a string, and creates the correct type
of flag.

For whatever reason I found all this rather cute (and apologies to any actual
D programmers if I've misread this whole situation).

~~~
acehreli
Spot on! :) With the help of opDispatch (the catch-all member function
temlate), it's possible to drop the string from the use site. (I don't know a
way of dropping it from the type name.)

    
    
      import std.stdio;
      import std.typecons;
      import std.string;
      
      // This type's opDispatch removes the need for string in the flag name.
      struct FlagFromBool {
        auto opDispatch(string flagName)(bool value) {
          mixin (format!q{
            return value ? Yes.%s : No.%s;
          }(flagName, flagName));
        }
      }
      
      // A convenience function to remove the need for empty struct construction parenthesis.
      auto flagFromBool() {
        return FlagFromBool();
      }
      
      // Unfortunately, the type name still requires string flag names:
      void bar(Flag!"foo" flag) {
        writeln("called with ", flag);
      }
      
      void main() {
        // However, the expressions don't need a string:
        bar(flagFromBool.foo(false));
        bar(flagFromBool.foo(true));
      }

~~~
biotronic
Both your convenience function and the quotes in the type name can be removed
via the use of static opDispatch:

struct FlagImpl(string name) { bool value; alias value this; }

struct Flag { alias opDispatch(string name) = FlagImpl!name; }

struct Yes { static auto opDispatch(string name)() { return
FlagImpl!name(true); } }

struct No { static auto opDispatch(string name)() { return
FlagImpl!name(false); } }

void fun(Flag.foo a) {} // Look ma, no quotes!

unittest { fun(Yes.foo); fun(Flag.foo(true)); }

------
andrepd
"without a single branch" I thought it meant actually a branchless version of
wc. Turns out it's just no explicit if statements.

~~~
jnordwick
If you were careful, it seems pretty plausible to be able to use some indexing
tricks and CMOV to write a jump/branchless version of wc. You are basically
counting newlines and runs of whitespace.

~~~
zerr
CMOV aside, I remember it was proved MOV itself is turing-complete.

~~~
agumonkey
slightly related, does embedding arithmetic in mov instructions go faster than
explicit ALU operations ?

~~~
Filligree
Depending on the arithmetic, it seems that yes it can! I've noticed gcc using
LEA instructions for arithmetic of the form (x * a + b), where 'a' and 'b' fit
with the instruction.

~~~
jnordwick
Using LEA (load effective address) for calculation seems to be pretty common
in most programs for both gcc and llvm. You basically get smaller code, and it
used to schedule them better across more execution ports. Not sure if the CPU
can fuse the ADD+MUL now.

------
dig1
I believe I'm missing something here or my day was too long, but in Clojure
this can be squeezed in 13 lines and 435 characters keeping things fairly
readable (for Clojure & Lisp developers ;)).

    
    
        (defn wc [^String file]
          (with-open [rdr (clojure.java.io/reader file)]
            (apply (partial printf "%d %d %d\n")
                   (reduce
                    (fn [[nl nw nb] ^String ln]
                      (let [words (count (.split ln "[ ]+"))
                            bytes (alength (.getBytes ln "UTF-8"))]
                        [(inc nl) (+ nw words) (+ nb bytes)]))
                    [0 0 0]
                    (line-seq rdr)))))
        
        (defn -main [& args]
          (wc (first args)))
    
    

I haven't tested how fast it is, but startup time can be optimized by
compiling it with GraalVM.

~~~
tazjin
Neat! I wonder if the character decoding & regex usage has noticeable
performance impact. My Common Lisp version was sped up somewhat by switching
from a character stream to a byte stream:
[https://git.tazj.in/tree/fun/wcl/wc.lisp](https://git.tazj.in/tree/fun/wcl/wc.lisp)

You can try this one via Nix with:

    
    
      nix-build -E '(import (builtins.fetchGit "https://git.tazj.in") {}).fun.wcl'

~~~
patrec
It's been a while since I last wrote CL but I think your program can produce
any counts between zero and the correct one – you need to use eql instead of
eq if you want this to work in standard common lisp.

~~~
tazjin
Ah, you're right of course - fixed. Though that still won't make this
portable, as I'm using an SBCL-specific way of accessing argv.

~~~
gmfawcett
`unix-opts` is in available in quicklisp -- it's small, and has a portable
`argv` wrapper function.

~~~
tazjin
Thanks for the tip, added that:
[https://git.tazj.in/commit/?id=10e2e56b67b41eb6315fdc4cc1bc1...](https://git.tazj.in/commit/?id=10e2e56b67b41eb6315fdc4cc1bc161f43440dad)

------
bestouff
There's also a wc in rust (of course) with more code (120 lines) but quite
more efficient: [https://medium.com/@martinmroz/beating-c-with-120-lines-
of-r...](https://medium.com/@martinmroz/beating-c-with-120-lines-of-rust-
wc-a0db679fe920)

~~~
ses1984
In the blog post you linked, a library for parallelism is used.

~~~
kbenson
It's an additional two lines of code when they add it (and probably one more
to pull it in above), and only happens at the end of the actual work, after
they've matched C's performance and beat C's memory footprint. Using the
parallelism library at the very end hardly invalidates the rest of the
exercise.

------
pixelbeat__
A couple of points with comparing coreutils,

* recompiling with -march=native can give significant wins over more generic binaries provided by linux distros.

* parallel processing helps with bigger files, and it's easy enough to leverage the existing wc binary to process in parallel.

Both points are discussed at: [https://www.pixelbeat.org/docs/unix-parallel-
tools.html](https://www.pixelbeat.org/docs/unix-parallel-tools.html)

------
forgotpwd16
I'm wavering between D and Rust for which one could be used as a better
alternative to C++. (Though I don't see C++ getting replaced anytime soon.)
Even if Rust is getting most of the attention, D also seems to be a strong
candidate.

~~~
tastyminerals
Rust is definitely more difficult but it has it benefits, I guess? For me as a
Python guy D was just easier both concept wise and syntax wise. I could write
a relatively complex algorithm after 2 weeks of reading a D programming book
with just standard ops. And it was definitely faster. Maybe not as fast as C
but I felt efficient. Personally, I liked that D does pray FP like Scala while
also being a multiparadigm language. Aaand it is definitely more readable.

~~~
tastyminerals
*does not pray FP

------
ape4
Fairly elegant code. Using a library that splits the line into words makes it
pretty simple.

------
monadic2
This is ridiculous: of course there are branches, but you don’t explicitly
write them. This is purely aesthetic.

Edit: i simply wish the author illustrated why this is good or
desirable—conditionals are not difficult to read.

~~~
coldtea
The aesthetic aspect is insignificant, it's about the semantics -- and thus
reasoning about the code and other such properties. So nothing ridiculous
about it.

It's like Haskell code "of course has" anything C has, as underneath the both
run assembly instructions full of gotos and state manipulation, you just
"don't explicitly write it".

~~~
egdod
Branches aren’t hard to reason about though. The only* reason anyone cares
about branches is that branch misprediction is expensive.

~~~
BoiledCabbage
I think branches are actually harder to reason about, we're just used to doing
it since all of us have done it for so long. However I do believe that
branches are less straightforward than "linear" code and add to mental
complexity on larger projects.

Of course there is no data to back this up, but I think one of the next trends
in programming beyond the adoption of functional styles, immutable data,
"functional core / imperative shell" will be abstracting away from explicit
conditional/branching logic in higher level code.

Obviously people can come up with pedantic/extreme cases where the abstraction
does nothing to hide the complexity, or even makes things more complex but im
not taking about that. I mean more simple abstractions like what was used in
OP, or a filter() abstracting over a while and if combo.

I'm convinced based on personal experience that it makes for cleaner code and
will become more widely adopted over the years as people explore it.

~~~
egdod
How is it possible to reason about filter() without thinking it through as
effectively a conditional? How is that easier than being explicit about it?

~~~
BoiledCabbage
Why is separating pure functional code from imperative / side effect causing
code easier to reason about? You still have to have the side-effect somewhere.

Why are immutable data structures easier to reason about? They still produce
the same results.

None of the above are proven facts, but they are generally agreed upon
principles that a number of people have observed. And I happen to believe them
as well.

Overall my opinion of why it's easier is that like most optimizations it
reduces the number of states/cases you have to think about in the "common
path". With a solid abstraction you rarely need to think about the internals,
with explicit code you need to check and think through all of the edge cases.

filter() (or map, or ...) is probably the best simple case I can think about.
There is more room for bugs in an explicit (ex. for (int = 0; i < ...) {})
than in a filter over a collection / enumerable. When reading the code you
need to think through base and termination conditions as well as confirm
aggregation is happening correctly. With a filter() you don't have to do it /
be explicit about it. Obviously you still need to know what filter means
(which means you know it's implementation) but you don't need to focus on the
details, only the filter condition: the signal amid the noise. The rest is
pushed behind a solid reusable abstraction. Again thinking about it in
isolation if someone just introduced the filter function to you you might look
at it an say it really doesn't add much and if anything obscures my code I
don't see the value. But after adopting it, and it's related family of monadic
collection operations, code can now be written at a much denser, fewer off-by-
one errors, higher signal to noise, level of abstraction. It's provided
structure and common abstractions to unstructured iteration. To the point you
begin to think in these abstractions (which I think is a benefit).

Another example are parser combinators. There is nothing that parser
combinators do that can't be done by hand. But one of the primary way they
simplify coding is by hiding away conditional (and looping logic). "?*.." , |
and & concepts. Parser combinators also benefit in simplifying things by
building an algebra for users to work in with nice closed operations. But
again that only became possible by abstracting away the conditional and
looping logic and making these nice simple abstractions that can be easily
combined.

So my summary is, my view is that abstracting over control flow code does a
lot to simplify logic. A lot of logic involving looping has already been
abstracted over and included in languages and libraries (in general
"functional programming" styles). Now that the low hanging fruit has been
incorporated I think next will be the next layer of non-looping conditional
logic.

------
brian_herman__
I find D easier to grok coming from a Java/Python background.

~~~
vips7L
My only issue is that it's far more complicated than Java and for a language
that fills that same niche it's hard for me to justify putting my resources
into learning it even though I do like a lot of the features.

~~~
forgotpwd16
In what way is more complicated than Java?

~~~
zojirushibottle
well, java isn't really complicated. it's just verbose. but my understanding
is that d has many of the features of c and modern cpp. that alone makes it
more complicated than java...

~~~
gallier2
You can also write verbose code in D. The language doesn't prohibit it.

------
cestith
I'm no D expert but this seems to assume one file to count given as one
argument on the command line. The wc in coreutils takes any number of
arguments or will happily count STDIN.

------
bjarneh
Every time I see a post with some D source, I think the language looks great;
but for some reason I never try to learn it...

~~~
WalterBright
That's ok, one of these days you'll try it and then wonder what took you so
long :-)

