
Beating Decades of Optimized C with 80 Lines of Haskell - szemet
https://chrispenner.ca/posts/wc
======
olliej
This is a pretty good example of why I don't like Haskell:

* The claim is Haskell is better because it's simpler: the final code is decidedly not simple

* You don't need to know about machine behavior: the example required numerous manual additions, e.g. explicit inlining, explicit strictness, and not using the "obvious" data structures

After all that contortion the only reason it was able to beat the C version
was by using multiple cores, specifically on a 4 core machine it was only ~60%
faster, and around 80% faster when parsing utf8. Better yet the C
implementation actually does true utf8 parsing, via libc, so it's not inlined,
whereas the "faster" Haskell code only counts the beginning of multibyte
sequences.

I would argue that sans the special cases (which the example doesn't trigger),
the C version is the obvious first pass implementation.

~~~
olliej
I just wrote a very dumb implementation of wc and it’s easily twice as fast.

I think the core issue is that wc is _not_ a super optimized program. It is
sufficiently fast for most purposes and so hasn’t ever been improved.

~~~
xrisk
Can you test it with GNU wc? I'm curious
[https://github.com/coreutils/coreutils/blob/master/src/wc.c](https://github.com/coreutils/coreutils/blob/master/src/wc.c)

~~~
olliej
Here is my super dumb implementation. Doesn't handle multibyte, doesn't handle
stdin -- but I don't think the Haskell version did either, so I don't have a
problem with that limit.

[https://github.com/ojhunt/wc](https://github.com/ojhunt/wc)

It's the simplest thing that could obviously work, but its generally not
tested beyond the most basic versions.

------
imode
I don't consider this "beating" the C version. The new version isn't even
semantically equivalent. You had to resort to multiple cores. A parallelized C
version of wc would probably be even faster.

There's not much content here, IMO.

------
Noe2097
What a clickbait indeed. The end version is still eating 3 times more memory,
but the worst is that it just doesn't work in the general case.

The way I generally use "wc" is inside a somewhat complex command line, with
commands preceding it. As in "feeding characters" to it.

There is a reason why "wc" is not multithreaded: it just can't. It must work
sequentially, because in the general case the input of "wc" cannot be skipped
over.

This is one of the two big assumptions that are made by the author ("wc works
only for files, so we can lseek") -- the second, identified by the author,
being that the underlying hardware and filesystem must support concurrent
access to the same location efficiently.

~~~
olliej
Don’t forget: the best case (where it can be parallel), is still only slightly
faster than the single threaded version.

------
carapace
The "flux monoid" for word counts is pretty cool. IIRC there was a paper about
doing something like that with regular expression monoids to make
parallelizable RE matchers, etc. I can't recall the title at the moment.

