
Destroying C with 20 lines of Haskell: wc - develop7
https://0xd34df00d.me/posts/2020/02/destroying-c-with-20-lines-of-haskell.html
======
notacoward
This doesn't seem to be comparing anything like the same thing. Does the
Haskell version _really_ do the same thing as the C version? Does it handle
all of the same error cases, providing the same quality of error messages if
they occur? Does it handle localization? If not, that makes the comparison
_very_ skewed, as unhammer already pointed out. Sure, if you strip out all of
the things that the people who wrote wc actually spent all that time on then
you can go faster, but failing to note the differences is simply dishonest.

~~~
jacobush
Does it handle multiple files, and stdin?

Anyway, destroying Haskell with 100 lines of C :-)

[https://raw.githubusercontent.com/gdevic/minix1/master/comma...](https://raw.githubusercontent.com/gdevic/minix1/master/commands/wc.c)

~~~
jstimpfle
Do you get a speed-up if you use getc_unlocked() instead of getc()? And if you
write your own isspace()? As far as I know, isspace() is locale sensitive.

~~~
sauerbraten
isspace is defined as a macro at the top of the file, and it's definitely not
locale sensitive ;)

------
ColinWright
So I just wrote the most naive version of wc I could think of in C, matching
the capability of this Haskell version, and I _smoked_ the system _wc_ ... my
code, unoptimised, was over twice as fast.

$ time wc Backups/Tera2/files.txt

    
    
      1123699   2283439 161361844 Backups/Tera2/files.txt
    
      real 0m2.010s
      user 0m1.964s
      sys 0m0.020s
    

$ time naive Backups/Tera2/files.txt

    
    
      L: 1123699
      W: 2283439
      C: 161361844
    
      real 0m0.864s
      user 0m0.835s
      sys 0m0.028s
    

Smoked !!

(See my comment elsewhere[0] for a more sensible comment)

[0]
[https://news.ycombinator.com/item?id=22234673](https://news.ycombinator.com/item?id=22234673)

\--------

Update: I just compiled with -O3 and got user time of 0.24 secs. This version
is 8 times faster than the system wc.

~~~
zzzcpan
I got 5.6x speedup on just wc alone with:

    
    
      export LANG=C
    

(obviously in both cases with prewarmed filesystem cache)

~~~
felixge
That doesn't seem to help on macOS. I suspect you're on Linux?

------
ColinWright
Not being great at reading Haskell, I have some questions I was hoping people
here could answer:

* Does this cope with different whitespace, such as tabs?

* Does this cope with different settings of locale?

* Does this include the option of the "longest line"?

* Does this perform the character counts?

I'm pretty sure wc does all these, and that stripping them out would make it
faster. If this Haskell version doesn't do that, and yet still compares
against a fully-featured version of wc, the comparison hardly seems fair.

~~~
unhammer
* isSpace handles tabs, but looking at a single byte at a time it won't handle all the multibyte space symbols you can have in unicode. If you read further down, they rip out the remains of unicode handling for further speed improvements.

* Looking at a single byte at a time, it presumably only handles the "C" locale :) They don't say what locale GNU wc was tested with (if it's not LANG=C, that benchmark should be re-run)

* --max-line-length? no. But I'm guessing GNU wc isn't benchmarked with that option on (can't find the invocation in the blog post though)

* data State { ws, bs, ls } keeps count of words, bytes (more honest than calling it characters) and lines.

~~~
ColinWright
Thanks for the reply ...

> _... further down, they rip out the remains of unicode handling ..._

Ah. Well, that makes it a little unfair, surely.

> _Looking at a single byte at a time, it presumably only handles the "C"
> locale ..._

Again.

> _\--max-line-length? no. But I 'm guessing GNU wc isn't benchmarked with
> that option on_

I wonder if wc does the work anyway, and only reports it if asked, or if it
actually changes the code path if it's not needed.

So this entire post feels ... intellectually dishonest. personally I'm all in
favour of Haskell, and I wish I had the chance to use it "in anger" rather
than just doing the occasional toy thingie that I do. But this post doesn't do
it or its community any favours.

Disappointing.

~~~
inimino
Yes. "Destroying C". The whole post feels like youthful bravado untempered by
experience.

------
mkup
But how would Haskell version of wc compare with C version of wc running with
LC_ALL=C environment variable? UTF-8 locale is much slower than C locale in
coreutils, it's a well-known fact, and their Haskell version of wc is already
using fixed 8-bit characters.

~~~
0xd34df00d
wc was actually slower with LC_ALL=C as opposed to ru_RU.UTF-8 that my system
normally runs with (about 10 s against 7.2 s).

Which actually raises a good question of whether I should have been comparing
with that one — but that'd probably raise more questions and lead to more
people accusing me of cheating in favour of Haskell.

~~~
zzzcpan
The only way I can think of this could be true is if you made a mistake in
setting an env var.

------
megous
Write me a haskell program that will init DRAM on PinePhone, setup PMIC and
eMMC, load 25MiB of linux kernel and initramfs from eMMC to DRAM and will fit
in hard limit of 32kB, and will do all of this in 300ms max.

Then I'll consider C destroyed.

------
GlitchMr
Destroying Haskell with 1 line of C:
[https://www.ioccc.org/2019/burton/prog.c](https://www.ioccc.org/2019/burton/prog.c)

------
tsukurimashou
I don't think it is very "hard" to "destroy" most of these programs, they were
written a long time ago, they evolved with backwards compatibility or
portability in mind, these can run on pretty much any system. It does seem a
bit unfair to compare it to a quickly hacked together program that you test
against one use case.

Like the other said, the second article will probably be more interesting.

~~~
StavrosK
But multiple people have been working full time on wc for decades! At least,
if you believe the article.

~~~
iainmerrick
Decades yes, but who said “full time”?

~~~
joosters
I've been working full time on 'wc' for only the last five years, since being
promoted from full time working on 'cat'.

Once I've wrung every last drop of performance out of wc, I can move on to the
next largest limiting factor in GNU performance, 'tail' :-)

------
quadrifoliate
I would really like to see fewer of these clickbait post titles. The author
honestly admits here (props!)
[https://news.ycombinator.com/item?id=22235536](https://news.ycombinator.com/item?id=22235536)
that they only used the title because it encourages discussion.

In that case, I would say that it's up to the community to not take clickbait
titles like this seriously if we want to encourage reasoned, detailed content
rather than borderline flaming with words like "destroy" and "smash". I
personally don't enjoy this new Buzzfeed style of technical post at all.

Therefore, I have some comments about the actual code here, but going to keep
them to myself so I don't encourage more people to follow OP's example.

------
bobowzki
I don't like these "Destroying C" titles.

~~~
0xd34df00d
Me neither. But experience shows that these titles lead to more folks looking
at the post, leading to more feedback, leading to better
writing/experimenting/etc in the long run.

Also, I was trying to make a reverence to the original post that was "beating
C", with the connotation of further improving on that. I'm not a native
speaker so my language model might be terribly flawed.

------
jeen02
Ah, the weekly "I beat C using X language".

I don't understand why LOC was even brought up. You can put all your code on a
single line in most languages. The Github code linked even has 26 lines of
Haskell which makes this even more nonsense.

------
mhh__
Not unimpressive, but this a bit of a hot take. If I'm reading it right, a
single file benchmark (Big O who?) against a fully fledged production C
program is not a complete measurement

------
throwawa66
Related: I have an F# snippet i run from Linqpad to keep or remove lines
(based on keywords) from huge log files and it takes just a few seconds to
breeze through the log and it’s done

------
iainmerrick
I take it this is “destroying” in the sense of presidential debates or late-
night satirical monologues, i.e. diverting but meaningless.

------
TickleSteve
He is comparing a multi-threaded haskell version against a single-threaded C
version???

"...There’s also a parallel version that relies on the monoidal structure of
the problem a lot, and that one actually beats C"

coreutils wc is single-threaded, just checked.

~~~
divisonby0

      But that post left me wondering: is it possible to do better without resorting to parallel processing?
      Turns out the answer is yes.
    

From the introduction

------
divan
Happy 33rd anniversary of Haskell devs claiming to destroy C.

------
deathnoto
os: CentOS Linux release 7.7.1908 (Core) cpu: Intel(R) Xeon(R) CPU E5-2630 v4
@ 2.20GHz

Average time for wc on the same test file: 0.8 seconds

------
deathnoto
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz CentOS Linux release 7.7.1908 (Core)

average time: 0.8 seconds

------
jstimpfle
> So we’ve managed to just smash a C program that was looked at by thousands
> of eyes of quite hardcore low-level Unix hackers over a few decades. We did
> this with a handful of lines of pure, mutation-less, idiomatic Haskell,
> achieving about 4 to 5 times of throughput of the C version and spending
> less than an hour on all the optimizations.

I've done many very arrogant things in my life, because I've been a strange
guy with lack of self-esteem who doesn't have a clue how many things he
doesn't know. I hope I've never been _this_ arrogant, though.

~~~
pinkfoot
Perhaps - just possibly maybe perhaps - he was suggesting its is his superior
tool and technology that allowed his team to do this.

Do you think you took the most generous possible interpretation possible?

~~~
jstimpfle
Given that I was reminded of myself, I concede it might be closer to the least
generous one.

------
b-3-n
The post is very insightful and well written. The title is provocative which
is very common nowadays. However, there should be a "serious" section that
puts things into perspective. As others have pointed out it leaves a bit of a
taste otherwise.

~~~
iainmerrick
Those are serious flaws. I think it’s misleading, not insightful at all.

