
When Haskell is Faster than C (2013) - mightybyte
http://paulspontifications.blogspot.com/2013/01/when-haskell-is-faster-than-c.html
======
tikhonj
The title is a bit click-baity, but the core point is sound: C is not _fast_ ,
it's _optimizable_ [1].

If you just write readable, friendly C code it won't be all that much faster
than normal code in a high-level language like Haskell—it might even be
slower. I know, I've done that myself. But you never see that in benchmarks,
do you? That's not what benchmarks are about.

Here's another illustration: we can compile high-level languages like Haskell
and Scheme to C. Does this make them "as fast as C"? Yes—they're literally
running as C. But also _no_. Hand-optimized C code is going to beat those
compilers any day. It's not even close.

C is not magical performance sauce you can sprinkle over your code to make it
fast. It's just a language that doesn't have much mandatory overhead (no GC,
minimal runtime... etc) and gives you access to certain low-level knobs that
you can twiddle to optimize your code. I mean, that's important and useful,
but unless you're going to apply that level of effort _to your whole codebase_
, most of your C code won't be all that fast. Some applications need this
level of optimization; most don't.

[1]: I actually wrote a little article about this myself too:
[http://www.forbes.com/sites/quora/2014/01/09/can-a-high-
leve...](http://www.forbes.com/sites/quora/2014/01/09/can-a-high-level-
language-like-python-be-compiled-thereby-making-it-as-fast-as-c)

~~~
rtpg
When talking about performance, C actually can have overhead though.

For example, people rely on how C lays things out in memory, so C compilers
have to lay things out in memory in specific ways. Haskell doesn't have this
constraint and so can do a bunch of tricks.

And then there are things where C cannot move a variable/optimise out a
variable because it's unknown whether some unrelated bit of code my go poke at
its memory.

Now one might say "well if you know what you're doing then you can get the
compiler to do what you want." Everything's possible, since everything's
turing complete. But I would consider "variables need to be somewhere in
memory" to be mandatory overhead.

~~~
caf
Variables _don 't_ necessarily need to be somewhere in memory. Compilers
absolutely can and will notice if you never take the address of a variable to
do this optimisation - it's very common for local variables in C to be
materialised nowhere but in a register.

~~~
tacos
Thank you. ANSI C defines a language containing 32 words. And one of them is
"register."

------
jlg23
That's a great short read on the idiocy of over-optimization. I get a lot of
similar questions when people learn that I write common lisp for a living - in
most cases they go silent when I provide a sufficiently efficient solution to
a problem that was just given to me in the meeting in which we were supposed
to agree on a schedule for the implementation.... (which btw also happened a
lot when I was mostly hacking in perl 15 years ago - if you haven't read
"beating the averages"[1], read it now!).

[1] [http://paulgraham.com/avg.html](http://paulgraham.com/avg.html)

~~~
codemac
I'm a big scheme fan, and really enjoy programming in it..

but the lack of a real ecosystem is really starting to hurt my progress.

What suggestions do you have for lisp-1 -> lisp-2? While I prefer lisp-1 +
syntax-parse, I'd rather have.. quicklisp. quicklisp is life. the quicklisp
must flow.

~~~
DigitalJack
He who controls the quicklisp controls the universe!

I personally would recommend clojure + cider + emacs for an excellent
ecosystem and tooling.

~~~
jlg23
> He who controls the quicklisp controls the universe!

Too true, that's my main complaint about ql since day 1. Give me a simple way
to define a ql-repo hierarchy and I'll be fine with it. I did invest a few
hours maybe 3 years ago but realized that it's too much work (research and
patching) to allow for that. Recent developments (author pitching for money in
ql context) make me even more worried :(.

------
vvanders
If you're not using C/C++ to take advantage of locality of reference, low
memory footprint and the things that make C/C++ fast then why the hell are you
using it?

Seems like this should just be filed under "right tool for the job".

~~~
adrianratnapala
While I agree with this, I look around the history of programming and see that
too much code was written in C, and too much is being written in C++. And
writing even more of it is my day job.

One thing that slows down change is that new languages come with parallel
universes called "runtimes". The D language
([https://dlang.org/](https://dlang.org/)) is an honourable exception (it has
a runtime, but not a parellel universe).

But D is only an incremental improvement on C++, and somehow never took off.
Maybe Rust will be the Saviour.

~~~
ktRolster
The advantage (one advantage?) of C is that if you write a library in C, then
it can be used on any platform, and from any language. Basically every
language has bindings into C.

~~~
pjmlp
Every language has OS ABI bindings, that on OS written in C, tend to be the
same as the OS ABI.

~~~
ktRolster
[http://www.swig.org/](http://www.swig.org/)

------
chadaustin
If you find yourself setting out to write Haskell that can outperform C, you
will probably be disappointed. I love Haskell, but the myth that it just takes
a little bit of elbow grease to make it as fast as C really needs to die. See
for example: [https://chadaustin.me/2015/02/buffer-
builder/](https://chadaustin.me/2015/02/buffer-builder/)

THAT SAID, I have a relevant story. IMVU used this file format called "CFL"
for its 3D content. It was basically a kind of zip file except it used LZMA
for asset compression. The original CFL library was written in C++, and while
it worked fine, it was getting annoying to maintain and compile it, as well as
inefficient to pass data from Python file buffers into C++ and back out. So
one day I decided to see if I could replace it with a bit of Python code and
pylzma. After I made this change, parsing our content files was _twice_ as
fast. Having the code in a couple hundred lines of simple Python allowed me to
optimize the data flows, minimizing copying from the file buffers into the
LZMA decoder and then to the consumer of the data.

Sometimes the best way to make something fast is to write it in a safe,
expressive language that lets you directly express your intent. :)

~~~
khedoros
Where was the wasted time? Something about converting the data back and forth
through the CFFI? Just convoluted data flow?

~~~
chadaustin
Convoluted data flow and unnecessary copies! The C++ implementation was
thousands of lines and the Python one was a couple hundred, so it was much
easier to see the path the data had to take.

------
im3w1l
> I wrote the inner loop of the C to operate on a linked list of blocks

Linked lists are slow...

~~~
reikonomusha
Linked list has extremely fast prepend/insert/delete operations compared to an
array!

~~~
cyphar
Not really. Depends how big your list is. If the answer is "not really that
big" then cache misses and memory non-locality can cause linked lists to be
less efficient. Besides, arrays actually have the same amortized time
complexity as lists for dequeue operations. It's just in-the-middle deletions
and insertions that can cause problems in arrays.

~~~
enqk
Using linked-list is not such an issue if nodes are not allocated with the
general purpose allocator, and instead are allocated from a pool of memory
addresses next to one another.

~~~
goldenkey
Then there's no point of using a linked list... A linked list is literally to
connect distant memory addresses.

~~~
enqk
Such a linked-list using a linear pool of nodes is still useful if your
algorithm makes many inserts and/or removals.

The leap you're making here is about how distant the memory addresses should
be in general.

This is entirely dependent on your problem and your data-set.

~~~
goldenkey
A skiplist array is the proper structure to use. A linked list is never the
right structure for a local block of memory.

------
theseoafs
A ridiculous article. Yes, Haskell will outperform C if you write a crazily
inefficient C program.

------
mmaldacker
1\. The "reverse-complement" problem here is simply about reading a file,
reversing strings and mapping a small set of characters to another one and
printing the result. This is really a simple problem and there aren't a lot of
optimisations to be done. Really this about having the fastest I/O library and
thus doesn't seem like a good way to compare C and Haskell. If you look at the
various solutions presented on benchmark games, the core algorithms are all
the same and they only differ on how to read/write and the use of threads.

2\. The author talks about the importance of reducing cache misses due to
pointer indirection and then proceeds by implementing a character buffer as a
linked list of small buffers...

3\. The C version reads and writes character one by one, this can be greatly
improved by reading/writing bigger chunks at once. Actually, the author points
out this optimisation but says "that would require significant changes to the
code;". So the author spent time optimising the Haskell version, but spending
time optimising the C version is too much work? And then arrives at the
conclusion Haskell is faster?

4\. Commenters on the author's blog cannot reproduce the results...

------
rtpg
A similar paper on Haskell's stream fusion has a pretty similar PDF filename
([http://research.microsoft.com/en-
us/um/people/simonpj/papers...](http://research.microsoft.com/en-
us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf)). With benchmarks showing
the "naive" implementation in Haskell of a simple stream problem beating hand-
tuned C.

Haskell is still pretty well-defined in terms of execution, but freedom from
how to manage memory is in itself a major liberator for the optimisers.

------
bipvanwinkle
I think it's worth pointing out though that idiomatic C is probably going to
be more consistently performant. It seems common to run in to situations in
Haskell where one change can cause 10X speedup, but I don't see that nearly as
often with C code. I don't have a lot of evidence on hand to support this,
just what I've observed personally. This seen fair? Relevant?

------
pskocik
I don't understand why someone thought it was a good idea to require that
POSIX stdio should lock the thread. It slows down stdio several times compared
to the nonlocking version, and it completely goes against "don't pay for what
you don't use" as single threaded programs have to pay the cost too. And
locking and unlocking the stream manually is like two lines of code, which a
person writing MT code should be well capable of writing.

------
plinkplonk
C and Haskell have different strengths.

I've been programming in both for a while now, and I wouldn't use 'faster than
C' as a selling point.

In my (limited) experience, what Haskell gives you is conciseness and
abstraction. What C gives you is speed and 'close to the machine' programming
for when you need it. I'm not sure this article provides any useful insights.

------
jacquesm
previously on HN:

[https://news.ycombinator.com/item?id=5090717](https://news.ycombinator.com/item?id=5090717)

[https://news.ycombinator.com/item?id=5080210](https://news.ycombinator.com/item?id=5080210)

------
tacos
"...you don't need C. Haskell will give you the same performance, or better,
and cut your development and maintenance costs by 75 to 90 percent as well."

I want both the compiler from the year 2030 and the drugs this guy has.

~~~
goldenkey
Well he uses putc and getc which is IO bound to smithereens >.<

I guess you can call self-delusion a drug..

------
divkakwani
What tools did the OP use for profiling his C and Haskell code?

~~~
pjmlp
You can use ThreadScope for Haskell,

[https://wiki.haskell.org/ThreadScope](https://wiki.haskell.org/ThreadScope)

