
When Haskell is Faster than C - metajack
http://paulspontifications.blogspot.com/2013/01/when-haskell-is-faster-than-c.html
======
haberman
These "faster than C" claims are almost always embarrassing (usually involving
C code that would easily win if it were as aggressively optimized as the high-
level language) but that's almost not the point.

The real point is the larger narrative. The subtext of these posts is what we
are really arguing about. So let's just duke that out directly.

High-level language fans have a point, which is that high-level languages are
sometimes an better overall "bang for the buck" in developer time, and that
sometimes they can be pretty fast (possibly even out-performing an un-
optimized C program). Reasonable C guys aren't arguing against this. We're
certainly not arguing that people should use C for everything.

But here's what high-level language fans have to understand. First of all, you
depend on us. Your language runtime is (very likely) implemented in our
language (possibly with a little assembly thrown in). So as much as you may
like your language, it certainly does not _obsolete_ C. C guys like me get
cranky when high-level language fans imply that it does.

Second of all, a C+ASM approach will _always_ win eventually, given enough
time invested. That is because a C+ASM programmer has at his/her disposal
literally every possible optimization technique that is implementable on that
CPU, with no language-imposed overhead. What this means is that a higher-level
language being "faster than C" is just a local maximum; the global maximum is
that C is faster.

Yes, it's absolutely true that in limited development timeframes a higher-
level language might still be the right choice, and in rare cases might even
have better performance. But for long-term projects that want the absolute
best performance, C (or C++) are still the only choice. (But maybe Rust
someday).

~~~
Xurinos
I guess I will be your essay-writing dissenter.

C is a high-level language with fewer features than many other languages, is
not necessarily the engine behind other languages, has the problem of its
programs poorly implementing a percentage of what other high level languages
are capable of doing quickly and securely, and provides slow and troublesome
memory allocation out-of-the-box. When comparing the speed of operations,
people are rarely comparing apples to apples. And contrary to what is
spattered on the boards, C is not an understandable, close-to-the-metal
wrapper around assembly instructions (compilers have advanced quite a bit).

I love C. It does feel fast, and I get the illusion of being close to the
metal. It was one of my first languages, holding a sentimental place in my
heart. Very important things are written in it. It is a high-level language
with some okay abstractions.

Is it underneath other high level languages? Maybe, if you mean that the
compiler might be written in C in order to bootstrap the language. Of course,
one could write the compiler in any language; it's all about translating
programmer-friendly symbols into assembly or VM bytecode, right? And speed of
compile is a different subject from speed of the compiled program.

But here is the gotcha on raw performance: Your large C program poorly
implements a percentage of what other high level languages are capable of
doing quickly and securely.

I once foolishly argued in favor of C's performance, saying that one could
write a layer that supports all these nice features speedily, such as the data
structures I will mention below as well as GC; by the time you do that, you
might as well be using a different language. You probably implemented that
layer poorly, compared to other languages with large communities pounding at
and optimizing that layer. For example, when you implemented your "fast" list
with the basic struct and next pointer, did you also implement the new-node
creation in such a way as it still uses raw malloc(), as opposed to managing
previously-malloced memory efficiently?

How many implementations of a basic list do we need in C? Super large
integers? Fixed-point integers? Growable arrays? Lazy/infinite lists? Trees?
Hash maps? Surely you don't think these other language designers said to
themselves, "Let's support hash maps and make them slow." No, they came up
with a fast standard, supported by their language, sometimes complete with
various configuration options to make all the tradeoff decisions on making
those data structures speed-efficient or memory-efficient for reads or writes.
Others, of course, subscribe to a religion, er, a specific tradeoff, such as
perl's approach to {}s ("There's more than one way to do it" ... unless you
are dealing with hash tables).

What about all the wonderful memory management you can do in C? Aren't you
closer to the metal that way, able to make basic memory allocation super
speedy? Not really. This is part of the illusion. malloc() is slow enough that
developers have rewritten versions of it several times. ROM-based MUDs, for
example, manage their own memory, using an initial malloc, of course, but
regularly using their own set of allocators and deallocators (free_string,
str_dup, etc) on top of that allocation. There are these tricks and more in
high level languages, including the sharing of partial structures (kinda like
union but with more pointers and fewer bugs associated with those pointers),
allowing for resource allocation strategies that can be "faster than C".

If the argument in favor of C's speed at the end of the day is, "When we write
crappy programs with buffer overrun holes, memory leaks, and no error
handling, it's super fast!", we are (1) not comparing apples to apples and (2)
doing ourselves and our customers a grave disservice.

Let's be honest: C is no "closer to the metal" than other high level languages
(<http://news.ycombinator.com/item?id=3753530> and
[https://en.wikipedia.org/wiki/Low-
level_programming_language...](https://en.wikipedia.org/wiki/Low-
level_programming_language#Relative%20meaning)). The days of manually XORing
to assign 0 to a variable are well behind us.

Note... I don't mean all high-level languages. There are many very slow
implementations of these languages. Programmers are getting better at this
stuff in modern implementations, gcc included. And it is fair to say that
there are many things other languages can do that are, when you compare apples
to apples, faster than C, especially when you factor in modern JIT
compilation; and they also do some things slower than a similar function in C.

No, C isn't and won't be obsolete, not until people write popular OSes in
other human-readable languages, complete with a body of excellent libraries.
We operate in a world of legacy, working code.

Edit: Looks like we both must have not read the article before replying. The
author goes over many of these points.

~~~
haberman
> Let's be honest: C is no "closer to the metal" than other high level
> languages

This is dead wrong, and your links do not support it. This is exactly the kind
of statement that gets me grumpy.

Your link illustrates that an aggressive C optimizer can collapse a chunk of C
code down to something smaller and simpler than the original code. This is
true.

But what you said is that C is "no closer to the metal" than other high-level
languages. Let's examine this assumption.

Take this C function:

    
    
      int plus2(int x) { return x + 2; }
    

You can compile this down into the following machine code on x86-64, which
fully implements the function for all possible inputs and needs no supporting
runtime of any kind:

    
    
      lea    eax,[rdi+0x2]
      ret
    

Now take the equivalent function in Python:

    
    
      def plus2(x):
        return x + 2
    

In CPython this compiles down to the following byte code:

    
    
      3           0 LOAD_FAST                0 (x)
                  3 LOAD_CONST               1 (2)
                  6 BINARY_ADD          
                  7 RETURN_VALUE
    

Notice this is byte code and not machine code. Now suppose we wanted to
compile this into machine code, could we get something out of it that looks
like the assembly from our C function above? After all, you are claiming that
C is "no closer to the metal" than other languages, so surely this must be
possible?

The tricky part here is that BINARY_ADD opcode. BINARY_ADD has to handle the
case where "x" is an object that implements an overloaded operator __add__().
And if it does, what then? Surely just a very few instructions of machine code
will handle this case, if C is "no closer to the metal" than Python?

Well __add__() can be arbitrary Python code, so the only way you can implement
this BINARY_ADD opcode is to implement an _entire Python interpreter_ that
runs __add__() in the overloaded operator case. And the Python interpreter is
tens of thousands of lines of code in... C.

The end result is that writing the same function in C and Python is the
difference between two machine code instructions and implementing an entire
interpreter.

This is why I get grumpy when people deny that C is any different than other
high-level languages. While this is a somewhat extreme case, you could make a
similar argument about most operations that happen in other high-level
languages; similar constructs will very frequently have less inherent cost in
C.

~~~
gnuvince
The equivalent `plus2` OCaml function compiles to:

    
    
        camlAdd__plus2_1030:
        .L100:
        	addq	$4, %rax
        	ret
    

(It's using 4 instead of 2, because ints are boxed; a 0 in the last bit
denotes an int, a 1 denotes an address).

~~~
PaulAJ
Part of the point of the original article is that not even assembler is "close
to the metal" any more. How long does that fragment of assembly code take to
execute? Depends on whether the instructions are in the I-cache, whether some
previous branch prediction has failed, and whether the data are in the cache.
All this adds up to a couple of orders of magnitude.

------
jlarocco
I don't think his example is helping his argument at all.

He cherry picks optimizations for the Haskell, such as using
Data.Vector.Unboxed instead of the regular lists and removing calls to
isLetter, but then he rolls his own linked list and uses getc in the C
version. He doesn't even have the correct return type for main.

Haskell written by decent Haskell programmers is faster than C written by poor
C programmers. Not very surprising.

~~~
ozataman
I'm no C expert - too bad to hear that about his C code. However I can tell
people here that Vector.Unboxed is a very common optimization as soon as you
start thinking about performance in Haskell. Nothing "expertly" about it,
really. I, for one, use it in all of my computational Haskell code.

~~~
jlarocco
Fair enough.

My complaint is that the C code isn't given the same chance. He even calls out
that reading data with getc is a known performance problem, but then does it
anyway. Any book on learning C will point out that getc is slow for reading
lots of data, and fscanf or fread should be used instead.

~~~
gnuvince
From what I understood from his post, using a buffered input function would a)
make the code differ from the specification, b) require more refactoring than
the Haskell code needed. b) seems particularly important when the program is
not <100LOC, but tens or hundreds of lines of code.

~~~
cube13
The problem is that his implementation is essentially a test of how he's using
libc versus how Haskell is using it.

The pointer arithmetic he's using shouldn't need to be optimized, since the
page sizes he's malloc'ing and pulling from memory are small enough that they
should stay in the L1/L2 cache for the entire run(he's using 1k blocks of
data, most processors use 4k pages). There's almost no optimization to be done
there.

The biggest performance hit is actually the single character puts versus a
block read or write.

Haskell is probably implemented to read a large block of the file(or perhaps
the entire file) into memory, then parse after. That would be a minimal number
of system fread calls over the entire run. Versus the C code, where 1 block's
parse could be 1024 getc and putc calls.

~~~
dllthomas
It's not about system calls, it's about locks. The getc function _is_ buffered
(by default) - that's what's going on behind the scenes in that FILE
structure. What is slow about calling getc over and over is synchronization
around that FILE object (hence the existence of functions getc_unlocked, &c).

------
jacquesm
I'm in no position to judge the quality of the Haskell code (I couldn't
program my way out of a wet paper bag in Haskell) but after half a lifetime of
writing C for a living I can say with confidence that the C code is horribly
written and horribly inefficient.

It is tempting to pull the code and fix it.

If you want to compare two languages make sure that you are proficient in
both.

~~~
CountSessine
I agree - if you're comparing runtime implementations.

I think this is actually a useful comparison, though. I would argue it's a lot
easier to become a proficient, performance-conscious programmer in python,
java, or even Haskell, than C. And you're more likely to shoot yourself in the
foot with C.

From the point of view of someone who hasn't learned either language (maybe a
scientist or engineer looking to do some simulation work), the message here
is, "with the same time and effort, not only is Haskell as fast as C in many
cases, but in some cases it will actually be faster than the C code that you,
a beginner, can write."

------
metajack
A similar anecdote from my own experience:

Chesspark had a web and a win32 native client. It kept track of your friends
with a roster (the underlying stack was based on XMPP). As a way to make new
users feel welcome, I and a few coworkers were added to all new user's roster
(like MySpace Tom). It wasn't long before this overwhelmed our clients.

Each client was written by a different developer. One was in JavaScript, and
one in C. Due to some poor design choices on the C side, the JavaScript client
was substantially faster. I don't remember the exact benchmarks, but I think
it was close to 50x faster.

The JavaScript code took less time to develop and less time to fix. The C code
did eventually get fixed and outpace the JavaScript on raw speed, but if I
were doing it again, I never would have made a native client to begin with.

~~~
tedunangst
Good design choices are better than bad design choices?

~~~
PaulAJ
More like: good design choices are vastly more important than micro-
optimisation, and Haskell is a good design choice.

------
raphaelj
I want to point something nobody ever talk about :

Efficient data containers.

For instance, Haskell and C++ come in standard with a lot of these (maps,
sets, various linked lists, ...) whereas the C standard doesn't come with
anything like this. Moreover, type parameters and templates from these
languages make usage of such structures easier and safer.

I've seen some C codes where programmers has used sub-effective containers to
answer the problem (like arrays or simple linked lists) as they were lazy to
use a non-standardized or to implement a faster but also more complex
container. In C++ or Haskell, these efficient structures come for free.

Also, we can say the same for algorithms and concurrency. Haskell awesome
safety and expressiveness made it really easy for me to implement such complex
yet efficient systems.

These two features of high level languages tend to build ofter faster programs
in those languages.

~~~
Snoptic
Haskell maps, being persistent, aren't anywhere close to efficient for mutable
maps.

------
taylodl
The paragraph beginning with "To put it another way, C is no longer close to
the real machine" really drove the point home. The further C gets away from
the real machine then the less useful C will become. Higher-level languages
have the advantage that a compiler can more easily determine what the program
is attempting to accomplish and optimize the result for a specified
architecture. This will be very difficult to do for C. As a result I would
expect the performance of higher-level languages to start exceeding the
performance of C. I guess we're just going to have to change our conventional
wisdom when that time comes.

~~~
ajg1977
Well that point is complete nonsense so you should undrive it.

Just because a piece of code incurs hardware related performance issues does
not mean it's "no longer close to the real machine". Cache misses? Reorder
your data, or start inserting prefetch statements. Mispredicted branches?
Issue a hint, or structure your code better.

Both of these are profile-guided optimizations. It's very difficult for
compilers to optimize for access patterns that will only become clear in the
context of execution, and often depend on your target spec machine.

~~~
graue
But a JIT can perform such optimizations. Which doesn't help Haskell, but may
help JavaScript or Lua beat C in some cases.

~~~
ajg1977
I'm doubtful that a JIT could realistically adapt its output based on runtime
performance metrics, but if you have links I'd love to read more.

~~~
vsync
[http://en.wikipedia.org/wiki/Tracing_just-in-
time_compilatio...](http://en.wikipedia.org/wiki/Tracing_just-in-
time_compilation)

------
nemo1618
As many advocates of functional programming point out, in many cases the speed
of _development_ is more valuable than the running time of the code. The great
strength of Haskell and other FPs is their readability and modularity. Trying
to win people over with benchmarks is the wrong approach IMO.

~~~
PaulAJ
Its more a matter of trying to nix the "Haskell would be nice, but its too
slow to be practical" line that many see as a killer.

~~~
chc
Does Haskell really have a reputation for being slow? In a world where Ruby is
the de facto standard for startups, I wouldn't think Haskell would have
anything to worry about.

~~~
chongli
It does. A lot of people have written ignorant blog posts whereby they show
their first "real" Haskell program is extremely slow compared to their heavily
optimized C version (with years of experience behind it).

A cursory glance at the code, however, reveals that their Haskell program is
using linked lists of linked lists of boxed arbitrary precision integers while
the C version uses a 2D array of ints.

Okay, perhaps that's a bit of an exaggeration but you get the idea.

------
TheNewAndy
Before you go and rewrite everything in Haskell for speed, it might be worth
trying to reproduce the results from here. I tried to and C was much faster
than Haskell:

[http://paulspontifications.blogspot.com.au/2013/01/when-
hask...](http://paulspontifications.blogspot.com.au/2013/01/when-haskell-is-
faster-than-c.html?showComment=1358558313945#c3252681956440975671)

~~~
vilhelm_s
When I try it, the Haskell version is faster (1.3s versus 2.6s on a 98MB input
file). GHC 7.0.4 with -O3, GCC 4.6.2 with -O3. Not sure why these results are
different...

------
Xcelerate
Hmm... it seems to me that whenever any article is posted that claims "X is
faster than C", there are immediately 40 replies saying "Well, the author's C
is horrible. If I wrote that, it would be much different."

Okay, as someone who has NOT been programming in C 8 hours a day for years on
end, I would actually like to see somebody do this -- to show me what GOOD C
looks like.

So if someone wouldn't mind, could you take his C code and show me the
improved version? This would really help me understand. (And I don't just mean
an example where one line could be improved; I'm talking abou the whole
thing.)

~~~
jacquesm
Ok, I pledge to re-write this properly and to benchmark the current
implementation vs a nice one. I'll post the results. I need something to get
my mind off things and this is as good as any. It will take at least until
Monday (it is my sons birthday tomorrow).

~~~
ay
Hey Jacques, may ask for a code review ? :-)

<http://stdio.be/revseq.c>

Compiled with "gcc -pipe -Wall -O3 -fomit-frame-pointer -std=c99 -pthread" on
my Mac, it's about twice as fast at the blog author's version.

Time spent: coding: 30 minutes bugfixing: 30 minutes

I have a feeling there is some kind of catch in the description of the
algorithm in terms of implementing the output, but I for the life of me could
not grok whether they wanted me to parse the entry into these three pieces or
not...

EDIT to add: The only optimization I made was inlining the tightly-called
"subst" function, did it without any profiling (so the optimization process
literally took about 30 seconds:). Before inlining this version was still
about 15% faster than the blog author's one.

~~~
jacquesm
Sure, I'll pick it up in the post. Neat little project this. I won't peek at
your code until I'm done.

~~~
ay
Thank you! It was indeed a nice little fun exercise. It is interesting the
bugs that I made by being tired and not reading the task carefully/not
thinking clearly (yesterday was a bit of a long and stressy day):

1) My initial understanding was that I do need to reverse the order, yet
somehow after re-reading the article I understood the order does not need to
change, and the "reverse" in the name is some kind of jargon. This is quite
stupid, and probably not worth mentioning, if only to prove I was tired :)

2) missing that the first iteration of the "business logic" code in my case
happens before anything is filled in. Crash.

3) forgetting about the "\n"s - with rather funny "partially correct" output
effect.

Very much looking forward to see your code !

~~~
jacquesm
Hey Andrew,

Ok, I looked at your code. What you really should do (before coding up the
solution) is to look at the problem specification. Other than that I like the
'direct' approach, it isn't quite as fast as what I cooked up but yours is a
lot shorter.

~~~
ay
Thanks! yes, this goes to show the perils of coding after a 12h work day on
Friday :-). This affected the use of fgets() I/O (I understood you _have_ to
call the line-buffered routines based on their description)

Enjoyed reading your code. Beautiful. Thanks!

------
confluence
I want an article that says "Why it doesn't matter what's faster than what -
just ship product people use". I code everything I create using the language
that gives me the least amount of friction between me and the working product.

On Android that's Java/C/C++, for my servers it's Python/Ruby/Java/C++/C
(everything is service oriented consumable APIs), for the browser it's
Javascript. Any time I want something faster - I recode and bind it in C/C++
if at all possible.

I feel like a lot of these language comparison posts are just a massive
pissing contest. All that matters is that people use the thing that you made.
Billions of lines of code have been written in the past. Make sure your code
isn't part of the billion that nobody cares about.

------
Yttrill
I'm afraid the argument that C+ASM is always faster is flawed in reality. Pure
ASM, with a bit of C thrown in, maybe, but this is just as impractical for
complex codes as C itself is.

It is well known that for numerical codes Fortran beats the pants off C. Why
is this? Because the structure of C programs proves difficult to optimise
automatically. Indeed the C committee attempted to address one of the main
problems by introducing the restrict keyword (the problem of course is
aliasing).

For complex codes, ASM isn't an option. For large functions, high levels of
optimisation aren't an option for C because C compilers are incapable of
optimisation in a reasonable time frame: I have short code that cannot be
compiled in less than 2 minutes on either gcc or clang. Full alias analysis
requires data flow which is cubic order on function size and C compilers are
incapable of partitioning functions to keep the cost down.

Furthermore, C has an weak set of control operations and an object model which
is generally inappropriate for modern software. K&R C compilers were hopeless
because of the ABI required the caller push and pop arguments to support
varargs, preventing tail rec optimisation of C function calls.

Subroutine calling is useful, but it is not the only control structure.
Continuation passing, control exchange, and other fundamentals are missing
from C. These things can always be emulated by turning your program into one
large function, but then, it isn't C and it cannot be compiled because the
best C compilers available cannot optimise large functions.

Similarly, complex data structures which involve general graph shapes require
garbage collection for memory management. With C that's not built in so you
have no choice but to roll your own (there is no other way to manage a graph).
It's clear that modern copying collectors will beat the pants of C in this
case.

C++ pushes the boundaries. It can trash C easily because it has more powerful
constructions. It had real inlining before C, and whole program compilation
via templates. It is high enough level for lazy evaluators to perform high
level optimisations (expression templates) C programmers could never dream of.
And C++ virtual dispatch is bound to be more effective than roll your own OO
in C, once the program gets complex because the C programmer will never get it
right: the type system is too weak.

Many other languages generate C and have an FFI, some, like Felix, go much
further and allow embedding. Indeed, any C++ program you care to write is a
Felix program by embedding, so Felix is necessarily faster than C by the OP's
argument: C++ is Felix's assembler.

As the compiler writer I have to tell you that the restriction to the weak
C/C++ object model is a serious constraint. I really wish I could generate
machine code to get around the C language. Its slow. Its hard to express
useful control structures in. It tends to generate bad code. With separate
compilation bad performance is assured.

I am sorry but the OP is just plain wrong. C is not assured to be faster, on
the contrary, its probably the worst language you could dream up in terms of
performance. The evidence is in the C compilers themselves. They're usually
written in C, and they're incapable of generating reasonable code in many
cases and impossible to improve because C is such a poor language that all the
brain power of hundreds of contributors cannot do it.

Compare with the Ocaml compiler, written in Ocaml, which is lightning fast and
generates reasonable code, all the time: not as fast as C for micro-benchmarks
but don't even think about solving complex graph problems in C, the Ocaml GC
(written in C), will easily trash a home brew collection algorithm.

Compare with ATS(2) compiler, written in Ocaml(ATS), which by using dependent
typing eliminates the need for run time checks that plague C programs given
the great difficulty reasoning about the correctness of C codes. AST generates
C, but you would never be able to hand write that same C and also be confident
your code was correct.

Compare with Felix, compiler written in Ocaml, which generates C++, can do
very high level optimisations, which can embed C++ in a more flexible way than
a mere FFI, and which provides some novel control structures (fibres,
generators) which you'd never get right hand coding in C.

The bottom line is that OP's claim is valid only in a limited context. C is
good for small functions where correctness is relatively easy to verify
manually and optimisation is easy to do automatically, and any decent C code
generating compiler for a high level language will probably generate C code
with comparable performance.

So the converse of the argument is true: good high level languages will trash
C in all contexts other than micro tests where they will do roughly the same.

~~~
DannyBee
It's not actually the structure of C programs, it's the guarantees the
language offers. So only about the first paragraph of your rant is right.

Fortran had almost no memory aliasing, explicit global accesses, and offered
almost unbridled implementor freedom. As long as the operations got done, it
didn't care what happened behind the scenes.

None of the rest of the things you talk about matter when it comes to
optimizing C, to be honest. If you gave me no aliasing by default and explicit
globals, I probably could do near as well as fortran (though it would take
significantly more analysis) in terms of loop transformations.

Note that "full alias analysis" is statically undecidable. When you say cubic
order, you are thinking of Andersen's subtyping based algorithm. There are
unification based algorithms that are almost linear time (inverse ackermann).

At this point, we have scaled these algorithms almost as far as you can on a
single computer. You can do context insensitive andersens on many million LOC
without too much trouble.

Context-insensitive unification points-to can scale to whatever size you like.

Context sensitive unification based algorithms do quite well in practice with
10 million LOC + codebases.

The main reason you don't see unification based algorithms used often in free
compilers is because the entire set of algorithms are covered by patents owned
by MS Research.

As a final note, note that C++ does not really help optimization in practice,
it often hurts it.

It is very hard to teach a pointer analysis algorithm about virtual calling.
Most compilers treat them like function pointers that get type-filtered, and
do some form of class hierarchy analysis to limit the number of call graph
targets, or try to incrementally discover the call graph. It's a bit of a
mess.

On the other hand, straight function pointers get resolved either context-
sensitively or insensitively.

In fact, C++ makes type based aliasing a _lot_ worse due to placement new
being able to legally change the type of a piece of memory, which is very hard
to track over a program.

Even outside the realm of alias analysis, C++ involves a lot more structures,
which means a lot more time has to be spent trying to get pieces back into
scalars, or struct splitting, or something, so that you don't end up having to
fuck around with memory every time you touch a piece of the structure.

I could go on and on.

In short: Any of C++'s lower level optimization advantages come from less
pointer usage by programmers, not language guarantees.

At the high level, it's from better standard implementations and common usage
idioms.

In any case, high level languages, particularly those with memory objects (not
real pointers, just memory objects, like Java) usually solve none of the
pointer/alias analysis related problems. You are still stuck with the same
pointer analysis algorithms.

For example: The only nice thing about java's memory system is that doing
structure-field sensitive pointer analysis can only help, whereas in C it can
hurt, due to some weirdness.

It's just nobody usually gets around to doing pointer analysis on the higher
level languages (because it's harder and offers no particular benefit), they
lower their language to an IR that already has a good algorithm in it.

Just in case you were wondering, i'm not talking out of my ass. I wrote GCC's
first set of high level loop optimizations, and also, it's pointer analysis.

~~~
srean
Lots of interesting information. However, I think you over-reacted about
Fortran. After reading the parent comment and your comment it seems both of
you are saying the same thing: Fortran has the advantage of having no pointer
aliasing.

Regarding unification based algorithms, do microsoft use any of it in their F#
compiler. I ask because they have time to time tried to say we wont sue you
for F# technology. Dont know how much of those sweet nothings are binding.
Given your knowledge about compilers and legal systems I am very curious to
hear your opinion.

~~~
DannyBee
I don't know off hand if they use it. I know they do in static analysis tools.
As nice as MS is, they seem to consider compilers solely a cost center. Their
compilers produce "relatively good code", but have never really been state of
the art.

------
pekk
The contradiction of "C is always faster than everything" (apparently shown
untrue by comparison on the reverse-complement problem) is not "Haskell is
faster than C".

~~~
threedaymonk
Indeed not. The useful point this article makes, I think, is that if you're a
good Haskell programmer and a competent C programmer, you can produce working
code in both, but you can still produce better-performing code in Haskell.

Essentially, the lesson is that there's little point writing code in C for
performance unless you're good at it. If you can write good-enough code in a
higher-level language that you're happy with, you might already have reached
an optimum.

------
sampo
> The winningest programs are always written in highly optimised C

Actually, for example looking at the winningest programs for x64 single core
(<http://benchmarksgame.alioth.debian.org/>):

Ada: 1, C: 5, C++: 2, Fortran: 1, Haskell: 1, Java: 2, Javascript: 1

So plain C is only winningest in 5/12 = 38%, and C/C++ in 7/13 = 54%.

------
zxcdw
As always talking about performance differences and optimization without
thoroughly profiling and pointing out bottlenecks should be frowned upon big
time.

I would really love to see the generated assembly code and see _what_ makes
the difference in performance. Anyone up to analyzing?

------
chj
Another monthly claim of XXX is faster than C. Please improve your C
experience first.

------
martinced
I've got an honest question: it's not sarcasm...

How do you write fast multithreaded C-code? The article mentions that the C
code is too disconnected from the real hardware which, in this case, has
multiple cores.

Do you need to call (non portable?) code setting mutexes manually? (not that
it would be a problem)

How do you use the CPU's underlying CAS operation? (by inline assembly?)

As an example: the guys who wrote the very fast LMAX disruptor pattern in Java
relied on the fact that Java does provide methods inside the AtomicXXX classes
calling CAS operations under the hood. But sadly they couldn't "pick" the one
operation they'd like, which would have been faster than the one Java decided
to use (it's a RFE if I recall correctly: they'd like Oracle to modify Java so
that it uses the faster version when it makes sense).

I take it that in C you can inline assembly and do as you want!?

------
DannoHung
The program in question copies data from stdin, does the barest minimum of
preprocessing, statically remaps all characters, and writes to stdout...

Why on earth is this a demonstration of what Haskell can do effectively?
There's no room to exploit anything interesting about type level reasoning.

------
pretoriusB
> _Conventional wisdom says that no programming language is faster than C, and
> all higher level languages (such as Haskell) are doomed to be much slower
> because of their distance from the real machine._

No: conventional wisdom just says that no higher level programming language is
consistently faster than C AND/OR better to reason about with regards to
memory consumption and runtime behaviour.

(Conventional wisdom also adds fortran, forth, Ada and C++ in the same
"speedy" category).

Conventional wisdom adds that micro-benchmarks of some BS outlier examples
(Java where the JIT can take advantage of some known condition to do something
clever, etc) do not matter in real life programs, which are far more complex.

Conventional wisdom also adds that C to be speedier it doesn't even have to be
highly optimized or carefully crafted by some C programming wizard or
anything. Merely avoiding gross mistakes (like using an algorithm of the wrong
complexity for the job) will do.

Conventional wisdom concludes that writing in your high level language in an
unconventional (non idiomatic) way to get to "as fast a C" speeds is bullshit
too, because it doesn't represent idiomatic (and far more common) high level
language use.

------
dakimov
1) There is the C++ language. Will you ever remember it? Stop comparing
everything to C — it is an oversimplified outdated language with specific uses
for low-level system programming where you don't need complex data structures
or high-level application logic.

2) If you summarize all the benchmarks comparing C to other languages, it
turns out that the C language is one of the slowest. Of course, that's hardly
the case. It's just the dudes who did the benchmarks suck in C/C++ and suck in
programming in general.

3) Haskell makes you 10 times more productive without microoptimizing? ORLY?
Try doing some big enough data manipulation with C++ 11 & boost vs Haskell. In
C++ you're done with the task as long as you get decent performance with the
simplest naive very high-level code, whereas in Haskell you're in the
beginning of your optimization journey as the simplest code is not good
enough, and after optimization you end up with much more complex, cluttered
and hard to understand code than in C++ and its performance is still
comparable to the naive C++ version if you're lucky.

