

Measuring the Haskell Gap [pdf] - dons
http://www.leafpetersen.com/leaf/publications/hs2013/haskell-gap.pdf

======
breckinloggins
This is a great paper; it is very readable and motivated and I learned quite a
bit. I'm also now looking forward to perusing the 2012 Ninja C paper.

One small change I would make to the preprint would be to better normalize the
graph symbols. In particular, failing to read the legends of each graph
carefully might cause readers to misattribute results in subsequent graphs
(for example, my mind wanted to associate Intel's HRC with "the lighter gray
ones with the boxes", which is not a stable representation across all graphs).

~~~
yalue
I would add to this comment that I would appreciate labels for the y-axes of
the graphs.

~~~
jfarmer
The y-axis is unit-less. They normalized the run times by the "Normal C" run
time, so that, e.g., 2.0 means "took twice as long as Normal C."

------
arocks
It is interesting to read how Haskell optimized the algorithm based on the
instrinsic properties of the data structures. In contrast, C compilers
leveraged on the knowledge of the underlying machine. It is amazing how far
Haskell compilers have come.

------
berkut
It's a good conclusion I feel: this is always the issue with language
benchmarks - who wrote the code, and how good were they with each of the
languages.

Similarly as the article points out, the compiler matters a lot: ICC can in
certain cases be more than 200% faster than GCC with similar flags, and is
generally 15-20% faster anyway, mainly due to more intelligent inlining and
much faster (and more accurate with fpmath=fast) math libs.

~~~
copx
..only on Intel chips. It deliberately generates code that runs slow on non-
Intel CPUs:

[http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Criticis...](http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Criticism)

As an AMD user I really hope most programmers know this by now. If you make a
build for the general public as opposed to only targeting Intel machines,
please don't use ICC.

~~~
berkut
Not since 2010 it doesn't:

[http://www.hardware.fr/articles/847-1/impact-compilateurs-
ar...](http://www.hardware.fr/articles/847-1/impact-compilateurs-
architectures-cpu-x86-x64.html)

It can generate code which when run on AMD is faster than GCC and MSVC.

~~~
copx
I don't read French but Intel's current official compiler documentation..

[http://software.intel.com/sites/products/documentation/docli...](http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-
win/index.htm)

..suggests nothing has changed. Search for "non-Intel".

Maybe ICC generates code which beats GCC even when run on an AMD chip in one
particular benchmark but that doesn't mean it generates better code in
general.

Personally I will never trust the Intel compiler, because it's part of their
business strategy to generate bad code for AMD processors.

Even if the claim in the original post about being "generally 15-20% faster"
were true for Intel chips, it wouldn't be 15-20% faster on AMD or Intel's
documentation - which clearly states the compiler generates inferior code for
non-Intel chips - is wrong.

~~~
berkut
You could look at the graphs - it's pretty obvious ICC wins in almost all
benchmarks on all processors, not just one benchmark.

Intel state they do different optimisations for different chips, and I'd guess
it does it based on how many load/store ports there are as this seriously
affects the fp throughput.

These change per-chip - i.e. the core i7 sandy bridge doubled the number of
front end float load ports from Nahalem, so more OOO execution can be done,
and thus the compiler can generate code differently to take this into account.

You can't expect Intel to optimise very thoroughly their compiler for all the
processor models of their competitors.

~~~
copx
>Intel state they do different optimisations for different chips,

The documentation says _more highly optimized for Intel® microprocessors than
for non-Intel microprocessors._ again and again. Not just different, inferior.

Also the optimization notice mentioned in the Wikipedia article I linked, the
one Intel was mandated to add by the courts is still there. Maybe I should
quote the current version in full:

 _Intel 's compilers may or may not optimize to the same degree for non-Intel
microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction
sets and other optimizations. Intel does not guarantee the availability,
functionality, or effectiveness of any optimization on microprocessors not
manufactured by Intel. Microprocessor-dependent optimizations in this product
are intended for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel microprocessors.
Please refer to the applicable product User and Reference Guides for more
information regarding the specific instruction sets covered by this notice._

>You can't expect Intel to optimise very thoroughly their compiler for all the
processor models of their competitors.

As I said I don't, I expect it to generate bad code for the products of the
competition. And that's what it did and does. AMD dragged Intel to court over
this and won. That's why the compiler documentation is now full of these "non-
Intel" disclaimers.

ICC exists to sell Intel processors, one should always remember that. Intel
isn't trying to make money selling compilers..

~~~
berkut
> Not just different, inferior.

Because they're not going to spend time working out by trial and error (it's
possible based on timing and evaluating code) how many float ops / cycle each
AMD chip can do. For their own chips they know the numbers themselves.

So they make an assumption for non Intel chips. Maybe they assume 2 when some
AMD chips can do 4.

> ICC exists to sell Intel processors

And strangely enough, if you use ICC you'll generally be getting better code
out the other end _regardless_ of what chip you run it on compared to the
other two major compilers.

~~~
aidenn0
I don't know the current state of ICC, but previously it would ignore the
instruction set the CPU claimed to support and not use a large fraction of SSE
instructions on non Intel CPUs that supported it.

Centaur did a study where they changed their CPU ID to pretend claim to be an
Intel part and they got a significant performance boost when running code
compiled with ICC.

------
joelthelion
This is a nicely done benchmark, and an impressive demonstration of HRC.
Thanks for posting!

------
mhaymo
I'm surprised by how dramatic the difference is between the speed of C and
Haskell. One of my professors (at The University of Glasgow, so appropriately
a Haskell fan) once claimed that it had "c-like performance".

I suppose that's the point of this paper though, that "c-like performance" is
a terribly vague term, meaningless without knowledge of the specific
comparisons being made.

~~~
octo_t
For lots of algorithms, fairly naive Haskell can get very close in performance
(within 10%) of pretty decent C or C++.

For example[1] shows that for very advanced algorithms (such as BLAS), Haskell
can be very performant - with the optimisation being _reusable_ and
transparent to the programmer.

[1] - [http://research.microsoft.com/en-
us/um/people/simonpj/papers...](http://research.microsoft.com/en-
us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf)

~~~
strmpnk
That's kind of an odd comparison, using unfused C code compared to fused
Haskell code. Their point seems to focus on stream fusion's advantages and
possibly that optimizing C takes more effort.

To quote the paper: "Clearly “properly”-written C++ can outperform Haskell.
The challenge is in ﬁguring out what “proper” means."

------
beefman
Summary: Intel's HRC (Haskell Research Compiler) is an optimizing compiler for
GHC's (Glasgow Haskell Compiler) "Core" intermediate language.* On six common
benchmarks, it improves the performance of Haskell dramatically. But Haskell
is still 4 times slower than the best C implementations of these benchmarks,
on average.

* Core is just desugared Haskell and should not be confused with GHC's other intermediate languages, STG and C--. And there is no relation to Intel's "Core" microarchitecture.

------
6ren
Scribd got better! It's comparable, perhaps better, than google's viewer:
[https://docs.google.com/viewer?url=http%3A%2F%2Fwww.leafpete...](https://docs.google.com/viewer?url=http%3A%2F%2Fwww.leafpetersen.com%2Fleaf%2Fpublications%2Fhs2013%2Fhaskell-
gap.pdf)

~~~
jkldotio
Isn't Google's viewer really the default PDF viewer in Chrome? The one which
doesn't add a toolbar or any tooltips that are popping up despite my mouse not
hovering over them.

Scribd is for people who want to share a PDF but don't know how to do it in
any other way. It's pretty much never been welcome on HN because that's the
only problem it solves and everything else it does is inferior to just having
a native PDF you can view with no problems and save with no problems. We are
not the target market so I never understood why it was pushed on HN at all.

~~~
soganess
Chrome use an unbranded version of foxit[1][2].

[1][http://googlesystem.blogspot.com/2010/08/google-chromes-
pdf-...](http://googlesystem.blogspot.com/2010/08/google-chromes-pdf-plugin-
uses-foxit.html) [2]independently verified to me by a foxit employee.

------
crncosta
Does any one know if the authors are sharing the benchmark's source code? tks.

------
yogsototh
I would be interested to see how jhc [1] compare to ghc. I am not sure Repa
could be easily compiled with jhc thought.

[1]: [http://repetae.net/computer/jhc/](http://repetae.net/computer/jhc/)

~~~
dumael
[http://mirror.seize.it/report.html](http://mirror.seize.it/report.html)

That report is quite old though and just compares compilers with the nofib
benchmark.

JHC can't compile repa as repa requires multi-parameter type classes which JHC
doesn't support unfortunately.

------
sirspazzolot
"DRAFT - Not for redistribution" haha

Very interested in seeing where Haskell is headed in the future. Major props
to Intel for the disclaimer that this isn't a definitive study.

