GHC's new cutting-edge dataflow-based code generator

jeffdavis · on Aug 4, 2012

I just wanted to say that it's refreshing that I can write high level code and I can expect it to be fast.

I am just learning haskell, but I like the fact that I can return very large structures (even infinitely large) and then operate on those structures much later, and it magically pipelines everything behind the scenes.

alinajaf · on Aug 4, 2012

Agreed. I know a lot of Ruby and a bit of C, and it's great that I can write quite high level abstractions that run a few orders of magnitude faster than the equivalent Ruby.

This is not to say that Haskell is always more performant. From the little I've done it seems as though you need an awareness of when thunks would take up more memory than the results of your expression and how to write code accordingly.

cwzwarich · on Aug 4, 2012

But could you really determine whether things would be efficiently pipelined if I gave you arbitrary Haskell code examples? Magic that sometimes works and sometimes doesn't is fun for small projects, but on a large project you really want predictable tools. You shouldn't need knowledge of the internals of the compiler in order to write efficient code.

jeffdavis · on Aug 4, 2012

"Magic that sometimes works and sometimes doesn't is fun for small projects..."

SQL has a magical compiler. It's actually a lot more magical in some ways than GHC, because good implementations use statistics on the actual input data and a sophisticated cost-based optimizer. And SQL is certainly suitable for large projects.

I mostly program in C, so you don't have to convince me that predictability of code fragments has its benefits.

But it has a lot of costs, too. Sometimes, good performance fundamentally requires a lot of complexity, so if your compiler is not helping you (being "magical") then the complexity is forced on the programmer. Once the programmer takes on that complexity, they also take on the burden of additional maintenance cost and additional cost to add new capabilities forever.

So, you have to consider whether you really have a net increase in predictability by using more predictable (but less helpful) tools. People complain about garbage collection and virtual memory and NUMA for the same reasons, but those are ubiquitous now -- dealing with the annoyances simply takes less effort than managing the memory yourself.

You could argue that laziness, in particular, is not worth the trade-off (though it has worked out pretty well for SQL).

wolf550e · on Aug 5, 2012

SQL has EXPLAIN PLAN. SQL without EXPLAIN PLAN is useless, for exactly this reason: you have no idea how it will perform.

jeffdavis · on Aug 5, 2012

You can output various internal representations from GHC. I believe that, or something similar, can show you a lot of information about the kinds of transformations done.

I'm not really a GHC expert, but that doesn't seem like a fundamental problem that can't be solved, and I assume it is (to some extent) solved already.

dons · on Aug 4, 2012

On a large project you optimize the hotspots using an excellent profiler...

cwzwarich · on Aug 4, 2012

You also want reasonable and understandable default behavior so you can avoid problems before they even occur.

fusiongyro · on Aug 5, 2012

Different languages make different tradeoffs.

on Aug 4, 2012

[deleted]

jfager · on Aug 4, 2012

"Goebbels of Haskell"? What the hell is wrong with you? Assholes like you are why this site is going down the toilet.

on Aug 4, 2012

[deleted]

jfager · on Aug 4, 2012

Are you fucking serious? Goebbels used propaganda to further the cause of genocide. Don Stewart thinks Haskell is pretty cool and shares that with the world. The fact that you think the two are even slightly comparable...

teilo · on Aug 4, 2012

Ok. Godwin's rule. You just lost.

rapala · on Aug 4, 2012

It's not that easy to see efficiency from the source code even in a low level language like C. It shouldn't really be the job of the programmer to see from the source code if stuff gets efficiently executed. Profilers are for that, and if you really wan't to know what gets executed, you can always read the generated assembly. Code should express the intention, not the implementation of the execution. That is what high level languages are for.

jeffdavis · on Aug 4, 2012

In SQL, you don't even specify the algorithm, the cost-based optimizer chooses the algorithm for you (that is, if it's a good SQL implementation).

nightski · on Aug 4, 2012

Except that it isn't magic. It is simply a different method of evaluation. Calling this "magic" is the same as calling strict evaluation in C++ magic.

cwzwarich · on Aug 4, 2012

I am talking about things like strictness analysis, automatic unboxing, etc. not lazy evaluation. And even an 'optimization' as simple as common subexpression elimination on a lazy language can introduce space leaks, e.g. http://hackage.haskell.org/trac/ghc/ticket/917.

nightski · on Aug 5, 2012

I don't understand. Strictness analysis and automatic un-boxing are "free" optimizations and are directly related to laziness. So you can't talk about one without the other.

cwzwarich · on Aug 5, 2012

To clarify, I meant that even if you exclude the claimed unpredictable performance of lazy evaluation itself, you still have the unpredictability of optimizations that are performed on top of lazy evaluation like strictness analysis and unboxing.

nightski · on Aug 5, 2012

My point was simply that the unpredictability of these optimizations is irrelevant in my opinion. They simply make your programs run faster. I rarely if ever have seen a library that relies on these optimizations triggering. Usually strictness, when absolutely necessary, is specifically annotated by the developer with bang patterns. Same with unboxing with the UNPACK pragma.

cwzwarich · on Aug 5, 2012

If everything is annotated then obviously you don't have predictability problems. I was talking about relying on automatic optimizations. In practice, do Haskell users rely mostly on the explicit annotations rather than the 'magic' optimizations?

cantankerous · on Aug 4, 2012

I think if you polled a bunch of devs, you'd come up with a bunch of things you "shouldn't have to do" in order to write efficient code. It varies by framework, and GHC/Haskell is generally unapologetic about what you have to do to write fast Haskell programs. It is a language nerd's language, after all.

nightski · on Aug 5, 2012

The really important advice boils down to the simple things. Like not using the default String type for heavy text processing since it is a [Char]. In any language a string represented as a linked list is going to be slow. It is still useful, just slow. But use Data.Text instead and you'll likely see a massive performance gain due to the use of proper data structures and algorithms (when processing a lot of text).

Having a solid grasp of the fundamentals gets you quite far.

Locke1689 · on Aug 4, 2012

Is there any good documentation for the GHC architecture for someone who is relatively familiar with modern (ML) compiler architecture theory, but not practice? Or should I just start reading the cited papers for STG and the rest of the GHC design?

ehamberg · on Aug 4, 2012

Volume II of “The Architecture of Open Source Applications” has a good introduction to GHC's architecture:

http://www.aosabook.org/en/ghc.html

mahmud · on Aug 5, 2012

beautiful :-) didn't know there was a volume 2.

tibbe · on Aug 5, 2012

The GHC Commentary contains a wealth of information: http://hackage.haskell.org/trac/ghc/wiki/Commentary

wonderzombie · on Aug 4, 2012

I'm familiar with Haskell but that's about all the context I have. Anyone care to translate, summarize, or contextualize?

dons · on Aug 4, 2012

GHC currently has an excellent semi-whole program optimizer for Haskell-level code. And it has LLVM for optimizing the generated assembly. What is missing is an intermediate level optimizer that optimizes asm prior to LLVM, and has awareness of the properties of the Haskell runtime: e.g. immutability, lack of aliasing (properties LLVM can't see). With this last piece, all aspects of code generation are getting heavy duty optimizer treatment.

The new code gen is also pluggable, so you can add new optimizer passes easily (e.g. by implementing text book algorithms)

tikhonj · on Aug 4, 2012

Heh, 'semi-whole' seems to be a bit of an oxymoron. Could you explain exactly what that means? Do you just mean that it looks at your whole program but not the libraries you use?

batterseapower · on Aug 4, 2012

Basically, it inlines across module boundaries (except where such modules are mutually recursive).

Locke1689 · on Aug 5, 2012

I don't think cyclic imports are allowed in Haskell.

dons · on Aug 5, 2012

Sure they are, and full cross-module inlining too

tinco · on Aug 5, 2012

Since when? I just merged 3 files together because GHC complained of the cyclic imports. I would love to pull them apart again.

walrus · on Aug 5, 2012

http://www.haskell.org/haskellwiki/Mutually_recursive_module...

Locke1689 · on Aug 5, 2012

Right, .hs-boot files. I was being naive and trusting the compiler warnings.

Locke1689 · on Aug 4, 2012

As far as I know it looks at out-of-module calls and does some optimization, but can't run the full optimization suite on outside modules.