

The Illustrated GHC [pdf] - dons
http://takenobu-hs.github.io/downloads/haskell_ghc_illustrated.pdf

======
amelius
As someone interested in functional programming and compilers, I recently
tried to use GHC's intermediate output. However, I got a little disappointed
about the documentation that's available for compiler writers. From what is
essentially a research compiler, I'd expected a little higher standards, also
in the department of available tools, and examples. In other words, the
experience came too close to what I would call "hacking".

Furthermore, the "Core" language is, I believe, for many purposes, more
complicated than necessary (e.g., it is typed). It would be nice if there were
a simpler alternative, e.g., for small projects.

Of course, I could be wrong about this (perhaps I looked in the wrong places),
but this is just what I noticed.

Besides this, of course, Haskell is a cool language, and GHC an awesome
compiler! :)

~~~
thoughtpolice
> As someone interested in functional programming and compilers, I recently
> tried to use GHC's intermediate output. However, I got a little disappointed
> about the documentation that's available for compiler writers. From what is
> essentially a research compiler, I'd expected a little higher standards,
> also in the department of available tools, and examples. In other words, the
> experience came too close to what I would call "hacking".

We don't really emphasize or support people using the intermediate language in
an external way very well. They're parts of the compiler that are intimately
and deeply tied to other internals elsewhere, and it's not really an explicit
design goal that those components be reusable in a general way.

Now, people do use GHC for this (years ago I even helped write a compiler that
transformed GHC's core language into a whole-program IR that was compiled to
C), and we do have an API you can use to leverage the compiler, but overall
the amount of people using it for things like this, vs using it for things
like dynamic loaders or typechecking utilities (such as ghc-mod) are very
small.

In fact, just last year we removed 'External Core' from GHC, which was a way
of serializing the Core representation to disk. Why? Because it was actually
broken for close to two years I think, and nobody ever really complained or
fixed or wanted to support it! And after a discussion, we didn't want to
support it either. It has been used, but when it bitrots that bad, I think
it's clear this isn't one of the largest driving use cases for the compiler.

That said, we could improve the documentation a lot, and add a lot more
examples. But I don't think there ever has been (or probably will be) a huge
push to make the core IRs reusable in an easy way. It's simply not a high
priority design goal.

> Furthermore, the "Core" language is, I believe, for many purposes, more
> complicated than necessary (e.g., it is typed). It would be nice if there
> were a simpler alternative, e.g., for small projects.

The Core language being typed is a good thing for GHC - it makes it easy to
turn on an internal typechecker and determine if the compiler has produced
invalid IR, your optimization pass did, etc. It adds a lot of safety to your
produced programs when you can ensure they type-check in a sane way.

This component, along with the core linting passes, have likely caught an
_innumerable_ amount of optimization and desugaring bugs over the years, while
being extremely low cost to support and use. Overall, typed Core has been a
very huge win for GHC, and I don't ever see us adopting a different route. In
fact, it's planned for our Core language to get an even fancier (dependent)
type system not far off in the future. :)

~~~
amelius
I agree that the number of potential users for this particular use of GHC is
small. But as with everything, it are the small groups of people that produce
the most interesting things ;)

I don't think an external format (serialized Core) is even necessary. Just a
well-documented API would be nice. I'm hoping that somebody could write a set
of examples, and put those in a test-suite so that it keeps getting updated
whenever something breaks it.

I also agree that typing is extremely useful. But for small research projects,
it can get in the way of the actual goal.

~~~
thoughtpolice
Yes, I suppose my larger point was more to illustrate that nothing is free.
GHC is software like any other piece of software and it has a lot of competing
constraints we must balance.

If the cost of supporting reusable IRs like you want is high in terms of LOC
and needed changes, but the amount of people who will leverage that work is
incredibly small, it's honestly not clear to me if making that change and
maintaining it is sensible or worthwhile. It might be worth it purely from a
code cleanliness standpoint if it did a lot of cleanup, but that's not
immediately clear either. We have to maintain everything people submit to us,
presumably forever - the barrier to entry for large changes should be high,
and well-motivated.

It is similar with external core; it was several extra thousands of lines of
code throughout the compiler that was rarely - if ever - used by anyone. Did a
few people use it to do cool work - for example, by writing the Intel Research
Haskell compiler? Absolutely, and Intel's results were incredible - but that
does not mean it's worth keeping that ball of code around for that one use
case.

I'd actually argue keeping it for that one case would be a terrible decision
for everyone, and when I deleted that part of the compiler, Intel's compiler
lingered in my mind, as it was a user. But the fact is nobody was maintaining,
submitting patches or using it in a modern GHC, including them - so I rm -rf'd
it.

GHC is a research compiler, but it is also a production compiler for many
users, and a project several dozen of us work on.

So while I'm sure your work is incredibly interesting (most of our large
features come from people doing interesting research, so I understand your
plight!), this really doesn't mean large, sweeping changes on our part - for
your one case - are very worthwhile at all for us. But maybe you don't need
sweeping changes, either!

(Of course, it's impossible to say how many people _did not_ use GHC because
of these constraints that otherwise were unsuitable for their projects - but
we have to draw a line somewhere and balance our own needs vs those of
others.)

> Just a well-documented API would be nice.

To be clear - what API are you exactly envisioning? An API that makes it easy
to construct Core or STG programs using GHC's ASTs, and then feed them through
the rest of the compiler? Or an API to do things like construct and manipulate
the same core representation GHC uses, to later do whatever you want with
perhaps with a separate tool?

It sounds like this is something you could tackle with a library that wrapped
the GHC API. The problem is the Core representation in GHC is very prone to
change, so I'm not sure there's any sane way to really maintain things like
API stability between versions. But you also might not need that - just a
simple API around "GHC Core version X.Y.Z" to build programs may be enough.

We could distribute such a library with GHC, but this sounds like something
easily doable out-of-band first, at least.

Either way, something like thisI can surely see the need for. Where it should
live is a different question.

> I also agree that typing is extremely useful. But for small research
> projects, it can get in the way of the actual goal.

Yes, but again, GHC's internal needs and developer needs outweigh those of
random small research projects that might use it - the fact that it makes some
kinds of work slightly harder is unfortunate, but in the grand scheme of
things, this isn't really a concern at all for any of us, and likely never
will be. There's no free lunch, I'm afraid.

~~~
amelius
I would be mostly interested in reading the Core or STG representation, so I
can apply transformations to them. Of course, it would also be nice to
generate intermediate code from scratch, but that is something which is not
forming an obstruction at this point (as it would be easy to just generate
Haskell code instead).

I'm not looking for high-performance solutions, just some tools to make
researching new ideas a little simpler.

To give you some idea about the things I'd like to do:

* Transforming programs into "incremental programs", so they run faster on changing inputs.

* Automatically translating functional data-structures to other languages (C++)

* Sending code over the network

I tried using the GHC parser directly, but that route turned out to be too
much work (too many cases to handle). Then I looked at Core, but the
documentation was lacking or not up to date.

So, from where I stand, the internal documentation is the biggest problem,
and, according to good software engineering practices, there can be no excuse
for having no good internal documentation :)

Besides just plain documentation, a tutorial with examples would be nice, but
I can understand if that is not a priority.

Anyway, keep up the good work :) I'm happy with GHC as it is. I just wished it
was a little more "open".

------
hardwaresofton
Fantastic in depth PDF. While lacking the talk (as others have mentioned) I
was able to determine in large strokes what's going on at the lower layers of
Haskell. I particularly found the FFI interface section (ex. the breakdown of
a call to getLine) extremely interesting/useful.

While I think real understanding will come if/when I have to actually wrestle
with the lower layers of GHC (or am contributing to the source code or
something), this served as a great overview, and a quick explanation of how
Haskell "really" works.

------
jamesfisher
This looks great, but needs some interpretation, like an accompanying talk. I
got lost in box-and-line diagrams which I did not know how to interpret.

An introduction to GHC which I found helpful is its AOSA chapter:
[http://www.aosabook.org/en/ghc.html](http://www.aosabook.org/en/ghc.html)

------
mcguire
I haven't looked at the source; is GHC using _both_ C--
([http://www.cminusminus.org/](http://www.cminusminus.org/)) and LLVM?

