
Why OCaml GADTs matter for performance - michaelochurch
https://blogs.janestreet.com/why-gadts-matter-for-performance/
======
kragen
I'm still not quite sure what GADTs are (after skimming the WP article,
reading this article, and skimming the code in it) but very surprisingly to
me, it looks like OCaml's GADT implementation tackles a lot of the memory
fragmentation problems that come with object-graph-heap memory-model languages
like Lisp, Python, Java, previously OCaml, JavaScript, and so on, in exchange
for a little more type declaration overhead. (Although, in my small
experience, debugging OCaml type errors usually involves adding some redundant
type declarations anyway.)

This is really interesting and I hope I understand it someday soon!

~~~
tel
Generalized Algebraic Data Types, GADTs, are algebraic data types which
include both (a) existential types and (b) type equalities. The combination
allows for a certain kind of interaction between the specific kind of value
within a type and the type itself.

For instance, we might have a little expression language using normal ADTs [0]

    
    
        type exp = 
          | Int  of int               -- integer constants
          | Bool of bool              -- bool constants
          | Add  of exp * exp         -- addition
          | Neg  of exp               -- negation
          | Ifte of exp * exp * exp   -- if then else
          | Lte  of exp * exp         -- less than or equal to
    

but there's no particular guarantee that a value of `exp` is well-typed. For
instance, the following is a valid value of `exp`

    
    
        let x : exp = Add (Int 3, Bool true)
    

GADTs allow us to fix this situation by reflecting the type of the expression
value into the type of the expression. Here's some syntax

    
    
        type _ exp =
          | Int  : int  -> int exp
          | Bool : bool -> bool exp
          | Add  : int  exp * int exp -> int exp
          | Neg  : int  exp * int exp
          | Ifte : bool exp * 'a  exp * 'a exp -> 'a exp
          | Lte  : int  exp * int exp -> bool exp
    

The way to read this syntax is to see it as giving the "type" of each
constructor. Not only do we now parameterize the `exp` type with an index type
corresponding to the type of the value this expression denotes, we also create
very non-trivial type equations like

    
    
        Ifte : bool exp -> 'a exp -> 'a exp -> 'a exp
    

and

    
    
        Lte : int exp -> int exp -> bool exp
    

The upshot is that while the type of an interpreter for the first exp would
have to look like

    
    
        type value = VInt of int | VBool of bool
    
        val interpret : exp -> value option
    

the type of the second `exp` is so constrained as to eliminate (a) the need
for the separate `value` type and (b) the possibility of failure:

    
    
        val interpret : 'a exp -> 'a
    
        let interpret = function
          | Int i          -> i
          | Bool b         -> b
          | Add (i1, i2)   -> interpret i1 + interpret i2
          | Neg i          -> - (interpret i)
          | Ifte (c, t, e) -> if interpret c then interpret t else interpret e
          | Lte (i1, i2)   -> interpret i1 <= interpret i2
    

Such a construction is known to be type-error free because GADTs allow us to
ensure that type constructors follow certain type equations and it's possible
to write such a clean evaluator because GADTs keep around enough type
information so that the compiler can "trust" them.

Finally, the short answer of how they work is something like they interpret a
constructor like

    
    
        Ifte : bool exp * 'a exp * 'a exp -> 'a exp
    

as one with some existential types

    
    
        Ifte :    ...    | 'x exp * 'a exp * 'a exp 'a ex
    

and some type equalities

    
    
        Ifte : 'x == int |              ...
    

then when typechecking it just has to collect all the unique new type
variables and solve the system of equalities which arises.

[0] Yeah, the author pooh poohs at examples like this, but I think they're
really particularly clear.

~~~
gsg
I've come to think that these toy interpreter examples are a bit questionable
unless they contain a variadic structure like an arbitrary tuple - they give
the impression that programming with GADTs is simpler than it really is.

~~~
joeyh
Problem with toy AST examples is they're great if you're doing something like
that, but don't give much motivation if you're not. (Unless you starting
seeing all programming as AST manipulation, which can be an interesting
viewpoint.)

Here's an example of another way to use GADTs.
[http://joeyh.name/blog/entry/making_propellor_safer_with_GAD...](http://joeyh.name/blog/entry/making_propellor_safer_with_GADTs_and_type_families/)

More important that using GADTs for one thing or another is to start looking
at the type checker as something you use to assist you in writing the kind of
code you want to write, and avoiding the kind of code you want to avoid. GADTs
are one amoung many useful tools in the modern toolbox for that.

~~~
kragen
That’s a really fascinating post! Thank you! I’m still not sure I understand
GADTs (in part, here, because I know very little Haskell indeed) but this
helps.

Do GADTs allow Turing-complete computation at compile time, like C++
templates? Are there limits to what kind of correctness properties you can use
them to get compile-time machine-checked proofs of?

~~~
vilhelm_s
GADTs by themselves do not allow arbitrary computation[1]. But Haskell also
allows you to write functions at the type-level ("type families" in the
Haskell jargon), so you can directly write functions to compute types without
any trickery, and the Propeller example uses this feature.

GADTs let you write down basically the same things that you could write as an
inductive datatype in a dependent language like Coq and Agda. So (with some
patience) you can make a function return a "proof" of anything that can be
witnessed by a first-order inductive type, e.g. equations between expressions,
list membership, proofs that a term in an object-language is welltyped, etc.
The main limitation is that Haskell doesn't have any termination checking, so
if you consider it as a dependently typed language there is no way to write a
trustworthy proof that uses induction. Also its not very user-friendly, see
[2] for some examples.

[1] Basically, GADTs are equivalent to having equations between types. If all
the equations are written out in full, it is reasonably straightforward to
check whether two types are provably equal. If you want to do type
_inference_, i.e. figure out what equality assumptions a given function should
make in order to become welltyped, then you run into various undecidable
problems (e.g. "rigid E-unification"). My guess would be that if you wrote
down a completely general typesystem featuring GADTs, then you would be able
to encode arbitrary computations as type-inference problems, but the encoding
would be very clumsy. The Haskell and Ocaml implementations deliberately don't
try to be very general in the type inference, and instead try to pick a
decidable subset, so the programmer has to add enough type annotations to
guide it.

[2]
[https://personal.cis.strath.ac.uk/conor.mcbride/pub/hasochis...](https://personal.cis.strath.ac.uk/conor.mcbride/pub/hasochism.pdf)

------
AceJohnny2
GADTs are
[http://en.wikipedia.org/wiki/Generalized_algebraic_data_type](http://en.wikipedia.org/wiki/Generalized_algebraic_data_type)

~~~
marktangotango
Upvoting, so obnoxious of the author of the post to not define them.

~~~
adultSwim
I can't tell if you are serious or not. OP goes through them by example.

Original type:

    
    
      type 'a t = | Array of 'a array
                  | Bytes of bytes
    

GADT'd (same functionality as before):

    
    
      type 'a t = | Array : 'a array -> 'a t
                  | Bytes : bytes -> 'a t
    

Now using power of GADT to add constraint:

    
    
      type 'a t = | Array : 'a array -> 'a t
                  | Bytes : bytes -> char t
    

Just jumping into it seems like a fine choice to make. Allows for a (narrow)
introduction to GADTs without getting bogged down in their full
complexity/limitations. Shows that it's useful before/without a theoretical
discussion.

Still, I wouldn't have minded a little discussion introducing GADTs.

~~~
AceJohnny2
In the original version of the article (when I wrote my comment), the author
hadn't defined them.

------
mercurial
It would have been interesting to get actual performance numbers with the
article.

~~~
throwaway000002
The specific performance aspect is, essentially, immaterial. The idea, in
over-simple terms, is say you have Foo<T> but for say T=X there is a faster
specialized way to implement Foo<X>, well you can do this in the type system
of Ocaml via GADTs and still write generic type-safe code for Foo<T> and the
compiler does the right thing when when it happens that T=X.(+)

So you get to have your cake and eat it too.

(+) without brittle, verbose, horrific re-factoring required to do similar
things in simpler type systems.

~~~
kragen
Yes, having a faster specialized way to implement Foo<X> is great. But the
reason it's great is "the specific performance aspect". If the cost of using
the faster specialized way is, say, massive interpretive overhead, then it may
not actually be faster, and then there's no reason to use it.

GADTs actually seem to go in the other direction, though — they seem to enable
the compiler to prove _more_ at compile-time and thus generate code with
_less_ run-time tag checking.

But some hard performance numbers would go a long way toward proving that.

~~~
mercurial
Especially since GADTs don't do too well with type inference. My take on
"premature optimization is the root of all evil" is that I will take any
"cheap" premature optimization which doesn't impact readability, but I'd like
to be sure there is an actual performance improvement first.

