

Phantom Types in Rust - 0x1997
http://bluishcoder.co.nz/2013/08/15/phantom_types_in_rust.html

======
tel
Phantom types are used frequently in Haskell to really great effect. Perhaps
the most well-known example is the ST monad which allows you to do mutable,
destructive updates in a "pure" fashion by containing them all within a narrow
region. From outside of that region, the effects are invisible and
referentially transparent. This is great for embedding your favorite
iterative/mutable algorithm in a pure program.

Phantom types along with quantified type erasure allow the compiler to force
these impure regions to be contained. They disallow sharing of variables or
results between ST monad invocations statically.

To do so, most operations are polymorphic in the phantom variable. So
something like `get (put x box)` is of type `ST s Int` for an `x :: Int`. The
`s` corresponds to whichever ST region this action gets run within.

Running ST looks like `runST :: (forall s. ST s a) -> a` which means that the
ST action you give to ST must be completely naive in its choice of region.
Another way of thinking about it is that the `s` parameter is chosen
adversarially by the compiler so that only ST actions which are able to be run
in any region are allowed to be run at all.

------
yebyen
I loved this blog. BluishCoder was my goto for Factor articles when I was
learning about Factor. At least I thought it was. Browsing the titles under
the Factor tag, I don't find one article I recognize, and all of the links
I've tried clicking through wind up at nginx 404.

Sadly! Books and their covers... I'm sure they were good.

I did find the article I was thinking of, but today keyword search has failed
me. I was looking for "monotonically increasing timers" and the correct search
was "fast now"

[http://re-factor.blogspot.com/2011/03/fast-now.html](http://re-
factor.blogspot.com/2011/03/fast-now.html)

Chris D, are you here reading comments? Can you bring back your Factor
articles?

~~~
doublec
Yes, it's unfortunate that I lost a bunch of articles in a drive corruption on
linode: [http://bluishcoder.co.nz/2013/04/03/parts-of-this-site-
tempo...](http://bluishcoder.co.nz/2013/04/03/parts-of-this-site-temporarily-
down.html)

I've been going through archive.org recovering posts when I can. I do have a
bunch of the factor articles in my project that produces a PDF from them:
[https://github.com/doublec/factor-
articles](https://github.com/doublec/factor-articles)

~~~
yebyen
Thanks very much!

------
valtron
(I don't know Rust)

So I assume the compiler infers the type of `TI(1)` as `T<int>` and
`TS(~"Hello")` as `T<~string>` because ...?

What if you defined:

    
    
        enum T<A> {
          TX(int, ~string),
          TY(~string, int)
        }
    

What's the inferred type of `TX(1, ~"a")`?

~~~
andolanra
No, the compiler doesn't infer the types quite like you're imagining. It is
possible, depending on usage, to have TI(1) of type T<~str>, or TS(~"Hello")
of type T<int>. The following is actually _still legal_ in the first example,
and will result in a runtime error:

    
    
        let a = TI(1);
        let b = TS(~"foo");
        let z = concat(a, b);
    

The reason is that the Rust compiler is trying to work out the types for you.
In this case, it sees that concat requires two T<~str>s and infers logically
that both a and b must have the type T<~str>... even though one of them was
created with TI.

If you look at the example with a compile error, you'll notice that the two
things being concatenated which result in the error were themselves the
results of calls to `plus` and `concatenate`, which are guaranteed to produce
T<int> and T<~string>. You could also guide the compiler by providing the
types manually

    
    
        let a : T<int>  = TI(1);
        let b : T<~str> = TS(~"foo");
        /* now this produces a compile-time error */
        let z = concat(a, b);
    

Or by writing wrapper functions over the constructors

    
    
        fn makeTI(i : int) -> T<int> { TI(i) }
        fn makeTS(s : ~str) -> T<~str> { TS(s) }
        ...
        let a = makeTI(1);
        let b = makeTS(~"foo");
        /* this also now produces a compile-time error */
        let z = concat(a, b);
    

Other languages with phantom types use something called Generalized Algebraic
Data types (GADTs) which—to handwave a bit—are something like the latter
solution, except built directly into the data type constructors (rather than
manually writing those extra functions).

------
Anderkent
How does the compiler know that TI(1) is of type T<int>, and TS("foobar") is
T<~string>? Does it infer the type of the enum constructors from their
arguments? Does that mean the arities must be the same?

Say something like:

enum Node<A> { StringNode(Node, Node, ~string) IntNode(Node, Node, int)
VoidNode(Node, Node) }

Is there any way of declaring that StringNode gives you Node<~string>, and
IntNode gives Node<int>?

~~~
tel
I'm not really sure of Rust's semantics, but this sort of thing is achieved
using GADTs in Haskell. They look like this

    
    
        data T a where
          TI :: Int    -> T Int
          TS :: String -> T String
    

which lets you reflect a value up into its type. Something similar but more
implicit seems to be going on with that Rust example.

Another method is kind of a hack: unify the type variable with an unused
argument. This looks like

    
    
        data T a b = T b              -- we want to find a way to choose a
    
        setA :: a -> T a b -> T a b
        setA _ t = t                  -- Just returns the second argument
                                      -- BUT tells the compiler the `a` type unifies
                                      -- with the type of the first argument
    
        > :t setA Int (T "foo")
        setA Int (T "foo") :: T Int String

------
Symmetry
Does anyone here know why they call those structures Enums rather than Unions?

~~~
chongli
It's rather infuriating. An Enum is supposed to be just a Union of nullary
constructors. What they're doing here is using the word Enum when they really
ought to mean Union.

This sort of overloading of language happens all over the place in programming
and it leads to endless confusion. I find it extremely frustrating but what
can I do about it?

~~~
sixbrx
I was put off slightly at first by the name as well, not so much now. It _is_
a generalization of the enums from the C languages.

Also Enum seems a good name because they represent fixed alternatives that may
be (finitely) enumerated (We're saying "Let me _enumerate_ the alternatives
that this type may hold."). Whereas "union" seems to speak to its underlying
implementation (which may or may not hold) as a tagged, overlapping union. Ie.
that usage of union says much less about the intended semantics and is more of
a possible implementation detail. Otherwise if you look at the actual meaning
of "union" it doesn't seem to fit really. Maybe "sum" might be closer but
again that's leaning towards the theoretical, and in that case you'd probably
then need to rename struct to become "product" for symmetry.

~~~
chongli
Hey, I'd be all for having them called sum and product types. But then why not
go full out and have (generalized) algebraic data types?

