Hacker News new | past | comments | ask | show | jobs | submit login

IMO, the single most interesting thing about OCaml is its object model. On one hand, you have a fairly conventional setup with classes, methods, inheritance, generics etc. On the other hand, the entire system is structurally typed, with pervasive type inference. Recycling bits from my earlier comment on the subject...

    # let f obj x y = (obj#m x)#n y;;
    val f : < m : 'a -> < n : 'b -> 'c; .. >; .. > -> 'a -> 'b -> 'c = <fun>
Here I defined a function taking 3 arguments, called method m on obj, passing x as argument, then called method n on the result of that, passing y as argument.

Note the inferred type signature for this: the first argument is of type <m : 'a -> <n : 'b -> 'c; ..>; ..> - i.e. any object that has a method named m (and possibly some other unspecified members - that's what ".." means), with said method m having a signature that allows us to pass some 'a, and returning another object of type <n : 'b -> 'c; ..> - i.e. something with a method named n that accepts 'b and returns 'c. 'a and 'b' are then used as types of x and y, respectively, and 'c is the the result of f.




This is actually row polymorphism rather than structural subtyping. The difference is rather small, but important:

Under structural subtyping (or any kind of subtyping, for that matter), the subtyping relation admits a rule called "subsumption", which can lead to information loss. Consider this behavior, under structural subtyping:

    fun f {x : real, y : real} = {x = 2 * x, y = 2 * y};
    (* f : {x : real, y : real} -> {x : real, y : real} *)

    f {x = 2.0, y = 3.0, z = 4.0};
    (* it = {x = 4.0, y = 6.0}; *)
Above, when `f` is applied to a subrecord with an additional `z : real` field, the result is missing the extra field! This is because the subrecord was subsumed by the (super-) record type taken by `f`, leading to information loss. This is the reason why you often see people using up-casts in languages like Java (especially before generics were introduced) and Go (which is lacking generics for some befuddling reason): up-casts allow you to circumvent the type system and force a particular value to take a particular type, thus recovering the lost information, but can be unsafe.

Under row polymorphism, however, you can achieve basically the same effect, but you get to save any unused/irrelevant type information inside of a row variable (the `..` in your example) and include it in the result. Then, no information is lost, and no safety is compromised:

    fun f {x : real, y : real, R} = {x = 2 * x, y = 2 * y, R};
    (* f : {x : real, y : real, _ : ..a} -> {x : real, y : real, _ : ..a} *)

    f {x = 2.0, y = 3.0, z = 4.0};
    (* it = {x = 4.0, y = 6.0, z = 4.0}; *)
Users of languages like Lua or Python might be reminded of "duck typing", and not without reason: row polymorphism enables many of the same programming patterns/techniques/abstractions in typed languages that are enabled by duck "typing" in untyped ones. It's a shame that more typed languages don't incorporate the idea, because it's a really wonderful one, and it can lighten the relative "burden" of writing typed code in some cases :)

The folks at INRIA has come up with some really neat stuff, and, if you haven't already, it's definitely worth it to check out their publications, too!


Thank you for the clarification!

And, indeed, "static duck typing" is exactly what I'm thinking about when I'm dealing with OCaml objects.

FWIW, while it was relatively obscure, TypeScript is currently popularizing this concept (albeit with a lot less type inference). Although they also refer to it as "structural subtyping":

https://www.typescriptlang.org/docs/handbook/type-compatibil...

But if I understand correctly, they're basically doing the same thing that's going on here - the only difference is that the ".." is implicit in any TS interface type. Or am I wrong?


> Thank you for the clarification!

No problem! The difference between the two was something I personally struggled with at first, so now I feel like I've got a pretty good grasp on it. Funny how that works ;)

I don't see anything here in the TypeScript docs that looks like row polymorphism. Without a row variable somewhere in the type (even if it's inferred), I'm not sure it could be considered row polymorphism. However, they could be doing something fancy with their notion of type compatibility that avoids information loss without introducing row variables (which, I think, would similarly imply a lack of subsumption[0], and might not be sound, but the docs admit the system is unsound in places). It's also possible that every record type in TS carries an implicit row variable.

The important part is the (lack of) information loss, and good way to tell would be to construct a test case like the one I wrote above and see if you wind up losing your `z` field (information loss => subsumption => "by-the-book" subtyping).

This all reminds me that I've been meaning to spend some time with TypeScript :)

[0]: I've known type theorists who claim that without subsumption, there is no subtyping. I disagree because I think the fundamental/important part of subtyping is the subtyping relation itself.


I actually didn't fully understand your example originally (I missed the part where ".." now has a name!). I grok it now, and yeah, TS doesn't have it - there's simply no type-safe way to write a function like that (of course, TS being a strict superset of JS, you can still write it, it just won't typecheck).

It appears that this was discussed in the context of supporting the ES7 "spread" operator, since row variable is the obvious typed counterpart to that:

https://github.com/Microsoft/TypeScript/issues/2103

but they ended up implementing just the operator, without a way to reflect it in a function signature:

https://github.com/Microsoft/TypeScript/pull/11150

However: "I expect a spread type or something similar to be available in future versions of TypeScript."


Ah! Good to know :) TypeScript seems like a neat language, and I've got a JavaScript project coming up so I intend to look into it more deeply soon.

Yeah, in my example, `R` is the row variable itself, and it represents all those other, irrelevant values in the record, while I gave it the name `_` in the type itself, since rows themselves don't have field labels, but rather represent sets of labels. In ML, the labels of the fields are part of the type, but the labels of the rows are not part of the type. Then, `..a` is a type variable representing the type of the row (it gets a distinct name because you can imagine a function which takes two separate records with two separate rows[0], or even a function which takes two records but constrains them to the same type with a single row[1]), and is akin to a type variable used for parametric polymorphism, such as `'a`, and, in Standard ML with eqtypes, `''a`.

To the best of my knowledge, the term "row" comes from The Definition of Standard ML, where the grammar given for the core language includes productions called "TyRow", "PatRow", and "ExpRow", which correspond to the syntax of record types, patterns, and expressions, respectively, but only the part in between the `{` and `}`:

    TyRow  <- Label `:` Ty [ `,` TyRow ]
    PatRow <- Label `=` AtPat [ `,` PatRow ]
    ExpRow <- Label `=` AtExp [ `,` ExpRow ]
(The `At` prefix means "atomic", and has a specific meaning in the definition.)

So, then, a row variable is one which literally holds a sequence of label×type or label×value pairs. It sometimes even feels like it's a metavariable which holds a branch of the syntax tree the way a variable in a macro does, which might be another reason why row polymorphism feels so cool ;)

Sorry if I'm boring you at this point, I just find this stuff really fascinating and fun :)

———

[0]: Any two records of any two record types, as long as they each have both an `x : real` and a `y : real`, with intentional information loss:

    f : {x : real, y : real, ..a} -> {x : real, y : real, ..b} -> {x : real, y : real}

    (* f {x = 1.0, y = 1.2, z = 1.3} {x = 1.0, y = 2.0, w = 42}; *)
    (* ^ typechecks okay!
     * row type `..a` can be different from row type `..b`,
     * no problem
     *)
[1]: Any two records which both have the same type, which can be any record type that at least has an `x : real` and a `y : real`:

    f : {x : real, y : real, ..a} -> {x : real, y : real, ..a} -> {x : real, y : real, ..a}

    (* f {x = 1.0, y = 1.2, z = 1.3} {x = 1.0, y = 2.0, z = 0.0}; *)
    (* ^ typechecks okay! *)

    (* f {x = 1.0, y = 1.2, z = 1.3} {x = 1.0, y = 2.0, w = 42}; *)
    (* ^ type error!
     * the second record has a different type than the first,
     * but the type of `f` demands that they be the same type,
     * since the single row `..a` must match itself
     *)


A largely unrelated question: do you know why most ML dialects have this weird scoping rule for record members, where they share the same scope as the record itself (and so another record cannot easily reuse the same field name)? It seems awfully inconvenient, and it feels like it could be trivially resolved by allowing reuse with an explicit disambiguation syntax. Or am I missing something?


Which dialects? Both Standard ML and OCaml can handle multiple record types with homonymous fields just fine:

    $ poly
    Poly/ML 5.5.2 Release
    > type point2 = {x : real, y : real};
    type point2 = {x: real, y: real}
    > type point3 = {x : real, y : real, z : real};
    type point3 = {x: real, y: real, z: real}
    > let val pt = {x = 1.0, y = 2.0} in #x pt end;
    val it = 1.0: real
    > let val pt = {x = 3.0, y = 4.0, z = 5.0} in #x pt end;
    val it = 3.0: real
    >



    $ ocaml
            OCaml version 4.03.0
    
    # type point2 = {x : float; y : float};;
    type point2 = { x : float; y : float; }
    # type point3 = {x : float; y : float; z : float};;
    type point3 = { x : float; y : float; z : float; }
    # let pt = {x = 1.0; y = 2.0} in pt.x;;
    - : float = 1.
    # let pt = {x = 3.0; y = 4.0; z = 5.0} in pt.x;;
    - : float = 3.
    # 
The only nuance with record labels in ML, as far as I can remember, is that the type of projection operations can be ambiguous if the exact type of the record is unknown (e.g., in `(fn pt => #x pt)`, the type of `pt` cannot be inferred since it could be `point2`, `point3`, `{x : int}`, `{x : real}`, ...; but this can be disambiguated with a type annotation: `(fn (pt : point3) => #x pt)`).

Perhaps I've misunderstood the question. If so, can you give me an example?


This is weird. Last time I really did anything with ML was ages ago, and I distinctly remember fighting with the fields... I wish I could remember the context, though.

Anyway, good to know that it's definitely not OCaml!


I would say that the most interesting feature (for me) is the module system (although with structural typing as a close second) :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: