
Ask HN: Can I have heterogenous lists in a Lisp while preserving type inference? - urs2102
Trying to see if I can implement type inference on a toy Lisp, but can&#x27;t figure out what the type of (car &#x27;(x y z)) should be in compile time. For those with experience in dealing with types and Lisp, can I keep these heterogenous lists and Hindley-Milner&#x27;s algorithm W at the same time - or will they naturally run at odds (which is what I&#x27;m thinking)?
======
bjourne
I think you need to run dataflow analysis. There is a paper I read which
explains the approach, but I lost the link. It describes how to propagate
types to optimize code, but the exact same method can be used to type check
code too.

Basically the compiler annotates each value it deals with with what it knows
about it. So in your example:

    
    
        (sqrt (car '(1 2 "hello"))
    

First '(1 2 "hello") what do we know about it? The value is of type list, the
elements types are: integer, integer and string. Since it is a literal, we
even know the values: 1, 2, "hello".

We record it somehow. Next we apply car to that value. What do we know about
the resulting value? If car was a random function, we wouldn't know anything
about the result. But car is not random. It is one of the most common Lisp
functions so we have custom code for this "known" function that says that its
value is the first item of its first parameter.

Therefore the type of (car '(1 2 "hello") is integer and the value is 1. Since
the type of sqrt is an integer N >= 0 (let's ignore negative roots), the
expression (sqrt (car '(1 2 "hello")) type checks! And if we have annotated
car and sqrt as side-effect free we can fold the whole expression to 1.

Note that if the expression was (sqrt (car '(x y z)) where x, y, z are unknown
variables then it would be a "maybe". We can't know if the code type checks or
not. Note also that if the expression was (sqrt (car '(-1234 y z)), then the
code would _not_ type check as (sqrt -1234) has no type (ignoring complex
numbers here). That is strictly more powerful than ordinary type systems which
can't determine that (sqrt -1234) is a programming error.

~~~
urs2102
I just saw this - but funnily enough, this is how I handled it. Thanks man -
appreciate the comment!

------
DonaldFisk
You want a statically typed Lisp dialect?

Why don't you just declare your list, say, x, as having type (List Any) where
Any is the most general type? (car x) would then have type Any at compile time
(i.e you don't know its type), so you can't take its square root or cons it
onto a list of type (List Int), but you could still print it or cons it onto
another list of type (List Any).

(List Any) then has to be distinct from (List ?x) where ?x is an undetermined
type. If y is of type (List ?x) and your function contains (sqrt (car y)), you
can infer that ?x is a numeric type and y is a list containing only that type
of elements.

~~~
urs2102
So let's try an experiment:

(sqrt (car '(1 2 "hello"))

How would this work according to you, if the car function will return type
Any? And what is the difference between ?x and Any?

Thanks by the way!

~~~
DonaldFisk
If that's typed into the listener, it should probably just be interpreted and
dynamically typed. If it's an expression in the process of being compiled, it
can be simplified to (sqrt 1) and thence to 1.

car should return the type of the elements of the list it is supplied with,
whatever that is. E.g. if x has type (List (List Int)), (car x) should return
(List Int). I did this for Full Metal Jacket (see
[http://web.onetel.com/~hibou/fmj/interpreter.pdf](http://web.onetel.com/~hibou/fmj/interpreter.pdf),
section 6).

If something has a variable type (e.g. declared as being of type ?x) its type
can be inferred once known. If it's of type Any, its type cannot be inferred
more specifically than that. If, instead of the above example, you declared x
of type (List Any), the list '(1 2 "hello") could still be bound to it, and
the call to car would still be accepted by the compiler, but not the call to
sqrt.

~~~
urs2102
> If, instead of the above example, you declared x of type (List Any), the
> list '(1 2 "hello") could still be bound to it, and the call to car would
> still be accepted by the compiler, but not the call to sqrt.

(sqrt (car '(1 2 "hello"))

So in this example, since it's a (List Any), in order to get this to work
would we cast the Any to be an Int to then put it through sqrt?

(sqrt (int (car '(1 2 "hello")))

Also checked out out FMJ - super interesting. I do like the way you used
Prolog's unification as way to describe successful type matching. How far
along is your work with FMJ?

~~~
DonaldFisk
> So in this example, since it's a (List Any), in order to get this to work
> would we cast the Any to be an Int to then put it through sqrt?

I wouldn't recommend you use casting the way it's done in C. It defeats the
purpose of having a strong type system if you can change the types like that.
Having a function which checks the type and gives an error if it's not the
type you expect is, I think, a better approach.

I'm still actively developing Full Metal Jacket. Glad you find it interesting.
An interpreter for the basic language is almost complete. I'll soon add to my
tutorials
([http://web.onetel.com/~hibou/fmj/tutorials/TOC.html](http://web.onetel.com/~hibou/fmj/tutorials/TOC.html))
how to do inlining and substitution macros, and how to define new types. Then
I'll make my code more robust, possibly add a compiler, and arrange a release
on a shared-source licence.

~~~
urs2102
Oh yeah, my mistake - you're totally right. So in this case what type does car
output (or how do I write the type definition for car), because it can't
output multiple types otherwise I wouldn't be able to check its value at
compile time?

And yeah, Full Metal Jacket seemed cool (and it's a great name). Do post it
here when you start putting those tutorials up!

------
justin_vanw
It depends on the implementation, but even if you have proper type inference,
I don't understand how it would help with a heterogenous list, since it's the
heterogenous nature that will prevent optimization anyway, even if you knew
exactly what the types were.

Most lisps, and certainly common lisp, do not depend on type inference for
correct behavior in any way, as they are fundamentally strongly but
dynamically typed (like Python). The reason for type inference in Lisps is
generally for optimization, where knowing that an array is of type X lets you
compile X-specific code for functions that take arrays of type X, and to allow
efficient memory layout, where your array is made up of contiguous X objects,
rather than storing an array of pointers to X objects, and all the cache
missing and so on that this causes (as well as overhead of allocating space
for the pointers). Just for completeness, I am saying that in the absence of
type information, arrays are always actually arrays of pointers.

So lists are always made up of cons cells which contain pointers, so one
optimization is out regardless (it is possible that primative types like int
or float would be packed into the cons cell itself, but most implementations
would achieve this by some kind of bit tagging rather than type information
being 'known'). The other optimization is out because it would likely be
equivalent to do dispatch via the usual method for untyped objects rather than
trying to do somehow use the type information.

~~~
urs2102
Thanks man, this is great.

> since it's the heterogeneous nature that will prevent optimization anyway

That was the part I was wondering about. I wanted to see if it was possible to
do compile time type checking on a list where I don't know what element would
be inside.

It was less for optimization and more for experimentation. Like I was trying
to see if I approached building a Lisp with OCaml type checking - how much of
the "Lisp" would I have to trade to get proper HM type inference. No purpose
other than for play.

Then, although I assumed it was naturally impossible, figured I'd ask.

So car can work for type cons, and then that returns the element regardless of
type since the first part of the cons that is pulled becomes just the element?

------
sjayasinghe
I'm not exactly sure. I think they may be at odds. Scala can handle AnyTypes
in lists, but I think you have to decide whether you want compile time type
checking or run time dynamic type checking as from my knowledge you can't have
heterogenous lists in languages like OCaml (and from the little I know of
Haskell, Haskell as well).

~~~
urs2102
Hm, that's what I'm thinking as well. Scala's Any list is probably not the
same, but I think with Haskell you can do it if you make a union of types. I
just want to see if there are ways to allow type inference while keeping the
list flexible and heterogenous as otherwise the Lisp loses power if I need to
use functions like (car-int '(1 2 3)) to get an int. Do you know how pattern
matching works in the case of matching cars to their respective types.

~~~
Xophmeister
A sum type (union type) solves the problem by making the list homogeneous. A
`car` against such a list would return a member typed as this sum type and it
would then need to be extracted (e.g., with pattern matching) to get at the
value.

------
kazinator
> _can 't figure out what the type of (car '(x y z))_

What? (quote (x y z)) is a constant expression denoting the value (x y z). We
know statically that its car is x, of symbol type.

~~~
urs2102
> We know statically that its car is x, of symbol type.

I'm saying within a statically typed Lisp or when trying to use HM's algorithm
w you don't know what type of car of (x y z). The type of "symbol" is not
specific enough as what operations will you allow to be run on symbol if
you're working in a statically typed Lisp with type inference.

If symbol is a character you can't run a float addition on it, and any attempt
to do so should throw an error at compile time.

In the case above, I'm using x, y, and z as substitutes for characters,
floating point numbers, integers, etc. I'm trying to see if I can figure out
the specific type in compile time rather than runtime.

~~~
kazinator
If we know that L is a list, and that is all we know, we don't know what type
(car L) is.

If we know that L holds the value (X Y Z), then we know that the type of (car
L) is symbol.

> _The type of "symbol" is not specific enough _

It absolutely is specific enough. It's a fundamental type in the language.

We just don't know what kind of symbol for certain purposes. If we _only_ know
that some object O is a symbol, we don't know whether whether it is self-
evaluating (nil, t, keyword symbols in CL) or bindable (any other symbol).

In _this particular case_ , not only do we know that (car L) is of type
symbol, we know _which symbol_!!!

We know the exact _identity_ of the object: it is X.

If we know this at compile time, because L is a quoted list literal, we have
the strongest possible information about what it is, even more precise than
most type information.

> _as what operations will you allow to be run on symbol if you 're working in
> a statically typed Lisp with type inference_

If I know that O is a symbol, then we can, for instance, compile the (symbol-
name O) operation such that O's type tag isn't checked; however, if there is a
special representation for a symbol nil, we have to distinguish that. In this
case we know that X isn't nil.

> _If symbol is a character_

... then you're talking about a completely different language that isn't one
of the mainstream Lisps. At best a very oddball dialect.

What Lisp tutorial have you been following?

Perhaps you have confused symbol objects, and the use of a symbol as a
variable name (X having a value binding in a given environment, and the value
of that binding being a character object).

~~~
urs2102
Oh perhaps I'm not being specific enough. Apologies mate.

> If we know this at compile time, because L is a quoted list literal, we have
> the strongest possible information about what it is, even more precise than
> most type information.

By '(x y z) I'm assuming that it's not a quoted list literal but a let's say
list generated by a function at compile time. Where x, y, and z could be of
different types like ('(2.5 7) 6 "hello"). It's very hypothetical, and the
contents of the list generated cannot possibly be known at compile time.

> ... then you're talking about a completely different language that isn't one
> of the mainstream Lisps. At best a very oddball dialect.

And yes, what I'm talking about is building a toy lisp between an ML language
and a Lisp to see how much Lispyness you have to trade off for the static
typing and the type inference of an ML language. The definition of oddball!

In OCaml for example which has static typing and type inference you can't just
have a "symbol". Even though symbols are a basic part of Lisp, you can't have
really strong type inference for a "symbol" despite it being a fundamental
type in the language as for strong type inference you would want to do stuff
like have separate operators for adding integers and adding floating point
numbers. This is very important for HM's algorithm w, similarly - in that
case, car needs to be able to take one type, and return one type for it to
comply with algorithm w. I'm trying to see if other people had ever tried
implementing something like this.

Anyway, great conversation! I think DonaldFisk helped me out already, but
thanks!

------
brudgers
Perhaps it could infer statically typed tuples from heterogeneous types and
then apply list semantics to the operations? If the default is immutability,
then all lists can be treated that way heterogeneous or not.

SMLNJ and other Hindley-Milner based type inferring languages are good at
inferring tuples at compile time so there's probably a little prior art.

------
gosub
could be something like this (with some sugar)?

    
    
        data Cons a b = Nil | ConsCell a b
        
        cons = ConsCell
    
        car Nil = error
        car (ConsCell a b) = a
    
        cdr Nil = error
        cdr (ConsCell a b) = b
    
        mylist = cons 'a' $ cons True $ cons "hello" Nil
    
    
        mylist :: Cons Char (Cons Bool (Cons [Char] (Cons a b)))

------
lmm
AIUI you can't have HLists in languages without higher-kinded types, and H-M
becomes incomplete in that case.

------
mveety
The actual elements of the lists are typeless basically. The easiest way to do
this is that each car of a cons cell is just a pointer to a value (actually
both are). Many lisps use tagged pointers where the tag stores the value or
even each variable could be a struct with the type and value.

------
personjerry
Would this perhaps be a better question for StackOverflow?

~~~
sillysaurus3
I hope not. This place is already more like Noob News than Hacker News, and
the transition has been very gradual.

That was a bit harsh. I'm saying we should try hard to retain the culture of
the site.

It was delightful to see this question pop up on the front page. And of
course, what was the top comment but a reminder that the old HN is now firmly
"the old HN," rather than "the present HN."

~~~
personjerry
Ah, to be clear I'm not opposed to this post. I guess it's a combination of a)
surprise at seeing this type of technical question here, and b) being used to
this sort of question on StackOverflow. I feel like at least for this type of
question in particular, StackOverflow is better equipped to produce and sort
out a good answer. But I can certainly see why it might fit on both.

~~~
urs2102
Just checked back on this post from yesterday. The reason I didn't pose it to
SO is that 1. I still do associate HN with a strong Lisp community. 2. It's
more of a question of opinion versus a technical question itself and figured
the discussion could be interesting for the site. I also didn't give too much
thought as to where I was posting it.

