Hacker News new | comments | ask | show | jobs | submit login

Because with dependent typing, the length can be specified at runtime and you still get static checking.

There's also a huge difference between untyped templates and strongly typed generics, but in this particular case the difference is somewhat subtle.




How does it work exactly? Both languages check things at compile time, but of course in c++ the length has to be known statically. Does idris generate code at runtime for the right size?


Think about generics in any language, say Java Stack<String>. The type Stack is parametrized by the type String. What would you do in a language without generic types? To achieve the same effect, you'd need to create a custom StackString type, which itself is trivial. But then, you'd need StackInteger, StackBoolean, StackSomeoneElsesCustomType, etc.

Language with a type system but without generics have this above downside. Usually you'd just give up making a type for each of these, and simply use the Stack type.

Language without Idris-like dependent types suffer the same downside: instead of creating Stack<String, n>, you're forced to create Stack<String, One>, Stack<String, Two>, Stack<String, Three>, where One/Two/Three are unique types you created to conceptually represent a number. If you don't even have generics, you'd need to create StackStringOne, StackStringTwo, ...

Dependently typed languages encode numbers (among others) as unique types, to simplify. Imagine this:

  put: Stack<String, Three> -> String -> Stack<String, Four>
This isn't too hard to imagine is it? No runtime checks needed. Once you get the output you can carry it around, along with its type, to places that accept Stack<String, Four>.

So, asking "how come you don't have to generate these checks at runtime?" is similar to a programmer who's never seen generics asking "how come you can parametrize? Don't you have to generate all the StackString and StackInteger, etc.". It just works. Hope that analogy was clear =).

Dependent types in general blur the line between values and types, as you've seen above. Think of the usual values you use, and imagine having a new type to represent each one. Dependent types let you do this in a sane and logical way.


Thank you, I understand that, but I need a little more detail. Let me explain.

Languages that have generics can either erase the generic argument at runtime (like Java or Scala do) or generate a different concrete type for each instantiation (like C++ or Nim do). In either case, you usually know at compile time which concrete instances of the generic parameter are going to be used where.

That is, even in Java you may have a single class Stack<A>, but at the final use site this will eventually turn into some concrete type, such as Stack<String> (it may remain generic in intermediate steps). In other words, you are not going to use Stack with a generic parameter that you did not anticipate. You cannot read the runtime type of some instance, call it X, and then start writing functions that work on Stack<X>.

Now, let me turn to dependent types. A lot of the techniques that Idris promotes can be simulated in other languages, such as Scala, C++ or Nim. But in each case that I know of, you are required to know at compile time which concrete values you are going to use as generic parameters. That is, in C++ you can write functions that operate on types like Vector<A, N>, where N is the length, but you have to eventually specialize N to a concrete number before compiling.

The comment above states that Idris does not have this restriction, but it is not clear to me how that might work. I concede that by using an erasing technique (as opposed to specialization) one is not required to materialize such functions for all N - this is what one does for instance in Scala.

But the problem is: what do you do at the usage site? Say you are reading two lists off two JSON files, and you want to concatenate them using the above function. Wouldn't you need to provide at compile time a concrete value for N and M? (and consequently cast the unsized vectors to sized ones?).

If this is the case, then there is not much difference from letting the compiler materialize a function for concrete values of M and N that are used (because in either case, you have to encode knowledge about concrete values of M and N at compile time).

Maybe everything would be more concrete if I saw an Idris program that reads two vectors from external files at runtime, and then concatenates them, whatever their size is.


I believe the Idris runtime uses erasure rather than code generation[1]. The key here is that Idris will make call sites prove statically that the conditions hold, either through having static conditions of their own or proving that the programmer has checked (at runtime) that the conditions hold.

(This case of append-n-to-m-vector is a bit weird since it basically just amounts to calculating the length of the output, but my previous statement applies a bit more generally.)

[1] So one plausible representation for Vec n a would be a size + array of pointers to a's. I believe they're actually using linked list for Vec at the moment, though.


Yes, but my question is: even if you use erasure, what are you going to do at the call site? Won't you need to state a concrete size there? For usual generic types, erasure and code generation are equivalent in what they can do, because at some point you will specialize to concrete types.

I would be curious to see an Idris program that reads two vectors out of some file (without knowing the size a priori) and concatenates them


> Yes, but my question is: even if you use erasure, what are you going to do at the call site? Won't you need to state a concrete size there?

No, Idris doesn't actually care what "n" and "m" are (concretely)... it just knows that if you happen to have a (Vec n a) and (Vec m a) lying arond, then you'll get a (Vec (n+m) a) out.

> For usual generic types, erasure and code generation are equivalent in what they can do, because at some point you will specialize to concrete types.

I'm not sure I fully understand what you mean, but I think this is slightly wrong... You can think of the fully erased (Vec n a) representation as (n, void-star) in C-style or Array<Object> in JVM style. Note that "a" doesn't actually appear. So the type is actually never needed at runtime -- all functions which have (Vec n a) as a parameter will just assume that it has been (statically) proven[1] that the pointed-to-array has the correct size. Because of parametricity, functions will not be allowed to actually access any 'a' values directly. So it all works out.

Here's another comment of mine that goes a bit deeper into the interplay between static proof and runtime checks: https://news.ycombinator.com/item?id=10302863

Key being: You either need to prove statically or you need to prove statically that you've checked at runtime.

HTH.

[1] This is valid because call sites must provide such a proof (statically).


I will try to be more clear. Say the size is actually unknown until runtime, for instance a vector is read from a file.

You cannot apply directly any of the size-aware functions, because it will not typecheck. I only see two avenues:

Either you assert statically that the vector has some concrete size, in which case the size is known statically and the code generation will work as well;

Or you use some other size-unaware function, in which case it is false that Idris can use runtime values as type parameters.

If I understand correctly, you are saying that there is a third possibility, but I am missing it


> I will try to be more clear. Say the size is actually unknown until runtime, for instance a vector is read from a file. You cannot apply directly any of the size-aware functions, because it will not typecheck.

Yes, you can in fact do that! Let's assume you're building the list you're reading from a file as if it were a linked list. That's the point of dependent types -- the types can (and often do) depend on run-time values.

EDIT: I suppose I should expand on this a bit. The idea is that the "read-Vec-from-file" function can look like this:

    readVecFromFile :: File -> (n, Vec n Int)
(let's say it's reading Ints)

So the readVecFromFile function will return you a run-time value "n" (the number of items) and a vector which has that exact length. When using readVecFromFile, you'd do something like

    (n, v) <- readVecFromFile
and you'd be absolutely free to call any function which expects a vector of any length. (If necessary you could also supply the actual length, but that's just a detail.)

The implementation of readVecFromFile is left as an exercise, but it can be done. One idea is to start with the empty vector (length 0) and an "n" of 0 -- and then to just accumulate into it, all the while keeping a "proof" that the length of the vector is way the runtime value says it is. (Using structural induction, basically, to build a proof that the runtime value is the same as the length of the vector.)


Ok, I see how it can work. Thank you very much for your explanation!


Actually, I see that I was a bit sloppy in that post wrt. the type annotations and editing, but I'm glad it helped :).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: