
In Further Praise of Dependent Types - EvgeniyZh
https://golem.ph.utexas.edu/category/2020/05/in_further_praise_of_dependent.html
======
nyanpasu64
> So, a good place to try and understand underlying linguistic structure is
> informal mathematics, something he studied in his 1993 paper Type theory and
> the informal language of mathematics. This investigates both directions of
> the passage between formal and informal mathematics: sugaring and
> formalization.

Okay...

> x:A ⊢ B(x): Type

I'm lost.

~~~
curryhoward
Unfortunately you need to know type theory/logic notation in order to read
this. Here's the English translation:

\- There is a variable called `x` in scope.

\- `x` is known to have type `A`.

\- `B` is a function that, when given `x` as an argument, returns a type. If
this seems weird, it's because you need dependent types to even be able to say
this.

Honestly, this article is not written for programmers, and I don't know why
it's on Hacker News right now. Most HN readers don't have the necessary
background to follow it.

~~~
Gravityloss
Thank you, it is easy to understand when explained like that! Maybe there
could be a programming language where these concepts were used but slightly
more explicitly, verbosely or in general easier to read fashion. A lot of
successful languages are big on ergonomics or low barriers for entry.

~~~
kungato
Unfortunately, it often seems like "low barrier to entry" means "no prior
effort invested" even if it means knowing basic logic notation which is taught
in gymnasiums

~~~
nyanpasu64
Is a gymnasium a school in Europe? It's an exercise facility in the USA.

I've gone through high school and taken CS (and a bit of discrete math and
logic) in college, and never seen the ⊢ symbol before.

~~~
throwawaygh
_> Is a gymnasium a school in Europe? It's an exercise facility in the USA._

[https://en.wikipedia.org/wiki/Gymnasium_(Germany)](https://en.wikipedia.org/wiki/Gymnasium_\(Germany\))

And yes, teaching basic notation from mathematical logic is pretty standard in
Gymnasium's. I've taught sequent calculi at US high school enrichment
programs; honestly, the students get it perfectly fine. The notation is no
more difficult to understand than a two column proof.

It's always the teachers who struggle. Every attempt at math ed reform in the
US meets that crux -- the teachers' and parents' willingness/ability to learn
anything new is always massively over-estimated. The only way to teach real
mathematics in US high schools is to smuggle it in through "enrichment"
programs.

 _> I've gone through high school and taken CS (and a bit of discrete math and
logic) in college, and never seen the ⊢ symbol before._

Discrete math courses are different from university to university; sometimes
formal logic is covered, and sometimes it's more of an "introduction to basic
proof techniques and combinatorics" course.

A university course in logic should certainly have shown you a sequent
calculus at some point. I'm actually confused about how you would fill a
semester-long course on logic without a turnstile ever showing up. What
notation were you using to write down your derivations?

They used to say "the US has the best high schools in the world;
unfortunately, they're called universities". But the universities have become
so watered down that many Americans get university degrees without ever
passing through an institution at the level of a good gymnasium.

~~~
nyanpasu64
It was not strictly a logic class, but more of an overview course of discrete
math, logic and proofs, and some statistics. They taught -> I think.

I've never heard of "sequent calculus" before, and I don't expect anyone I
know to have heard it. Apparently it's not calculus.

> They used to say "the US has the best high schools in the world;
> unfortunately, they're called universities". But the universities have
> become so watered down that many Americans get university degrees without
> ever passing through an institution at the level of a good gymnasium.

I don't think universities are watered down because they don't teach notation
that most people don't know and find obscure. My university is famous for a
difficult admissions process, tough courses that most people only learn 2/3 of
the material, and grading curves calibrated so getting 60% on exams is enough
for a B or A.

\----

[https://en.wikipedia.org/wiki/Sequent_calculus](https://en.wikipedia.org/wiki/Sequent_calculus)

Wikipedia says "Sequent calculus is one of several extant styles of proof
calculus for expressing line-by-line logical arguments."

I didn't learn "every statement is conditional" logic involving turnstiles, I
learned something where the assumptions were given in the problem, and each
line was proven based on the previous lines, which boiled down to the initial
givens.

------
exdsq
After playing with dependent types and type driven development, almost every
language has been spoiled for me :(

~~~
dgellow
Could you explain what was so great about it?

I guess you did this with Idris?

~~~
icen
(Not OP.)

When moving from dynamically/untyped/unityped languages to those with types,
you are liberated from having to check the shape of inputs. If you're given
something that claims to be a list of strings, it absolutely is a list of
strings, and you can proceed without thinking about it's list-ness or string-
ness any further.

When moving from a typed language to a dependently typed language, you are
liberated from having to check the state of inputs. You don't have to worry
that your file handle is closed, or ensure that you update the state of the
object correctly to satisfy some protocol (if you don't, you will be picked up
on it!). If you are writing a parser, you don't need to check that the parser
returned the form that you just called - you know upfront.

There are other niceties: the type of the function is always a very rough
exposition of what the function should do. With dependent types, it becomes
considerably clearer. Consider these two functions, which could have very
similar implementations (in some Idris-like syntax):

    
    
        f : Int -> Int -> Bool
        data T : Int -> Int -> Type where
             Tz : T 0 x
             Ts : T a b -> T (a + 1) (b + 1)
        g : (x : Int) -> (y : Int) -> Maybe (T x y)
    

Given just the structure, it should be a bit clearer as to what's going the
functions intend to do.

~~~
tsimionescu
But wouldn't the actual code tell you exactly what the function does? Why is
it beneficial to have a similar version of the same in a more or less
different language expressed separately?

The flipside of your description of moving from un typed to typed to dependent
typed languages is the amount of specification you need to think about and add
to your code. This is awesome for component interfaces, but it can often get
in the way of internal code. For example, if I try to redactor a long function
and extract a few bits, those bits may only be called with very specific
argument values, despite what the types I have on hand (or am willing to
define) might say. Languages without some dynamic escape hatches can force you
to create code that is too general, or define extremely specific types, for
situations that really don't call for it.

It also makes very generic or dynamic code much harder to write, or requiring
ever more complex constructions. For example, most mainstream typed languages
can't express something as simple as 'a function that returns either an A or a
B' without resort g to dynamic typing and casting. And even in Haskell, a lot
of code is more loosely typed than might be expected. For example, even though
measurement units are often touted as a feature of typed languages, there are
no complex mathematics libraries that use measurement units, simply because it
is so much effort to correctly type everything, for relatively little gain.

~~~
icen
Yes, the actual code will always tell you exactly what it does.

There is always a necessary distinction between specification and
implementation: if there weren't, one of them would have no value. We often
suggest that the intent of a piece of code should be specified in comments,
and this is in that category. This way, the compiler can read your comment and
figure out if it needs updating! Languages without dependent types work this
way too - but you have to try very hard indeed to be eloquent in type
signatures, and being able to talk about values lets you say a bit more.

Coming onto your second point: if the function is only valid for some set of
values of the given type, why is it 'not really called for' to create
specialised types for it? If you call it with something else and it runs, it
will be a bug. Newtypes or wrappers are common patterns.

I think that most mainstream typed languages have common constructs to offer
that 'A or a B' value you want: this is solved via inheritance, `Either`,
`Result`, or `std::variant`.

~~~
tsimionescu
Types don't generally specify intent as well as comments do. They still can
only tell you what the code does, not really _why_. They do clarify some
assumptions the implementation makes, essentially defining pre- and post-
conditions, which you'd normally write in comments if you didn't have types.
But at some point, it does get much easier to say 'performs a stable sort'
then to define the dependent types to express that the list is sorted and that
the order of equal elements doesn't change.

My point about newtypes and wrappers is that they are almost pure overhead for
helper functions. There is a reason why we don't define precise types for each
of our intermediate calculations in general, and this doesn't change when we
move those computations to a separate function.

Finally, inheritance is not a real solution for returning A or B, with
inheritance I can only return a C that both A and B derive from, and this C
often doesn't exist and can't be added post-hoc. What you normally do is go
for Object or void*, which is much closer to dynamic typing than inheritance.
And most mainstream typed languages don't have Either or Result. C++ with
std::variant is the only exception, I forgot that it was added. BTW, I should
be more explicit - the way I see it, the mainstream typed languages are Java,
C#, C, C++, and maybe Go.

------
curryhoward
I've been meaning to write a tutorial on dependent types, but aimed at
programmers rather than mathematicians and computer scientists. Here are some
of the reasons I love them so much:

\- The most popular motivation: the ability to write proofs about your
programs, using the same language for both programming and proving. It's so
satisfying to write code and prove it correct (with a type checker to catch
your mistakes), but so few people get to experience this feeling.

\- Of course, you can also use dependent types to just write proofs for the
purpose of doing mathematics, without any programming. I've written about my
experience doing that here: [https://www.stephanboyer.com/post/134/my-hobby-
proof-enginee...](https://www.stephanboyer.com/post/134/my-hobby-proof-
engineering)

\- But dependent types are also useful in ordinary everyday programming! Many
"features" of programming languages that we use at work are actually just
limited special cases of the general idea of dependent types. For example, in
a dependently typed language, you don't need special support for generics—you
get it for free!

\- Dependent types also eliminate much of the need for macros. In Rust, for
example, you might use a macro to generate a JSON parser for a given type. If
you had dependent types, you could just write a function that takes your type
as an argument and returns the parser!

\- Of course, there's that famous example of length-indexed vectors. The idea
is that you can keep track of the sizes of your arrays in their types, and
statically prevent out-of-bounds errors. So, for example, trying to get the
first element of an empty array would be a type error.

\- Haskell's generalized algebraic datatypes are another example of a limited
special case of the full power you'd get from a dependently typed programming
language.

\- Another example: some languages have special support for existential types
in some form or another. For example, Rust has something called "impl Trait"
which is a limited use of existential types. This is another thing you get for
free with dependent types (or rank-2 polymorphism).

\- Other examples of things you get for free with dependent types: type
aliases, higher-kinded types, higher-rank types, and compile-time code
execution.

Dependent types may seem complicated at first glance. But after seeing how all
these programming language features collapse into a single unified framework,
you might change your mind: dependent types are extremely simple compared to
the cornucopia of concepts we have to learn in their absence! I strongly
believe that programming languages have grown too complex, and dependent types
have the right power-to-weight ratio to cull that complexity.

~~~
devit
The problem is that, as far as I can tell, there is no language with zero-cost
dependent types, i.e. a language with dependent types that has a subset
equivalent to Rust that compiles to machine code as efficient as the Rust
compiler outputs.

Hopefully Rust will evolve into it or a new language will come up for that: we
really need this to finally have a programming language that is strictly
better than all others and can thus be the single language in use and finally
the solve the programming language problem.

~~~
curryhoward
> The problem is that, as far as I can tell, there is no language with zero-
> cost dependent types, i.e. a language with dependent types that has a subset
> equivalent to Rust that compiles to machine code as efficient as the Rust
> compiler outputs.

The Coq language groups types into two categories: Set and Prop. Prop, which
(informally speaking) contains all your proofs, is erased when you "compile"
your Coq code into another language like Haskell or OCaml. This is something
people have thought about quite a bit already.

I think the reason people aren't using dependent types has more to do with the
ecosystem around these languages: the tooling, the libraries, the
documentation, the tutorials, ... None of that is where it needs to be for
these languages to be appealing to industry programmers.

> Hopefully Rust will evolve into it or a new language will come up for that:
> we really need this to finally have a programming language that is strictly
> better than all others and can thus be the single language in use and
> finally the solve the programming language problem.

I have serious doubts about this. Rust, like every mainstream language,
already has too many features that overlap with what dependent types give you
for free. Also, Rust is not a functional language (some people claim it to be,
but there are side effects everywhere!), so it would be a bit of an impedance
mismatch.

As much as I'd like to believe in this unicorn language that is better than
all the others, experience has taught me to be skeptical of that. Programming
is used to solve so many different kinds of problems that it's hard to imagine
a one-size-fits-all solution, but I won't claim it's impossible.

~~~
devit
AFAICT Coq doesn't have linear types, and thus can't provide in-place
modification of array elements or non-reference-counted/GCed non-primitive
values, which means it can't take full advantage of a real CPU+RAM machine.

Also it's not enough to erase irrelevant types, the types that end up in the
program also need to be treated efficiently, which means generic
monomorphization, not auto-boxing things unless essential (or ideally never),
using computed layouts where possible (i.e. store (n, [T; n], [T; n]) in a
single allocation if n is immutable), etc.

The issue is that I think all of the current dependently typed languages don't
have zero-cost abstraction and compiled code efficiency as a primary goal,
they are designed for research or as machine-checked proof systems.

There's also the issue that some features may not compose well, e.g. Rust's
ability to mutate a single field in place (needed for zero-cost since CPUs can
do that with a store instruction) doesn't compose very well with dependent
types because that can change the type of other fields, and also linear types
(again needed to fully use CPUs) don't go well with having proof-like types
that refer to other values, etc. All this seems fixable, but it seems quite
involved to find the most general and ergonomic solution.

~~~
curryhoward
> AFAICT Coq doesn't have linear types, and thus can't provide in-place
> modification of array elements or non-reference-counted/GCed non-primitive
> values, which means it can't take full advantage of a real CPU+RAM machine.

You don't need linear types for in-place updates. You can use a monad (which
is another thing you can easily do with dependent types) to express the side
effect of doing mutations. Then you can compile that monadic Coq code to
Haskell, which then compiles to efficient machine code that actually does in-
place updates.

Though, perhaps it is also worth mentioning that Idris 2 has both linear types
and dependent types.

Regarding the need for automatic memory management: yes, most dependently
typed languages have a garbage collector, but many "real" programming
languages have garbage collectors and programmers are still happy to use them.
I don't know why everyone in this thread is so concerned about performance.
Dependently typed code can compile to reasonable machine code without any
stretch of the imagination.

> Also it's not enough to erase irrelevant types, the types that end up in the
> program also need to be treated efficiently, which means generic
> monomorphization, not auto-boxing things unless essential (or ideally
> never), using computed layouts where possible (i.e. store (n, [T; n], [T;
> n]) in a single allocation if n is immutable), etc.

Memory layout is certainly something that needs to be dealt with, but
compilers can monomorphize where possible to generate fast code with unboxed
types in many cases. I don't think this is the main blocker for dependent
types. I've written dependently typed code that easily outperforms the
dynamically typed code I write for production at work (by at least an order of
magnitude). In fact, dependent types can be used to guarantee that certain
things don't need to be checked at runtime, which can in some cases give even
better performance than what you'd get in a regular old statically-typed
language.

~~~
devit
Monads work for just doing in-place updates, but they seem unable to also
prevent/control mutable aliasing, avoid rc/gc of arrays/records, put
arrays/records inline in the containing record, and anyway they unnecessarily
linearize code, which also seems to play badly with providing proofs.

Mandatory GC is not a zero-cost abstraction (it is ridiculously inefficient
and unnecessary in general), and a language with mandatory GC is a non-starter
as a universal language. C programs (like web browsers) are starting to have
new code written in a different language only now that Rust is available as
the first zero-cost non-GC safe language.

Yes, dependent types improve performance by eliding checks, but that's only
likely to be a net win if the rest of the compilation is optimal.

~~~
tsimionescu
I agree with your point in general, but I think it is vastly exaggerated to
call GC 'ridiculously inefficient' or 'unnecessary in general'. For many
common problems, GCs add at best constant overhead. And, almost all highly
performance critical programs have some kind of (admittedly specialized) GC
mechanisms in place, because often releasing objects as they go out of scope
is not efficient enough.

Even more, the performance limitations of most GC languages have more to do
with the lack of good ways of writing code which simply doesn't allocate,
rather then the problem of collection. GCs are often faster than malloc/free,
but not as fast as simply not allocating/freeing anything.

Finally, it's important to remember that there are algorithms that are
significantly more difficult to implement without a GC tahn with a GC. Even
simple compare-and-swap atomic sets can require many times more code and care
to implement if you have to handle cleanup of temporaries as well.

~~~
devit
All non-toy languages and compilers only add constant overhead; unfortunately
that's still a problem, the goal is to not have any overhead at all.

malloc/free is fast with a moving GC, but moving is not, and if an object is
never moved it's likely that it should not have been allocated on the heap in
the first place.

The cmpxchg problem is solvable by epoch-based reclamation libraries (RCU,
crossbeam-epoch, etc.), and while a traditional GC also solves it, it's not
necessary.

------
auggierose
If you want to read a somewhat longer treatise about dependent types and
language, try the book "Modal Homotopy Type Theory". Modal logic has been
basically invented for linguistic problems.

I don't think dependent types are essential for giving language a formal
semantics though, I think discourse representation theory has a more direct
approach.

------
benrbray
> The main characteristic of the language of mathematics is perhaps that
> different members of the community can completely agree about the meaning of
> each word and sentence.

Haha, this got a laugh out of me :). While this is true in principle, the
reality can be quite messy! For instance, there seems to be little consensus
about what the words "vector", "contravariant", "covariant", and "coordinate"
should mean on the Wikipedia page [1] claiming to explain just that!

[1]
[https://en.wikipedia.org/wiki/Talk:Covariance_and_contravari...](https://en.wikipedia.org/wiki/Talk:Covariance_and_contravariance_of_vectors)

------
gitgud
Is that the same as inheritance of types in CS?

~~~
jmaa
No; lookup Calculus of Construction. Dependent types allows you to specify an
dependence between different values in your type directly within your type
system; which has great implications for type safety and so on.

A classic example is that you can specify the 2-vectors on the unit circle as
"{ (x,y) : x,y in R | x^2 + y^2 = 1 }" (read as "the set of real 2-tuples,
where their norm is equal to 1".) This is not possible in most languages; the
closest you can get in most languages is "{ (x,y) : x,y in R }".

My personal pet peeve with dependent types and proof-driven programming in
general is the wealth of options in formulating propositions. I've seen at
least five different ways of defining equivalency.

~~~
fuoqi
>you can specify the 2-vectors on the unit circle as "{ (x,y) : x,y in R | x^2
+ y^2 = 1 }"

Which is not possible for any language targeting real machines since floats
have fixed precision.

~~~
icen
You can specify the 2-vectors on the unit circle as "{ (x, y) : x, y in R |
x^2 + y^2 - 1 < 0.000001 }" instead.

