At first I wanted to finish this up dismissively by suggesting that maybe it’s just that only the easier things would end up in a gist but having thought about a tiny bit I think there’s a lot to gain from larger examples that won’t fit in half a column in a paper.
I've tried learning about them on my own using books, but I have trouble staying focused and motivated enough to get very far.
As a trivial counterargument, there's nothing preventing programmers from ignoring even the most advanced type system (see "stringly typed programming", for example). In fact, my experience with "advanced type systems" (e.g. Agda, Idris and Coq), the types have never been a hindrance; it's their totality checking that can be painful (that's why Idris makes it easy to turn that off, either completely or for certain pieces of code).
> In fact, my experience with "advanced type systems" (e.g. Agda, Idris and Coq), the types have never been a hindrance; it's their totality checking that can be painful (that's why Idris makes it easy to turn that off, either completely or for certain pieces of code).
A data point: if I'm reading this right, the seL4 formally verified kernel is
14,400 lines of code which required 2.2 person years to write and 20 person years to verify (proofs are done with Isabelle so it's not using types for proofs like e.g. Coq does but you can't escape a similar verification effort):
"The abstract spec took about 4 person months
(pm) to develop. About 2 person years (py) went
into the Haskell prototype (over all project phases),
including design, documentation, coding, and testing.
The executable spec only required setting up the
translator; this took 3 pm.
The initial C translation was done in 3 weeks, in
total the C implementation took about 2 pm, for a
total cost of 2.2 py including the Haskell effort."
"The cost of the proof is higher, in total about 20 py.
This includes significant research and about 9 py
invested in formal language frameworks, proof tools,
proof automation, theorem prover extensions and
libraries. The total effort for the seL4-specific proof
was 11 py"
Why do you think function termination checking is a big factor? I find the majority of loops in apps are of the form "for everything in this collection" or a similar restriction that is simple to check.
Either way, if you start pushing complicated program properties into your types, you're very likely to have complicated proofs to solve by hand. What properties were you proving with types where you say they weren't getting in the way?
However, I don't think it's relevant to the point I was making. You seem to be talking about formally proving things. If you look through my comment that you're replying to, you'll see that I don't talk about proving anything, let alone "complicated software properties".
I was talking about writing programs in languages with 'advanced type systems', as opposed to dynamically typed languages; and that I've not seen compelling evidence to back up common claims like "dynamic languages are faster to program in".
You've given compelling evidence that "formally verifying complex properties of large systems is slow and hard", which I think many would already agree with. That doesn't seem relevant to the typed/untyped programming comparison, since formal verification is still a very uncommon thing to do (ignoring language-provided guarantees like memory safety or type safety).
> proofs are done with Isabelle so it's not using types for proofs like e.g. Coq does but you can't escape a similar verification effort
I don't understand what you mean by "you can't escape a similar verification effort". Not only was I not talking about proving anything, I even stated that the type system can often be ignored and gave "stringly typed programming" as an example (where a program is strongly typed, but almost everything has type `String`, which gives us something quite close to a dynamically typed language; including no guarantees that our variables even have meaningful contents, let alone for any "complicated program properties"!).
> Why do you think function termination checking is a big factor? I find the majority of loops in apps are of the form "for everything in this collection" or a similar restriction that is simple to check.
I tend to write in a functional programming style, which makes heavy use of recursion.
Termination checking (AFAIK) doesn't have the same sort of powerful, composable nature that type checking has, so languages like Coq and Agda (which require termination checking) are very conservative in whether to allow certain forms of recursion or not. It's easy to do "structural recursion", where recursive calls are run on a sub-expression of the argument, e.g.
odd Zero = False
odd (Succ n) = not (odd n)
gcd a 0 = a
gcd a b = gcd b (mod a b)
The hardest cases I tend to run into are co-termination, which Coq is especially bad at, both in practice (co-recursive calls must be manually unwrapped, and there doesn't seem to be any way to abstract over the raw `cofix` form) and in theory ( https://github.com/coq/coq/issues/5288 https://github.com/coq/coq/issues/6768 ). Agda's co-patterns are nicer to work with, but still require weird type-level hacks (like the "sharp" and "flat" operators, "sized types", etc.)
When languages require termination, these sorts of overly-conservative restrictions are inevitable, and we have to deal with them regardless of whether we actually care about termination (or type soundness). That's why I like programming in languages like Idris, where we can selectively ignore termination, and hence the restrictions on what forms we're allowed to write.
> What properties were you proving with types where you say they weren't getting in the way?
I wasn't talking about proving anything, using types or otherwise. I was specifically talking about not proving things, despite the type system allowing us to do so if we want to (which, if we're happy to consider dynamic types, we probably don't want to).
However, when I do happen to be using types to prove things (which, again, is absolutely not required at all, whether or not we're using an 'advanced type system') I find it hard to say that types are "getting in the way", since they're the system of logic for the proofs. We could get such types 'out of the way', but that would also throw away the point of the endeavour. After all, we can 'prove' all sorts of things if we don't allow logic to get in the way ;)
Well I strongly agree with that if you meant strongly statically type programming languages. I think they're a good sweet spot and I'd pick them over dynamic languages even for small projects.
The reason I went down the route of software verification is you said "In fact, my experience with "advanced type systems" (e.g. Agda, Idris and Coq)". Wouldn't OCaml or Haskell have been better examples? Why use Agda, Idris and Coq at all if you're not going to prove advanced program properties?
I would say Coq is used solely for software verification with the help of dependent types, while Agda and Idris are an attempt to make software verification with dependent types more palatable. I'm not sure why you'd use them over OCaml or Haskell if you weren't going to use types that were more advanced.
> It's harder to do non-structural recursion, where the recursive call isn't acting directly on part of the argument, e.g. the greatest common divisor algorithm:
How often are you really doing non-structural recursion though? It's exceptionally rare to write complex loops like a GCD algorithm was my point. Either way, I agree an escape hatch for proving properties can be a practical solution.
Why not? Just because they have advanced type systems doesn't mean that we need to make use of those features; the same way writing a Haskell program doesn't require us to use closed type families, multi-parameter type classes with functional dependencies, etc. or OCaml doesn't require everything to be an applicative functor over a row-polymorphic existential, etc.
As an aside, when I am doing fancy things with types, I personally find dependent types much simpler to think about and work with than the tower of cards that Haskell and GHC have been building via language extensions. The downside is limited type inference :(
In any case I was trying to highlight the common implicit assumption that programs written in languages with feature X (like powerful types) must therefore use feature X. There's no reason for that to be the case: programs in Haskell and OCaml can be stringly typed, so can those in Coq and Idris. Yet this clouds comparisons, where on one side we might have a formally verified microkernel, and on the other we might have a pile of bash string interpolations.
I will readily admit that Coq would be a pretty horrible choice for writing regular software, but mostly due to ergonomics like having multiple nested languages (vernacular, gallina and ltac) and having to "extract" programs to run them. On the other hand I think Idris is really nice, and would enjoy using it for projects which don't rely on some exising library infrastructure.
> How often are you really doing non-structural recursion though? It's exceptionally rare to write complex loops like a GCD algorithm was my point.
It's definitely a small fraction of the number of `map` or `fold` calls I write, but it's not insignificant. I worked on a project recently that made heavy use of sets, so there was a lot of sorting going on which IIRC wasn't structurally recursive; it also calculated least common multiples (similar to greatest common divisor) as part of a random sampling function. That project was in Racket though :P
> Either way, I agree an escape hatch for proving properties can be a practical solution.
In Coq I often just stick on a decreasing "fuel" parameter, turn the result into an `option` and bail out with `none` if the fuel runs out :P
What specific program properties have you tried proving that have required hand proofs you don't think are that difficult? The core issue is that once you try to capture a program property in one part of your program, you have to start capturing it everywhere and it's very difficult to contain the complexity. Simple looking properties can require very deep knowledge and expertise of writing proofs.
OCaml etc. are only likely to add feature that give a good amount of power for a small amount of expertise whereas with Coq, Isabelle etc. they don't need to hold back on what level of expertise is expected from users (e.g. many users are in research publishing journal papers from the expertise required of them). Dependent types give a massive amount of power for a massive rise in the expertise expected from the user.
I don't do much proving by hand; I tend to sketch out possible encodings by hand, but work on actual proofs on a computer. Incidentally I find that such exploratory work is where CoqIDE, ProofGeneral, etc. shine; as opposed to Agda or Idris which seem to work best for "writing up" and "filling in the holes" of proofs we're already confident in.
Most non-difficult things I've formally proven been toy examples. I did enjoy formalising a particular AI approach, which was pretty simple to prove properties about once I figured out a nice encoding https://github.com/warbo/powerplay (for theorems see https://github.com/Warbo/powerplay/blob/master/SimpleTests.v ). Though that's more of a toy theory, rather than a huge programming effort like seL4, CompCert, etc.
> The core issue is that once you try to capture a program property in one part of your program, you have to start capturing it everywhere and it's very difficult to contain the complexity. Simple looking properties can require very deep knowledge and expertise of writing proofs.
Very true. There are essentially two ways to do it:
- Use standard encodings (e.g. lists and peano naturals), then build up a mountain of lemmas about them, and plug those together in different combinations each time we want to prove a property (e.g. the elements are sorted).
- Use "correct by construction" encodings (e.g. a list of differences) which guarantee the properties we want, then build up a mountain of helper functions which act on those encodings and preserve the properties (e.g. `mapDiffs`, `foldDiffs`, `filterDiffs`, etc.)
I've seen a couple of potential improvements for this proposed; though I'm not sure whether they scale:
- Adam Chlipala likes to build up a custom proof tactic (in Certified Programming with Dependent Types this is called `crush`). Whenever that tactic doesn't work, he proves the statement manually, then alters the tactic's search algorithm so it can find that proof, then replaces the manual proof by a call to the improved tactic.
- Conor McBride proposed "ornaments" as a way to "add on" property-guaranteeing information to standard encodings, e.g. treating normal lists as lists-of-differences to preserve sortedness.
> OCaml etc. are only likely to add feature that give a good amount of power for a small amount of expertise whereas with Coq, Isabelle etc. they don't need to hold back on what level of expertise is expected from users (e.g. many users are in research publishing journal papers from the expertise required of them). Dependent types give a massive amount of power for a massive rise in the expertise expected from the user.
I don't think dependent types themselves are particularly difficult to grok, although they can certainly be used for all sorts of elaborate shenanigans. I don't think that's any different from programming languages themselves though, e.g. some languages like Basic and Scheme are routinely taught to children, despite there presumably being some monstrously fiendish projects out there which use those languages. I've also encountered some bewilderingly complex spreadsheets in my time, but I wouldn't say that spreadsheets themselves require difficult expertise.
I certainly find the few, simple rules of dependent types much easier to understand than the mess that is type level programming in Haskell (data kinds, singletons, associated types, open/closed type families, etc.). I still don't fully understand that stuff, but I do know that:
- Data kinds in Haskell are just normal values in dependently typed languages
- (Closed) type families in Haskell are just normal functions in dependently typed languages
- Type classes and instances in Haskell are just normal record types and values in dependently typed languages
- Constraints in Haskell are just function arguments in dependently typed languages (this also generalises constraint solving to implicit arguments)
- Associated types in Haskell are just normal record fields in dependently typed languages
Most of these features are just a "shadow language" ( https://gbracha.blogspot.com/2014/09/a-domain-of-shadows.htm... ), which are completely unnecessary when types are first-class, and especially when dependent types are allowed.
Again, just because something is available doesn't mean that we need to use it. The only pressure to do so comes from existing libraries. As another example, many Haskell libraries, including the Prelude (built-ins) and "do notation" make use of monads (via the `Monad` type class). Yet we're free to ignore all of that and never touch monads at all; we could spend our entire programming career in the language, blisfully unaware that such a thing even exists. The language does require that we define a `main` value as our program's entry point (just like Java and C), and that value has a type like `IO ()`; hence we're forced to use `IO`. Yet we can use `IO` without ever knowing or caring that it just-so-happens to implement the `Monad` type class: we can do everything we want via functions like `runIO :: IO (IO a) -> IO a` instead of `join :: (Monad m) => m (m a) -> m a`, and `sequenceIO :: [IO a] -> IO a` instead of `sequence :: (Monad m) => [m a] -> m a`, etc. in the same way that many programmers spend their entire careers making heavy use of lists (perhaps even using a LISt Processing language!) without having to know or care that `List` just-so-happens to be a monad.
I wouldn't call that datapoint robust nor empirical, but Scala developers have been complaining about the slowness of their builds for more than ten years now, and it's pretty much accepted that the Scala compiler will never get marginally faster than it is today. The Scala 3 compiler is not looking much better either.
The speed of the compiler is related to the design of both the language and the compiler as a whole, not the presence or absence of a decent type system. I would even argue that one of the point to recognize a well designed type system is that it's conceived in a way that admit fast typechecking.
I've not used Scala much, but I know that while Haskell's de facto implementation is the Glasgow Haskell Compiler, there are also (unmaintained) interpreters like HUGS, so compilation isn't a necessity.
Plus there's the whole spectrum of Futamura projections, JITs, etc. ;)
Note: This could also be true for programming tooling/environmental reasons, and strongly typed languages do help smaller teams handle larger codebases/reduce tests!
One clear failure of dynamic types for Web programming is injection attacks, which are trivially solved by treating HTML, SQL, etc. as separate types (e.g. http://blog.moertel.com/posts/2006-10-18-a-type-based-soluti... )
If we are talking about a first iteration/mvp then I can agree with you. On the other once you have to maintain/extend/refactor this first iteration you'll start seeing the (imo) enormous benefits of static typing.
There is a (perhaps wrong) belief that being first to market greatly increases your chances of success if not outright dominance in an area.
People are unconcerned with the possibility that if successful they will be left with a mess to clean up and a system that is hard to evolve. They believe those are easy problems to fix.
But it's mostly staticly typed already anyway.
Is it normal to refer to data structures as objects? Just to my knowledge most typed languages abhor the awesome potential of object-oriented programming, like the CLOS.
Also, lexical closures in a functional language are "classes" from the OO setting, so it is natural to implement classes if you want them; see e.g. OCaml. What you rarely see in the functional setting are common OO design patterns, because many of those patterns are unnecessary or awkward in functional languages.
That is, objects and closures can both be implemented in terms of the other, if your language doesn't happen to offer both.
(defun constructor (x y)
(list (lambda (a) (+ x a)) (lambda (a) (* y a))))
terms allowed to depend on types, corresponding to polymorphism.
types depending on terms, corresponding to dependent types.
types depending on types, corresponding to type operators.
`object' here refers neither to the objects of object-oriented programming nor to object in common lisp object system.
The meta level is about lambda terms, involving things like variable names, abstractions, judgements, derivations, equivalences, reductions, substitutions, etc.
The object level is what lambda terms are about, involving things like values, types, variable contents, etc.
direct link to a preview of the 2nd edition:
direct link to an unabridged version of the 1st edition:
The whole book is basically taking a core minimal language (Oz) and extending it in various ways to obtain different paradigms (functional programming, declarative concurrency, message passing concurrency, imperative/stateful programming, OOP, shared state concurrency, constraint programming, logical programming etc).
It's very fun and illuminating.
It also includes a part on formalizing semantics, of which
I recall nothing.
And it's free.