Hacker News new | comments | ask | show | jobs | submit login
Clojure.spec – Rationale and Overview (clojure.org)
474 points by SCdF on May 23, 2016 | hide | past | web | favorite | 153 comments



This is an important move for the Clojure language. As with all dynamic languages, and exarcerberated by the fact that it encourages relying on data first - as opposed to say, classes in other languages - Clojure puts the onus on the developer to catch errors in data.

The clojure community as a whole embraced prismatic's schema as a way to provide occurence typing for data. Since library authors often tend to reduce their dependency surface, most libraries do not ship with it though.

With the inclusion of this in clojure core, it will now be possible to provide occurence typing at the edge of functions instead of tracking malformed input deep inside apps.

The racket-type contract notation is also a very welcome change from other similar approaches in my opinion.

[1]: https://github.com/plumatic/schema


Agreed that this is a big step, but what is being included in core is very distinct from Prismatic Schema. For example the way keysets vs values are handled is intentionally different from schema.

The rationale is a good read.


Yes, I've read and see the difference from schema. The huge advantage with this, is that even though the approach is different, it supersedes schema's abilities and it will be a require away for every library and application author.


> The rationale is a good read.

Hickey's design choices consistently appear to be exceptionally well reasoned.


It's because of the hammock time.


The code has just landed [1]. It's not a .cljc file yet, but fingers crossed this becomes a thing at some point.

[1]: https://github.com/clojure/clojure/commit/3394bbe616c6202618...


It will not be a cljc file, but it will be added to ClojureScript too.


Luckily Common Lisp already has ways to do that:

Functions have defined keyword argument lists and can have type declarations which for example a compiler like SBCL can partially check at compile-time.

Common Lisp uses for anything more complex than lists of keyword value combinations CLOS classes, which form a multiple inheritance hierarchy and types for slots.

That way I can look into a debugger at runtime and see what functions are actually supposed to receive and what these collects of keyword/value maps actually mean (since they are CLOS classes).

Having simple lists as maps was popular with assoc lists in the 70s - basically the same as hash tables, but slower. Then lots of dynamic object systems were developed which were in essence dynamic key/value sets. Later various object systems appeard, with CLOS as a local maximum.

Where Rich Hickey points to RDF, there is a lot prior art in Lisp as well -> Frame systems / description logics / ....

http://wilbur-rdf.sourceforge.net/docs/ivanhoe.html

http://franz.com/agraph/racer/krss-spec.pdf and zillion others

CLIM, the Common Lisp interface manager, uses 'presentation types' to specify user level data structures...


Clojure, with its dabbling in schemas and rule systems / logic programming (as a substitute for conditional statements) and thanks to its great tools (like figwheel or devcards), could really be establishing a cheaper, "as-good" alternative to static typing and full formal verification.

Instead of encoding constraints as type signatures, the Clojure folks (true to character) encode them in data. In my eyes a very interesting, pragmatic trade-off between expressiveness and automatic verifiability.


> Instead of encoding constraints as type signatures, the Clojure folks (true to character) encode them in data.

I've never bought this line of thinking when the "Clojure folks" also say "code is data", because then a syntactic type signature you write down is also just data. If you want to consume the type signature (or spec signature or whatever) as syntax (as a sibling comment suggests), use a macro or any other program which consumes programs.

The thing I do see as being important is not how you write these things down, but whether they have an accessible representation at run/read/compile/whenever time.

Lastly, I strongly disagree these approaches are meant as a replacement for static typing or formal verification because (as far as I can tell) a value can be checked against a spec on demand, but makes zero guarantees as to what that value does in other parts of a program. This is also in contrast to Racket contracts, which will give you the correct blame information between parties using and not using contracts.


You make a very good point, the comparison was flawed there.

> The thing I do see as being important is not how you write these things down, but whether they have an accessible representation at run/read/compile/whenever time.

This is a better way to put it. The second important factor for me is expressiveness. With spec or any other contracts-like system one gets the full power of the language to express constraints. Of course type systems are not artificially restricted in this regard, they simply make a different trade-off.

I hope my comment did not come off as a riff on static vs dynamic typing, and I don't think any contract system is meant to replace type systems. Until expressing all important program specifications formally becomes viable for everyone (maybe through this work https://www.math.ias.edu/vladimir/current_work?), a less-formal, dynamic approach seems very attractive.


I hear you on expressiveness of using the language. Of course (and as you mention) the strength of restricting the expressiveness is that it becomes decidable in some amount of acceptable time for most programs.

I agree with your last sentiment but I don't think it has to be any less "formal" than something like Coq. I just think we shouldn't be so concerned with proving all properties of a program before running. And then we can turn the knob to adjust how much to prove during vs. before running.


> a value can be checked against a spec on demand, but makes zero guarantees as to what that value does in other parts of a program

The spec is just metadata. How it's used is part of the tooling. I see nothing precluding using specs in tools that employ and analyze that information in different ways.


Yep you are totally correct. I only point this out because the tool presented here does not do the same thing as other contract systems, but could (as you say) be used as a building block to get there.


Important to note, that as data, they can be consumed by other systems for other purposes. For example, it's feasible to consume a spec within Alloy and use a SAT solver (which is made even easier given the split on keyset). This pushes spec from design and requirements validation, into runtime constraints, and on through to tests and verification.


> dabbling in schemas and rule systems / logic programming (as a substitute for conditional statements

That sounds interesting. Could you elaborate or post a link?


As a side note in his talk "Simple Made Easy" (https://www.infoq.com/presentations/Simple-Made-Easy, around minute 42) Rich Hickey mentions, that conditional statements are complex, because they spread (business-)logic throughout the program.

As a simpler (in the Hickey-sense) alternative, he lists rule systems and logic programming. For example, keeping parts of the business logic ("What do we consider an 'active' user?", "When do we notify a user?", etc...) as datalog expressions, maybe even storing them in a database, specifies them all in a single place. This helps to ensure consistency throughout the program. One could even give access to these specifications to a client, who can then customise the application directly in logic, instead of chasing throughout the whole code base.

Basically everyone involved agrees on a common language of predicates explicitly, instead of informally in database queries, UI, application code, etc...

But Hickey also notes that this thinking is pretty "cutting-edge" and probably not yet terribly practical.


It can work. My current company uses a rule system to represent most of our business logic since it is so dynamic. The downside is that we have to rebuild the entire graph into memory (times the number of threads, times the number of app servers) every time anything changes (which is constant).

Facebook wrote about rebuilding a similar system in Haskell that only changes memory incrementally, so it's definitely possible to do better.


Interesting note, thank you. Are you referring to "Sigma" https://code.facebook.com/posts/745068642270222/fighting-spa... ?


That's the one.


Quick refs to logic programming and pattern matching in Clojure.

core.logic https://github.com/clojure/core.logic core.match https://github.com/clojure/core.match


I assume @ComNik is referring to core.schema and core.logic[1] here, respectively? (if not, sorry @ComNik)

1: https://clojure.github.io/core.logic/


Cheaper? How so?


Full, formal verification of complex systems (especially in the distributed case) requires a lot of modelling effort, beyond what is needed for "mere" implementation. A lot of important constraints cannot even be expressed in most type systems. And if one uses a specialised modelling language for the proof, the actual implementation might introduce bugs. Also relevant to real, shipping software: specifying a program in such that detail as required for automatic verification makes it even harder to change (which, given infinite time and money, is indeed a very very good property!).

This just goes to say, that a type system can be very helpful, but is ultimately just a part of regular testing.

So for most companies and most developers, anything that aides in keeping documentation up-to-date, writing or generating tests and helping developers understand what they are reading is probably a better ROI.


If static type systems cannot do everything, it does not make sense to me to conclude that they shouldn't then be used for anything. Type systems aren't part of "regular testing", they're part of compile-time error checking.


That was not the intension behind my comment at all. I rely on type systems a lot, myself.

I was trying to note that Clojure, as a dynamic language, is making a (to my eyes) very interesting choice, of doubling down on these dynamic methods of verification. For some uses, I can see this as the better choice, for others it isn't and won't be.


Sure, and I can't fault anyone for wanting less dangerous dynamic languages. But to me it's kind of like wanting less dangerous cigarettes.


As I commented elsewhere, Clojure very deliberately adopts the stance that type systems like Haskell's are valuable but heavyweight, and chose development speed and concision over types.

In my own work, I chose Clojure because it mirrored my experiences: that type systems weren't worth the trade-offs for me and my field.

Would I prefer stronger typing in different domains (longer project times available, greater consequences of failure like medical devices, etc.)? Sure, and I have in the past. But it's not the case that dynamic languages are always the incorrect choice, like smoking cigarettes.


Or like leaving the house without a suit of plate mail.


When you prove that a language like Clojure (untyped, but with dispatch that is no more dynamic than that of many modern typed languages) combined with a construct like specs is significantly less safe than any typed language, then you can make that statement. It's not like untyped and typed are two binary categories; otherwise, I could say that wanting less dangerous typed languages as opposed to formal tools are just like wanting less dangerous cigarettes. These ideas exist on a spectrum.


> spec has been designed from the ground up to directly support generative testing via test.check. When you use spec you get generative tests for free.

Great, I've been holding out on digging into the test.check generator syntax... looks like my laziness paid off!


It's still worth digging in! There are times when you will want to write your own generators.


It's unfortunate that Rich Hickey (assume he's the author) can't write this without trolling about type systems. It's pretty clear to anyone who's read the literature, or has thought for a few moments, that contract systems provide different guarantees to type systems, so no need to poke at type systems throughout the Guidelines.


For one it goes with the territory. When designing a language you need to emphasize its strong points and Clojure's strong point is its (sane) dynamic nature and they are playing to its strengths for example in solutions such as their Transducers.

I don't view it as trolling. The mentality and style in dynamic languages are simply different. Once you add static typing, you break that mentality, replacing it with something different. Not inferior, just different. The patterns, the solutions end up looking different. This is why static typing added to a LISP like Clojure can only be optional and probably less powerful than what you can get with something like Haskell.

Or in other words, type systems are not practical for Clojure. There are attempts, like core.typed. Such solutions don't work very well because as said, adding types changes what you're allowed to do, even if those types are optional. So he's speaking primarily in the context of Clojure.

Also, poking fun at your competition is good marketing. People love it :-P


I think you're "seeing what you want to" in his comments. One of the reasons I appreciate clojure and the community so much is that it's one of the few that doesn't actually troll things.

If anything, the community tends to give really well thought out reasoning around why they make the decisions they are making.

When making an argument, sometimes it's important to explain why something is different, or why you've made the choice you made. Just because you explain why your idea is different doesn't make it trolling.


I highly doubt he's trolling. In his own words from a post where he asked people to refrain from language wars: "Haskell is a tremendously ambitious and inspiring language, for which I have the highest respect. There is zero point in bashing it (or its users) in this forum."

Clojure very deliberately adopts the stance that type systems like Haskell's are valuable but heavyweight, and chose development speed and concision over types.


It's quite annoying, but he definitely has a tendency to do this, though I wouldn't go so far as to call it 'trolling' per se. I think he's more sort of speaking to the 'in-group' and that can often be quite off-putting for people who are not members of that group because all the "but..." caveats are missing.

Btw, remember 'transducers'? He did the exact same thing there and claimed (in so many words) that it couldn't really be done in a typed language until people showed that, yes, it really could be done in a typed language such as Haskell. I suspect it's because he just doesn't know enough Haskell and couldn't get the type signatures to work. However, it's a big leap from "I cannot figure out how to do it" to "It cannot be done" and it requires a wee bit of (probably 'well-earned') arrogance to make that leap. :)


Rich provided the Haskell translation at https://www.reddit.com/r/haskell/comments/2cv6l4/clojures_tr.... His point was that some aspects of transducers were not possible to capture well in a type signature.


Some aspects, as in: not possible to describe state machines by types.

This is a well known problem, an active subject of research actually. Add asynchrony in the equation and you're out of luck. And transducers can also have asynchronous behavior.


> Some aspects, as in: not possible to describe state machines by types.

I really don't understand this, I mean:

    data TrafficLight = Red Int
                      | Green Int
                      | Yellow
(let's assume that you have a flexible traffic control system which permits chaning the duration of Red/Green due to live traffic data, but Yellow must always be, say, 5 seconds because the safety requirements say so.)

Add a function

    nextState :: Input -> State -> State
    nextState = ...
and you're done. Am I missing something?

EDIT: Oh, I think I see what you're saying. This doesn't provide that many guarantees at the type level about state transitions, but I'd counter that: a) You have other options than "Int", so you could manage valid states more strictly, and b) There's such a thing as Dependently-Typed languages, say Idris, so you can go even further. Then, perhaps Dependently-Types languages was what you meant by "active subject of research". I'd probably agree on that :)

> Add asynchrony in the equation and you're out of luck.

Not sure how that's to be interpreted? Are you talking about asynchronous updates to state, or...?

> And transducers can also have asynchronous behavior.

Is there a formal description of what semantics they have in that scenario? Or are we just going by implementation details?

EDIT: I should say: I'm not being faceitous or going for any kind of 'gotcha' type thing here. I'm really curious; maybe I'm just misunderstanding your post.


I'm as clueless as you are about that asynchrony business, but would like to point out that you are not fully describing your state machine within the type system unless the typechecker prohibits you from defining (nextState Yellow == Green).


That would be an invalid definition since you must provide an (integer) parameter to Green. Effectively you must "magic" a default value for what happens when Yellow goes to Green.

(Obviously in my sketched scenario the value would really come from Input.)


That's true. My point was just that your type definition does not tell the compiler/typechecker that the light, if it is currently yellow, must become red before it can become green.

That's a rule of the state machine and, if the compiler doesn't stop you from writing a function that breaks that rule then your state machine has not been modeled fully within the type system.

I'm pretty sure this is what Hickey et al mean when they say that Haskell's type system is not powerful enough to describe a state machine.


> That's true. My point was just that your type definition does not tell the compiler/typechecker that the light, if it is currently yellow, must become red before it can become green.

Ah, yes, I understand what you mean now, but you can actually encode that if you really want to. (If we assume Haskell, there's very limited DT programming which will probably let you state which transitions are legal. I must admit I'm not very up-to-speed on it, but see the next paragraph.)

The thing is that people usually don't want to go that far. Whether you're working in a typed scenario or not. In the unityped scenario, you can't go that far without reinventing a (run-time?) type-checker.

> That's a rule of the state machine and, if the compiler doesn't stop you from writing a function that breaks that rule then your state machine has not been modeled fully within the type system.

> I'm pretty sure this is what Hickey et al mean when they say that Haskell's type system is not powerful enough to describe a state machine.

Certainly this is true, but if you're in an entirely unityped setting... do you have any guarantees?

(To which the answer is, of course, "no".)

Foregoing all guarantees just because you can't guarantee everything seems... rash. (I believe this is the position that Hickey takes, for better or worse.)

Obviously, that's why I, personally, prefer languages with multiple types.

Note: I do think there are geniune scenarios where unityped may win out, e.g. "generic data transformation without knowing anything about the structure", but there's even "typed" answers for that, e.g. lenses. For my money -- and I'm obviously semi-joking -- ekmett >> hickey :)


> Certainly this is true, but if you're in an entirely unityped setting... do you have any guarantees?

This is what clojure.spec is all about. The answer is not, of course, "no". They are "reinventing a run-time type checker" and prefer it that way because it allows increased expressive power at the cost of delayed verification.

> Ah, yes, I understand what you mean now, but you can actually encode that if you really want to. (If we assume Haskell, there's very limited DT programming which will probably let you state which transitions are legal. I must admit I'm not very up-to-speed on it, but see the next paragraph.)

So you don't actually know how to do it, but assume that you can. This a little strange. Are you sure that "people usually don't want to go that far"? Because the real answer is probably "it's too hard to do".

By the way, you probably want linear types in order to encode state machines in the type system in a natural way, which Haskell does not have.


> reinventing a run-time type checker

It's not a runtime type checker, but a spec (not a type[1]!) that tools can use to check in different ways.

------------

[1]: A spec is a proposition; a type is a judgment (that corresponds to a proposition). E.g., you can have a type: `data A = 1 | 2 | 3`, and `data B = 3 | 4 | 5`, but then testing `a < b` (assuming `a:A b:B`) may not be straightforward (depending on the language), while a spec of `A = {1, 2, 3}, B = {1, 2, 3}, a ∈ A, b ∈ B` does let you test `a < b`.


Confusing types with typing judgments is wrong. A type is just one of the constituent parts of a typing judgment.


Types are propositions. See Curry-Howard.


Types encode propositions (see Curry-Howard) but are logical judgments in the language's logic. The expression `a:A` does not have a boolean truth value (like a logical proposition), but is rather a judgment. In my example above, the value of the expression `a ∈ B` is true iff a = 3 and false otherwise, but the expression `a:B` is just not a well-formed expression. In other words, you could create a type that corresponds to a proposition, but type assignment is distinct from a proposition (e.g. the proposition `b1 < b2`, assuming `b1,b2:B`) in the language.

More precisely, you could say that types encode propositions at the type level (even if the language allows the same data-level syntax in type definitions, as in Idris), while propositions at the data level are distinct, while specs are data level propositions.


`x:T` is a typing judgment. A typing judgment is not a type, even if one of its constituent parts (`T`) is a type.

What you call “data-level propositions” is really just “data-level abstract syntax representation of propositions”. Propositions really are types. (Though, of course, there are types that are not propositions.)

And your example is simply not valid Haskell.


> A typing judgment is not a type

You are absolutely right. `x` is a proof object of `T` in the logic induced by the language, but I didn't want to get to that level of technicality, just to show the difference between data-level propositions and type-level ones. That's why I added the second paragraph, but in the process confused matters; sorry.

> Propositions really are types.

Types have a syntactic meaning that determines the well-formedness of an expression. Once you talk about types, you must talk about a language; a type cannot exist without a language because it is a syntactic construct, but a proposition can: it is a set (of programs, or Turing machines if you like; people in the verification community may say it's a set of states in a Kripke structure). Now, you could say that given the proposition (or the set) P, you can construct type P in language L that expresses it, but P would still not be a data-level expression in L. You can then argue about whether or not this matters, but you can't deny that the two expressions of proposition, i.e., the dynamic `x < 10` and `T`, are distinct and "live" in two separate worlds. When data-level propositions are concerned, this is not the case. E.g., in TLA+, an untyped logic (TLA+ZFC), the program and the propositions are the very same objects and not at all just a "data-level abstract syntax representation of propositions".

> And your example is simply not valid Haskell.

It wasn't meant to be (I haven't written a Haskell program in over 15 years). Sadly, unlike standard logic and set notation, there isn't a standard notation for typed LC with data types, or any typed logic for that matter (not one that I'm familiar with), so you have to use something with a meaning people would understand or define a syntax every time.


> but a proposition can: it is a set

Set theories are normally built on top of the apparatus of logic, so a proposition can't be a set.

> (of programs, or Turing machines if you like; people in the verification community may say it's a set of states in a Kripke structure)

Ah, I see. You aren't talking about general propositions. You're talking about assertions about program states that can be checked dynamically, but want to verify statically. You can't represent an arbitrary proposition as a data structure, but it's of course a-whole-nother matter if you can decide the proposition's truth value by evaluating a Java expression of type `boolean`. The `boolean` expression itself is still not a proposition, though.

> but you can't deny that the two expressions of proposition, i.e., the dynamic `x < 10` and `T`, are distinct and "live" in two separate worlds.

Most programming languages don't have propositions at all, for a very good reason: a proposition is the type of a programmer-supplied proof that has no runtime computational relevance, and how to integrate proofs into programming is still a research question whose answer language designers can't wait for. In Java, “the dynamic `x < 10`” is just a `boolean` expression - not a proposition!

> E.g., in TLA+, an untyped logic (TLA+ZFC), the program and the propositions are the very same objects and not at all just a "data-level abstract syntax representation of propositions".

In TLA+, or any other external program logic, the program itself is just a syntactic object. And by extension so is any `boolean` expression the program might contain. OTOH, if Java had propositions, you could carry out your reasoning internally, side-stepping the need to encode propositions as abstract syntax for a metalanguage to see.


> Set theories are normally built on top of the apparatus of logic, so a proposition can't be a set.

You don't need a set theory. P is a "naive" set. It's well defined no matter what, because TMs are a recursive set, and P is just a RE subset. You're not in danger of running into any sharp edges.

> The `boolean` expression itself is still not a proposition, though.

I'm not sure what you mean by that, but I think we're talking past each other. That boolean expression is not a proposition about the program -- that is correct -- but the distinction may be artificial. You may be used to languages where there's a clear distinction between program properties and program code, but if you think about it, that distinction is artificial. For example, the expression `IF x > 0 THEN x ELSE -x` can be understood to be a program description taking the absolute value, but it can also be a proposition, or a description of the set of programs returning the absolute value of their input. Your language may use the same expression in both cases, yet still make a distinction about its meaning based on where it is used. Another language (like TLA+) has no such distinction.

> a proposition is the type of a programmer-supplied proof that has no runtime computational relevance

That's your definition. For me, a proposition about code is a set of TMs with a property (in particular, safety properties are a closed set of TMs, and liveness properties are dense etc. etc.). Its truth value is whether your program is in this set or not. To be more precise, a proposition to me is a sentence in any logic denoting that set.

I think my definition is both more encompassing and more precise than yours. It is more encompassing because every proposition by your definition is a proposition by my definition; it is more precise because the converse may not be true (although, again, I'm not certain about what your definition really means, its ties to a specific language, or whether or not it applies only to denotational semantics of programs).

> has no runtime computational relevance

I'm not sure what you mean by that. Does the proposition "the program returns a list containing exactly the same elements in its input, each appearing the same number of times, only in sorted order" has a runtime computational relevance or not?

> In TLA+, or any other external program logic, the program itself is just a syntactic object.

TLA+ is not an external program logic, but a complete logic -- i.e. a TLA+ expression can describe any unique and arbitrary TM, or a set of them -- and there is no difference between a program and a proposition about a program in TLA+. The program is certainly not a syntactic element in TLA+ (certainly no more than any proposition is a syntactic element). Conceptually, in TLA+ you can describe the set of all programs that perform quicksort (it can be more than one; TLA+ allows nondeterminism) and then ask whether quicksort implies sorting, or, in other words, whether the set of quicksort programs is a subset of all programs that sort their input. Both the program and the proposition are logical expressions (describing a set of TMs, if you will), and you may then try to prove that one implies the other (or any logical combination of the two):

In a quest for minimality and orthogonality of concepts, TLA+ does not formally distinguish between specifications and properties: both are written as logical formulas, and concepts such as refinement, composition of systems or hiding of internal state are expressed using logical connectives of implication, conjunction, and quantification. Despite its expressiveness, TLA+ is supported by tools such as model checkers and theorem provers to aid a designer carry out formal developments.[1]

[1]: http://www.loria.fr/~merz/papers/tla+logic2008.pdf


P.S.

If you think about it, the conjunction of all true propositions about your program is the program (modulo the equality semantics of your logic). This is true whether or not your program is deterministic, or, in the case of LC, is a collection of state machines, depending on the evaluation strategy). Conceptually, then, there is no difference between a program and the true propositions about it. Some languages make a syntactic distinction between the two (Haskell), some languages use similar syntax but different semantics (Idris), and some make no syntactic or semantic distinction (TLA+): your program is a logical sentence (a conjunction of axioms, if you will), from which all true propositions about it can be derived (and, obviously, only those). A program is the complete axiomatization of all true propositions about it[1].

[1]: This may get more tricky if intuitionistic logic is used, where the program is actually the proof object of all true propositions, but I think that even in intuitionistic logic, the distinction may be artificial: a proof may be no more than a hidden proposition.


There's so much that's wrong with this comment. You're either ignorant of formal logic, or deliberately trolling. If the former: being ignorant isn't a crime, but pretending you know what you don't know is uncivilized. If the latter: of course trolling is uncivilized. Let's keep things civilized, okay?

> If you think about it, the conjunction of all true propositions about your program is the program (modulo the equality semantics of your logic).

This doesn't even make sense. A program can be run. What does it mean to “run a proposition”? Even if I charitably interpret that as “decide its truth value”, well, not all propositions are decidable!

> your program is a logical sentence (a conjunction of axioms, if you will),

What you can treat as a bunch of axioms is the semantics of your programming language of choice. But a program isn't the same thing as the language it's written in.

> the distinction may be artificial: a proof may be no more than a hidden proposition.

This isn't true in either classical or intuitionistic logic. A proposition has a truth value - a proof does not.


> Let's keep things civilized, okay?

Formal verification of large, very complex real-world systems is what I do all day every day. If you don't understand what it is that I'm saying (and that's my fault -- I'm probably not clear enough) you can just ask. I'm a practitioner, not a theoretician, so I may make mistakes, but I think that in this particular case you are too attached to the typed LC to consider other reasoning systems (which, BTW, happen to be far more prevalent in the verification community).

> This doesn't even make sense. A program can be run. What does it mean to “run a proposition”?

Let's step back and consider the theory and then get to the specifics. Consider the set of all TMs (perhaps TMs isn't the best name because it may imply a specific programming model, but that is not my intent; I'm using the term here interchangeably with an abstract state machine, or a program). Each and every one of them is "runnable", right? Each and every subset of that set also consists of TMs, which are also runnable, right? A proposition about a TM constitutes such a subset. Under certain conditions, you can incorporate all DTMs in the set corresponding to the proposition into a single NDTM, which can also be run. Hence, there is a direct correspondence between propositions about programs (e.g., the program returns its input sorted) and NDTMs. Hence, you can most certainly run a proposition about programs (though perhaps not all propositions).

The temporal logic of actions (TLA) is a complete logic in the sense that for every NDTM (or a DTM as a special case) there exists a (non-unique) TLA expression, consisting of the simple logical connectives, the priming operator (an action) and possibly LTL operators (describing say, fairness, properties) that describes that particular TM. But, of course, a TLA expression can be less precise, and describe a whole set of TMs -- a proposition. Therefore in TLA, propositions and programs are the same object. Here's Lamport in the opening paragraph introducing TLA in the 1993 paper[1]:

Correctness of [an] algorithm means that the program satisfies a desired property. We propose a simpler approach in which both the algorithm and the property are specified by formulas in a single logic. Correctness of the algorithm means that the formula specifying the algorithm implies the formula specifying the property, where implies is ordinary logical implication.

FYI, that paper has 2458 citations and is far from obscure. This theory is much better known in the software verification community than type theory (certainly when you consider it within the greater framework of temporal logics, of which TLA is one), and has had a greater impact -- both academic and real-world -- on formal methods, at least so far. Specifying programs as propositions in temporal logics was a modern theoretical turning point in CS, and has yielded two or three Turing awards (Pnueli in '96, Clarke et al. in '07, and Lamport -- at least partly for this -- in '13).

> Even if I charitably interpret that as “decide its truth value”, well, not all propositions are decidable!

This is where things are a bit tricky. For a complete treatment see the preliminaries section in this '88 paper by Abadi and Lamport[2].

> A proposition has a truth value - a proof does not.

Of course, but there's room for interpretation here (like I said, I'm not a type expert so I'm not 100% sure about it): `<x = 3, y = 5>` can be viewed as a proof object of the proposition `∃x,y ∈ 1..10 . x < y`, but it can also be viewed as the more precise proposition, `x = 3 ∧ y = 5`. Existential propositions (or types) can therefore be viewed as a set of multiple programs, while the proof object is a particular program, that could have been specified with a more precise proposition. That is what I meant when I said "hidden proposition".

If you step back from typed LC and consider the original proof objects, those objects are just a series (a tree, actually) of logical deductions. Each of those deductions is a proposition. A proof is, therefore, a series of true propositions, each derived from the previous one using one of the deduction rules. You can certainly say, then, that a proof is a collection of hidden (true) propositions.

[1]: http://research.microsoft.com/pubs/64074/lamport-actions.pdf

[2]: http://research.microsoft.com/pubs/64046/abadi-existence.pdf


> Consider the set of all TMs (...) Each and every one of them is "runnable", right? Each and every subset of that set also consists of TMs, which are also runnable, right?

This much makes sense.

> A proposition about a TM constitutes such a subset.

A unary predicate whose argument ranges over TM may have such a subset as its extension. Two syntactically different predicates may have the same extension.

> If you step back from typed LC

At this point I'm not even talking about types. I'm talking about deductive systems in general.

> and consider the original proof objects, those objects are just a series (a tree, actually) of logical deductions.

This part is correct.

> Each of those deductions is a proposition.

Propositions exist inside of a logic, deductions exist in the logic's metatheory. There oughtn't even be the possibility of conflating them.

> A proof is, therefore, a series of true propositions, each derived from the previous one using one of the deduction rules.

A deduction is a series of judgments related by rules of inference. But a judgment is not a proposition!


> A unary predicate whose argument ranges over TM may have such a subset as its extension.

Yes, but now we can talk about a specific logic/language, and now it matters exactly how the NDTM is expressed syntactically. E.g. in TLA (and LTL, or CTL for that matter) `[](x ∈ {1,2,3})` means "the variable x in the program is always in 1..3", or (semantically) "the set of all TMs where x is always in 1..3". x is a name given to a variable in an infinite set of variables defining the abstract machine's state (i.e. a state in TLA is defined to be an assignment to the infinite set to all possible variables), and the existence of the machine (or a Kripke structure, if you will) is implied. If you only supply a proposition on variable x, it means that all other (infinite set of) variables can take any value.

You can read about TLA/TLA+ in one of the links I provided to get the specifics (it's extremely easy to pick up, BTW). In any event, a proposition in TLA can be thought of as a predicate over a(n implicit) TM. You may also find this short note by Lamport about the history of TLA/+ (and in particular the unification of temporal-logic propositions and programs) interesting[1].

> Two syntactically different predicates may have the same extension.

True, I didn't say the syntactic representation is unique. Also, in TLA it's no longer just a predicate but any proposition.

> There oughtn't even be the possibility of conflating them.

I am not conflating them, and I'm not saying a proof is formally a proposition. I'm saying that in intuitionistic logic, a proof (especially of an existential quantification) may be also interpreted as a more precise/limited proposition. But this is just a vague notion, and it could be completely wrong (even if I were able to communicate it clearly), so I see no point in debating this.

[1]: http://research.microsoft.com/en-us/um/people/lamport/pubs/c...


> That boolean expression is not a proposition about the program -- that is correct -- but the distinction may be artificial.

A Java `boolean` expression isn't a proposition about anything. A proposition might be, say, the assertion that the expression always evaluates to `true`, but this is inexpressible in Java itself.

> You may be used to languages where there's a clear distinction between program properties and program code, but if you think about it, that distinction is artificial.

The distinction I made wasn't between program properties and program code, and in certain programming languages (say, Agda), propositions themselves can be normal code. But Java doesn't have propositions at all, so if you need to express propositions, you'll have to use an external logic.

> For me, a proposition about code is a set of TMs with a property (in particular, safety properties are a closed set of TMs, and liveness properties are dense etc. etc.).

This is just wrong, and it has nothing to do with types. Even a set theorist would disagree with you. A proposition is not a set in any conceivable setting.

> Its truth value is whether your program is in this set or not. To be more precise, a proposition to me is a sentence in any logic denoting that set.

The denotation of a proposition is always a truth value.

> Does the proposition "the program returns a list containing exactly the same elements in its input, each appearing the same number of times, only in sorted order" has a runtime computational relevance or not?

It does not. It's simply a fact about your program. Your program can't branch on it.

> TLA+ is not an external program logic, but a complete logic

It's external because it's not a part of Java or whatever other language you're using to write your actual program. And nothing prevents an external logic from being “complete” or whatever other desirable property you want - in fact, the very reason for using external program logics is to avoid the limitations of the language your program is written in!


> A proposition is not a set in any conceivable setting.

What you called "predicate" in your other comment, is a proposition in a logic that expresses machines in a certain way (where the object of the predicate, i.e. the machine, is implicit). TLA is a (linear) temporal logic, and so, like all temporal logics (at least those concerning computer programs) its structure is an abstract state machine (which, BTW, can also describe LC, either by choosing an evaluation strategy, in which case the machine is deterministic, or not -- treating it as an abstract rewriting system -- in which case the machine is nondeterministic).

> It's external because it's not a part of Java or whatever other language you're using to write your actual program.

Now you completely lost me. You said "In TLA+, or any other external program logic, the program itself is just a syntactic object". In TLA+, the only notion of "the program itself" is the TLA+ expression. Whether your TLA+ expression is automatically generated from Java or C code (as some research teams have done) or coded manually, the other representation is not an object in TLA+ at all. It's like saying that if I implement quicksort in Java and you implement it in C, then your program is a syntactic object of mine. I'm not sure what it means. When you code quicksort in TLA+, TLA+ is not aware in any shape or form of any other existing or possible representations.

Unrelated to your statement about external logics, it is obviously true that TLA+ is not used as a programming language (I guess it could be, but it would be extremely inefficient[1]) but a specification language. TLA+ is used across many companies in the industry to specify systems that are very large (the program I alone verified in a few months is many times bigger than seL4 or CompCert), and therefore full end-to-end language-level (or extracted) verification does not apply to them anyway. TLA+ provides a different level of guarantees, but also aims at a different scale of programs (Lamport: "The development of TLA+ was motivated by the needs of engineers building large systems, for which complete formal development is out of the question."[2])

[1]: Synchronous programming languages (like Esterel, or the modern Céu) share some similarities with TLA, and indeed Esterel and its descendants are used in the industry to write and verify safety-critical real-time systems (esp. avionics and other embedded software). However, in Esterel (unlike TLA+) the LTL propositions are indeed distinct from the program, and are therefore external.

[2]: http://research.microsoft.com/en-us/um/people/lamport/pubs/c...


Again, perhaps I've complicated simple things unnecessarily:

> A proposition is not a set in any conceivable setting.

A proposition can be seen as a predicate over the logic's structures, and therefore a set of structures. TLA is a logic whose structures are abstract state machines (and therefore the machine is implicit). Any proposition in TLA is therefore a predicate over programs, or a set of them, and is, therefore, runnable.

But since I've added this comment, let me give a specific example. The TLA+ statement `A ≜ x = 3 ∧ (x' = x)` says that the value of variable x in the machine's initial state is 3, and it never changes. It can also be viewed as the set of all programs with this behavior. If we add the following statement, `VARIABLES x`, we tell TLA+ that variable x is the only variable we care about (of all infinite variables), and at this point the proposition is runnable, because all (infinitely many) machines would appear to behave in the same way with regards to variable x, so we may as well imagine that x is the only variable. "Running" within this logic, then, means visiting all states of all possible machines where the proposition holds true.

Now, let's define proposition B: `B ≜ x = 3 ∧ y ∈ {-1, 1} ∧ (x' = x ∧ y' ∈ {-1, 1})`. This is a proposition about the behavior of two variables, but also a description of a nondeterministic program (in which at any state, y can take the value 1 or -1), which we can run (once we tell TLA+ `VARIABLES x, y`).

Now we can also write `B ⇒ A` -- we say B implements A -- (which is true) and semantically means that B is a simulation of A (or a trace-subset; I'm not sure, but I think it's the same as far as we're concerned), but this is just a restatement of treating `⇒` as the familiar implication connective within the modal (temporal) logic. "Running" `B ⇒ A` is the same as running B, but visiting all states where `B ⇒ A` is true, which is everywhere. Similarly, we can define `C ≜ x = 3 ∧ y' ∈ {-1, 1} ∧ (x' = x ∧ y' = -y)` which is still nondeterministic but now has only two possible behaviors, and it is true that `C ⇒ B`. If `D ≜ x = 3 ∧ y' = 1 ∧ (x' = x ∧ y' = -y)`, then D is now fully deterministic, and D ⇒ C (and therefore D ⇒ B and D ⇒ A).


It's more accurate to say that some types are propositions. More precisely, a type is a proposition if and only if any two inhabitants of it are equal.


> So you don't actually know how to do it, but assume that you can. This a little strange.

Also a little strange to follow this ticking off with a false claim "you probably want linear types in order to encode state machines". State machines certainly do not require linear types.


My claim is not false: state machines do not "require" linear types, but that is not what I said. I said linear types are what you want if you want to encode these things in a natural way.

Substructural (linear) types are ALL ABOUT encoding stateful propositions in the type system. Even if you had an expressive dependent type system, the best way to type check a stateful protocol would be to add substructural types to it!

See http://users.eecs.northwestern.edu/~jesse/pubs/dissertation/... for a good discussion of encoding stateful propositions with substructural (linear) type systems.


This is wrong. Linear types are about values having a single-threaded logical history. It's true that linear types allow you to soundly implement certain operations as in-place modifications of their inputs, because these inputs are guaranteed not to be used anymore by other computations. But that's an implementation detail. If you don't need to control the number of logical futures of your values, you don't need linear types.


> increased expressive power at the cost of delayed verification.

I don't see where the increased expressive power comes from. Dynamic checks are equally doable in a typed and an untyped setting.

But let's say it gives you more expressive power. It does so at the cost of no verification. Validation is a very poor substitute.

> Because the real answer is probably "it's too hard to do".

It's common usage of GADTs and DataKinds nowadays. Granted, strictly speaking not Haskell, but GHC Haskell.

> By the way, you probably want linear types in order to encode state machines in the type system in a natural way

You only need linear types if you need to enforce that (some) values will have a single logical future.


> Dynamic checks are equally doable in a typed and an untyped setting.

Yes, but specs are not "dynamic checks" but rather "static metadata". How they are checked depends on the tooling. It's true that a typed language could support specs, but those would either be an additional orthogonal mechanism (like Java and JML/C# and Spec#, or somewhat differently, Isabelle/HOL) or encoded in dependent types, which require the tooling to be built into the compiler, which would support either manual/interactive proofs (as in Agda/Coq respectively) or automatic inference (as in Liquid Haskell).

> It's common usage of GADTs and DataKinds nowadays. Granted, strictly speaking not Haskell, but GHC Haskell.

That's not the same level of expressivity, and certainly not the same level of effort.


> Yes, but specs are not "dynamic checks" but rather "static metadata".

It's the checking that matters, not the declaration of intentions. If the checking is dynamic, I'm calling the whole thing dynamic. And, because there exist hard limits on what can be inferred about programs, certain specification styles don't support static verification without additional programmer-supplied information.

> That's not the same level of expressivity, and certainly not the same level of effort.

Using GADTs parameterized by DataKinds to encode problem domain logic is by now a de facto standard practice in the Haskell community. It wouldn't have done so if it required an unreasonable amount of effort.


> If the checking is dynamic, I'm calling the whole thing dynamic.

Who says the checking is dynamic? This release contains several orthogonal capabilities: specs, which are a completely static language feature, and a few dynamic checking tools. Tomorrow they can release some static checking tools. Java's JML and C#'s Spec# are similar. They are static specs, but there are various tools for checking them, some static (using SMT solvers or model checkers), some dynamic (dynamic contracts), and some mixed (e.g. concolic tests).

> Using GADTs parameterized by DataKinds to encode problem domain logic is by now a de facto standard practice in the Haskell community. It wouldn't have done so if it required an unreasonable amount of effort.

First, I didn't say unreasonable, but I think it is still more effort. Second, without dependent types (which require proof, either manual -- which is a lot of effort -- or automatic -- which is rather limited), you can't nearly express the same propositions.

The advantages and disadvantages of types do not stem from them expressing static propositions -- there are many ways of doing that -- but from them expressing propositions in a certain way that influences how the program is organized and how propositions propagate throughout the program.

Whether propositions about code are best expressed as types or as data-level propositions is an interesting debate, but a rather subtle one; it is certainly not binary. IMO, it is likely that some propositions are best expressed as types while others are best expressed differently.


> Tomorrow they can release some static checking tools. Java's JML and C#'s Spec# are similar.

If, in order to check a program, you need to supply additional annotations, you aren't checking the original program - you are checking the extended annotated program. So let's not conflate these two languages:

(0) The whole of Java, and,

(1) The collection of Java programs, equipped with JML annotations, that can be statically verified by some given tool.

If the annotations express facts inexpressible in Java's type system, and tool performs a sound analysis, then you can prove more safety theorems about the second language. (And if the analysis is unsound, why bother with it?) That both languages happen to have mostly the same surface syntax and dynamics is best regarded as a happy accident.

> First, I didn't say unreasonable, but I think it is still more effort.

How so? All you have to do is think what properties you want to hold for your system. But this is something you have to do whether you encode these properties in types or not.

> Second, without dependent types (which require proof, either manual -- which is a lot of effort -- or automatic -- which is rather limited), you can't nearly express the same propositions.

I never said otherwise. I only said “using GADTs and DataKinds to express problem domain constraints is a de facto standard practice”.

> The advantages and disadvantages of types do not stem from them expressing static propositions -- there are many ways of doing that

Merely expressing static propositions is boring. What really matters is proving them.

> but from them expressing propositions in a certain way

In a compositional way: the types of expressions can be synthesized from those of their subexpressions, and the types of defined symbols can be recovered from their definitions' right-hand sides.

> that influences how the program is organized and how propositions propagate throughout the program.

In a modular way: when checking a module Foo, you won't observe a failure of an invariant internal to another module Bar.

> Whether propositions about code are best expressed as types or as data-level propositions is an interesting debate, but a rather subtle one; it is certainly not binary.

I can see legitimate use cases for expressing propositions as types (most of the time), as well as legitimate use cases for not bothering to represent propositions in a mechanically verifiable manner (when the proof is obvious, only needed in a very small part of the code, and explicit annotations would only add noise). But I can't see a use case for expressing yet not proving nontrivial propositions that you expect to hold. If it isn't trivial, you need a proof.

> IMO, it is likely that some propositions are best expressed as types while others are best expressed differently.

Whatever makes the proof go through the easiest.


> If, in order to check a program, you need to supply additional annotations, you aren't checking the original program - you are checking the extended annotated program.

I don't understand what you're saying. The annotations express propositions about the code, i.e. the original program. You could say that typing is also "supplying additional annotations" in a type-level language. I don't understand your definition of "language" in this case.

> All you have to do is think what properties you want to hold for your system. But this is something you have to do whether you encode these properties in types or not.

Not quite because 1/ there are very few properties of programs you can express without dependent types, and 2/ the language used to express the properties is usually very different from the data-level language (except in some languages with dependent types, but those have their own many drawbacks) and often requires clever tricks and thought not about the proposition but about how to express it.

> Merely expressing static propositions is boring. What really matters is proving them.

What matters is verifying them to the possible or desirable degree. Proving most interesting propositions may often be too time-consuming or downright unfeasible.

> In a compositional way: the types of expressions can be synthesized from those of their subexpressions, and the types of defined symbols can be recovered from their definitions' right-hand sides.

This is true for most ways of expressing propositions and is not unique to types. In fact, type inference is more limited than "spec inference", because every expression usually belongs to a single type only (assuming no subtyping). So, e.g., from the expression `x = 3` you can infer only the type judgment `x:Int`, but you can infer many relevant propositions, like `x ∈ Int`, `x ∈ Nat`, `x ∈ {1, 2, 3}`, `x < 50` and `x = 3`. This can be both an upside or a downside for either system.

What is unique to types is indeed some level of modularity (on the one hand, and lack of modularity on the other -- it's complicated). Also, types, because they are a syntactic construct, enforce certain structure on the code which may be beneficial (or not).

> In a modular way: when checking a module Foo, you won't observe a failure of an invariant internal to another module Bar.

This is not so simple. You don't observe a failure provided that the type of Bar fully expresses every property that may affect Foo. I.e., Foo is not allowed to assume anything about Bar other than what is expressed in the type -- that's OK -- but it can't even inquire about it (and let the checker tell you whether the answer to your inquiry is true or false). For that to be truly modular, the author of Bar has to encode every potentially interesting property of Bar, and none of the properties that may change due to Bar's implementation; this is very hard to do right. At the extreme, your type must encode the full denotational semantics of Bar, making none of it internal, on the other hand, this allows the author of Bar to cleanly hide some properties (OTOH, hiding properties is also possibly in untyped specs). This is, again, both an advantage and a disadvantage. Overall, once interesting properties are expressed (through dependent types), I don't think types have a clear modularity advantage anymore.

Another problem with types is that a type is more than a proposition: it is a specific construction of the proposition. OTOH, when you state propositions directly, the expressions `x ∈ Nat ∧ x < 3`, `x ∈ {0, 1, 2}`, `x ∈ Int ∧ 0 ≤ x ≤ 2` all denote the same proposition; in that respect they are more modular than types (I believe, though I may be wrong -- I'm far from a type expert -- that when using dependent types you may need to take into account definitional equality which makes this extensional equivalence tricky).

To summarize, I think that even among type enthusiasts, the question of whether types are always the best way to express (static) propositions about code is an open one. Too little is known about how well either approach works well in practice.

> I can see legitimate use cases for expressing propositions as types (most of the time),

In practice, though, this is rarely the case. Even those few programs that are fully verified deductively usually use a language like Isabelle/HOL, where most interesting propositions are not expressed as types. AFAIK, there has only been a single nontrivial (though still rather small) real-world program ever written that uses types to express all interesting properties (CompCert) and even Xavier Leroy didn't bother proving termination (he says it was too hard/tedious, so he just put a counter, and emits a runtime error if it runs out).

> But I can't see a use case for expressing yet not proving nontrivial propositions that you expect to hold. If it isn't trivial, you need a proof.

What do you mean by "need"? And what do you mean by "proof"? Obviously you don't need formal proof because 99.999% of software written don't prove their interesting, non-trivial properties. Sometimes you want or need a certain level of confidence. That confidence can be provided by a manual or automatic deductive proof, an automatic model-theoretic proof, or something less than proof, like auto-generated tests.

To summarize, I think that even among type enthusiasts, the question of whether types are always the best way to express (static) propositions about code is an open one. Too little is known about how well either approach works in practice.


P.S.

As to observing failure, it's even more complicated than just a matter of which proposition to express in types -- which may be hard in and of itself -- but also some failure modes may be very hard to express using types altogether because they don't naturally lend themselves to denotational semantics (e.g. time and space complexity). Not that those are easy to express in Clojure's spec, and not that they are necessarily a problem with types, only with typed LC, but still (they're easy to express in TLA+).


> I don't understand what you're saying. The annotations express propositions about the code, i.e. the original program.

If you add JML annotations to a Java program, one point of view is that the annotations are external and you're just proving properties about the original unannotated program. I contend that a more useful point of view is that you actually have written a program in another language that happens to be a subset of Java. The safety theorem for the sublanguage rules out more ways of “going wrong” than Java's safety theorem. This is consistent with the fact you wouldn't accept the program if the supposedly “external” SAT solver or type checker or whatever rejected it - you'd check why the tool rejected it.

> You could say that typing is also "supplying additional annotations" in a type-level language.

If you remove the type level from, say, ML, what you get isn't ML anymore. Actually, it isn't even a well-defined language, since ML's dynamic semantics only assigns a meaning to well-typed programs. But, even if you extend ML-minus-types' dynamic semantics to give a meaning to every well-formed syntax tree, its safety theorem will rule out a different set of misbehaviors than ML's.

> I don't understand your definition of "language" in this case.

A programming language is a possibly infinite collection of programs, and is generated from three finite components: a grammar, a static semantics and a dynamic semantics. The static semantics determines which syntax trees are considered valid programs, and the dynamic semantics assigns a meaning to every valid program. I didn't expect to have to provide such an obvious definition here.

> Not quite because 1/ there are very few properties of programs you can express without dependent types.

Plain parametric polymorphism and substructural types can enforce security policies (who gets to manipulate certain kinds of data and how often) and conformance to communication finite-state protocols. GADTs and DataKinds can extend this to protocols with an unbounded number of states, as well as enforce invariants of data structures. (GADTs probably count as a limited form of dependent types, though.)

> and 2/ the language used to express the properties is usually very different from the data-level language (except in some languages with dependent types, but those have their own many drawbacks)

That's a limitation of your tooling.

> and often requires clever tricks and thought not about the proposition but about how to express it.

In what meaningful sense can you claim to have a proposition if you don't know how to express it?

> So, e.g., from the expression `x = 3` you can infer only the type judgment `x:Int`

What you can infer from the definition `x = 3` varies from one language to another. While ML will indeed only infer `x: int`, this is just because ML's static semantics lacks a notion of definitional equality for term-level identifiers.

> Foo is not allowed to assume anything about Bar other than what is expressed in the type -- that's OK -- but it can't even inquire about it (and let the checker tell you whether the answer to your inquiry is true or false). For that to be truly modular, the author of Bar has to encode every potentially interesting property of Bar, and nonproper e of the properties that may change due to Bar's implementation.

Again, depends on your type system. With dependent types, `Bar` can have a rather uninformative type (say, `int`), and separately but in the same context, you can provide a witness `p` of some useful property of `Bar` (say, that it is prime). If the language has a module system, this lets you control how much knowlege you give others about `Bar`. It isn't true that you need to cram absolutely everything you want others to know about `Bar` in its type.


> I didn't expect to have to provide such an obvious definition here.

What I meant was that it seems to matter to you what name you give your language. What difference does it make whether I call my program's language "Java" or "Java+JML"? In the case of Lisps, like Clojure, the distinction seems doubly superficial.

> Plain parametric polymorphism and substructural types can...

Yes. In general, the expressive power of non-dependent types is that of an FSM (or a regular expression, if you will), with the number of states equal to the number of concrete types (i.e, all combinations of generic type instances). The question is how much that limited power affects program quality in practice. This is an empirical question with unclear answers.

Don't get me wrong: I don't doubt that type systems can have great benefits. I'm just saying that the way those benefits express themselves, the "best" type system strength (and the point of diminishing or negative returns), and the interaction with verification tools are all very much open problems. My personal, very biased, mostly aesthetic preference -- for a programming language -- is for a relatively simple type systems, possibly even unsound, maybe in combination with a bit of cleverness in the form of linear types (again, possibly unsound), plus specifications expressed as direct, "data-level" logic (even as mere syntax; no need for TLA+-style complete logic/program unification). For full formal verification of high-level specifications, I think TLA+ is as close to a perfect language as anyone has ever done.

> In what meaningful sense can you claim to have a proposition if you don't know how to express it?

In a sense similar to the fact that I could express every context-free grammar on a machine with finite memory as a regular expression, but doing so may be hard.

> It isn't true that you need to cram absolutely everything you want others to know about `Bar` in its type.

Fair enough; I stand corrected. Do you have examples? Like I said, I'm far from a type expert (I just know a bit of the theory), though I do a fair bit of formal verification (nowadays mostly untyped logic).


> What I meant was that it seems to matter to you what name you give your language.

I don't care about the name: you could call both languages “Java” (just like two different people could be called “John”), as long as we're all aware that they're two languages and it's clear from context which one we're talking about.

> Do you have examples?

    let x: int = 97
    let p: is_prime x = <proof-that-97-isn't-divisi...>


That's not what I meant. Can x be a function, and can you then prove a proposition about x in a different module/in the same module in a different definition? E.g., can x be a sorting function whose type specifies that the result is sorted, and can you then have another proposition specifying that the result is also stable?


> That's not what I meant. Can x be a function, and can you then prove a proposition about x in a different module/in the same module in a different definition?

Well, in my previous example I used an `int`, but there's no fundamental reason why you couldn't use a function (or any other type) instead.

    let f : list int -> list int = <mergesort>
    let p : (forall x. is_sorted (f x)) = <proof>
In this example, `f` itself doesn't have a dependent type, and everything I want to prove about it is proved separately.

But you can also do:

    let f : list (int * int) -> sorted_list (int * int) = <mergesort-sorting-by-first-int>
    let p : is_stable f = <proof-that-it-respects-the-relative-ordering-of-second-ints-amongst-entries-with-equal-first-ints>
In this example, `f` contains a baked-in proof of one fact (that it sorts), but there's another fact that's proved separately (that it's stable).

In general, the proof `p` needs access to the definition of `f`. So if `f` is held abstract (its definition isn't exposed to other modules), `p` has to be in the same module as `f`.

In most programming languages, with respect to modularity, you only get two options:

(0) Keep a value private. (Not expose it.)

(1) Expose the existence of a value, indicating its type.

Dependently typed languages give you a third option:

(2) Expose the full definition of a value. As in, let developers of other modules know that `x` isn't just “a” function (or int or whatever) - it's this particular function (or int or whatever).


Got it, thanks! See my other comment explaining how programs and propositions about programs are the same thing (and are artificially separated by the designer of the language).


> They are "reinventing a run-time type checker" and prefer it that way because it allows increased expressive power at the cost of delayed verification.

Well, to me "delayed verification" is worth nothing. There is an almost unbounded infinity of program states. I want proof before I run the program.

(That's, like, the point, man! I'm not going to send up a space shuttle and just count on "oh, well, our error recovery is great and we saw no errors during testing anyway". To quote a great comedian: "Are you having a laugh?")

> So you don't actually know how to do it, but assume that you can.

Oh, come on, be a little more charitable. I don't know, but people in the DT community know and will tell you... at length how to do it.

> By the way, you probably want linear types in order to encode state machines in the type system in a natural way, which Haskell does not have.

Indeed it does not... but I must say I don't miss them that much for all practical purposes. If what you want to do is encoding state machines and prove things about said machines, then you probably want an even more specialized language. I'm OK with that.

I'm not OK with saying: "Can't be done in language X, therefore all effort at static verification is futile."

So there.


> I want proof before I run the program.

Except that for most interesting propositions you can't have proof. You need to show that the crude propositions that your type system can prove are what makes all the difference. I'm not so sure that's the case.


Well, what's "most interesting"?

That me there's a lot interesting stuff that can automatically get proven by the compiler given a little effort on your part. It may not be the most interesting, but that doesn't make it worthless, does it?


> So you don't actually know how to do it, but assume that you can. This a little strange.

Not all that strange. I trust the members of the Algorithms Research community when they tell me it's possible to test a number for primality in polynomial time. I have no idea how to do it.


AKS is the name of that algorithm, by the way, after the initials of the authors. The paper is called "PRIMES in P".

https://en.wikipedia.org/wiki/AKS_primality_test


See e.g. https://hackage.haskell.org/package/foldl-transduce

(That's just a random package from a 5 second Google search -- I think there are several at this point. Some of them even generalize transducers, I believe.)

EDIT: Thanks for the link, btw.


To me, his comments on transducers & types sounded more like: I don't care to write a type signature for this, and good luck to you if you do.


In one of his posts he actually provided Haskell-like type signatures (and stated why they weren't sufficient), so that's my basis for concluding that he thought about it in terms of types.

(Perphaps my memory is fuzzy, but I'm 99% sure on this point.)


He did provide them but didn't do a very good job, tbh. Types capture the behavior you want to express in your program and Rich ultimately concluded that types could no achieve this for transducers. On the other hand, many libraries since then have generally shown that types can and do... It's just that Rich's types didn't.


> He did provide them but didn't do a very good job, tbh.

Well, yeah, that was what I pointed out in my original post :).

I think we agree.


I don't think he's trolling, he's proposing spec due to issues he personally sees in type systems whilst acknowledging that clojure without types or spec or whatever is definitely missing "something" ... it's just that something doesn't have to be types. Given his comments here he seems well aware that not everyone shares his views on type systems and that is fine: https://groups.google.com/forum/#!msg/clojure/DUKeo7sT4qA/TU...


Yes, especially since there's no reason you can't have contracts in a language with a type system.


> or has thought for a few moments

As is obvious even to a child...


Could someone explain what the paragraphs under "Map specs should be of keysets only" are getting at?

I'm trying to map it onto my experience with Clojure and plumatic/schema, but nothing is clicking.


Generally in most of the existing libs like Schema you specify that a "person entity" is a map that has particular attributes that have particular properties and a "company entity" is a map that has other attributes (some of which might overlap).

In spec, you specify that a first-name attribute is a string and an email attribute matches some regex. Those are specifications about the attributes and can be used across many entity types. A person entity spec is then defined as a map that contains a set of required or optional attributes. A company entity spec is a different map with different attributes, some of which might be shared with a person (like email address).

This is in many ways a shift in perspective, allowing you to start defining and reusing attribute specs to a much greater degree.

The use of structural systems with mathematical properties (regular expressions for sequence structure and sets for map keys) turns out to have many useful effects in computing error sets, etc.


The division -- between predicates for attribute values vs. specs for maps/seqs -- is interesting because AFAICT Datomic schemas ~= the former, while leaving the latter to your app? (Although I guess :db.cardinality/many straddles that line.)


thanks for this excellent explanation; this provides just about all the justification I need. looking forward to toying around with this.


Here's my understanding...

In Schema, you would represent a map's spec like this:

    (def my-map {:a s/Str :b s/Int})
In spec, you'd do it like this:

    (s/def ::a string?)
    (s/def ::b integer?)
    (s/def ::my-map (s/keys :req [::a ::b]))
In other words, the values are specified separately from the keys. This promotes reuse of specs, which is important especially as they get more complex.


If you have a function that accepts a map of a certain shape and want to write a specification for that map, you should only be specifying the required/optional keys in that spec. You should not be forced to also specify the values for those keys. When you are (as in Schema) then you wind up with redundancies when you have a slightly different map structure that also includes one or more (but not all) of the same keys with the same semantic meaning.


I think that (spec/keys ...) specifies the possible keys of a map, i.e., required and optional keys. The specification of what values these keys take are external to the specification of which keys go into a map. So somewhere else you will have to specify the valid value(s) for some (namespaced) key.


And that's the problem. With Schema you can compound key specs and values. This mashes the code self documenting. With Specs you have to point to another spec. Ask the way down the complex tree.


Those regular expressions for sequences seem related to "Efficient dynamic type checking of heterogeneous sequences", or maybe some other prior work I don't know about.

See https://github.com/jimka2001/regular-type-expression and http://www.lrde.epita.fr/dload/papers/newton.16.rte.report.p...



I enjoy using racket and contracts make it quite enjoyable to use - error messages in particular are much improved by their presence. Looking forward to this!


If you are new to using a snapshot of clojure (like me), you can find some helpful information here: http://dev.clojure.org/display/community/Maven+Settings+and+...

tl;dr

  In a Leiningen project.clj file:
  :repositories {"sonatype-oss-public" "https://oss.sonatype.org/content/groups/public/"}

  Also, in that lein project file, reference clojure as
  [org.clojure/clojure "1.9.0-master-SNAPSHOT"] (1.9.0 being the dev head right now)


This won't work with Cider btw. I had to change it to 1.9.0-SNAPSHOT in pom.xml before a mvn install.

  error in process filter: version-to-list: Invalid version syntax: '1.9.0-master-SNAPSHOT'
  error in process filter: Invalid version syntax: '1.9.0-master-SNAPSHOT'


Interesting. It works for me, but I'm using a dev snapshot of cider too.

Having -master- in there is pretty non standard, but I was poking in oss.sonatype.org and that's what the jar looked like in there. Maybe I goofed. There is a little bit of magic that happens with maven and snapshots (magic to me anyway).


Excellent. This definitely looks to be an improvement over schema, which I use but don't particularly like. Though, I'm fairly new to Clojure (about a month or so), so it may just be my inexperience :)


> Defining specifications of every subset/union/intersection, and then redundantly stating the semantic of each key is both an antipattern and unworkable in the most dynamic cases.

This is the reason I haven't jumped on the Schema bandwagon. Surprised to see I'm not the only one who had reservations about it.

> Finally, in all languages, dynamic or not, tests are essential to quality. Too many critical properties are not captured by common type systems.

Glad to see some acknowledgement of the need for at least some tests from Rich, who previously critized TDD, saying that you don't keep a car on the road by banging into the guardrails, and advocated H(ammock)DD.

> But manual testing has a very low effectiveness/effort ratio. Property-based, generative testing, as implemented for Clojure in test.check, has proved to be far more powerful than manually written tests.

(Assuming by manual testing he means manually writing automated tests,) I can't help but wonder if he came to that conclusion by writing the wrong tests and, naturally, finding them unhelpful.

> Enable and start a dialog about semantic change and compatibility

Okay, slightly rant-y tangent, but out of all the fluff phrases and buzzwords I've ever heard, "start a dialog" is probably the worst offender, and I wish it was wiped from the English language.

> I hope you find spec useful and powerful.

Looks useful so far. But where's the Github repo? Or is it coming in the next version of Clojure? Or, like, how/when can we use it?


Rich was never against testing, he was against TEST DRIVEN development, and thats what TDD stands for.


Bits will be coming into the Clojure repo today and we'll have an alpha release in the next couple days along with a lot more info.


>of all the fluff phrases

You're forgetting "it is worth noting", "it is important to note," "it should be noted that," "I should note," and their many derivatives which serve no purpose and are nothing but filler. [0]

[0] https://en.wikipedia.org/wiki/Wikipedia:It_should_be_noted


This is awesome. Is it available for clojurescript too ?


From Alex Miller via slack:

> will be in cljs as well, but not on first release

> most of the code should work as is, most of the tricky bits are around vars


>The basic idea is that specs are nothing more than a logical composition of predicates.

This reminds me of this technique:

https://github.com/astoeckley/Eat-Static#going-further


I'm confused what a function spec would look like using fdef (etc). Can someone write an example?

If in Haskell we have a type signature

f :: Type1 -> Type2

...what will a clojure function spec roughly look like?


  (spec/fdef clojure.core/range
    :args (spec/alt
            :infinite (spec/cat)
            :range-0  (spec/cat :end number?)
            :range    (spec/cat :start number? :end number?)
            :step     (spec/cat :start number? :end number? :step number?))
    :ret #(instance? clojure.lang.Seqable %))```


Interesting. I don't know what to think. It's a lot to read. But in some ways adds more information than types (you can make reference to values). Maybe you could make it more concise with more definitions? For example:

    (def UnboundedInterval (spec/cat))
    (def PositiveUnboundedInterval (spec/cat :end number?))
    (def ClosedInterval (spec/cat :start number? :end number?))
    (def StepClosedInterval (spec/cat :start number? :end number? :step number?))
    (def SeqableInstance #(instance? clojure.lang.Seqable %))
And then:

    (spec/fdef clojure.core/range
        :args (spec/alt
                :infinite UnboundedInterval
                :range-0  PositiveUnboundedInterval
                :range    ClosedInterval
                :step     StepClosedInterval
        :ret SeqableInstance)
Perhaps this would be non-idiomatic, and perhaps I'm trying to make this into something that looks like a static type annotation when it isn't.


> (you can make reference to values)

Dependent types let you do just that as well. While theoretical computer scientists for the most part focus on dependently typed systems capable of expressing and proving any mathematical proposition (which obviously results in increased complexity), there exist more pragmatic variants of dependent types aimed at solving the everyday needs of programmers, with a minimum of annotation burden, see: http://hackage.haskell.org/package/liquidhaskell


Looks sane, specially because you can (s/and ...) the specs so you should have reusable specs in some common namespace.


Here's another example showing how you can use the :fn spec to relate the :args and :ret values in arbitrary ways.

  (defn title-case
    "Capitalizes the first letter of each word"
    [s]
    (str/replace s #"(\w+)" (comp str/capitalize second)))
  
  (title-case "moving pictures")   ;; => "Moving Pictures"
  
  (spec/fdef title-case
    :args (spec/cat :s string?)
    :ret string?
    :fn #(= (-> % :args :s str/capitalize) (-> % :ret str/capitalize)))


clojure.spec is available in 1.9.1/alpha1 via [org.clojure/clojure "1.9.0-alpha1"] in your project.clj.

There is an excellent guide available here: http://clojure.org/guides/spec


Will there be performance hits to using this?


This site could really benefit from a custom stylesheet for printing. Having to delete lots of nodes with devtools just to get a printable doc.


Noted. Won't get to it anytime soon though. https://github.com/clojure/clojure-site/issues/99


I'm waiting for the day somebody with write a Clojure transpiler for a non-Lispy syntax! Although I love Lisp (I used to use it 20+ years ago daily), but the year is 2016!


Not sure what the year has to do with anything, but most things in computer science were invented in the sixties and the seventies or even earlier and since then we've only been refining them. In terms of programming languages, the theoretical breakthroughs have been in terms of static typing, even though note that even the Hindley-Milner type system has been first described in 1969.

In other words, strange thing to complain about a syntax that has stood the test of time. After all, most mainstream languages (C, C++, Simula, Pascal, Java, etc) are Algol-based, a language that first appeared in 1958. And no, you can't have LISP without its syntax, because that's what's giving it power. The "code is data" mantra only works because Lisp is homoiconic, because it basically has no meaningful syntax other than syntax for describing its primitive data-structures, which makes its macros first class and its "eval" natural.

But in the LISP family Clojure is actually quite modern. It runs on both the JVM and Javascript engines. It does have extra syntax for describing sets and maps. Its data-structures are immutable and it encourages immutability to a far larger degree than other LISPs. But it also has very good interoperability with the underlying platform.


> And no, you can't have LISP without its syntax, because that's what's giving it power. The "code is data" mantra only works because Lisp is homoiconic

No. Homoiconicity is nice for "code is data" and macros, bot not necessary. Being able to reify and generate ASTs is sufficient. See for example Dylan, Terra/Lua, Scala LMS, MetaML, or Julia's or Rust's macros.


Note sure what you're arguing against. I stand by my claim.

I'm actively working with Scala's macros and played with Scala LMS. In fact working with those convinced me of how much all LISP alternatives suck for expressing macros.

Do you know why? Because in these languages macros are not first class, hence (1) they tend to be an afterthought, (2) hard to use and (3) filled with bugs because, wouldn't you know, they are targeting "library authors" and not users.

Scala's macros for example are exposing the compiler's internals. And boy, I can tell you stories. Like how it wants an "untypecheck" call on the AST if you modify it, because those ASTs happen to be mutable (the horror), but then "untypecheck" has a bug in it, crashing the compiler if you have an implicit "unapply" in a pattern match, so you have to rewrite your AST first to get rid of implicit "unapply" calls to work around it. Behold Sincron, an otherwise simplistic library, for which it took me a whole month to achieve inlining Function0 literal arguments, with help from others and copy/pasted code, between cries that I crashed the compiler, again and again: https://sincron.org - and btw, this rewrite for getting rid of unapply, this was the still-dangerous shortcut I took, because the general consensus at this point is that if you want to workaround the bugs of "untypecheck", you need to fix that AST manually.

You can't blame Scala too much really. This is definitely preferable to hacking your own compiler plugin or to forking the compiler. And eventually beautiful solutions can emerge from that.

The alternative to something like this is to expose a limited form of macros. For example F# quotations, or Rust's macros, since you mentioned those. The problem with these macro systems is that they are very limited, only applicable to a narrow set of use-cases. Sorry, but I personally think Rust's macros are next to useless as they needlessly complicate the language without that big of a gain. And the irony is that macros are basically used for error-handling in Rust. You know, if they added higher-kinded types, there wouldn't have been such strong demand for those macros in the first place.

LISP is the only language (family) exposing macros to users.


The standard library contains many macros that are not about error handling: http://doc.rust-lang.org/stable/std/#macros Only try! is, specifically. vec!, write!, panic!, and println! are also very popular.


It's just a research language (as far as I know, at least), but Honu[1] has support for syntax-case style macros.

[1] https://www.cs.utah.edu/~rafkind/papers/honu-2012.pdf


Don't know about the others, but Julia is homoiconic.


I wish Julia the best after 7 years of having to use Matlab, but despite its metaprogramming facilities, it's not actually homoiconic. It appears enough people pointed this out that they finally stopped using the word on their website.


"Like Lisp, Julia represents its own code as a data structure of the language itself. Since code is represented by objects that can be created and manipulated from within the language, it is possible for a program to transform and generate its own code. This allows sophisticated code generation without extra build steps, and also allows true Lisp-style macros operating at the level of abstract syntax trees." (http://docs.julialang.org/en/stable/manual/metaprogramming/#...)

Doesn't this qualify? I may certainly be missing something in my understanding of what "homoiconic" means.


Homoiconic means that the code structure is similar to the actual AST. In Lisps, this is easy to see, because a nested list of symbols is pretty much the definition of an AST. In truth, real homoiconicity is hard to achieve in any language that doesn't represent code as trees/S-expressions.

In Julia, they built a data structure that can represent code, and can be accessed via reflection, but it's no more homoiconic than most languages (note that "homoiconic" is nowhere on the page you linked to). In Lisps, the code is just nested lists, like other list data, and can be manipulated using the all of the standard list-processing functions. Whereas, if you look at the examples in Julia, you'll see that using macros vs functions involves special syntax all over the place.

A good page for this is http://c2.com/cgi/wiki?HomoiconicLanguages


Usually I avoid splitting hairs, but since the terminology bikeshed is all about that, let me make a little correction. In julia, there is a data structure called `Expr`, which is what's used internally by the compiler to represent code and is exposed to julia. It's not a list, it's a tree, but the power of being able to access it like any other tree still remains. Of course julia's surface syntax is significantly more complicated than lisp, so macros are more complicated.


I see - you're exactly right. Thanks for the clear explanation.


By the way, it's now "Lisp", not "LISP".

I love the original Lisp, because it was simple Syntax was so suitable for AI and treating code as data, but Clojure is not as simple syntax as Lisp! So, it's both complicated and unusual to the nowaday developer!


For those who find all of those parenthesis off-putting: http://clochure.org/


I suspect that if you think parentheses are the problem, they are not the problem.


Yeah, I actually love the parentheses!


What syntax do you feel would be more expressive and as versatile, while not losing macros?


Well, Rust has macros [0], too!

[0]: https://doc.rust-lang.org/book/macros.html


Rust has tons of problems with macros and they are still not fully in the language (last I checked). Rust macros actually do go back to Lisp. Specifically the Dylan language, a Lisp once envisioned as the equivalent of modern Objective C by Apple. Their are a number of papers from that period that you can read.

Macro systems are not all the same, they don't all have the same power. Clojure uses a very powerful form that is not as safe. Their is a lot of modern research in Racket.

A extremely powerful macro in a language with syntax was specified but never implemented by David Moon, you can find it here: http://users.rcn.com/david-moon/PLOT/


"Macros" can be an overloaded term in Rust. Specifically, "macros" mean "macro_rules!", which is very much in the language. Compiler extensions, which are sometimes called "procedural macros", are still unstable.


How could you implement macros in any sane way without s-exps? You need an AST and the hoops involved if you didn't have something like lisp would be terrible. Imagine running a macro over python or C code.


Elixir has an interesting way of approaching the problem. http://elixir-lang.org/getting-started/meta/macros.html


But this introduces additional ast format. Something in between simplicity of the syntax and complexity of the real ast. In lisp like languages you are working with syntactic forms from the language. Seems like extra overhead without any gain.


I would personally rather optimize for the convenience of reading the language than writing the macro, but I understand the tradeoff.

(And though I came to S-exprs early in my learning, algol-style still is easier for me to parse and read.)


And this is precisely the point of divergence, as "readable" understandably means different things to different people.

Personally I find Clojure very readable, but of course I don't mean that in some kind of objective sense of the word.


Coming from a Clojure background, "extra overhead without any gain" is more or less the definition of Elixir.


Isn't the Erlang VM and ecosystem sort of the selling point of Elixir? Not that I'm particularly well versed in the specific strengths and weaknesses of that platform.


Yes, but you can have that with Erlang as well and I think this is exactly what grandparent was alluding to.


Runtime overhead possibly. Developer productivity is likely increased. Hard to measure that though.


Usually people do the opposite — write transpiler from lispy syntax to some language's syntax. For example: lfe, hy, to some extent clojurescript.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: