Hacker News new | comments | show | ask | jobs | submit login
The Ergonomics of Type Checking (2016) (silvrback.com)
69 points by adamnemecek 7 months ago | hide | past | web | favorite | 93 comments

The type checker has questions, and it is adamant about them, refusing to let you move forward until they are answered. Often, I’m ok with that. They are good questions, and most of the time they really should be answered. The question is: when?

This is a good point. The poster then proposes being able to nail down your types progressively -- which I think is fine as a long as you don't let the iteration cycle get too long.

Optionally typed languages are one way we approach this ideal. Another way is linters that are stricter than the compiler itself. You must run the compiler to have testable code, but you can run the linter later, perhaps only as part of CI.

It is possible that in future, both of these approaches will look like primitive exteremes of some richer process progressively nailing down the static safety of a program.

Haskell has a flag called -fdefer-type-errors (https://downloads.haskell.org/~ghc/7.6.1/docs/html/users_gui...) that gets you pretty close to the experience of ignoring errors you don't care about right now.

You can do something similar with Eclipse's compiler for Java, JDT, which features [1]:

An incremental Java compiler. Implemented as an Eclipse builder, it is based on technology evolved from VisualAge for Java compiler. In particular, it allows to run and debug code which still contains unresolved errors.

That means that you can write something like:

    private void describe(String string) {
        if (string.isEmpty()) empty();
        else nonempty(string);
    private void empty() {
    private void nonempty(String string) {
        System.out.println("contains " + string.size() + " characters");

Then you can still run the code and call describe, but if execution reaches nonempty, you get a runtime error telling you about the compilation problem.

It's one of the few things from Eclipse i miss when using IntelliJ.

[1] http://www.eclipse.org/jdt/core/

I can't see a problem with nonempty, am I stupid?

A `String` in Java has a `.length()`, not a `.size()`.

Apologies! I meant to include a comment pointing out the error (as Thiez writes, it should be length, not size), but that somehow didn't make it into the comment as published. I blame my sub-editors, can't find the staff these days.

And this has been working for a decade. It's one of the reasons I laugh in their face when a JS developer tries to sell his babel reload crap as progressive.

> I often wish I worked in a language where I could start dynamic and end statically... I can imagine writing the core of an algorithm and then gradually tightening the conditions as it comes into focus. More than a tool, this would be a practice: progressive type elaboration.

Bingo. I love this idea. I haven't tried it yet myself, but I think mypy [1] from Dropbox makes this a reality today in the Python world.

[1]: http://mypy-lang.org/

I believe it's called Common Lisp, specifically SBCL :)

Live the dream :P

Came here to mention MyPy and optional typing, the preferable term Gradual Typing means something distinctly different.


Dropbox employs the creator of MyPy but it isn't a creation of dropbox.

Type Systems as Macros http://www.ccs.neu.edu/home/stchang/pubs/ckg-popl2017.pdf

Pluggable Type Systems http://bracha.org/pluggableTypesPosition.pdf

Propositions as Types http://homepages.inf.ed.ac.uk/wadler/papers/propositions-as-...


> Came here to mention MyPy and optional typing, the preferable term Gradual Typing means something distinctly different.

Are you sure that gradual typing is different than what Mypy supports?

The Wikipedia page for gradual typing cites mypy as an example.


The Mypy docstrings specifically state:

> Mypy has a powerful type system with features such as type inference, gradual typing, generics and union types.


The section on gradual typing in PEP 483 seems consistent with this as well.


Do they really insert extra dynamic checks?

Is Sound Gradual Typing Dead? https://news.ycombinator.com/item?id=11041272


It was my understanding that MyPy was purely compile time with no runtime checks. But I could be wrong. Gradual Typing as implemented in the original paper would lead to much much slower code. The weight of a static language bolted onto a dynamic runtime for the worst of both possible worlds. Sufficiently abstract Python code is almost un-typable.

I think eventually we will get there, where one can explore a program in a dynamic language and ship a correctly statically typed one. Types are only the beginning.

I believe you're right that Mypy only offers compile time type checking, but not runtime.

It looks like there are some active projects for runtime type checking in Python and while I don't believe Mypy is related to any of them, enforce [1] looks to also use the same standard type hinting syntax from PEP 484.

[1]: https://github.com/RussBaz/enforce

General question for those more experienced than I am with statically typed languages: would Clojure's 'spec' or Elixir's 'dialyzer', or perhaps TypeScript, fit the bill? If not, what's the downside between these what appear to me 'gradually typed' languages/approaches?

To provide some context: my first experience with static (or rather explicit?) typing was TypeScript, and I really liked what I saw.

That's how I write in Julia. Start with an algorithm with no types and eventually introduce type contraints everywhere, and even adding new types.

I know it's not a favorite lang around here but this is exactly what modern php is.

How exactly does modern php (not Hack or another vm) let you "start dynamic and end statically"?


is checking and throwing type errors at runtime, not compile time.

Hate to be that guy but, Flow for JS is something like that. You can optionally enable it and even enable/disable it in certain files, functions, lines, etc.

The author's ideal language seems to also describe TypeScript https://eng.lyft.com/typescript-at-lyft-64f0702346ea

That is optional typing. The author is talking about gradual or progressive typing.

But he mentions he wishes he could start dynamically and end statically. By enabling Flow at the end, don't you basically achieve that? You can also enable it progressively for files as you move on, and still rapidly prototype without it. In fact, unless I'm mistaken I believe it's "opt-in" by default, ie. Flow won't scan the file unless you include the special comment keyword at the top of the file.

Could you explain the difference? I though TypeScript was a good example of gradual typing.

Can't a compiler be created to traverse the AST, locate all operations performed on the variable and infer the narrowest type definition that allows for all of those operations, and then enforce at compile time?

Edit: example to further clarify what I meant (it's a bit more than type inference at initial assignment)

    var x = readUserInput(); // narrow down: string, int, double
    x += y; // if y is also a user input, no change
    x += 1; // narrow down to int, double
    p = x % 2; // narrow down to int

You're describing type inference.

An example is Crystal; [edited] example from the website:

  if rand(2) > 0
    my_string = "hello world"
  puts my_string.upcase


  $ crystal hello_world.cr
  Error in hello_world.cr:4: undefined method 'upcase' for Nil (compile-time type is (String | Nil))

I edited my comment to explain further -- I'm talking about going beyond the initial assignment...

This is a well explored area. Wikipedia article is short but has some good references [0].

Languages like Haskell, OCaml, F#, and Elm use global type inference.

Languages like Scala and TypeScript use local type inference. TypeScript in particular uses flowed type inference (there might be a more formal name for this). For example:

  let a = document.querySelector('#a') // typeof a is Element | null
  if (a == null) throw 'not found'     // typeof a is now Element
[0] https://en.wikipedia.org/wiki/Type_inference

Check out the hindley milner type system [1]. It's used by haskell and some other languages, and allows for a very extensive amount of type inference. You can often write large sections of code with no type annotations and the compiler can figure out the most general type based on the type of some function called somewhere, which means the thing you did with that result must be of a certain class of types, which makes the surrounding expressions' types have to be a certain class of types, etc etc.

[1] https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy...

I think it's fair to say that (GHC) Haskell no longer really uses the Hindley-Milner type system. It doesn't have let generalization (except in a few restricted cases), and it doesn't have principal types, which I would consider to be the two distinguishing features of HM.

Let generalization should only be disabled if one uses the GADT type extension and even then can be turned on in addition to GADTs. While Haskell isn't really Hindley-Milner, it's close enough for most development.

Depending on how versed you are in typed languages, but it looks like you're describing type inference, which exists in Haskell and in some other languages as well. I.e you don't need to add type signatures everywhere.

But gradual typing means avoiding typing alltogether in some parts of the program and in some sense going against the inferred type. This is probably most relevant in Haskell where IO is typed, and less so in others, e.g Clojure.

I come from a C++ background, with a recent interest in JavaScript and Go. I'm trying to envision something a little more than simple inference. I'm imagining how I would take an untyped codebase and how I will infer the data type of a variable or a parameter by studying all its usages within the code.

That would still fall under the umbrella of "type inference"

If you want to learn more about this, the Damas Milner unification-based algorithm used in ML, Haskell and others is a must-read starting point. It does global type inference of all types in the program without requiring any annotations, by analyzing how the types are used. It also can detect when a variable is only passed around, without being used in primitive operations, in which case it infers a generic/parametric type for it.

It depends on the type system, but often all usages are taken into account during type inference.

Coming from C++, take a look at OCaml, and specifically it's object-oriented facilities. It does exactly what you want, but in an usual (for a C++ coder) way - by making objects structurally typed. Suppose you write a function like this:

   let f(x, y) = x#foo(y#bar(x))
`#` is the member access operator. So this is a function that takes two objects (strictly speaking, it takes a single tuple of two objects), calls the method `bar` on `y` passing `x` as argument, then calls the method `foo` on `x` passing the result of the previous call as argument, and finally returns the result of that. Nowhere are the types declared, but they are fully inferred and statically checked. The actual type of `f` is rather convoluted when written out fully:

   (<foo: 'b -> 'c; ..> as 'a) * (<bar: 'a -> 'b; ..>) -> 'c
The `<>` notation means "object", with members listed inside delimited by semicolons. `foo: 'b -> 'c` means that there must be a member named "foo", which is a function that takes a single argument of some type 'b, and returns a value of some other type 'c - these are like template type parameters in C++, except you don't have to declare them. Finally, `..` means "and any other members" - without it, our function would only accept objects that have only the member named "foo" with the designated signature. So, `<foo: 'b -> 'c; ..>` means "any object that has method "foo" that takes a value and returns a value, and also possibly some other members". The `as 'a` part then assigns another type parameter to describe that object for reference later.

Then we see the object type for the second function argument, which is `<bar: 'a -> 'b; ..>`. This is structured in the same way, but note that it uses type parameters with the same names as before - this indicates that they must be the same type. This makes it possible to make the definitions recursive; the entire bit before `-> 'c` at the end can be described in a more verbose fashion like so:

"A tuple, the first element of which is an object, with method "foo", taking an argument of the same type that is returned by method "bar" of the second element of the tuple, and returning a value of any type, and possibly some other members; and the second element of which is an object, with method "bar", taking an argument of the same type as the first element, and returning a value of the same type that method "foo" takes as an argument, and possibly some other members."

Note that you don't actually need to write any of this. You can explicitly declare the type of "f" if you want to, but the compiler can figure it all out by itself, just by looking at the body, and will print out the type for you. It will also statically verify how the function is used. So, this is okay:

   let a_foo = object
      method foo y = y + 1
   let a_bar = object
      method bar x = x#foo 0
   f(a_foo, a_bar)
But on the other hand, this:

   f(a_bar, a_foo)
gives a compile-time error for the first argument:

   This expression has type < bar : < foo : int -> 'a; .. > -> 'a >
   but an expression was expected of type < foo : 'b -> 'c; .. >
   The first object type has no method foo
And this:

   f(a_foo, a_foo)
gives an error for the second:

   This expression has type < foo : int -> int >
   but an expression was expected of type < bar : < foo : int -> int > -> int; .. >
   The first object type has no method bar
(OCaml also has classes, but their sole purpose is to provide a common implementation for objects, and to allow code reuse and extensibility via inheritance; classes aren't types, and class inheritance does not necessarily denote subtyping.)

Effectively, this is static duck typing. This all looks a great deal like C++ templates, in that you can "just do" things, and the validity is effectively determined at the point of use. However, because it is captured at the type system level, the point of use doesn't have to see the implementation of your "template" to typecheck it - it just checks the arguments against the type signature that it computed when the "template" was defined. This also means that `f` is an actual function - not really a template - and so it can be passed around, stored in a variable etc (which in C++ is only possible for specific instantiations of a template).

Call site analysis and such are ways to do that. That's for example the difference between TypeScript and Flow. The later does stuff like that (which is both a boon and a bane. Its amazingly powerful, but you can get into states where figuring out why you're getting a compile time error is a mind fuck. Languages Elm handle it beautifully though.

My understanding is that Julia does something like this? But I'm far from an expert.

Yep. Julia is a great language and I think other languages could learn from it, especially it's typing model and the excellent implementation of the REPL. but I wouldn't say Julia is good for everything (I wouldn't want to write a web service in it).

Yes. It works well as long as you have all the pieces and don’t have to worry about nominal subtypes.

Could you explain a little more about the complications of nominal subtypes?

I should have qualified that with the presence of type parameters (nominal subtypes without type parameters are still easy to deal with). Without subtypes, you only unify X and Y (so X = Y), where X and Y are trees with open variables. It turns out X < Y is much harder (semi unification rather than unification).

It is basically equivalent to an alias analysis with imperative assignments (subtypes) and fields (type parameters).

You might have seen this already, but there was a thesis on combining subtyping and type inference here: http://www.cl.cam.ac.uk/~sd601/papers/mlsub-preprint.pdf

It’s an active research area (less so now than in the 90s), so there is a lot of work on the subject. Nothing in production beyond ordersky’s color type system in scale (and that is completely local).

Also, I believe the work you are citing above explores structural subtypes, not nominal ones.

Many languages do this (type inference), to some extent. The moment you have any expression that can evaluate to more than one type, though, it requires annotations.

That's not always true, rank-1 polymorphic functions (i.e. functions polymorphic in their argument) don't usually require type annotations. Higher-rank polymorphic functions though do generally require them, because without the annotations type inference is undecidable in the general case for such functions. For example:

    pair x y = (x, y)
This function creates a pair for any two types, and doesn't require any annotations.

Are there any languages that do it beyond the first value assignment, or outside the current scope? Also, any languages that subsequently revise the type definition?

Not sure if this is exactly what you're asking but one can reassign variables to change their types in Python:

    >>> x = 1
    >>> type(x)
    <class 'int'>
    >>> x = 'foo'
    >>> type(x)
    <class 'str'>

I believe the question was about inferred static typing for union types. Python does not do any static type inference.

Also, I'm not sure how a union type could be distinguished from an error during type inference. Maybe a nullable, but otherwise it'd be very tricky.

What do you do about two types with methods that have the same signature?

It will need to be a language without method overloading.

So no two classes anywhere in the entire project can have a method with the same signature?

No, they just cant have the same name if they are in the same namespace.

Even languages with overloading dont allow them in the same namespace if they also have the same signature.

How does this hypothetical compiler know whether to take NamespaceA.Queue.Size or NamespaceB.DynamicArray.Size or NamespaceC.File.Size?

Because a compiler will only load symbols from the current namespace. If you import both Queue.Size and DynamicArray.Size, then try to call Size, the compiler may fail due to the ambiguous reference to "Size". If you wanted access to both of them in the same namespace, you would have to do some form of a qualified import/reference to disambiguate for the compiler. This is the way that any language that does not support overloading handles this, regardless of type system.

It is also possible for the compiler to use type information to attempt to disambiguate. Especially because this would normally happen across module boundaries, a hypothetical compiler could:

1) Infer types based on all non-ambiguous references (or all references within the compilation unit that contains the definition)

2) Disambiguate all ambiguous reference using type information.

I suppose a sufficiently smart compiler could attempt to do type inference and disambiguation at the same time; but that sounds like introducing NP hard problems to the compiler for the sole purpose of allowing confusing code.

Julia does exactly this.

As much as it gets a bad rep, I think PHP is actually the language that comes closest to this. It's not quite there yet, as although newer versions of PHP have been adding optional type annotations, they are enforced at runtime rather than compile time. It seems not too much of a leap to imagine that some compile time checks might be added at some point though once type-annotated code becomes more universal.

"Gradual typing" is one of the reasons I still love using Groovy :)

Same here - working up an algorithm is so much more pleasant when you can take advantage of all the Groovy features to get the logic right interactively in a REPL, then you can bring it into your code and sprinkle in types afterwards.

I quite like the pattern of having fully typed interfaces and type signatures while letting the body of a function be dynamic where it makes a big difference to expressiveness. Like a hard shell with a soft inside.

I've recently come become aware of Groovy while looking for a programming language with Java's features _without_ the need for object orientation. How production ready is it? (I'm talking enterprise systems, not web based or consumer applications)

It was booming in JUG talks about 10 years ago, Sun was even doing talks about making it available as implementation language for JEE beans.

Nowadays if it wasn't for Gradle, I bet it would have been gone by now.

There's no question it's production ready ... plenty of large companies are deploying it routinely in production for years. The real question is more the opposite of that - is it past its prime to the point where it will stop receiving enough fresh developer input to keep it up with modern trends? Groovy's lunch was mostly eaten by Ruby, unfortunately, and even that is looking old now.

For the other part of your question, it depends a bit what you mean by "without the need for object orientation". Groovy definitely relaxes a lot of the strict object orientation of Java. However, since it is based on the JVM it doesn't have platform support for core functional features like tail recursion and immutability. Groovy adds a number of features to add support for these, but as with all such features added after the fact, they aren't as good or reliable as if it was built into the core platform.

Groovy's forte is really in how it makes Java massively more developer friendly while staying within the original philosophy and maintaining full compatibility with Java. This is where I think it wins over Scala and other JVM languages which introduce their own types and radically different design philosophies. The integration of Groovy with Java is nearly completely seamless, and that's where I think it's actually really awesome for enterprise systems.

It is good when used an an extension to Java. Sort of like TCL works with C. It is not so great if you try to use it as a stand alone language.

I really only use it as a glue code language or for automation. I thought that at least a few years ago there were a bunch of people using Grails? After using it extensively, I have found a few strange bugs that's killed some time but nothing show stopping. For what it's worth, the bugs were at least in their bug tracker. I would love to use it for everything but it seems to be losing popularity, which is really the only reason that makes me hesitant to do so. Maybe I should start contributing :)

We still use Grails at work. Kill it. Kill it with fire.

It's been around for a long time. Works pretty well as a scripting language to plug into Java enterprise systems. At least we've been using it for that for six or eight years.

Nowadays I would look at Kotlin, if you really need Java's slowness—er, features—in a shinier package.

I still use Groovy to write quick scripts or cron jobs that need to use Java libraries. Basically the Bash or Python of the JVM. Otherwise there are much better choices.

I've never worked on an "enterprise system, not web based," so I'm not sure what it is, but it sounds important and large. I wouldn't write such a thing in Groovy, or any other dynamically typed language: one misspelled variable and you have a runtime error waiting for you. (I wouldn't write such things in Java either: one missing null check and you have a runtime NPE waiting for you. But alas, the industry thinks otherwise.)

PS. Whatever you do, stay away from Grails. I've never seen such a bug-ridden, slow and crappy piece of junk in my entire career. And when I say bug, I mean a running production system suddenly starting to omit all WHERE clauses from all SQL queries, out of the blue. The app still working properly, but customers seeing the data of all other customers. I'm not joking.

Apache Groovy's worked quite well as the "Bash of the JVM", though other JVM languages have been moving into that space lately. Clojure's also quite good for running tests -- its macros can easily eliminate clutter in repetitive tests. Kotlin's been chosen by Gradleware as their preferred language for writing Gradle plugins and build scripts. And Scala doesn't really need glue code for tying its classes together because it can do that itself. Virtually no-one's ugraded Grails to version 3 since it came out over 2 years ago, or starting new projects with it. Groovy's adoption is dropping fast, e.g. it slipped out of Tiobe's top 50 a few months ago, and some robot running in Germany is responsible for most of its downloads.

Kotlin and Scala are mostly superior choices to groovy for my purposes.

“I often wish I worked in a language where I could start dynamic and end statically. There are a number of languages with optional type systems, but they aren’t the ones that I use.“

That’s a shame because typescript is basically this. Typing is optional, and you can become more granular as you want.

Now that it is finally there, I can say this without the usual jokes we have grown so accustomed to:

Perl 6 does this.

Granted, (almost) nobody is using it, but it does exist (finally!).

Right on time too!

Back in like 2006 they were estimating that all of the stuff they wanted to put in 6 would require until 2016 or 2017 to complete. I figured that was a comment meant to negotiate down the scope to something reasonable.

Turns out, no...

Julia does gradual typing very well. I think it delivers what the user wants. It has a REPL, you can write code using no types, as types are added to existing code it improves in performance and safety.

If you're interested in learning more about gradual type systems, check out Jeremy Siek's work [0, 1]. He's one of the early researchers in this area.

[0] https://wphomes.soic.indiana.edu/jsiek/what-is-gradual-typin... [1] https://github.com/samth/gradual-typing-bib

I think that type hints are becoming like that in the python world. Almost all my code starts untyped and later I start adding types when they are not trivial

in JavaScript I pepper each function with if(...) throw, eg the program will throw if something is wrong. Doing this catches a lot of errors and bad state. - Without having to babysit a compiler (Typescript) and I can enjoy all the dynamic features. Sometimes I miss the type annotation as documentation, but then I just rename the variable to make it more clear what it is and does.

Instead of "babysitting" your compiler, you're "babysitting" your application at runtime - I fail to see the difference.

Good point. Think of it as hand-holding your peers instead of hand-holding the compiler. if(x==y) throw "friendly error" // avoids eternal loop ... The compiler doesn't care if you get an eternal loop at runtime, and static types wont prevent it.

What prevents you from writing the exact same exceptional code in a static system? I'm not sure I understand your point.

If you make it a habit to write exceptions, it's more likely that you would. Some people think tests etc is only needed because the language is dynamic.

I can agree with the author's premise that types are essentially great, but it would be useful to have control over when (if ever) in the development stage I need to bring them, much like we do with testing. I guess there is an underlying analogy between TDD and strong static typing. I think this is one of the motivations for typed/racket, which is a lot of fun to use.

"I often wish I worked in a language where I could start dynamic and end statically."

C++ 'auto' type. Workflow:

1. Write 'auto =', see what the compiler picks up (iterator of a vector of a set of classes that points to a list of structs, etc).

2. Potentially copy/paste that (unless the line is bigger than 350 characters :) )

Type inference or deduction is not the same as dynamic typing. An auto variable resolves to exactly one type, which is determined by the expression it is bound to. If you pass that variable to a method expecting a different type, the compiler will stop you. If it were dynamically typed, it would let you proceed and fail at runtime.

> I wish I had a language that did optional typing

> they exist, but I don't use them.


The author's thoughts (I want to write dynamically to get the code working, but then make it static afterwards) are nearly the exact thing I've heard from Matthias Felleisen when he talks about the motivation behind Typed Racket.

The biggest value of static types is when you go to edit some code written years and years ago and you can tell what the hell the intended usage was without digging through the method

Not even remotely universally true. I've seen plenty of static code that is as obtuse as any dynamic code. Authorship holds far more sway than dynamic vs static.

It is 100% less likely that you'll run into out-of-date docs wrongly describing the types of the input arguments and return types. Of course the code might still be obtuse but the type declarations are a form of documentation that the compiler ensures is still up-to-date.

This is related to why all my companies use TypeScript rather than plain JavaScript for their frontends and sometimes their backends (vs other languages).

I imagine the author would appreciate the static analysis/success type checking that erlang/elixirs dialyzer provides.

So PEP 484? You can write out whatever and then add annotations (at least to the API).

Also Cython: start with pure Python and add types and other annotations to get the performance gains of static compilation (to C Python extension).

So you spend time writing the things, which then are totally ignored by the compiler/interpreter. Seems like a waste of time, except for documentation purpose.

Those are actually enforceable by IDE or interpreter/compiler if implemented. Even if this wasn't the case, it is better than some arbitrary variable naming convention or comments format.

You can check them at runtime with mypy.


Documentation is never a waste of time. Especially if your IDE is aware of it, and pointing out errors for you.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact