Hacker News new | comments | show | ask | jobs | submit login
Revenge of the Types (pocoo.org)
193 points by rwosync 1204 days ago | hide | past | web | favorite | 133 comments

Of all Armin's articles to date, this one I agree the most with.

> Python is a language that suffers from not having a language specification ... There are so many quirks and odd little behaviors that the only thing a language specification would ever produce, is a textual description of the CPython interpreter... Keeping a language lean and well defined seems to be very much worth the troubles. Future language designers definitely should not make the mistake that PHP, Python and Ruby did, where the language's behavior ends up being "whatever the interpreter does".

This is an incredibly important point. The rise of PyPy is just one compelling illustration of how Python is at a point where a language specification is needed, and these crazy CPython-specific bugs need to be purged.

> I think for Python this is very unlikely to ever change at this point, because the time and work required to clean up language and interpreter outweighs the benefits.

I would disagree - I think it's possible for Python to change this. Any such bizarre behaviors need to be treated as a bug, and eliminated in the next release.

I think Python had one chance to do what you suggest - remove bizarre behaviors. This chance was Python 3.

This is also one of the things that bothers me a bit about Rust. The complexity of the compiler means that writing an actual language spec is hard, and the spec is very much tied to "what rustc's borrow checker is capable of".

There is a relative formal document describing the borrow checker's behaviour: http://doc.rust-lang.org/master/rustc/middle/borrowck/doc/

A language spec (other than "cpython is the spec") would probably be good. But I know of jython, iron python, stackless python and micro python, in addition to pypy -- that are all more or less complete and/or alive as alternative implementations. I'm not sure of the .net variant (iron python) -- I've heard rumours that it's dead. So it's not like the lack of a spec is an insurmountable obstacle, nor like it's not something that no-one is working on...

It's probably not an insurmountable obstacle, but getting over it is by no means trivial. Exhibit A: None of the alternative Python implementations are 100% compatible with CPython. I get the impression that, in practice, the "Python spec" is something that can only be approached asymptotically.

> Any such bizarre behaviors need to be treated as a bug, and eliminated in the next release.

Going back in time (sorry for the long story).

From what I observed, there seems to have been a split in the community. The split is between the importance for PyPy and its future vs the "core" CPython.

When PyPy was making very fast progress there was a feeling that maybe in the next big release PyPy will merge into CPython. So maybe if you downloaded python 3.0 you'd just get PyPy with it. There was a lot of hype and hope. And I think at some level the old guard (Guido and some other core developers) saw that as bit of threat and they made public remarks that I understood to mean: "CPython is not PyPy", "it won't be merged in any time soon", "CPython is fast enough".

So why all this background? Well that might explain why there would be resistance to modifying CPython to "help out" PyPy.

Python 3 should have been a from the ground rewrite of a new core runtime written to a language specification. Maybe we can get that in Python 4.

Who's going to work on that, and most importantly bless it as official and unite people? I haven't seen anything from Guido on the criticisms of Python 3 and CPython, like this post and other unicode complexity complaints in Python 3. It seems that a lot of people who like Python because it lets them move fast hit a wall when they encounter main Python developers as moving so slow as to consider it okey to have two major versions of a language competing until 2020.

There should be a spec and M versions of the runtime. The batteries should be portable between those versions. I used to split by time evenly between Jython, CPython and PyPy. Now I use PyPy about 80% of the time. But having the spec defined as how CPython and the included libraries operate is detrimental to the language. Python3 could have been a chance to clean up the syntax, define and excellent FFI protocol and split the standard library. Python3 could have been a reference implementation that spread Python to all corners of computation.

What are good examples of languages with specifications and at least two implementations that fully implement the spec?

C99 and C++11 have several compliant implementations, as does ECMAScript 5 (javascript).

The benefit of a spec is not so much that all implementations are necessarily completely compliant, it's that you know what the behavior is supposed to be so you can implement to the spec and work around any differences in implementation. Plus, you can report bugs without them being closed as "by design".

Given how complicated any specification for a non-trivial language is, I wonder if there are any. But I think this is beside the point.

Not having a specification for a language pushes every language feature discussion down onto application and library developers 'in situ'. When a developer or library maintainer stumbles onto unexpected interpreter or compiler behavior in a language without a specification, it means that in the short term there's no guidance about how they should deal with it. Do I hack this for Visual C++ in a way that can be removed eventually or do I really have to assume going forward that variadic templates aren't going to be available?

In Lua the user manual is the reference, so if the language does something different, it is considered a bug. If something is not described in the manual then it is an implementation detail (unspecified) and code that relies on it is broken.

This led to several alternative implementations, among which at least one (LuaJIT) implements completely a version of the language (Lua 5.1).

Go has gc and gccgo. They made two implementations to keep the spec honest and not just "whatever the compiler does".

Common Lisp, for sure

Yes, the various Lisps were the first possible examples I came up with (I think Scheme probably qualifies as well).

"Because there is basically no type system that fights against you, you are unrestricted in what you can do, which allows you to implement very nice APIs."

If you feel like the type system fights against you, chances are that you are doing something wrong. When I program, the type system definitely fights for me. It gives me a lot of guarantees, plus it's a really convenient way of self-documentation. I mean, I'm not even talking about Haskell or something really sophisticated: I see the advantages even in old plain Java or C++ (so much that, in fact, most of my gripes about Java are about its type system not being complex enough).

Also, being "unrestricted" is far from being an universally good thing, since it also means many more opportunities for mistakes. I know for a fact that I make more mistakes in languages without static typing, but well, I guess it's just me.

>If you feel like the type system fights against you, chances are that you are doing something wrong.

Like most things, I think this depends on context. Doing exploratory data analysis in a static, strongly-typed language, for example, is extremely painful. And writing quick, one-time scripts in such languages is usually more trouble than it's worth. For this reason, Python is a wonderful language for doing data analysis and writing quick one-time data munging scripts (and yes, I realize that Python can be considered strongly-typed).

On the other hand, I've now been exposed to languages like OCaml and Rust, and have had time to think about Java, and I now have a different attitude when I'm building a large application that will have to be maintained. I'm beginning to feel extremely vulnerable when I use a language like Python or (worse yet) Javascript to build such applications.

It's not that I'm not a Javascript hater. I like Javascript in general -- it has a kind of functional feel to it, it's easy to prototype quickly, you can use it client and server, nice ecosystem, and unlike so many I actually enjoy async programming. And I've long enjoyed Python for similar reasons.

But for building a robust application that will have to be maintained a long-time, you have to religiously write tests in these languages or you'll be buried in bugs. And even then, you still may be buried in bugs that a static, strongly-typed language would have detected. For this reason, I'm looking hard for an static, strongly-typed alternative to Javascript for the browser (i.e., a language compiling to JS) for building large applications. And Ocaml's js_of_ocaml is definitely on my list of things to evaluate.

So the programming language philosophy that I adhere to now is encapsulated in the hackneyed old saying "use the right tool for the job".

"But for building a robust application that will have to be maintained a long-time, you have to religiously write tests in these languages or you'll be buried in bugs. And even then, you still may be buried in bugs that a static, strongly-typed language would have detected."

People make this sort of claim all the time, but my personal experience has not borne it out, and I have seen no data to backup this claim which would override my personal experience.

First, I find I have to write tests nearly as religiously in statically types languages, so I don't find that a convincing argument.

Second, most bugs I've experienced in large code bases using a dynamic language are not the sort that would be caught by most typed languages. On this point there is even some data that backs up my personal experience (http://vimeo.com/74354480).

Third, writing code with a static language takes more time in my experience. Maybe it prevents a tiny fraction of bugs, but for most things I'll take saving significant time and cost over preventing 3% of bugs (I made up that number).

In short, I'm not sold on these arguments for static typing.

Funnily enough, my experience has been quite the opposite.

I never felt the need for TDD or such… until I used Lua. I was drowning in bugs for a tiny 200 lines program! I just couldn't finish. Then I wrote some visualization code, then more of it, until I had the equivalent of a domain specific debugger. It worked. I found my mistakes, and corrected them. Then I though "that is where TDD comes from".

A statically typed languages such as Ocaml or Haskell would have caught the vast majority of my mistakes right away. Heck, even C++ would have helped. In my experience, writing code that "mostly works" is faster with a statically typed language. So many trivial tests you don't have to write, so much debugging you don't have to do…

> Second, most bugs I've experienced in large code bases using a dynamic language are not the sort that would be caught by most typed languages.

So, no Null pointer bug?

Also don't underestimate the number of domain logic errors that could have prevented by encoding the domain logic into the type system. Type systems can encode many invariants, they're not limited to "this is a bool, I wanted an int".

I always wonder, if languages like Haskell are so good, why don't we see a lot of amazing open source projects built with these languages? If you can get things done faster with fewer bugs it should be reflected in real life code bases. Yet I haven't seen it.

"The only arguments that hold water, in terms of programming language suitability, are bold, finished projects." http://t.co/ahOBAnXZ18

Programmers write the same number of debugged lines (or expressions) of code per day, regardless of the language they write in. A language only enables faster development if it requires fewer expressions to say the same thing. That is why high-level languages trounce low-level languages when it comes to productivity. I admit to not being very familiar with Haskell, but if it is not more concise it's not likely to be more productive.

I don't believe that. The speed of producing code is not relative to typing speed. LOC per day depends on the complexity of the lines. I can write 500 simple lines or 50 complex lines of code in a day. I'd agree that higher level languages deal with some of the complexity for you, like memory management. But I bet the business logic takes a similar amount of time in almost any language, assuming the developer has the same amount of experience in each language.

I don't think typing speed factors into it. It's more a factor of cognitive load. While programming you are working at the level of "expressions", and your speed is limited to how quickly you can turn business logic into appropriate sets of expressions. Programming languages that are lower level require more expressions to say the same thing, hence are slower to program in. I don't think it's a controversial notion to say that assembly is less productive than C, which is less productive than Python. The reason is the number of lines (or rather expressions) needed to achieve a piece of functionality. The business logic most definitely does not take the same amount of time when written in assembly compared to python. The higher level languages tend to be more productive, but it is only a given if the higher level language is actually more concise.

Here are some papers that underbuild this argument: http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt_computer... https://files.ifi.uzh.ch/rerg/amadeus/teaching/seminars/semi...

The way to reduce lines of code is to create good abstractions. Some languages may be better at this than others, and I think this is more important than the type system of the language.

In some languages, the type system helps quite a bit in creating abstractions.

I doubt it makes it easier to create abstractions compared to a lisp.

There are upsides and downsides to working in something with a sufficiently useful type system versus working in a lisp - even as pertains specifically to abstractions.

You can doubt, but I'm speaking from some experience with both. That doesn't necessarily mean I'm right, of course, but take it for what it's worth.

I think maybe it's just another example of the worse-is-better principle.

Great as it is, the language itself (and functional programming in general) tends to be pretty opaque to most programmers. Purity is powerful; we've all heard it before so I'll spare us yet another apologia. But, as my time spent TAing an introductory course in Scheme taught me in agonizing detail, purity is painful, too. Most people just don't naturally think of processes as a composition of mappings from values in a domain to values in a codomain. For better or for worse, the "process as widget-fiddling" way of thinking comes much more easily.

> Most people just don't naturally think of processes as a composition of mappings from values in a domain to values in a codomain.

This is most difficult to accept, because I do.

It's like programmers are divided into two camps: those who think procedurally, and those who think mathematically. I have noticed lately how this can influence API design, down to things as simple as this:

    void        latin9_to_utf8(      std::string& latin9str);
    std::string latin9_to_utf8(const std::string& latin9str);
The first interface is a design mistake: it encourages side effects, and requires more code to use in practice. But it is also a very natural one: you want to convert strings, so you write a function that converts strings (KISS). The second interface is better, but more convoluted: instead of just converting strings, it returns converted strings.

Guess which ended up in a real program I worked on…

Cases like that are fairly straightforward.

Where things really fall apart is using recursion instead of loops. Especially when the time comes to mention tail recursion.

Case like that are straightforward, and people still manage to botch them. Which is why I suspect there is something deeply wrong, or at least fundamentally different, about the minds that produce such mistakes.

Though for this particular mistake, it may have come from some misguided notion about performance (like perceived overhead of constructing a string then passing it up the stack).

As for things like recursion and closures… I just avoid them because I know my colleagues will recoil in horror with such uselessly indecipherable excess of cleverness.

I think maybe you do.

The size of Sonatype Snapshots and Central is roughly around Rubygems.org size I think. But so so so many Ruby projects are half baked, while so many public JARs are so much more mature (if often not very pleasant to work with).

It is interesting that I often see people coming primarily from a strongly typed language say this kind of thing, whereas people used to working in dynamic languages rarely make obvious type errors.

This says something about what tooling does to people. Haskell people say they think in types but this make it look wrong. It would appear they are less good with types when you take their tools away. People working primarily in dynamic language probably end up writing programs that are mostly correct in that respect because they have no other choice to become productive.

This looks similar to how people who are raised in clean environments are more vulnerable to bacteries. This is not a bad thing: we still improved life expectancy a lot with hygiene and antibiotics. But we got weaker immune systems and allergies.

If you value time spent writing code higher than preventing bugs, I have doubts about the size of codebases you were maintaining. The reality I have experienced is that working with a large codebase in Python (talking 100kloc+ here, and teams larger than 3) is an absolute nightmare. Lack of explicit typing on argument values and returns means that everytime you add an argument to a function, or move things around, you are frantically grepping trying to find all the callers and hoping that you caught all of them. Add in the more functional aspects such as passing functions around and lambdas and *args and kwargs and different return types depending on function behavior and it gets very ugly. Given the fact that adding functionality and refactoring are the most common activities when working with a large existing codebase, this becomes very old, very quickly. I suspect that Guido's newfound interest in explicit typing coincides with his employment at Dropbox and Google and realizing the very real pain that comes with maintaining large Python codebases.

I work with several codebases, (+250Kloc, +150Kloc and 50Kloc), all in Python (plus some bash scripts). The teams working on them are obviously bigger than 3 people. Though they present challenges, I wouldn't consider them "a nightmare", and the number of bugs introduced by the lack of typing are absolutely minimal.

In my opinion, the maintainability of a codebase is not that much a function of the language, but of the architecture and structure of the code.

I've been writing software for a long time at some of the largest companies in the world. My experience does not match yours.

I agree with you on one part, writing static language INITIAL code, does generally take more time. Depending on the language, it can vary from slightly more in something like Go (where composition is valued over type hierarchies) to a great deal longer in languages that encourage lots of castle building and type relationships.

That said, I have been in industry for over a decade in a half, and the most miserable, hellish experience I ever had was dealing with a large (100k+) non-trivial consumer facing Python application. It was a special type of pain, with novel errors happening in production on a regular basis 4+ years into the project. Refactoring was horrifically risky, and the analysis tooling stack is AWFUL... to the point that when used, it was often outright wrong.

I will gladly give up those cost savings on the leading edge, to have a product I can actually maintain in the long term. I suspect this inability to maintain apps writing in certain dynamic languages is why they have culture bias towards starting over (and being proud of it, "version 2.0, rewritten from scratch!")... which I found a bit shocking at first versus C, which would trend far more toward refactoring.

I have also been in the industry for about as much and noticed:

* Sometimes, the time it takes to build something in a statically typed language is much longer. Long enough that by the time it is built it might not matter because somebody would have built it in node.js, ruby, or python. I really depends on the market.

* Type mismatches in a Python program are not the biggest and deadliest bugs. Unit tests and integration tests (that should be there anyway) should catch those well.

* Often static type languages that have pointers (C, C++) hide much deadlier and dangerous types of bugs -- pointer bugs like "wild pointers".

* Static type language that don't offer protocol checks also can't easily detect bugs where you open a file handle, close it, then read from it. It knows it is a FILE or some iostream but it doesn't help much there. I've heard other more advanced languages have something in place to handle that.

* The above point coupled with being able to use the REPL on a production system and code has saved significant debugging time. In a complex system there could be an interaction that is hard to reproduce and this one system is in that funky state. Can log in via SSH into it, edit the code to add some debug statements and reproduce it again.

In summary, I would like Python to have a standard type annotation syntax that good linters will be able to check for and IDEs will know how to read and use. But it would not be at the top of the list of features I would like Python to have next.

EDIT: (sorry, last sentence should have said "it would not be" instead of "it would be")

> * Static type language that don't offer protocol checks also can't easily detect bugs where you open a file handle, close it, then read from it. It knows it is a FILE or some iostream but it doesn't help much there. I've heard other more advanced languages have something in place to handle that.

You can fix catch these errors with a suitably advance type system; but often a simple construct like Python's with-statement works well for most cases, too. (If you have higher order functions or macros or RAII, that kind of construct doesn't need to be build into the language, even.)

You can fix catch these errors with a suitably advance type system

Not that it matters in most cases though. A programmer picking a language for a project does not realistically have access to the full design space of type systems. If the only implementation of a language with some type system feature is an interpreter for a core calculus written in Agda and not touched for the past several years, that feature is effectively unavailable. Most programmers are not in the business of designing and implementing custom type systems (and associated languages) and probably wouldn't gain enough from it for it to be worth the work of learning how anyway.

Yes. Though you can get pretty far with the extensions built into a mainstream compiler like ghc these days. If you are already doing Haskell, then the investment necessary for getting familiar with these is lower.

> I have seen no data to backup this claim which would override my personal experience.

I think that there are two reasons why good data on this is not forthcoming.

The first is that for a long time the dominant statically typed languages have been C++, which forfeits most of the potential advantages of static typing by also being weakly typed, and Java, which handles types somewhat better (while by no means being state-of-the-art) but brings its own productivity challenges.

The second is, the kinds of enormous long-lived maintained-by-50-full-timers hoary old enterprise applications that I think most fans of static typing have in mind when they say, "I wouldn't want to maintain that in a dynamic language!" aren't often written in dynamic languages. This in and of itself could be seen as evidence that dynamic languages really are unsuited to the task.

Perhaps the more interesting evidence, though, is the tendency of what few larger- and longer-lived dynamic projects there are to become famous for messy code. This hits at what I think fans of static typing are really talking about - that doing any sort of large-scale refactor in a dynamic language is a terrifyingly error-prone experience. You need to rely on ever-fallible humans to have written tests that exercise all relevant code paths in order to verify that the refactor was successful in even the most basic terms. For someone who's used to having the compiler quickly, reliably and exhaustively provide the same feedback, having to fall back on the source of all known bugs (programmers) to do the same job leaves one feeling a bit exposed.

"I have seen no data to backup this claim which would override my personal experience"

There have been attempts at measuring the effect of type systems, see for example this excellent presentation for some references: http://www.slideshare.net/Felienne/putting-the-science-in-co.... One of the studies mentioned here does claim that, other things being equal, static typing helps find bugs quicker.

Thanks for the link. I've read a lot of research papers along these lines. The results vary a lot.

The problem was in the phrase "other things being equal". Other things are never equal. Studies try to make them as equal as they can, and wind up differing as to which things are more important to get equal, and wind up with different results.

I know that this is a cop out argument, but trust me, we all went through this phase. Eventually you'll see static typing as another tool in the box, that is only helpful and doesn't get in your way.

Trust me, I've been doing this longer than you. Or maybe I'm just smarter, but you'll understand one day.

Why be so condescending? Why assume things you have no way of knowing?

Told you it was a cop-out argument. I'm just telling you what my experience was when encountering this topic, and then my conclusion after revisiting it many times over the course of a dozen years or so.

You also made major condescending assumptions. Why?

Because I'm an asshole?

What do you want me to say? I have no illusions about the integrity of my rhetoric—it has none. It was a statement made without much thought based on my experience. It does not require or request your approval.

You are interacting with another human being. Not caring about the person on the other end of that is something you should question.

I do care, it's why I made the comment I did in earnest. The lack of a fully formed argument was simply because of my lack of time and typing on a phone. Point taken: I will try to be more kind even to strangers on the internet. You need to let this go.

> Doing exploratory data analysis in a static, strongly-typed language, for example, is extremely painful.

As bullish as I normally am on static typing, I agree with that wholeheartedly—today.

I'm also completely convinced that a language with good enough record row-subtyping will take over as being far better for EDA than any dynamic language within the next 5 years.

That's the thing about research in types—it's almost always about discovering new techniques to appropriately structure and represent more complex programming techniques. It can lag for reasons of sophistication, but once it's in place it'll be hard to beat.

Already I feel it's absolutely the case that good static types (Haskell I'm most familiar with, but I imagine it's the case with Scala, OCaml, F#) are a win for short, one-off scripts. The difference is that the types you use are looser than the ones you'd use for a super robust application. The typing ends up looking a lot more like the typing you'd implicitly apply to a Python program [0].

So, "use the right tool for the job" applies to typing technology as well. Static types, I believe, will eat the use cases of dynamic types.

[0] This is essentially an extension of the "unityped argument". If you program in Haskell in a "always IO, frequently ambiguous" fashion then you can get all the flexibility you're used to in scripting in Python with little overhead.

In my opinion the trick to large-scale software architecture is to build small software systems tied together by strictly defined interfaces. There are many ways to ensure the strictness of the interface: having strict typing and a compilation step, unit tests, static code validation (compiling without compiling), and even hungarian notation and systematic code reviews. Those are not mutually exclusive tools; quality improves with each one added, but none are mandatory.

My personal experience working on large systems without unit tests (among others a 200.000 line javascript codebase) is that as long as you adopt good coding practices around how you define clearly typed interfaces the lack of unit tests doesn't hurt you that badly and there is no wild-growth of bugs. Using the jetbrains IDE's also made a big difference for me due to the static code checking. I want to see the little code quality box at the top of the scrollbar turn green, and the process of doing that tends to weed out a lot of those bugs that a unit test or a strict type system would catch.

I do think that if you're building large-scale or long-lived systems, it is not any faster to use a dynamically typed language. I've written lots of code in both strictly typed and loosely typed languages, and I find that while I enjoy a loosely typed language more, I am equally productive in the strictly typed environment because strict typing makes it easier to write reliable code from the outset.

Dart is a language worth considering if you want something with static types that compiles to js.

Alas, Dart is pretty much a dynamic language with optional (and intentionally unsound) static typing. I understand why they've made some of the decisions they did, but it doesn't appeal to me much as a Javascript alternative, since it's not going to catch most of the errors that a proper statically-typed language would.

It is a very nice Javascript-without-all-the-weirdness, however.

Absolutely - like the NoSQL guys don't realize that a DB enforcing a schema is helping you.

About this... Some types are more important than others. And the NoSQL guys usualy choose the most important ones to throw away.

Static vs. dynamic typing in languages are orders of magnitude less relevant, and one can choose to trade safety for other features much easily on this domain.

Armin didn't address what I belive is the main point of adding type annotations to Python: compiler checkable documentation and as hints to IDE:s. Seen in that perspective, a sucky type system is good enough because it's use isn't to verify program correctness. Personally, I think a standardized format for describing types in docstrings would be 100x better but that's not the way the Python core devs have choosen.

Another argument in favor of types is that it will enable Python to optimize code better. But since Python isn't built for static typing, the CPython bytecode interpreter has no facilities for exploiting the extra information. And even if it had, the V8 Javascript VM proves that you dont need static types to generate optimized code.

On the topic of IDE's, WingIDE has an interesting approach where you can add `assert isinstance(your_var, SomeClass) to the beginning of your functions to help Wing analyse your code[1]. It's more verbose than type annotations and doesn't feel very idiomatic sprinkling asserts all through your code but I recall it being pretty helpful (I haven't used Wing in years now).

[1] https://wingware.com/doc/edit/helping-wing-analyze-code

Sometimes I do it in PyCharm as well when it has troubles auto-detecting types.

PTVS supports it, as well. Basically, this is the way to go if you want cross-IDE type hints right now.

Yes. If you believe the point of a type system is to enforce correctness, then this doesn't make sense. Most type theorists seem to take this position. But if you believe that the point of a type system is to provide programmers with better feedback, then this completely makes sense. I wish more type systems research would explore on the latter rather than fixating on the former.

Typescript and Dart have been developed with similar philosophies.

I think Armin didn't address these valid points, because as he points out, "The first one is that I barely understand them[type systems] at all myself."

Seems odd for someone who doesn't understand type systems to be writing a high profile blog post about why they don't belong in Python, no?

People should be able to write on their blogs their thoughts. It's a really good way to learn, and to get feedback on your ideas.

Hopefully Armin listens to people who have been thinking about this for a long time, and done work in this area. But why should his ideas, which are admittedly uniformed, get spread wider than better ideas? This happens a lot anyway.

I don't think it's entirely positive when people rant when they don't have the knowledge to back it up. However, it can produce a reaction from other people to step up and argue their case better. Or even better, to put out their code. I think in this case Armin does have a bit of a clue, and this essay is informing himself, and others quite well.

You can statically check python with types now (pycharm pysonar2 etc), and you can use things like ABCs and interfaces to enforce constraints.

"Union types" and "intersection types" are the type systems people have used to statically type check python, and other dynamically typed languages. You can see the various types coming into or out of a function. This is what Armin is talking about with Option/Composite types.

So these various type systems which have been used to add extra type checking on top of python are now being blessed, and brought into the language proper.

There are plenty of places in Python where the types are not specified well, because mostly it doesn't matter to people using it. Since the type checking tools have added external type definitions, and fixed up inconsistencies outside of the python implementations. So Armin is pointing out a few examples of type inconsistencies within python. There are lots more. Especially at the C API level, where things are a bit weird. But they haven't really bothered people that much, so they haven't been fixed. As someone who has written C extensions for python, I can tell you that it is weird, and things have changed with every python release (even, and especially in the 2.x series).

However, these external type definitions are being brought together into the language (as per Guidos email) in the mypy format. There is a hope that the other definitions from tools like PyCharm can be translated automatically. These definitions are being used in useful tools today.

These external type definitions are in effect a specification of the types for the core language, the standard library, and even other popular libraries (like Django etc).

What came first the Duck or the specification of the Duck?

Some people may not understand the context of this part:

    So not long ago someone apparently convinced 
    someone else at a conference that static typing
    is awesome and should be a language feature. I'm 
    not exactly sure how that discussion went but the
    end result was that mypy's type module in combination
    with Python 3's annotation syntax were declared to be
    the gold standard of typing in Python.
Ronacher is referring to the fact that Guido van Rossum, the Python language creator and BDFL, recently said he wanted to make mypy's type annotation standard into a standard by making use of Python 3 function annotations.

The original function annotations standard is PEP-3107[1], GvR's proposal is on the python-ideas list[2], and information on mypy can be found at the project's site[3].

I agree with Ronacher's conclusion; I don't think static types -- even if only used at runtime -- are a good fit for the language. As for function annotation syntax, I think we just need to admit that isn't really good for anything.

Great article!

[1]: http://legacy.python.org/dev/peps/pep-3107/

[2]: https://mail.python.org/pipermail/python-ideas/2014-August/0...

[3]: http://www.mypy-lang.org/

The thing that function annotation syntax is good for is the ability to catch errors before they hit your production server at 2AM on a Tuesday. If you can describe the types that a function accepts and returns, violations of those rules can be caught by static analysis such as pylint and let you know before you commit. The current approach in Python after refactoring a large chunk of code is to grep through all the existing callers, fix what you can find, run the tests and hope for the best.

I've been using Nimrod to replace Python on a Bitcoin project.

elliptic.nim: https://github.com/def-/bigints/blob/master/examples/ellipti...

elliptic.py: https://github.com/wobine/blackboard101/blob/master/Elliptic...

Nimrod looks and feels like python, but it compiles to C. It's like C except with Pythonic syntax and with Boehm GC optional. In addition, Nimrod has a burgeoning NPM-like module ecosystem developing, albeit in the early stages.

    import rdstdin, strutils
      time24 = readLineFromStdin("Enter a 24-hour time: ").split(':').map(parseInt)
      hours24 = time24[0]
      minutes24 = time24[1]
      flights: array[8, tuple[since: int,
                              depart: string,
                              arrive: string]] = [(480, "8:00 a.m.", "10:16 a.m."),
                                                  (583, "9:43 a.m.", "11:52 a.m."),
                                                  (679, "11:19 a.m.", "1:31 p.m."),
                                                  (767, "12:47 p.m.", "3:00 p.m."),
                                                  (840, "2:00 p.m.", "4:08 p.m."),
                                                  (945, "3:45 p.m.", "5:55 p.m."),
                                                  (1140, "7:00 p.m.", "9:20 p.m."),
                                                  (1305, "9:45 p.m.", "11:58 p.m.")]
    proc minutesSinceMidnight(hours: int = hours24, minutes: int = minutes24): int =
      hours * 60 + minutes
    proc cmpFlights(m = minutesSinceMidnight()): seq[int] =
      result = newSeq[int](flights.len)
      for i in 0 .. <flights.len:
        result[i] = abs(m - flights[i].since)
    proc getClosest(): int =
      for k,v in cmpFlights():
        if v == cmpFlights().min: return k
    echo "Closest departure time is ", flights[getClosest()].depart,
      ", arriving at ", flights[getClosest()].arrive

Thanks for posting this, was not aware.

Spent much of Sunday reading the language docs and I must say, as a "die-hard Pythonista", this is really cool stuff.

Will definitely keep an eye on this one.

Random question for the type experts out there: is there any language that lets me track the units of my numeric variables? For instance, something like this:

    float<km> drop(float<m> x0, float<s> duration) {
      float<m> x = x0;
      float<s> t = 0;
      float<m/s> v = 0;
      float<m/s^2> g = -10;
      float<s> dt = 0.01;
      while (t < duration) {
        v += g*dt;
        x += v*dt;
        t += dt;
      return x * (1<km>/1000<m>);  // abbrev for a cast: (float<km/m>).001
Then I want the compiler to check that I'm not mixing up my units. It seems like this would be really useful, but I've never seen it before.

Haskell has a few libraries to do this. I particularly like unittyped[1], which not only keeps track of units but also converts among compatible ones automatically. So 1 meter + 1 inch would typecheck and be converted automatically, but 1 meter + 1 second would give you a type error.

The wiki page[2] has a bunch of examples, which I find pretty compelling. The one problem is that error messages are ugly, but they're ugly in a consistent way. You can just ignore the ugliness as unnecessary noise.

    *Main> (1 meter / second) * 5 second
    5.0 m/s⋅s
    *Main> 2 meter + (1 meter / second) * 5 second
     7.0 m
One cute thing is that prefixes like "kilo" are just functions, letting you write things like:

    *Main> (42 kilo meter) `as` mile
    26.097590073968025 mile
    *Main> gallon `as` (cubic (deci meter))
    4.546089999999999 dm⋅dm⋅dm⋅#
Haskell is really good at dealing with numeric types in general. For example, it's quite easy for a library to define its own types, which then behave just like built-in ones including nice syntax. Unittyped follows this philosophy, letting you use units with things that aren't floats, like rational numbers, meaning you don't have to lose precision.

    *Main>  (1 % 2) . meter `as` foot
    625 % 381 ft
It's a really slick design and manages to give you safety as well as additional expressivity (since units get converted automatically).

[1]: https://hackage.haskell.org/package/unittyped

I'm not sure why, but your examples aren't working with me...

    1 *| meter |+| 35 *| centi meter
    1.35 m
works, but

    1 meter
fails with

        No instance for (Num (Value LengthDimension (U Meter) f0 -> t0))
          arising from the literal `1'
        Possible fix:
          add an instance declaration for
          (Num (Value LengthDimension (U Meter) f0 -> t0))
        In the expression: 1
        In the expression: 1 meter
        In an equation for `it': it = 1 meter
        No instance for (Fractional f0) arising from a use of `meter'
        The type variable `f0' is ambiguous
        Possible fix: add a type signature that fixes these type variable(s)
        Note: there are several potential instances:
          instance Fractional Double -- Defined in `GHC.Float'
          instance Fractional Float -- Defined in `GHC.Float'
          instance Integral a => Fractional (GHC.Real.Ratio a)
            -- Defined in `GHC.Real'
          ...plus four others
        In the first argument of `1', namely `meter'
        In the expression: 1 meter
        In an equation for `it': it = 1 meter

Try running "import UnitTyped.NoPrelude" first.

  EDIT: you may also want to start ghci with the "-XNoImplicitPrelude" flag, or explicitly disambiguate operations such as "*" with "UnitTyped.NoPrelude.*" or "Prelude.*"

Thanks, but that's not enough. The error seems due to the fact that `meter` is passed as an argument to `1` which is not a function, and indeed I don't see how that could be valid haskell code (unless there's some language flag I should enable, but unfortunately the wiki/docs of unittyped doesn't seem to be comprehensive enough)

Weird, it is enough on my machine (running ghc 7.6.3, unittyped 0.1), but unittyped does seem to use a lot of language extensions internally, it is seems possible that it would behave weirdly on different versions.

To answer the question of how it is possible, we can start by looking at the types. ":t" is a ghci command to show the type of an expression:

  ghci> :t 1
  1 :: Num a => a
This indicates that "1" can be any type that implements the "Num" typeclass.

Next, we want to determine what type "1" takes in the expression "1 meter". We can do this with:

  ghci> let a=1; b=a meter
  ghci> :t a
  a :: UnitTyped.Value Double LengthDimension Meter
     -> UnitTyped.Value
             * Length ('UnitTyped.Pos 'UnitTyped.One) 'UnitTyped.UnitNil)
We can see that, in the expression "1 meter", "1" is actually a function. Looking at the source code [1], it seems like this is accomplished in a relativly hacky manner:

  instance (Fractional f, Convertable a b, t ~ Value f a b) => (Prelude.Num (Value f a b -> t)) where
	fromInteger i x = Prelude.fromInteger i . x
	(+) = error "This should not happen"
	(*) = error "This should not happen"
	abs = error "This should not happen"
	signum = error "This should not happen"
This works because the "+" being defined here is Prelude.+, not UnitTyped.NoPrelude.+ (which is defined seperatly).

[1] https://hackage.haskell.org/package/unittyped-0.1/docs/src/U...

I see. I tried to report this as a bug, fwiw:


Btw, it's weird... I just tried

    ack-grep "Num\b"
and I cannot see any instance of Num defined anywhere inside unittyped's src

PS: ok, since I wanted to see exactly what was the problem with unittyped compiling under ghc7.8 I cloned the sources, but I forgot to checkout the actual release... thus running directly from tip was the cause

Installing it directly from hackage solved it, it's embarrassing how I was stunned by this in hindsight

1 absolutely could be a function—all you need is a function type to instantiate Num.

> Haskell is really good at dealing with numeric types in general.

I just wish they had based the numbers on mathematical concepts like rings and fields, instead of this weird Num thing.

pjungwir's example code is actually really close to what you'd see in F#. In F# you'd typically write it in a more functional way with say a recursive inner function but I'll leave it imperative for clarity's sake:

    [<Measure>] type s
    [<Measure>] type m
    [<Measure>] type km  

    let mtokm (x:float<m>) = x * 1.<km>/1000.<m>
    //the types of g and ground are inferred
    let drop g ground (x0:float<m>) =
         let mutable x = x0  //type inferred here
         let mutable t = 0.0<s>
         let mutable v = 0.0<m/s> 
         let dt = 0.01<s>

         while x >= ground  do
            v <- v + g*dt
            x <- x + v*dt
            t <- t + dt 

   > drop -10.0<m/s^2> 0.<m> 50.<m>  
   val it : float<s> = 3.16

I've got to confess that I love how the design of MS languages regularly exposes my ignorance towards them.

Edit: removed unrelated Wikipedia article (Unit Type)

Unit type is completely different concept in the functional languages. It does not track units and measures.

Oh my, thanks for the heads-up I guess it's too late to post on HN. Time to go to bed :)

Adding to all the other answers, here is what Haxe can do:

    class Metric {
        static function main() {
            var coinRadius:Millimeters = 12;
            var myHeight:Centimeters = 180;
            var raceLength:Meters = 200;
            var commuteDistance:Kilometers = 23;

            diff( coinRadius, myHeight ); // 1.788 meters
            diff( raceLength, commuteDistance ); // 22800 meters
            sum( commuteDistance, coinRadius ); // 23000.012 meters

        static function diff( a:Meters, b:Meters ) {
            var d = Math.abs( a-b );
            trace( '$d meters' );

        static function sum( a:Meters, b:Meters ) {
            var s = Math.abs( a+b );
            trace( '$s meters' );
And the best part is, at runtime those are all floats. Take a look at the compiled JS code:

    (function () { "use strict";
    var Metric = function() { };
    Metric.main = function() {
            var coinRadius = 12;
            var myHeight = 180;
            var raceLength = 200;
            var commuteDistance = 23;
            Metric.diff(coinRadius / 1000,myHeight / 100);
            Metric.diff(raceLength,commuteDistance * 1000);
            Metric.sum(commuteDistance * 1000,coinRadius / 1000);
    Metric.diff = function(a,b) {
            var d = Math.abs(a - b);
            console.log("" + d + " meters");
    Metric.sum = function(a,b) {
            var s = Math.abs(a + b);
            console.log("" + s + " meters");
The compiler takes care of all the conversions for you. To see the complete compileable example, including my type definitions for each unit, take a look at this gist: https://gist.github.com/jasononeil/b6b1845824f45f5d19df

And the manual on abstract types: http://haxe.org/manual/abstracts

Nimrod features distinct types (http://build.nimrod-lang.org/docs/manual.html#distinct-type) which allow you to implement something like this. Here is an example: https://github.com/def-/units/blob/master/units.nim#L220, I don't think it's complete yet though.

Frink is a programming language that is all about keeping track of units:


Scala has a number of libraries that lets you do this, for example Squants (http://www.squants.com/) and ScalaQuantity [https://github.com/zzorn/ScalaQuantity].

The boost units library does this for C++.

In theory you can do that in any language that allows you to override operators. It's definitely being done in Rust. For instance you sleep for `Duration::milliseconds(10)`.

Statically checked dimensional analysis is a great deal trickier though.

Java's pluggable type system can do this. The Checker framework provides a Units checker that can be plugged into the compile stage. See this blog for more info: http://blog.paralleluniverse.co/2014/05/01/modern-java/

Jav^H^H^H errr... I mean, Xtend [1] :). It does this through a mechanism called "extension methods" [2]. Kotlin has something similar called "extension functions", so I'm guessing it should be possible with it too.

1: https://github.com/eclipse/xtext/blob/master/examples/org.ec...

2: http://www.eclipse.org/xtend/documentation.html#extensionMet...

3: http://confluence.jetbrains.com/display/Kotlin/Extension+fun...

From what I've read, Ada can do this.

Ada also allows this. Has for quite some time, and it comes with a built in real world units system.

Value classes in Scala.

We live in that wonderfull time in development,where a lot of things are questioned.A lot of languages that used to rule everywhere are questioned,like the OP,and devs try to come up with the ultimate language that would solve every use case.

There is,of course, no such a language but future languages wether they are dynamic or static ,strong or weak (type wise) will certainly not make the same mistake as their ancestors.

Personally I want a scripting language,with type inference but real strong static typing, that can be easily interfaced with C++, that handles concurrency the right way(ie not like javascript callbacks),that is trully multipurpose(like python) elegant(a bit like ruby), 00 with composition and strong encapsulation in mind and access modifiers, with immutable structures and variables by default but not limited to it,with some functional features without looking like Haskell,resonably fast for GUI dev,scientif computation and non trivial web apps,easy to deploy on a server, with sane error handling,IO streams everywhere, a clear syntax(no unreachable characters on non english keyboards everywhere),with a good package manager, an interactive repl(like IPython or the tool for swift,I forgot the name) and with battery included.

So we are definetly living in an exciting period.

I don't think C++ interface is easily feasible, as far as I know no language managed to do that yet, probably because of the wildly different ABI between C++ compilers.

Anyway, I would add to your list Go concurrency constructs (channels, "go"-like statements), polymorphic types, and, personally, significant whitespace for indentation, like Python.

Rust is a big step in the right direction, although it's a bit too complex|heavy for scripts.

Haxe (pronounced "hex") is pretty close to what you want... http://haxe.org/ http://en.m.wikipedia.org/wiki/Haxe

You probably also want sum types (or generally Algebraic Data Types) with your unicorn language.

> Personally I want ...

Julia, maybe?

The argument seems to be that if the added type system is not perfect, then it will be useless. For a working counterexample, see the Closure Compiler (https://github.com/google/closure-compiler), which adds a bolt-on, underspecified, occasionally-changing type system to Javascript. Similarly to GvR's proposal (https://mail.python.org/pipermail/python-ideas/2014-August/0...), the types are only ever checked at compile time: there's no attempt to use them as runtime assertions and very limited attempts to use them for compile-time improvements. So, yes, because they are imperfect and optional, you don't catch every type error, and you need to add type hints / casts that in a perfect system wouldn't be necessary.

Is this kind of flawed type system worth it? Hell yes. I've maintained large programs in both Python and (closure-compiled) Javascript, and with the former I've wished I had the help of the limited type checking available in the latter.

If you want to learn more about how 'optional' and 'unsound' type systems can still deliver a lot of value there are some interesting articles on Dart's optional types:

* http://journal.stuffwithstuff.com/2011/10/21/wrapping-my-hea...

* https://www.dartlang.org/articles/why-dart-types/

* https://www.dartlang.org/articles/optional-types/

I find it interesting that this is exactly what Perl6 tried to handle gracefully, both on the technical side (a type system that supports both static and dynamic typing) as well as on the practical side (a re-implementation from scratch with some measure of interoperability).

However, they did not set out to just design the next version of Perl, but the last version, reasoning that if you have proper extension mechanism in place, you won't have to do a reboot ever again.

This resulted in gradual typing (which sometimes needs to fall-back to runtime checks), a pluggable syntax with extensible grammar, a flexible meta-object protocol, default numeric types that are objects (rationals or bigints), lazy list, reified language construct (variables as container objects) and other stuff that makes a straight-forward implementation horribly slow.

Not only that, but look at the niche Python is occupying. It's a well established and respected niche to which, by definition, the language is suited very well.

By adding static types, the focus of the language would move to a different niche, which would probably be already occupied by some competitor language(s) which do(es) types much better.

If you want sort-of Python-ish syntax with elaborate types, just use Nimrod and leave Python alone. Get the right tool for the job, don't mutilate a perfectly good existing tool.

What exact niche is that? Unless you mean "dynamically typed" (in which case that begs the question), I don't know what niche you think it couldn't occupy with static types. Not even arguing for static types: your comment is just really unclear.

And this argument would be why PHP is in the shape it is today.

Hrm. I think this article does a great job of explaining weird inconsistencies in Python's current type system, but I think it does a somewhat less good job of demonstrating that adding annotations is actively harmful. What would be some examples of situations where inconsistencies in the type system made annotations problematic in practice? Are those situations compelling enough to warrant not adding any annotations to the language?

Python's API structure may be fast and loose, but that is a feature, I think. Its type system may have some threadbare implementation spots – the _sre example from the standard library, for instance – but there are some compelling examples as well.

I, for one, have been working on an app implemented mostly with PyObjC, the bridge between Python and Objective-C. I had all but written off PyObjC as a bizarre yet unuseful language mule... but lately I had the occasion to read through the PyObjC source code base, in service of my project. Did you know that when you subclass a wrapped Objective-C type in Python, an all-new Objective-C class is created and wrapped as the descendant class, behind the scenes? That blew my mind.

That happens transparently, in concordance with Python's type heiarchy and runtime ABI. As it turns out, PyObjC predates Mac OS X, and the authors have put a lot of work into mitigating things like the GIL and system events.

I am also a fan of Django's field type system, as the author mentioned – and I am curious about what he thinks about descriptors (which he mentioned one but did not address) – I think descriptors are an amazing addition to the duck-typing regime in Pythonville.


The more I read Armin's posts, the more I believe he should switch to Lua. It has all the core features he wants:

- simple design

- fast

- consistent behavior

In addition to what's mentioned above, LuaJIT is a marvelously designed JIT (please donate to the project [1]. Let's keep allowing Mike Pall to have a livelihood)

[1] http://luajit.org/sponsors.html

This is a very nice article showing some of the issues troubling python and the proposed type anotations.

I would summarize my view of the type annotation proposal as follows: Statically typed languages can introduce inference heuristics that minimize the amount of type declarations. They can "jump" into the dynamically typed world more easily. The other way around is a lot harder. Not only are all those type annotations lacking in the standard library and the tools around, but there is also a lack of function design by types.

No, it's a horrible article. And no, they cannot. I implemented such a type inferencer for perl (B::CC), which has the exact same problem.

With a highly dynamic run-time which changes the types at will, the compiler will never be able to do proper optimizations without explicitly forbidding certain coercions. The inferencer has to give up in 90% of all cases.

Only with optional explicit types, esp. for loop counters and array members, those operations can be efficiently optimized. Up to 2000% for the typical array or loop benchmarks.

And no, a v8-style tracing jit also does not help in all cases. This kind of run-time type caching has a huge overhead, and will always need an explicit run-time check, and does not catch the typical type errors writing programs. It helps with simple scripts, but not with programs.

You don't have to worry. If you don't like to use types, don't use it. Nothing will change. It will only compile-time optimize and check the cases where someone added the types.

Of course the standard library is free to add type-optimized versions later, as e.g some Common Lisp's did with their arrays and vectors. But this was done mostly in the compiler, not in the library itself.

You are right about optimizations, there are a lot of situations where type annotations could greatly help the compiler. I just do not have the feeling that performance optimizations in CPython are the reason for Guido's proposal. It rather seems to be motivated for development tooling.

Is `for i in xrange(10000):` especially optimized in python at the moment? I doubt it and those low hanging fruits could be tackled with some explicit pattern matching. (I probably should do my homework now and dive into the python source code).

Can somebody explain in greater detail what this means:

> This is true for the CPython interpreter world, but not true for Python the language. That these are not the same things is disappointing but generally the case.

My understanding was that the CPython interpreter is considered the "reference" implementation of Python, and that (except for bugs) it defined Correct Behavior for Python-the-language.

If the author is reading this thread, I spotted this type-o:

  Type type claims that it's a subclass of object.

Is that a typo? I read that as '[The type named] "type" claims that it's a subclass of object'

Compare to "Type integer claims that it's a subclass of number."

That makes sense. It's confusing though because "object" gets a monospace font but "type" doesn't.

My hope is that Guido's plan to incorporate the type declaration module (`types`, a subcomponent of mypy) will encourage the Python team to clean up the issues brought up in TFA.

I didn't know Python users hate static typing so much.

"Python users" is a vast generalization (this is literally one of the most popular programming languages in the world - there are a lot of users), and "hate" is a strong word. So if you want to convey meaning in your comment, you should try to be less hyperbolic.

FWIW: I'd argue that most "users" of Python don't know the difference between static and dynamic typing. Or care about fanatic language wars in which the word "hate" is used to describe preferences between purely technical details.

I don't "hate" static typing. I dislike the current proposal though, because the proposed syntax is a horrible wart on top of a very elegant language.

Do you mean aesthetically or the proposed specifications?

Little bit of both actually.

If the primary motivation behind optional type checking is IDE hinting and compile time checks I'd actually prefer a standard docstring format instead of annotations.


Could you expand on why you feel that way?

>So what's the solution? Not having null references and having explicitly typed arrays.

Throwing the baby out with the bath water. Null types are extremely useful.

I don't get why the author claims the null type in C# is a form of "damage". It's just said that it's bad, not why. The problem with None in python was that you can't tell what it's supposed to be. In C# you know what type the null is meant to be.

If you have ever programmed in a language with strict enforcement of something like a "maybe" or "option" type then you'll see that it's fairly trivial to have a separate type for a nullable pointer and a non-nullable pointer.

It is essentially brain-dead that the type Foo really means "Either a Foo, or null" Failure to properly check for null is a source of many, many bugs, and there is an almost trivial way (assuming you're creating a new language, like C# was) to completely prevent it in statically typed languages that was well known when C# was invented.

I don't see the issue. Armin went completely insane IMHO.

This problem is usually solved by expressing the type "Foo or Null" as Foo? in such a gradually typed language. Maybe the critics should lookup the basic type literature first.

Such posts will kill python. And this type of type critic already did kill python for Google. (And kept perl 10 years behind)

If when the type of a function F is "Foo or Null", you can express it as something like

F(...): Foo?

and if you're later forced to deal with such a value, Null cannot catch you by surprise, then haven't you just rediscovered Option types?

If, on the other hand, "?" is just an operator you use on values that can be null, to guard against an exception, this seems less useful. Are you going to wrap every return value of every function call with it? What happens if you forget? How can you know, at a glance, if a function may return a null value?

What are nulls extremely useful for in programming languages?

At least one serious problem with nulls in many languages is that you can get one somewhere you are not expecting it, and you won't handle it correctly, resulting in an null pointer exception (if you're lucky) or a crash (if you're unlucky). That's why alternatives like Option types are so useful: you really cannot ignore them, so they won't catch you by surprise. And at the same time, you can often handle them in pretty painless ways.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact