Hacker News new | past | comments | ask | show | jobs | submit login
Python 3 Types in the Wild: A Tale of Two Type Systems [pdf] (rpi.edu)
128 points by zdw 30 days ago | hide | past | favorite | 124 comments



>A key question arises: why are so few repositories type-correct?

The authors don't seem to ever discuss the fact that mypy version changes frequently make previously-passing code fail type checks.

I don't think I have ever upgraded mypy and not found new errors. Usually they're correct, sometimes incorrect, but it's a fact of being a mypy user. Between mypy itself and typeshed changes, most mypy upgrades are going to involve corresponding code changes. The larger your code base and the more complicated your types are, the worse it'll be, but it's basically an ever-present issue for any program interesting enough to really benefit from a type checker.

How many of those repositories were "type-correct" but only on particular versions of mypy? I bet it's a lot!


I don't have this experience. Can you give an example of a previously-valid codebase that failed typechecking unexpectedly on a recent MyPy update, that wasn't a result of a false negative bug in MyPy?


I can't give you any links because it's not open source code, but there was a bug fix in 0.790 named "Don't simplify away Any when joining union types" that caused me some problems with bad annotations in the existing code: The annotations implied that Any was possible, but it wasn't, but it got dropped from the final Union by the bug so we never had to handle Any. Dataclasses have had some backwards-incompatible improvements as well.

But the big culprit is typeshed. Something will get new/fixed annotations and suddenly you aren't handling all possible return types in your callers, or whatever.


I've had this problem a few times., For example it happened with the 0.800+ versions. Mypy got stricter, and was more aggressive in finding code (eg small scripts in not in the proper python hierarchy).

I can't show any of my professional work (no publicly available src) but this side project of mine is locked to 0.790 until I can find time to sort the issues: https://github.com/calpaterson/quarchive/tree/master/src/ser...

It's hard to classify anything as a "false negative" with mypy since it is very liberal (often unexpectedly so, which I think is one of the sharp edges of gradual typing).


I obviously can't share it here, but my primary codebase at my job needs a few dozen changes every time we change mypy versions. There's a few places of false positives where I've had to comment `# shut up, mypy` and a `type: ignore`. Usually when I'm being clever with the subprocess module.


I wonder if that's a mypy issue or more that the typeshed types are bugged, since type shed versions also get shipped (used to?) with new type checker versions.

https://github.com/python/typeshed


in a codebase with tens of thousands of lines of typed code, I see maybe a few new type errors with a new mypy release. They've always been fixable within a few minutes.

I think mypy has some problems, but this isn't one of the bigger ones for me.


I don't think it's a "problem" with mypy, I just think it's likely the cause of a lot of the programs that the authors think don't type check.

Though I will say, while most have been fixable in a few minutes, some have been a real chore to fix. Sometimes an innocuous looking error balloons into several hours of reconciling obscure type system behavior errors once you start fixing it. Regardless, it's a small price to pay for proper type checking in Python. I've more than made up the lost time in detecting bugs before they ship.


Even for fully statically typed languages like C++, it is very common that some old code can’t compile with latest compiler. Shrugs.


No it isn't. C++ has extremely good backwards compatibility.


I disagree. C++ often removes language features, standard library features (std::random_shuffle), etc. Also object files compiled with different standard versions often are ABI incompatible with each other, which means you can't just pick new C++ for new code, but its rather all or nothing.

You can argue that when it removes features it provides a replacement (sometimes), but that does not change the fact that if you have any reasonably large project (>1 million LOC), every standard upgrade will break your app.

One of the main reasons we write all new code in Rust and are migrating the C++ code base step by step to Rust is because Rust offers infinitely better backward compatiblity guarantees than C++.

Rust never ever breaks your code, and you can opt-in to newer Rust editions for new code only, and these are ABI compatible with Rust code written using older editions.


Even aside from deliberate backwards-compatibility breaks in the standard, compilers sometimes break compatibility. Both MSVC and GCC 11 have changed their header file transitive includes within the past few years, causing projects (like doctest and Qt5) to stop compiling because they forgot to include headers, which built fine in the past but not anymore. IDK if it's "very common", but it's definitely happening in the wild.

MSVC: https://github.com/onqtam/doctest/issues/183

GCC:

- https://invent.kde.org/qt/qt/qtbase/-/commit/8252ef5fc6d0430...

- https://invent.kde.org/qt/qt/qtbase/-/commit/cb2da673f53815a...


In the c and c++ worlds, this is the same line of thinking that kept new warnings out of "-Wall" for so long.


I’ve found the opposite with pyright. Code that seems right but is failing checks is fixed with the next release :)


Very good discussion from my wife's alma mater!

One thing I didn't see mentioned that is a particularly annoying false positive in mypy is the enforcement of Liskov Substitution for equality checks. All types have the common bottom type of object and object itself defines the __eq__ magic method as accepting any other object, which means all custom Python classes that want to define an equality check have to either define the '==' operator as valid accepting anything as a second operand, or tell mypy to ignore the error, which is the obviously saner thing to do since you want a type error, not a return of False, when you check equality against a completely unrelated type. Unfortunately, mypy provides no way to give you that statically.

I'm glad they mentioned the default parameter None to avoid the classic footgun of empty collections, though. That has to be the single most annoying false positive, when you declare a parameter as Optional and the type checker then complains it was set to None in the parameter definition but later set to an actual value, which you have to do because of the totally insane and unintuitive fact that all invocations of a Python function share the same parameter, so filling in the empty list with something else upon calling it means everything else that calls it now gets the changed default value instead of the empty list.

This is probably the single greatest rookie mistake made in Python because of how unexpected it is if you don't deeply understand the object model, but doing the literally recommended by the book solution to get around it results in the official type checker endorsed by Guido himself to report a type error.


It's possible to work around without forcing wrong type on parameter

    def f(l: List[int]=cast(Any, None)) -> int:
        if l is None:
            l = []
        ...
    
    f()

Using casts is useful for all sorts of things like this when sentinel values/objects are involved. People often feel it's somehow 'wrong', but it's fine to trick the type checker as long as it's localized to neighbouring lines of code :)


You learn something every day! I didn't even know about the cast function.


> you want a type error, not a return of False, when you check equality against a completely unrelated type

What? No I don't?

    foo = if x: Foo else 1
    foo == 1 # why would I want this to throw? I just want False!
Disclaimer that I haven't done much Python recently (or Java, ever – I wonder if raise-on-== is a java-ism)


Yup, that's true. In Python, generally if you want to only compare equal to the same type, you would use the NotImplemented value to implement:

    class MyClass:
        def __eq__(self, other):
            if not isinstance(other, MyClass):
                return NotImplemented
            else:
                # whatever your comparison is
When NotImplemented is returned, there is a fallback sequence described in https://docs.python.org/3/library/numbers.html#implementing-... - where basically, in the expression `A == B`, if `A.__eq__(B)` returns NotImplemented, then `B.__eq__(A)` is checked as well (falling back to an `is` identity comparison if both return NotImplemented).

This is neat because it allows you to define a class B that can compare equal to an existing class A with the expression A == B, even though class A has no knowledge of B.


You don't want it to raise an exception at run-time, but you probably want it to be a static type error: two things that aren't the same type can never be equal, so if you're bothering to compare them you're at best wasting cycles and at worst making a dumb mistake.

In your example, `foo` could be marked (in mypy) a `Union[Foo, int]`, which would be a reasonable value to compare against an `int` and not a type error.


But there are times you want heterogenous types to compare with ==. Your approach isn't compatible with ad-hoc polymorphism.

For an example, consider `np.eye(4) == 1`. In this, you're comparing a doubly nested array of floats with a scalar int, but the operation is well defined (vectorized equality comparison).

And yes, it should be feasible to typecheck that operation.


Yep. This is what I meant. I don't want to throw a runtime error. I want the type checker to be able to actually type check this statically, which mypy can't do.


>two things that aren't the same type can never be equal

Well, actually as long as they implement the right dundermethods, they can totally be.


Besides, there is a difference between the actual type of the objects `a` and `b` in `a == b` and the structural type you specify in an annotation.

For instance:

  from typing import Protocol


  class MyClass:
      def method_a(self) -> None:
          ...

      def method_b(self) -> None:
          ...


  class ProtocolA(Protocol):
      def method_a(self) -> None:
          ...


  class ProtocolB(Protocol):
      def method_b(self) -> None:
          ...


  def compare(a: ProtocolA, b: ProtocolB):
      if a == b:
          print("a == b")
      else:
          print("a != b")


  a = MyClass()
  b = MyClass()

  compare(a, b)

Clearly, you would expect the output to be "a == b".


> two things that aren't the same type can never be equal

You're saying is that an 8-bit number can never be equal to a 16-bit number, or a 32-bit number, or a...

I hope you agree that's ridiculous.


I hope you agree that's ridiculous.

No comparison without explicit casts is a valid design choice. I think Ada and Rust work that way, though I've never used either language and might be mistaken...


> No comparison without explicit casts is a valid design choice.

Sure, but that wasn't the claim.


Yeah, there was some interpretation of the intent behind the comment on my part (cf "you probably want it to be a static type error").

How you view the claim itself depends on your thoughts about the level at which equality should operate (should type tags be considered part of a value's identity, is it about the bit representation, or the abstract value encoded by that representation, ...)


> I wonder if raise-on-== is a java-ism

There are two forms of equality in Java:

- 'x == y' is an identity check, like 'x is y' in Python (are x and y referencing the same memory?). These must be of the same type, or else it won't compile.

- 'x.equals(y)' is an overridable method, like 'x == y' in Python. Its argument is of type 'Object', so it will accept anything.

I don't do much Java these days, but I do write a lot of Scala (which runs on the JVM). One of the first things I do in a Scala project is load the 'cats' package so I can use 'x.eqv(y)' which refuses to compile if x and y are of incompatible types. This problem also propagates up through different APIs, e.g. 'x.contains(y)' checks whether y appears in the collection x (which could be a list, hashmap, etc.); this doesn't check the types, and I've ended up with bugs that would have been caught if it were stricter. I now prefer to define a safer alternative 'x.has(y)' which uses 'x.exists(z => z.eqv(y))'.


The Python stdlib all seems to disagree with the GP and provide False for comparisons with values of different types. Personally, I didn't expect this, because Python tends to not like mixing types in any way, but it's the one place where it makes sense.


>which you have to do because of the totally insane and unintuitive fact that all invocations of a Python function share the same parameter

I think it does make sense if you consider the function signature an expression scoped one level above the function definition. I agree it's initially not intuitive, and it's tripped me up a few times before as a beginner, like probably most other beginners.

As a more experienced Python developer, I think I'd actually now find it unintuitive if they changed it, due to everything I've internalized about Python scope, values, assignment, expressions, etc. Same for always requiring "self" as the first method argument. Overall, I'd prefer the changed versions, but I think they (or someone else) would basically need to make a whole new spiritual successor language.


You want equality to return False for unrelated types for when you have dictionaries with heterogeneous types as keys.


On the other hand, you probably don't want to have dictionaries with heterogeneous types as keys in the first place...


Neat stuff. I've become a huge Python types fan. Even on small personal things, my flow is to write the empty classes, add attributes, add method signatures with hints, then fill in the methods. Leads to a lot less rewriting. Writing the hints doesn't take long at all.

What does take a long time is learning about it all. There's the docs, then PEP 484, then at least eight other related PEPs of various impact. It would be a good topic for a small book / website / course, except things change kinda fast.

I do think it could be a hack for a new dev to establish credibility. Spend a few months type hinting code and fixing bugs in popular open source repos. You'll learn way more about real software engineering than following another tutorial. If someone wants to do that and wants some mentoring, hit me up at my last name at gmail.


This is, in my opinion, the best way to design software. If you’d like to see what happens when you take it further with a more powerful type system, read type-driven development in Idris or any of the F# type-driven development blog posts.


I will upvote anyone who says my way of designing software is the best. And thanks for the suggestions.


I gave up on using mypy as I was spending more time telling it to shut up than actually using it catch mistakes ahead of the time.

That said using type annotations made me realize I want to be fluent in a language with a strong static type system (huge productiveness booster imo), so I'm currently studying Typescript and nim.


I wish that you could use typed Python to speed up the execution of the program instead of them just being hints.


But that is precisely what mypyc[0] is doing!

It is part of mypy, and it compiles type-valid python files to cpython binary extensions, and takes advantage of a lot of shortcuts available to cut execution time. It is still a bit early to advocate for its use everywhere, and has it drawbacks (extensions compiled with mypyc will also typecheck the inputs and outputs of the module at runtime, which is great for improving code quality and validity of the annotations, but also means you may get TypeErrors in production if you hit a bad edge case).

If there is no shortcut it will fallback to normal, slow, python performance, with some smallish benefits (e.g. vtables for attribute lookups, no parsing time, etc).

[0] https://mypyc.readthedocs.io/


That's really neat, I hadn't heard of mypyc!

How does it compare to Nuitka? https://nuitka.net


I have not ran benchmarks, I would expect Nuitka to be more mature and optimized, but I really like the idea of having it bound with the type-checker.


Not exactly what you're talking about, but... For some types of functions, Numba's JIT offers some pretty nice speed ups. It can infer function types (or you supply them yourself) and compiles to optimized machine code


It's tricky, since only the typed part can be sped up, and there will usually be lots of back-and-forth between typed and untyped code; which requires conversions and checks.

For comparison, Racket has gradual typing which gives a speed boost when everything is typed, but can end up much slower if anything remains untyped, e.g. see figure 3 in http://www.ccis.northeastern.edu/home/types/publications/gra...

Python would probably be even worse, since it's "more dynamic" than Racket; e.g. there are many hooks that can run arbitrary code (like dunder methods), and there's little encapsulation so values may get swapped out via monkey-patching at any time.


Nim feels very close to a typed Python.


afaik that was first planned for Python 4 and then dropped again :(


Any links on it being planned for Python 4? I didn't know they'd ever even considered it.


I think that the mentions of that have been deleted from the relevant PEP since, but e.g. https://www.reddit.com/r/Python/comments/7zvyhx/pep_563_ment... is also asking about it.


How do you think those two stack up against each other?

Not trying to start language wars here, of course, I'm just interested because I find Nim one of those languages that I'd like to try but don't have a project/use case to try on.


“At the same time, developers still see value in writing type annotations as they serve as documentation.”

Also useful for autocomplete.

For me, most of the value of typing is at the function level: what does this function accept as input and what does it produce as output?

Also mypy gets in the way of using loop constructs, as the iterating variables have to be predeclared with typing before the loop, breaking encapsulation.


I'm curious what you mean by "iterating variables have to be predeclared". In my experience mypy's type inference works with bog standard python constructs like a for loop that appends into a list that gets declared before the loop, aka:

  output = []
  for x in range(1):
      output.append(x)
      
  reveal_type(output)
  
  # note: Revealed type is 'builtins.list[builtins.int*]'


Maybe they're referring to the fact that if you do `for x in y` and y isn't properly annotated, then you will need to write

  x: MyType
  
  for x in y:
      ...
for x to be annotated properly.


Hmm, what do you mean? If you type the thing you're iterating over, the type of variable will be inferred?


If you're too lazy to read the paper, here's a short explainer video on the paper: https://youtu.be/_o9CgLjo9Kk


Hmm, I mostly use Pylance/Pyright because of tight integration with VS Code. Wonder why they chose PyType, which I believe is not very widely used.


Pylance/Pyright are proprietary and don't even work on open source builds of VScode (theyre hardcoded to work only on MS builds of VSCode), let alone any other IDE.


From what I can tell, Pylance is proprietary and intended to "extend/extinguish" the open-source VSCode codebase with Microsoft's proprietary VSCode binaries, but Pyright is open-source.


Hmm, I hadn't checked the license. Pylance is indeed proprietary.


Pyright is open source (MIT)


Pyright works well for me in vim-coc. Of course ymmv


Google's pytype seems to be better deployed within the standard PyPi ecosphere. The Microsoft packages you're using aren't.


That’s because pyright was written in typescript so it makes sense to use npm to install it. Pyright has weak ide support beyond visual studio stuff. I’m pretty fond of pyright due to how responsive the developer is on GitHub. I’ve made multiple issues before and gotten an answer same day (sometimes same hour) and some small bugs I’ve reported got fixed in the next release (which happens like twice a week).


I was wondering the same thing. I've never even heard of PyType, but I use Pyright every day. I expect many developers are similar.

Does Pyright behave like MyPy or PyType? Or neither?

Also did Python really create a type annotation system without specifying the semantics of how it is supposed to work? That's crazy.


The type system is specified but the main pep people point to is the base type system. There have been multiple extension peps adding new type features and different type checkers have different progress implementing them. Also different type checkers have different bonus features. Mypy comes with plug-in support to make it easier to extend the system for very dynamic custom situations. Pyright is able to generate rough guess type stubs for untyped codebases. Pyre has the best tensor related type support likely motivated by Facebook having pytorch. Pytype I know the least about but it likely has some unique features motivated by google usage.

If the type checkers disagree on stuff defined in pep 484 that’s a bug to report. Beyond that they can vary but feel mostly similar.


The article mentions that MyPy and PyType use two different type system models and therefore may report different errors. Does it make sense to use both of them on a large codebase? I've been working with MyPy for a while and it has gotten pretty good and fast. Assuming PyType is a valuable tool, I think I'll try adding it to our CI pipeline and see if we get new typing errors.


curious what do you do when each type systems agree and disagree with the other?


You determine which is right and report a bug in the other. After that you add a type ignore statement to keep the wrong one happy until the issue is resolved. My current workplace I have two type checkers enabled (mypy/pyright) and that’s our way of dealing with it.


I mean it seems obvious but most/all? modern type systems are not portable and don't compose with other type systems

Splitting out the mistake checking and the data specification aspects of types would probably make pieces more portable across different languages


Vivek Haldar did an overview of the paper this week: https://www.youtube.com/watch?v=_o9CgLjo9Kk


Someone who needs types could just write in a strongly-typed language.


Python is strongly typed. It's also dynamically typed. They are not contradictory between each other.


My ELI5 for this has always been that static vs dynamic concerns variables; strong vs weak concerns objects.

- Static + Weak = C

- Static + Strong = Haskell

- Dynamic + Strong = Python

- Dynamic + Weak = JS


> strong vs weak concerns objects

and, in particular, explicit vs. implicit type casts of those objects.


But C and Haskell don't really have objects...


It is strongly typed in the sense that 3 + "5" is an error. It also has duck typing which is weak. So, which is it? I would say it is more weak than strong.


There are two axes: strong/weak and dynamic/static.

Examples:

Python: strong, dynamic

PHP: weak, dynamic

C: weak, static

Haskell: strong, static


Duck-typing doesn't imply weak typing, but rather dynamic typing. Can you explain why you consider dock-typing as being weak-typing?


I once had someone try to tell me that python was "untyped" - I guess people assume a lot when they come from a first-class statically-typed language.


I wonder if my professor of the Programming Languages course at university would laugh or cry at such a definition.

I expect that everybody doing this job has a CS or equivalent degree and knows the difference between strong / weak and static / dynamic typing. Reading other comments here demonstrates this is not the case. This is not a bad thing (luckily we only need a screen and a keyboard) but hopefully they'll have learned something new today.

Btw, there is a lot I don't know despite my degree. No hard feelings.


“Untyped” is the phrase used in literature too

https://www.sciencedirect.com/science/article/pii/S147466701...

Practically, untyped languages would include asm and forth.


In Python, objects have types. Variables are best thought of as void* pointers to objects.


I've found over the years that people have an easier time understanding how Python manages variables when you tell them that, rather than "in-memory boxes with data" as is the case in other languages, Python variables are more like "name tags that you attach to an object".


Pointers that also carry the underlying type information.


I don’t remember what the CPython implementation looks like, but even if the information is there, it’s not really exposed to end users. “type(foo)” doesn’t return the type of the name “foo”, but of the object that “foo” points to.


I meant that it's a pointer that is aware of the type of object it's pointing to by virtue of all objects knowing their own type. Same difference I guess.

Unlike void pointers or Object references in Java you can't "forget" an object's type.


Ah, I think I see what you mean. I still think it's more accurate to say that names are just pointers that don't have any typing info themselves (or if they do, it's an implementation detail that doesn't actually get surfaced and isn't a part of the language definition). In Python it doesn't quite make sense to talk about the type of a "name" or "variable" or "pointer" because, because it points to an object, and that's where the canonical type info actually lives.


iirc untyped is not incorrect, untyped lambda calculus for instance is where your value type is unknown until your runtime like dynamic type, untyped doesn't mean you don't have type in your language


There's no such thing as "dynamic types". A type is by definition associated with an expression in a language, but those expressions are part of the program's static source code. Python has some kind of checking of the runtime values involved in certain operations, but that's not typing.


Types are not associated with expressions in Python: evaluating an expression produces a typed value, whose type, and not only its value, depends on the type and/or values of the inputs. Example:

   def comparativeInsult(a,b):
      if b>a:
         return "try again"
      if a>b:
         return a/b
      else:
         return None
The type is, for all practical purposes, a part of the value of a Python object.


Such a thing is not a type. Not by the usual definition, which predates not only python but mechanical computation.


Everyone needs types. Here's one of my favorite quotes:

> "Dynamic typing" The belief that you can't explain to a computer why your code works, but you can keep track of it all in your head.

— Chris Martin (https://chris-martin.org/2015/dynamic-typing)


> Here's one of my favorite quotes

Here's my response to that quote:

"Static typing": The belief that if your code type-checks correctly, it will do exactly what you intended. (A variant of the belief that if your code compiles correctly, it will do exactly what you intended.)

Of course, taken literally, that's unfair, but so is the quote you gave. Everything you add to your code has a cost. Time you spend writing type specifications is time you're not spending doing something else that might add more value. Sometimes writing type specifications is the best use you can make of that time; sometimes it isn't. Python at least gives you both options: use type annotations if you want, but you don't have to if you don't want to.


> "Static typing": The belief that if your code type-checks correctly, it will do exactly what you intended.

No-one claims that typechecking solves 100% of problems. But it has a higher cost-benefit than anything else that's been tried.

> Time you spend writing type specifications is time you're not spending doing something else that might add more value.

So is time you spend thinking about the behaviour of the code in your head - the main difference is that it's slower and more error-prone.

Occasionally you really can do the calculation about what kinds of expressions are valid in which places better than the computer. But that's a rare case that gets rarer every day.

> Python at least gives you both options: use type annotations if you want, but you don't have to if you don't want to.

Not really: your code will almost always be silently unsound. E.g. libraries you're using will usually have incorrect type annotations (because their type annotations aren't checked, and the checker is unsound even if they were).

In contrast if you use say Haskell you genuinely do have both options: you can write code and have it be safely typed, or you can call unsafeCoerce at any point where you don't want typechecking to happen.


> it has a higher cost-benefit than anything else that's been tried

I would want to see a lot of data, for a lot of different kinds of programs, to back up this claim.

> So is time you spend thinking about the behaviour of the code in your head

You have to do this anyway; static typing doesn't write your code for you. Even Haskell, which I will freely admit is probably the closest thing to an AI that I've ever seen in a programming language, can't do that. :-)

> your code will almost always be silently unsound

Code that has type annotations has this same problem. That was the point of my response: static typing != sound code.

> libraries you're using will usually have incorrect type annotations

They can't be incorrect if they aren't there. The comparison I'm making is not between code with correct type annotations and code with incorrect ones. It's between code with type annotations and code without any of them at all.


> You have to do this anyway; static typing doesn't write your code for you.

The amount is what matters. If spending x minutes writing down types saves you y minutes of thinking, and y>x, that's a win.

> Code that has type annotations has this same problem. That was the point of my response: static typing != sound code.

Well-typed code has certain soundness properties - they may not fully encode all the properties you want your program to have (that part is up to you), but the properties that you have encoded will be reliable. Optional typing undermines that: even if the types say one thing, you have no guarantee that that thing is true.

> They can't be incorrect if they aren't there. The comparison I'm making is not between code with correct type annotations and code with incorrect ones. It's between code with type annotations and code without any of them at all.

My point is that if the ecosystem is not well-typed then you don't actually have the choice of using types. What I'm arguing against is your claim that Python gives you the choice: it doesn't, because to be able to write well-typed code you need well-typed libraries.


> > So is time you spend thinking about the behavior of the code in your head

> You have to do this anyway; static typing doesn't write your code for you. Even Haskell, which I will freely admit is probably the closest thing to an AI that I've ever seen in a programming language, can't do that. :-)

I think what he meant to say is that you spent 95% of your time to think about: "what is the kind of think I can from this method call and how can and should I use it and what should I return in the end". Writing down the result of your thoughts takes almost no time in comparison. Not only that, in good statically typed languages, you don't have to write down the types explicitly most of the time (but you can still have your IDE show them to you if you want to see them).

And in addition to that, you certainly have to think _less_ in many cases. A good example are functions that return lists. Often you always have an element in the list. Using a good type-system, it will be indicated, so when I get the list, I know that I don't have to handle the case that the list is empty. In python I would often have to think _more_ about it and look into the documentation or maybe even the implementation to understand if it could return an empty list or not.


> you spent 95% of your time to think about: "what is the kind of think I can from this method call and how can and should I use it and what should I return in the end". Writing down the result of your thoughts takes almost no time in comparison.

The issue here isn't static vs. dynamic typing, it's API documentation. In cases where static type declarations work out to be sufficient as API documentation, sure, use them as API documentation. But the "time to think" issue isn't being solved by static typing; it's being solved by having good documentation for the APIs you are using (or writing good documentation for the APIs of the libraries you are writing).

> A good example are functions that return lists. Often you always have an element in the list. Using a good type-system, it will be indicated, so when I get the list, I know that I don't have to handle the case that the list is empty.

Same comment here: whether or not the function can return an empty list is part of the API. The API needs to be documented somehow. A static type declaration might be enough, or it might not. Either way, what's solving the issue isn't static typing, it's API documentation.


Let me quote your original claim again:

> Time you spend writing type specifications is time you're not spending doing something else that might add more value.

To what I say: writing out a type (= a few characters) is negligible anyways when compared to the time you spend thinking about what the type is (which you have to do in python as well, maybe even more).

> Same comment here: whether or not the function can return an empty list is part of the API. The API needs to be documented somehow.

If you have a proper API documentation in both cases, then you have to think less, that's true. In the real world however, I can testify that I have to think longer in python because 1) APIs not always well documented and 2) even if they are, a well documented API that uses statical types is still easier to use due to the automatic compiler/IDE support.


> writing out a type (= a few characters) is negligible anyways when compared to the time you spend thinking about what the type is

Writing out a type is only a few characters if it's a type that's already built into your type system. (In which case, as others have already pointed out, you probably won't have to write it anyway in a statically typed language because the type system will automatically infer it. But in such cases, you're not gaining any documentation benefit from it.)

If it's a type you're having to invent as part of writing the code, writing it out everywhere it gets used can be a lot more work.

> I have to think longer in python because 1) APIs not always well documented

Meaning, not well documented compared to APIs in other languages? That hasn't been my experience; my experience has been that API documentation pretty much sucks everywhere.

> a well documented API that uses statical types is still easier to use due to the automatic compiler/IDE support

If you use an IDE, perhaps. (I don't; I find that they cost me more than they save me.)


> Writing out a type is only a few characters if it's a type that's already built into your type system.

Right, but if it is not, then you also have write extra code in python (e.g. create a new class).

Of course alternatively you can also just use "String" for everything (you can do that with a statical typesystem too), but I hope you agree with me that this isn't a great idea in any language.

I also have the feeling that you might change your mind if you try out a modern fully fledged IDE with a good statically typed language. The reason why I think this is because of things that you said such as "But in such cases, you're not gaining any documentation benefit from it." which are correct if you are not using an IDE. But if you do, it's actually wrong. IDEs like IntelliJ can be configured to show all (or only certain) types that are not written in text but that the compiler infers - without pressing any key. I like to use this feature a lot when diving into an unknown code base.

> Meaning, not well documented compared to APIs in other languages?

No, just not being well documented. Many popular libraries in python are well documented, but not all are. And the less popular/public ones are often poorly documented. The same is true for other languages as well, but in those, you at least you often have a "basic" documentation through the types.


> if it is not, then you also have write extra code

In any language, not just Python. A type that isn't already built-in has to be defined in your code no matter what language you are using.

> I hope you agree with me that this isn't a great idea in any language.

Of course it isn't. Different types exist for good reasons.

> I also have the feeling that you might change your mind if you try out a modern fully fledged IDE with a good statically typed language.

I doubt you would have this feeling if you knew how many times I have tried "a modern fully fledged IDE with a good statically typed language". And every time has ended up the same.

> IDEs like IntelliJ can be configured to show all (or only certain) types that are not written in text but that the compiler infers

This is a fair point, but in a language like Python this could be provided as a library function if it were needed. (Python already has the "help" built-in function that shows you the documentation for whatever object you pass it as an argument, at the Python interactive prompt. Inferred types could be handled the same way if Python had them.)

> at least you often have a "basic" documentation through the types

Which might be significant useful information. Or it might not. It depends on what kind of code you are dealing with. It's quite possible that the particular kind of code I have dealt with has simply not been the kind where static typing is much of a help, and that there are other kinds of code where it is. But the original claim that I responded to was "Everyone needs types" (by which was meant "everyone needs static typing"). It is that blanket, general claim that static typing is always better that I was disputing, not the claim that static typing can be helpful in some cases.


> In any language, not just Python. A type that isn't already built-in has to be defined in your code no matter what language you are using.

Yeah exactly! So it's the same for every language. I just don't understand why you then say that I would have to type more in a statically typed language.

> I doubt you would have this feeling if you knew how many times I have tried "a modern fully fledged IDE with a good statically typed language". And every time has ended up the same.

Everyone is different and that's one reason why people choose different tools and languages. Nothing wrong with this - one of the best developers that I know uses vim for everything. I on the other hand would not be productive without a good IDE.

> But the original claim that I responded to was "Everyone needs types" (by which was meant "everyone needs static typing"). It is that blanket, general claim that static typing is always better that I was disputing

Fair enough, I agree with you on that one. The thing I disagree with is that is static typing requires (always) more effort when writing code. For Java this is totally true, but for many other languages, my experience is that I neither have to type more nor that I have to think more.


> If it's a type you're having to invent as part of writing the code, writing it out everywhere it gets used can be a lot more work.

Not really; you write it out fully once, when creating a type alias/definition, and then just use the name later.

> Meaning, not well documented compared to APIs in other languages?

Yeah, I’ve found Python to have a pretty good documentation culture (docstrings especially facilitate this), often better than some statically typed languages with ecosystems where signatures are regularly mistaken for adequate documentation.


> In python I would often have to think _more_ about it and look into the documentation or maybe even the implementation to understand if it could return an empty list or not.

In many cases in Python, you don't have to care. For example, if you're going to iterate over the list, Python will iterate over an empty list just fine: it will execute zero iterations. No need to check anything.


Right. I was referring to the cases where you must care, such as presenting a list of errors to someone or doing a calculation where an empty list must be considered in a special way if it occurs etc.


Chris Martin is a Haskeller, and so will spend hardly any time writing type specifications. He'll let the compiler infer them.

I'm very enthusiastic about static types with global type inference, where I don't have to write a single type annotation if I don't want to. My enthusiasm degrades pretty quickly the more annotations I have to manually clutter my code.


This. Can also be applied to unit testing: the belief that if your code passes its unit tests, it will do exactly what you intend.

I see way too many code bases that test things that are easy and xfail things that are hard because it turns out that you can’t just test your way to a complex product, but you sure can make yourself feel good by testing the fact that a dict acts like a dict.


I see your point, but I do think the amount of time spent writing type specifications is (usually) negligible, in return for an extremely large benefit whenever the code needs to be read by you or someone else later.

I would certainly argue that anyone who writes a library, whose public API will be consumed by hundreds or thousands of people, should always include type annotations.

After working in a TypeScript codebase for so long, it's jarring to return to Python or even vanilla JavaScript where I no longer have breadcrumbs to help me keep track of how the code is organized.

Python's type annotations leave something to be desired, in terms of how easy they are to include in code and how well the tooling supports them. I'd almost rather see something like "Typethon" with better syntax+soundness that compiles-to-Python.

Types are an optional tool, sure, but they're one of the most valuable tools we have to catch (not eliminate entirely) bugs and hasten development (without introducing errors).


> The belief that if your code type-checks correctly, it will do exactly what you intended.

With a little care and a sufficiently strong type system it’s surprising how often this is actually true.


This is the last bug I had to fix. Scenario: a credit transfer marketplace. When we auction an invoice, we snapshot the data about all the involved parties and show those data in all the screens about the invoice. However some of those screens didn't show the stored data, they showed the current ones, which is useless and very wrong. I know which tests can catch that bug but I wonder what type system could catch it at compile time. To make it worthwhile it must be easier than writing the test. We can stop at the JSON generation. No need to look into the React frontend.


That sounds like a pretty frustrating bug to try to find and fix. No type-level feature that I'm aware of could prevent this issue directly. What you're describing is pretty high-level business logic. While I've seen some pretty impressive demos with dependent types, nothing that I know of could handle this.

I don't know much at all about the structure of your application, and it's hard to speculate, but what strikes me is your live and snapshotted data have the same shape/type and seem to be interchangeable for the compiler/interpreter, but are very different to you and your customers. The engineer creating that screen may have had the forethought at the outset to say "this should never deal with live data, only snapshots", so that warning probably gets put in a comment, but there's very little to stop someone from later passing in the live data that you definitely don't want.

There's a feature in Haskell (and similar languages e.g. PureScript) called newtype declarations. They're a lightweight way you can tag and optionally limit access to your data. You essentially tell the compiler "this is a new and distinct type that can wrap the original one". The original and newtype are not interchangeable, and an attempt to do so won't compile.

Having that feature I'm in the habit whenever I recognize overloaded meanings encoded with the same type (e.g. live/snapshot) of wrapping one of them in a newtype, and before I even implement application logic, I write out the signatures and specify which type I actually want. They add a bit of tediousness, but they have saved me many times, and I've never regretted using them.


Thanks, maybe we can build something along those lines even if we're using a totally different language (Elixir backend, JSON to a React frontend).

That is an administrator console. Sometimes we do have the foresight to understand what our users actually want but in a large system inevitably happen things like "it was obvious to us" or "after months of use we realized that it's better to do this way", etc.


You can do the same in TypeScript with newtype-ts. I'd recommend it for the same reasons you'd use them in Haskell et al.


Depending on how your code is structured, phantom types might work for your use case. They're useful for tracking metadata about the values being passed around, even though the methods and fields don't differ between your use cases.



I can't seem to find the exact quote, but

> Dynamically typed languages are actually statically typed with precisely one type: hash table.


This doesn't describe Python at all though. In Python objects have some data structure (usually a dict but not necessarily with __slots__) that hold the object's attributes but that's not the whole identity of the object.

Because by this definition makes C++ and Java look dynamic with vtables.


True; it's a better description of Javascript.


Dynamic typing is just "objects have types but variables don't" which is far less controversial.


He should stick to songwriting...


Unfortunately, this isn’t always an option if you need to use a library that’s only available in Python.


It's possible to use it in a Julia-like fashion (like PyCall.jl[1]). For example, with OCaml one could use Pyml[2].

[1] https://github.com/JuliaPy/PyCall.jl

[2] https://github.com/thierry-martinez/pyml


Only if they don't also need X other things that Python provides and the other language doesn't...


This is something informative but python is the list of bad type safe.


May be I've written something that really does not make any sense. I should have understand the blog in the first place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: