Hacker News new | past | comments | ask | show | jobs | submit login
Type hinting sucks (reddit.com)
100 points by aiNohY6g on Feb 11, 2023 | hide | past | favorite | 72 comments



I'm usually a strict proponent of static typing, but I discovered that for the things I use Python for (mostly quick'n'dirty cross-platform scripting, not "real programs") it's indeed quite a nuisance mainly for non-trivial cases where the type hints looked entirely too complex and 'dominated' the code way too much resulting in (paradoxically) less readability. In the end I quickly abandonded my experiments and returned to no type hints.

I also find it kind of hilarious that type checking isn't built into the standard python interpreter, but you need a separate tool instead. It took me a couple of wasted hours until I discovered that my carefully type-hinted code was not actually type-checked, and after discovering that I was baffled that I can't just do something like "python --type-checked bla.py".

TL;DR: for many use cases, Python's duck typing is a feature, not a bug :)


Perhaps it might not make sense to write type hints for your own prototyping code, but the fact that many libraries and stdlib functions are annotated certainly improves my velocity.


Yes! I find this with TypeScript and JavaScript too. Even if I’m just writing a quick script in JS, the fact that most of the libraries I’m using have TS definitions is really useful.


> I also find it kind of hilarious that type checking isn't built into the standard python interpreter, but you need a separate tool instead.

Agreed, typically you don't add new syntax to a language only to silently ignore it. I think the plan may be to add runtime checking to the Python interpreter later.


The reason is that the community decided they needed some form of type annotations, but nobody could agree on which ones or how they should work. So some syntax was added just so the parser wouldn't fall over, but the semantics were left to other projects to "let 100 flowers bloom".

This worked about as predictably as you would expect. If you grabbed a random package from pypi, suddenly your choice in typechecker might complain about its API if it wasn't made using the same typechecker as you, so a separate, "standard" choice in typechecking was made called mypy. But they couldn't break older code not using mypy so it's strictly optional.


> so a separate, "standard" choice in typechecking was made called mypy

This distorts reality a bit. Mypy is one of the flowers; Jukka developed the annotation syntax. It got popular enough to become the de facto standard. I would say the strategy worked.


> Agreed, typically you don't add new syntax to a language only to silently ignore it.

The only new syntax here is variable and function annotations and it allows you to put anything into the annotation, so it makes sense the compiler would ignore them. Type hints are one possible usage of such annotations but they are not part of the syntax (because the types are just regular Python objects).


> I think the plan may be to add runtime checking to the Python interpreter later.

And slow the run time back down several versions.


Usually static types allow for better optimizations, but this is python, so the implementation is probably maximally pessimal.


My understanding is that the relationship is a lot more complex than "types = faster code". Types are often necessary for highly efficient code, yes, because the allow you to avoid lots of indirection and just give the hardware the raw numbers and instructions that it wants. But they aren't sufficient - you can't just have types, you also need to guarantee that those types are always the correct ones (i.e. that when I tell you it's an int, it will never be a string). In practice, the performance gains you get from using types are matched by the performance losses from having to validate them.

You can solve this by restricting What's possible significantly (you see this, for example, in RPython and Cython, where static types are used more effectively, but come with significant metaprogramming restrictions). But at that point, you're losing a lot of the value of a dynamic language. One of the big reasons that Typescript works so well here is that it has all sorts of escape hatches, so that you can write mostly correct Typescript code, but if you need to write something completely dynamic, you can always just define the arguments and return types and cast everything else to `any`. This is why, even though Cython exists and is much quicker than Python, it tends to only be used to speed up specific sections of code, with the rest of the codebase being dynamic Python glue.

The other side of this is that there already is a solution to performance issues in dynamic languages, and that's JIT compilation. Javascript is consistently one of the faster languages for lots of web tasks, and that's in no small part due to the significant effort put into optimising V8 and the other JIT runtimes. The best thing about JIT compilation is that you don't need to write the types at all, because the type information can be read at runtime. Theoretically, you could incorporate type hints into the optimisation process, but this probably won't give as much good information as actually running the code and seeing what types pop out.

Of course, the other side of this is to ask what the value of type hints are in the first place. I think a lot of Python developers are used to things like Cython, where types almost exclusively refer to the underlying data formats (i8, i32, u64 etc), but the biggest value to type hints and tools like mypy is not just defining struct layouts, but describing a program's behaviour at a higher level - for example, declaring that a variable can be one of these two types and ensuring that the developer never forgets that. Or defining a closeable resource like a file handle that can't be used after it's been closed. Or making sure that the developer handles each case of an enumeration and doesn't miss anything.

The types that are useful for those sorts of things tend not to be so useful for optimisation purposes, but they are _very_ useful for ensuring that code is semantically correct.


I don't think people select Python for it's performance ;)

But the type-checking could be a command line parameter, used only during development.


It already can be used for development, IDEs such as PyCharm perform quite effective live type checking based on type hints.



The trick in general is to focus on the types of (top-level) functions. This way, it can be super concise and machine-readable API documentation. (In Haskell, it's great how the type is defined above the function, where documentation also goes.)

Regarding the example of the original article, it demonstrates how type hinting doesn't turn an inherently dynamic language like Python or JavaScript into Haskell, where everything is neatly typed. Type-wise, it just does not make much sense to use the operator + for all the things it's used for in Python, hence the example function slow_add does not make sense either.

I admit I have difficulty knowing when I should just use Any and stop trying harder.


You should try F#. It infers all the types so you get the benefits of type checking without the noise


My trouble with this approach is that my brain is not very good at type inference, so I don't really understand what's going on just by reading the code. IDEs can help, but it's still a hassle (esp. in contexts where you don't have IDE - github PRs etc.)


F# really is a great language. I wish it was used more.


slow_add is not simple. It reimplements the Python + operator, which is quite complex, specially if you look at it from a type system perspective. As noted, its implementation is very core to the language and it has several edge cases. That's not usually what you do when you create a function. Rather, you have very specific types that you support. It's only "simple" because adding things together is a basic operation.

Even if you write a function that encapsulates some funky arithmetic. You will probably be using more operations, to the point that anything but numbers will really make sense.

That said, yes, adding types to an API that wasn't implemented with a typing system in mind can be challenging. But I feel like you get a better API at the end of that exercise.


This is exactly what bothered me with this article but I couldn't put it into words:

> That said, yes, adding types to an API that wasn't implemented with a typing system in mind can be challenging.

The title is just straight up clickbait. Type hinting doesn't suck, adding types to an existing project used by users who have accustomed themselves to an untyped system that lets them do whatever sucks. If this were typed as int, int from the get go, the article would be a hell of a lot shorter.


Groovy taught me that opt-in typing always sucks. Either a language is statically typed from it's inception or it's not. Type hinting is like putting up doors with locks around a house that has no walls to begin with.


It’s done wonders for quality in JavaScript land


I really like statically-typed languages where you can let the compiler fill in the blanks. Like Rust or Haskell at the farther extreme.


For what it's worth, Erlang's dialyzer definitely does not suck.


When it works yeah, sometimes good luck figuring out what it wants


… ten minutes later.


Is anyone able to summarise how python got to this point? I haven't used it seriously since the tail end of the 2.7 woes but liked it a lot back then. Good native dictionary type and first class closures worked for me. Scattering of unit tests. The type annotation idea always seemed inconsistent with the language to me so I've ignored it.

In particular I'm wondering if this is a consequence of the rumours that python scales badly - someone changes some module far away, runs the massive test suite and it passes, checks in their code, and your code promptly blows up because the interaction wasn't tested.

There also seems to be a current enthusiasm for statically typing everything on hackernews which might be reflective of the wider industry, possibly making python acceptable collateral damage as it was on the wrong side.

That's my conjecture though, would love to hear from someone closer to the game how type hints became such a big deal in the python world.


My understanding is that Python is used by millions of people to write programs measured in tens of lines, and by tens of large companies writing millions of lines.

Many in the first group are happy with dynamic typing. Those in the second group would have been better off with a compiled language but are too invested in Python to switch to something different, so are trying to morph Python itself into something different.

Unfortunately, the way Python’s governance works, all decisions about the future of the language are made by people in the second group. For example, IIRC the Python Steering Council is composed entirely of developers working for large tech companies - no scientists, finance workers etc at all.


My theory is that huge code bases are always more error prone then small code bases. And if a language is controlled by a committee they will destroy it.


> I'm wondering if this is a consequence of the rumours that python scales badly

My (completely uninformed) guess is that people want to statically type all the things not because it improves correctness, but because it makes editor facilities like code completion and jump-to-documentation so much more effective.


Web devs, same reason asyncio got railroaded through.

Python is increasingly trying to cater to the extremely online and loud webdev crowd. Living in an ivory tower is fine if you're inside it.


At first glance this is a pretty odd take, would you mind elaborating? I wouldn't say Python devs and web devs have any more overlap than web devs and some other random group of devs you could select, so why is it the case here? It's always easy to say that it's [outgroup]'s fault that [ingroup] is going sour, but is it really the case?


I'm coming more and more to the conclusion that if you need typing for Python, you might as well go full throttle and use Java (maybe TS/Deno?), unless you're in a specialized ecosystem like data science and you have no choice.


Are you talking about Java or JavaScript?


Java or Typescript.


Typehints are a great way to document code. If the type hints get too complex they can be cumbersome though. For example passing an int or None if the value doesn’ exist


Passing an int and, if the value doesn't exist, a None can be expressed quite neatly like this: "Optional[int]"


`int | None` is often prefered in the newest versions.


I must have missed this. Union types using | are possible now?


Yes, saves the import and looks cleaner IMHO https://www.blog.pythonlibrary.org/2021/09/11/python-3-10-si...


I was expecting the article to get to the end and have him invent an massive type system that allows you to pass any two types as long as your can add them together. Which, of course, is what it did at the beginning.


By the fourth attempt the author seems to be half on the way to inventing typeclasses.


Opinion: people don't actually need rigorus typing system, what they need:

1. preven type/interface errors at runtime.

2. get auto-complete for unfamiliar objects or function signature.


That's true, people don't need rigorous typing systems. However,

1. the way to verify types before deployment is usually through a rigorous type system.

2. auto-completion of type signatures is massively easier if the language has a rigorous type system.

It's the tools that people use that need a type system, not the people themselves.


Sure, but the experience tells us that the best way to achieve these goals is through a rigorous type system.


What do you think solves those better than a type system?


I think (optional) static typing is important for "serious" programming, so I've been experimenting with Python's type hints after of my good experiences with TypeScript (which fundamentally works in the same way).

Oh boy, what a mess, especially if you have to keep compatibility with 3.7 (which has slightly different types for builtins).

The ecosystem is definitely improving and I do think the benefit will eventually be worth it, but this example illustrates (albeit in a somewhat exaggerated way) why I would advise users to move cautiously.

IDE completions are a good benefit for a start, it'll probably take a while until you can support strict checking (errors when types don't match).


For compatibility, use typing_extensions. You can use most of the latest typing features with older versions of Python.


I find this an interesting example to ponder, because I think it actually reveals a weakness in Python’s data model, more than in its type hinting—a weakness inherent in class-based programming that is also found to a considerable extent in just about every dynamic language I can think of, but which is made particularly obvious in Python’s operators.

The cause of most of the trouble here is that `lhs + rhs` isn’t simple sugar like a reasonable person might imagine if they don’t think through the implications long enough; but instead, turns into something like this terrific mess:

  def +(lhs, rhs):
      lhs_type = type(lhs)
      rhs_type = type(rhs)
      if do_radd_first := issubclass(rhs, lhs):
          if hasattr(rhs_type, '__radd__'):
              output = rhs_type.__radd__(rhs, lhs)
              if output != NotImplemented:
                  return output
      if hasattr(lhs_type, '__add__'):
          output = lhs_type.__add__(lhs, rhs)
          if output != NotImplemented:
              return output
      if not do_radd_first and hasattr(rhs_type, '__radd__'):
          output = rhs_type.__radd__(rhs, lhs)
          if output != NotImplemented:
              return output
      raise TypeError(f'unsupported operand type(s) for +: {lhs_type.__name__!r} and {rhs_type.__name__!r}')
(This is almost certainly imperfect in details, and may be imperfect in larger pieces; it’s a quick sketch based on memory from about nine years ago, plus a quick check of https://docs.python.org/3/reference/datamodel.html#object.__... where I had completely forgotten about the subclass thing. But I think it’s pretty close. Good luck figuring any of this out as a beginner, though, or even being confident of exactly what it does, because there’s no clear documentation anywhere on it, and the reference material misses details like the handling of NotImplemented in __radd__, so that I’m not in the slightest bit confident that my sketch is correct. If I still worked in Python, I’d probably turn this into a reference-style blog post, but I don’t.)

And why is this so? Because with class-based programming, the only place you can attach behaviour to an object or type is on that object or type. Which means that for operator overloading, it must goes on the first operand. But there are many legitimate cases where you can’t do that, and not all operators are commutative (e.g. a - b ≠ b - a in general), so you pretty much have to support the reflected operators, rhs.__radd__(lhs) instead of lhs.__add__(rhs), and then you get worried about subclassing problems, and it all just gets painfully complicated.

Is it any wonder, then, that a typing system would have trouble with it? Certainly Addable/RAddable are the wrong level of abstraction: you need a bound that covers both of them in one go, and I doubt you can do that with Protocol or similar, since you’re defining a pair of types, where one has this method, or the other has that method (and good luck handling NotImplemented scenarios). I imagine it needs to be built into the typing module as a new primitive.

—⁂—

By contrast, in Rust, you might possibly start with a concrete type (the first attempt), but you’d be more likely to just go straight to the fully-correct solution that this example is never able to reach in Python, of being generic with an Add trait bound <https://doc.rust-lang.org/std/ops/trait.Add.html>:

  fn slow_add<A: std::ops::Add<B>, B>(a: A, b: B) -> A::Output {
      std::thread::sleep(std::time::Duration::from_secs_f32(0.1));
      a + b
  }
Certainly Rust does have the advantage of having started with and built upon a type system, rather than retrofitting it. But note how this allows you to use different left-hand side, right-hand side and output types (A::Output is here short for <A as std::ops::Add<B>>::Output, meaning “the type produced by A + B”; Output is what’s called an associated type), and threads the types through fully properly. And note more importantly how this doesn’t run into the __add__/__radd__ problems, because the implementation isn’t attached to the single type A, but is rather defined, as it were, for the (A, B) tuple: when you compile it, it’s like you have a lookup table keyed by a (Lhs, Rhs) tuple. Depending on what sorts of additions the left hand side type defines, the right hand side type may be able to define additions of its own. (If you’re interested in how this is done, so that different libraries can’t define conflicting additions, look up trait implementation coherence.)

Rust thus demonstrates one solution to the problem this case exposes with using classes in this way: instead of putting data and behaviour together in classes, separate them.

(It’s not all sunshine and roses: some things do map to class structures very nicely, so that implementing them in Rust can be painful and take time to figure out a decent alternative, especially when interacting with existing systems; but in general, I find myself strongly appreciating languages with this sort of data/behaviour division.)

—⁂—

My favourite demonstration of the advantages of separating data and behaviour is actually iterators:

• In Rust, when you implement the Iterator trait on your type, you can call any iterator methods on it—built-in ones like .map() and .filter(), but also methods defined in other extension traits, e.g. https://docs.rs/itertools/latest/itertools/trait.Itertools.h.... This is why iterators are very popular in Rust: because they just work, with no trouble.

• By contrast, in Python map() and filter() have to be globals, leading to messy code reading order and the preferred alternative approach of list comprehensions/generator expressions (which work pretty well, but are more limited, really only covering map, filter and flat_map in their capabilities).

• In JavaScript, you get Array.prototype.{map, filter, …}, and those methods are defined as working on any iterator, not just an array, because otherwise it’d be just too painful—but unless you copy the methods you want (like NodeList has done with forEach, but not any other method!) you can’t just chain things automatically, and you can’t add new methods anywhere.

• Ruby has a… different approach to all this, but I can’t remember all that much about it and this comment is long enough already.


> but you’d be more likely to just go straight to the fully-correct solution that this example is never able to reach in Python

Couldn't this be done if Python added C++ Concepts-like facilities? As in, something like (extremely ugly oversimplified un-pythonic pseudocode):

    def slow_add(a: Any, b: Any) requires({a+b}) -> decltype(a+b):
        return a + b
This is still effectively duck typed, but since the type checker is able to typecheck `a+b`, it should be able to handle `slow_add(a, b)` given these extra hints.


Yes, OOP binding is more complex than functional polymorphism.

But roughly half of the extra complexity on your Python code comes from the fact that OOP classes are closed, while FP traits (often called "classes" too) are open. As a consequence, you must support the OOP functionality defined on many places to get the same amount of freedom.

Python classes are actually open, so if you accept monkey typing, you can expect the functionality always coming from the lhs' class and get a much more obvious and simple code.


Type hinting does not suck. But missing support for method overloading does.


> Type hints are great! But I was playing Devil's advocate on a thread recently where I claimed actually type hinting can be legitimately annoying, especially to old school Python programmers.

> TL;DR Turning even the simplest function that relied on Duck Typing into a Type Hinted function that is useful can be painfully difficult.


Tbh, for smaller projects I feel this way. But when you're in a big workspace with lots of files and lots of variables, you forget after a while.


Isn't that a problem of types in general? I mean, you have the same problem in a strongly typed language.


This really triggered me for some reason! I hope it's a joke :)

(I am not sure exactly what triggered me but I think it's because at least to me it's very clear that in this very generic case, you just want to annotate with `Any`. Why go to such great lengths?)


the problem is not type hinting. the problem is people trying to use a single function for any type. as can be seen in the article, that is extremely difficult and error prone. so just dont do that:

    def slow_add(a: int, b: int) -> int:
    def slow_add_float(a: float, b: float) -> float:
OK its not as elegant or pretty, but its really clear what is going on. the problems described in the article crop up because people are trying to have the interpreter intuit what the programmer is wanting, when what the programmer is wanting might be ill defined or simply invalid.


So because the type system is far far far far less polymorphic than the underlying code, we're to monomorphize the code manually?

Angry rant: maybe mypy (and its PEPs) can be designed well? Maybe tacking java on top of python was a terrible idea, and python is much better than this awful type system.

cf. the syntactical catastrophe that is "protocols". Consider that to type `+` correctly, you're writing >dozen lines of python.

This is mad, and the type system shouldnt have been designed by people this inexperienced with type systems.


Python always had a type system. It's not checked ahead of time but it's there.

You can do things like return different types based on values read from a text file at runtime, that is it's a dependent type system. Python is not a total language - no termination proof of given functions to be found. It's also not pure, the calculation that picked a type can pick a different one next time it runs on the same arguments.

A dependent type system in a pure language with no user provided termination proof is in the ballpark of academia hasn't made it work yet.

Python's dynamic type system _cannot_ be checked at compile time, however much care goes into the design. What can be done is cripple python to some subset that can be statically typed, combining the elegance of java with the performance of python, with some end goal that doesn't make any sense to me.


Just make types functions which are executed at compile time:

    @type
    def Monoid(T: type):
        return {'+': type[T, T].to(T) }

    def add(left: Monoid(int), right: Monoid(int)): Monoid(int)
        return left + right
...if it takes too long to compile, terminate and report an error.

As for this kind of dependence, that seems straightforward:

    @type
    def Dependent():
       return {0: int, 1: str, "Hello": bool}

    def rtn_dependent(x: Dependent().keys): Dependent().values
      if x == 0: 
        return x
      elif x == 1:
        return "message"
      else:
        return x == "Hello"


Do you know of a type checking algorithm which is likely to deal with something like the second case? I'd say it would need to decompose the function body into cases and typecheck them individually - I think typed racket called the pattern occurance typing.

Push it a little harder though and you're into the domain of automatic theorem provers. Which is indeed an answer to how to write a dynamically typed program that doesn't need manual proof annotation - a great trick to have, but not one that exists yet as far as I know.

For what it's worth I quite like abort and apologize for a type check that takes too long. The termination requirement is probably flexible in practice.


I dont, but as far as practical compilation for python goes, i'm fairly happy with a type system that panics -- we dont need proofs, we need to model the underlying lang in a way 95% close to its actual semantics

For me, above, we can observe that all functions either return or implicitly return None. Therein lies a straightforward alg: are the types of all the returns consistent with the keys of the specified dict? etc. etc.

You may, for sure, need to run compile-time user-supplied fns to augemtn the type system. That's fine.

You can even, if needed, call run-time fns with generic args -- in order to determine their type.

"static" analysis need not presume the absense of a runtime.. only not a particular state of the runtime


This definition of dependent type system is a bit controversial. There was a conflict between Julia language maintainers and FP researchers a few years back.


Well, sort of. There's the (dominant?) school of thought that believes that types encode a program that runs before your actual program runs (and may thus be deleted at some point). This is often characterized by writing the type in a more awkward language than the usual one, C++'s separate languages for example.

Dependent Type Theory is quite strongly associated with the compiletime/runtime phase separation assumption, where the weirdness is that values from the runtime world leak into the compiletime one. This is magic in the static/AoT type world and trivial in the dynamic/JIT one.

So, strictly you're right, python is not dependently typed, as that term is owned by a different branch. However if you were to put an ahead of time type system on arbitrary python, that would need to be dependently typed to be able to describe the programs people write in python today. It would also be prone to failing to terminate during type checking.


Wait until you use math in OCaML.


I quite enjoyed programming in OCaml.


Same, but the math system is strongly typed and doesn't have what the parent is talking about. There's more than one way to type a math system.


Not actually sufficient though, you'll need:

def slow_add_int_float(a: int, b: float) -> float: # etc

where an argument could be made for encoding the return type in the name as well, at least for functions that don't do addition.

What that has going for it is the warm familiarity of C99. People really like function (and sometimes operator, if they draw a distinction) overloading though, and can reasonably argue that writing template instantiations by hand is a poor use of their time.


you should never be adding an int to a float. You should be casting one first.


* Declare `slow_add` to be used with instances implementing `__add__` protocol only.

* Declare that `concat` should be used to achieve a similar result with other types implementing `Foldable` protocol.

* Carry on.


It seems to me that the author isn't comfortable with static typing. The obvious solution (which might be impossible in MyPy?) is to have a two-parameter type class (protocol in MyPy?) of the lhs and rhs, and have the output be of an associated type. That should cover almost all cases.


It seems the problem stems from a poorly defined purpose of the API, which is to say, from this being a shitty example. Why would someone use slow_add instead of the plus operator anyway?


Yeah, it seemed like a straw man example. For me, type hints are most useful as a sort of meta domain specific language. Type hints can be hard to get right for abstract data types, and it’s often not worth it (and I rarely write ADTs in Python, relying on what the rich standard library offers). I find type hints to be very useful for “annotating” domain-specific code that carries out my application’s “business” logic. This rarely includes things like basic arithmetic functions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: