Hacker News new | past | comments | ask | show | jobs | submit login
Static Duck Typing in Python with Protocols (daan.fyi)
69 points by EntICOnc on Nov 23, 2021 | hide | past | favorite | 47 comments



Basic as it is, it is worth reminding folks reading this and similar articles that Python, even with type annotations, does not have a compile pass and that IDEs like PyCharm tend to fail open when they can't figure out how to type check something.

I see a lot of folks coming to Python from TypeScript and assuming that there's any enforcement of type annotations a la tsc. There's not, not unless you run a tool like MyPy on your code and configure it to be strict enough to provide the type assurance you need (what level of strictness is appropriate depends on the project and the programmers).

I also see folks getting lulled into a false sense of security by their IDE. PyCharm is great, and it highlights many typing errors in code as you write them. But it gives up very easily in the presence of any indirection at all (many basic decorators which don't change function signatures, for example, are enough to cause it to stop trying to type check decorated functions at all). And when it gives up, there's no indication of that; giving up looks the same as error-free code.

That's not an indictment of any tools. Python is the way it is for good reason, and it's type annotation system is good and getting better all the time (e.g. Protocols as discussed in this article). PyCharm is also super helpful and given how it performs it's type checking it's borderline miraculous that it works as well as it does in the first place.

Rather, this is a reminder, especially for new python programmers: don't assume that type annotations give you anything (except documentation, which it's on you to keep accurate) for free.


people talk a lot about MyPy but I must recommend Pyright as a better alternative. It's much more robust.

In my experience the main problem with MyPy is that when it's unsure about what's going on (especially due to imports it doesn't understand, sometimes because of misconfiguration) it just gives up on type checking with absolutely no warning. Pyright does not have this failure mode and handles Unknown types properly (with configurable strictness)


Have you compared pytype ? Want to know if pyright is wort switching.


PyCharm has some longstanding issues with type checking aliased and union types. My fix coming in 2021.3.1 [0]

[0] https://github.com/JetBrains/intellij-community/pull/1655


I face these issues on the daily. Pycharm's type checking is sufficient for my use case but the TypeVar and Union type checks are a bother and I always pay close attention to it. Really appreciate the fix(es)!


I just got bit tonight by Python's duck typing. I was trying to use PIL to draw rectangles for a super simple script to visualize results from another component. I found an example and tweaked accordingly:

    w, h = 220, 190
    shape = [(40, 40), (w - 10, h - 10)]
    # ...
    img.rectangle(shape)
When my results came out totally wrong, I spent 45 minutes trying to figure out if my code that built the shapes was off or if my drawing was off. What I had assumed were rectangle dimensions in the form of `(width, height)` were actually bounding box coordinates `(x2, y2)`. Both of those are tuples of ints, but semantically, they're very different.

I'm growing increasingly frustrated with Python, and in my frustration, I feel like this nonsense could be wholly avoided by stronger typing. Were I to use Rust or C#, two languages with which I have much more experience, I have to imagine the library would prevent this loss of productivity. Simply define a Point struct and it's perfectly clear what is being passed in.

Edit: I shouldn't have said duck typing but rather the lack of strong/compile checks. Apologies for my sloppy thinking.


> Edit: I shouldn't have said duck typing but rather the lack of strong/compile checks

Every compiled language I can think of would also be totally happy for you to pass (int width, int height) to a function designed to handle (int x2, int y2) - what language are you thinking of that would make this a compile error?

Some languages do allow you to create your own custom types which mean you can spot the difference between two types of int and avoid mixing them up… but python is one of those languages.


Its a library design rather than a language design thing. You could make a separate class Size and Point that both take an (int, int) constructor, and even have BoundingBox whose constructor takes (Point, Point), and the function could take a BoundingBox instead of a list of tuples. Of course, you can do that in Python, just as well.


Sounds like you want 'newtypes', e.g.

    XCoord = NewType("XCoord", int)
    YCoord = NewType("YCoord", int)
    Point = NewType("Point", Tuple[XCoord, YCoord])
These are just new names for existing types (in this case the existing types `int` and `Tuple[int, int]`); they make no difference to the way the code actually runs (unlike, say, introducing a new class).

However, checkers like Mypy will not let us mix up these newtypes with their underlying types. For example, if a function requires a `Point`, Mypy will complain if we try to call it with a `Tuple[int, int]`.

Newtypes are useful for keeping semantically-distinct types separate, even if they just-so-happen to use the same underlying implementation. In other words, they let us selectively avoid duck-typing if it might cause problems.

Of course, this is only useful if the code we're calling actually exposes such types!


There's nothing stopping you doing that in Python. If it's important to you to define a point struct, do that, add a type annotation, and add mypy to your compile.

When i used to work with C++ i would spend hours debugging compile issues that were just the wrong type (templates are especially bad!) that really had no impact on the code, it was just a waste of my time.

What I like about python is that the static typing is there if you want it, but you don't have to.


I have several problems with this:

* I have had the installation of mypy fail on multiple machines and crash (‽) several machines

* mypy is a separate component; yes, it's very well supported, but it's not included with python3 by default

* It's not required by packages to function, which means that packages can and do simply neglect to add typings. The package I used in my example doesn't provide any such typing information, so me using mypy wouldn't help

* Besides mypy, the package in question lacks appropriate documentation (eg, the mouseover of the function I was calling provided nothing in the IDE to give a hint). The method signature defines the shape parameter as being named "xy". Only when I went to the web site did I find that it's supposed to be either List[int, int, int, int] or List[Tuple[int, int], Tuple[int, int]] and that the second pair of values is a point and not dimensions. Were I doing this in another language, the type annotation would likely provide some details.

This is my problem with Python -- yes, you can use typing hints if you want, but nowhere is it required. This leads to a world in which I spend 45 minutes painfully debugging this kind of thing only after my code is written. In my experience, this type of bug very rarely exists in stricter, compiled languages.


To my knowledge Python doesn't have any sound type system. Mypy doesn't provide that.

So, no you can't have (currently) proper static typing in Python. It's more like in TypeScript where it's a best effort thing. (At least you get notified when the typer gives up. That's much better than nothing of course!)

C++ templates aren't a good example either. Templates are a Turning complete language so they can't provide guaranties in all cases and aren't as such type safe at all.

Better examples for type systems and their usefulness would be languages form the ML family (also including Haskell and similar in this case).


>> Python doesn't have any sound type system

Neither do popular statically typed languages like Scala, Java, C#, … - sound in the context of PL theory means the type checker will only accept a correctly typed program. Which is not true for any of these languages i mentioned - you can get the compiler to accept an incorrectly typed program.

>> you can't have (currently) proper static typing in Python

That’s an even bigger group of languages in that bucket, typing is undecidable in “favourite” languages like Rust, ocaml, c++, f#


Scala's base caluculus is proven sound!

The implemented language as such has minor soundness holes, correct. But you need to resort to "murky" parts (like combinations of null, top and bottom types and sub-type relationships between them) to exploit this.

In practice I've never run into any case of this soundness holes. You need to explicitly construct a pathological case. (Just avoid null and you should be safe in any case afik).

This can't be said about things like MyPy or TS. There you run into soundness holes on a daily basis.

OCaml and F# are undecidable? Also because of sub-typing?

And what's the problem with Rust? It doesn't have sub-typing.

The problems with sub-typing in general could go away in the future when algebraic sub-typing catches on.


I don't think this example has anything to do with "stronger typing". A Tuple<Tuple<int, int>, Tuple<int, int>> would have passed the type check just fine the same way, without revealing your error.


> I don't think this example has anything to do with "stronger typing".

I disagree. There are definitely languages where you could easily differentiate between a "length-int" and a "coordinate-int" in the type system, while still representing it as only an int in memory at runtime.

I belive both Haskell and Ada can do this for instance.


> There are definitely languages where you could easily differentiate between a "length-int" and a "coordinate-int" in the type system

Python, via the shared portion of the type system enforced by the major type checkers, is one of those languages.

The fact that a particular library doesn't choose to do this doesn't mean the type system doesn't fully support it.


If you have a Rect type that'll likely take four arguments and you can still make the same mistake of confusing the Rect(x, y, w, h) signature with Rect(x1, y1, x2, y2). This of course could not happen if Rect were defined as Rect(Point origin, Size size) but that might get a tad annoying for callers.


Sure that is annoying, but I'm not sure it has anything to do with duck typing?


You don’t need strong typing but good naming to avoid these issues. While a type system kind of requires everything to be named, it really doesn’t matter to the type checker whether these names are semantically correct or have any semantic meaning at all.


> Basic as it is, it is worth reminding folks reading this and similar articles that Python, even with type annotations, does not have a compile pass

This is literally false, Python does have a compilation phase, and with its default settings it saves the compiled results for reuse.

You probably mean a static analysis phase, and that's...up to your workflow. If you are using mypy, pyright/pylance, or a other typechecker, it does have a static type analysis phase, that's what those are. And if you are getting typw feedback in an IDE, that's probably what you are doing.

> PyCharm is great, and it highlights many typing errors in code as you write them. But it gives up very easily in the presence of any indirection at all (many basic decorators which don't change function signatures, for example, are enough to cause it to stop trying to type check decorated functions at all)

Most of Python's major typecheckers (not sure about PyCharm’s custom one, definitely Pyright/Pylance and mypy) have configurable levels of strictness. They can definitely be configured not to give up easily.

But a decorator is a function, and if it is untyped rather than expressing how it transforms the wrapped function signature—even if it does not, in fact, transform the function signature—then it will probably have an Any type (none of typecheckers are great at inference) and either defeat or fail typechecking, depending on strictness settings.

> Rather, this is a reminder, especially for new python programmers: don't assume that type annotations give you anything (except documentation, which it's on you to keep accurate) for free.

Annotations alone don't give you anything (and aren't free, as you have to crrate them), but static analysis definitely gives you something.


You're right, my phrasing was inaccurate.

The intent was: Python does not perform type checking based on annotations (e.g. Protocols as discussed in the article) when it starts.

Unfortunately PyCharm's analyzer is pretty fundamentally oriented towards failing open, rather than closed.

For decorators, PyCharm often infers whether or not a decorator changes a signature. However, it does this inconsistently (and it may be backed by a specific list of supported decorators by name/module, I'm not sure). When it can't tell what a decorator does, it doesn't try to validate calls to decorated functions, which is kind of a bummer. What I'd really like is a way to tell the IDE that unrecognized/unparseable decorators should be assumed not to emit functions with a different signature; that would be wrong sometimes, but useful in what is for me the most common case.


I'm not an expert in either, but the difference doesn't seem dramatic to me. You can put type annotations in JS also and just use typescript in a way that's similar to MyPy. Adding types to either JS or Python seems "not free" to me. You pull in extra tools and syntax for either.


In TypeScript, there's no* way to run your code without invoking the type checker, as it has to transpile TS to JS. While you can suppress/reduce the strictness of the checker, you can't avoid it entirely, and opting out takes work.

Python is the opposite; using type annotations for any sort of verification is opt-in exclusively, and not part of the language/runtime itself.

*Yes, there are some direct TS runtimes/compilers, but they aren't how the vast majority of TS code runs.


Article mentions a type `List[Union[TemperatureMeasurement, HumidityMeasurement]]` to average over, so a list of potentially mixed temp and humidity. Which I guess you could average but has no physical meaning (though not the point of the article).

I think what is meant is Union[List[TemperatureMeasurement], List[HumidityMeasurement]], , so a list of either just TemperatureMeasurements or a list of only HumidityMeasurements.


Article author here. You're absolutely right of course! Thanks for pointing this out. I immediately fixed it.


You also say,

> you care only about the behaviour ... not the explicit type of that input

"duck typing" is a run-time equivalent of structural typing. It would be a little more accurate to say the value's properties (ie., the object behaviour) *is* the type.

A "protocol" isnt a means of "ignoring the real type, ie., the name we give a type". It is a means of expressing a structural type. Nominal typing, ie., a type "being some name" isn't fundamental.

Indeed, my biggest complaint with python's static typing efforts are how hamfisted nominal typing has been shoved on top of python, as-if it were just "failing to be java".


> Indeed, my biggest complaint with python's static typing efforts are how hamfisted nominal typing has been shoved on top of python, as-if it were just "failing to be java".

This is the best way I've seen my feelings on this summarized. Thanks! I really find that efforts to bring nominal typing to Python really just restrict it, and it becomes not the same language anymore. If I want that kind of typing, I prefer to use a language that does it properly. Otherwise, I want a type system that works for how Python is designed.

I'm really finding these days that I have to restrict the range of types my functions are designed to work on just to satisfy a bunch of linters. It's very annoying. And imho goes against the grain of why I chose Python in the first place.


I regard, `Callable[Blah..., ...., ..., [Blah, Blah....]]` essentially a hostile gesture to the data profession.

Guido wanted to create a procedural imperative language that used runtime tricks to look simple. What he actually created was a highly (ad-hoc) polymorphic language with extraordinary power.

This is the inverse story to javascript: there brendan wanted to hide the lisp. Here, guido accidentally created one.

Since realising this guido has gone about doing everything to hobble efforts to realise the underlying efficacy of the python model -- and shove it back into the procedural box he had intended it to be in.

Given python3, this appears to be a remarkable contradiction. Python2 was what guido wanted, when python3 came along and "software engineers" won over "educators", playing the "lets keep it dumb and procedural" game was conclusively dead.

I have a lot of respect for Guido's vision, but this approach to typing has frustrated me. It feels like he's fighting a proxy-war on procedural-python, using the type system to force the rest of us to concede.


What exactly is wrong with Callable though? The argument and return types can be structural types. It doesn’t really make the function less powerful if you only constrain the input to have the exact interface you need, and nothing more.


And if we are really using duck typing here, you would want Sequence instead of Union.


I'm very curious to hear where people stand on Protocols vs ABCs. While I definitely see the appeal of Protocols, I _think_ I prefer ABCs. Realistically it probably depends on the situation.


Article author here. To me the big advantage of Protocols as opposed to ABCs, is that you don't have to have your classes implement a special base class and can instead focus on the _consuming_ code. Using a Protocol for a function parameter works even for passing in objects of classes you don't control.


I think it boils down to thinking in composition (x has a y) vs inheritance (x is a y). I’m happy about protocols because most of the times, coming up with some pattern or order of inheritance using ABCs really feels beside the point of what I’m trying to achieve (stating that an object _has_ a specific set of attributes).


In my case I mostly use ABCs, not because I think they're universally better but because my use case fits them: dealing mostly with my own types, base classes provide some common functionality and the number of types is limited. Protocols are great for the opposite cases: you use types that are not necessarily under your control, no shared functionality and a wider amount of types.


ABCs lend themselves well to create inheritance-heavy code with hierarchies, which tends to become rather messy quick.


Question. The idea here is to introduce a type declaring what the requirements are of the input arguments, right? The article's example:

  class MeasurementLike(Protocol):
      timestamp: datetime
declares that the object needs a field called timestamp. This is a contract that ensures the following line of code can work:

  window_upper_bound = measurements[0].timestamp + window_size
The benefit being promoted by the article here, which I agree with, is that you only need to declare what the actual code requires and not name the full type of the expected input.

So, for me this raises the following: why can't this be done automatically? I can look at the code directly and see that measurements is a list with items that require a field called timestamp. Why is it necessary to write a bunch more code to tell this to a linter, when it's directly inferable from the code itself?

I can see directly that measurements must contain objects with timestamp, and that timestamp must be some object that can be added to whatever window_size is.

If it's entirely possible to get this information from the code, I don't really understand why it's necessary to burden the programmer with type hinting at all. It seems like the job for a type inference engine.

If I have to repeat a bunch of information that the compiler/(interpreter/linter..) should be able to figure out, as far as I can tell all it adds is (1) more work to do and (2) more opportunities to make mistakes. The idea of writing it all out explicitly is often celebrated as some kind of assurance of correctness; but you often figure out the types after, or in tandem, with writing the code. So it effectively is just like a "check", like those systems that make you type your password twice to make sure you didn't mistype it. Sure, it serves a purpose in the moment, but is it necessary to add maintenance burden, forever, to your code base for something that is totally redundant?


Declaring the types manually ensures that you don’t change the interface by accident. If I start using an attribute of an argument that wasn’t required before, I’ll get an error from the linter that will prevent me from breaking user code.


Type checking in Python is a complete waste of time and only leads to a false sense of improvement. People use type annotations for the same reason they wear a shirt in the pool, it feels weird to not see those reassuring type names everywhere once you are used to them from some other language. Take out all the non enforced type annotations and spend the time and effort you put into them into writing tests and docstrings instead.


One thing that drives me nuts is that enthusiasts of python type hints claim that it means you will no longer give a function the wrong type of value and thus eliminate a class of errors. This should imply that you no longer need to test for those types of problems. Since writing tests is not free, and often not even trivial, this would be a huge benefit of type hints.

But, because type hints are just window dressing in Python, you still have to write those tests. So now when you make a change, you not only have to update your tests but also update your type hints everywhere. As far as I can tell, type hints just double the work for little to no benefit. If they actually removed a need for unit testing, that would be a different story, but then it wouldn't be Python anymore.

If I want to use types to enforce things, I'd prefer to use a language designed for that. But I choose Python because it lets me get things done quickly. Despite helping you out in some cases, overall maintaining type hints are a drag on that feeling of getting things done quickly. The last thing I want to do when writing Python is spend an afternoon composing a complicated TypeVar arrangement to express the full range of inputs my function can handle.

What this article talks about (Protocol) seems to be at least an improvement in that regard, as you can worry about what your function does instead of what your types do.


> But, because type hints are just window dressing in Python,

Type annotations are not window dressing in a Python workflow using typechecking tools over the final code base, unless you explicitly tell the typechecker to ignore a detected error.

It's true that a type annotated library can't necessarily rely on downstream clients to have type discipline, and some library packages may not be fully annotated, which may create gaps to address.

> If I want to use types to enforce things, I'd prefer to use a language designed for that.

Python's type annotations were designed largely to make Python compatible with a separate existing language designed for that purpose; there's a reason mypy’s domain is mypy-lang.org

> What this article talks about (Protocol) seems to be at least an improvement in that regard, as you can worry about what your function does instead of what your types do.

Protocols are types; specifically, they are structural types.


To add to this, when we are reading/writing python source code there are types involved. The question becomes do we want the types to exist only in our heads or also denoted in the source.

To me even though the type annotations don't provide the same kind of guarantees that the type system of say Haskell does, they do increase the readability of the source code.

Finally (and this is more of my workflow), I utilize a build tool that has a mypy --strict call that will check my type annotations and alert me when things don't match up.


> Take out all the non enforced type annotations

If you are using a static analyzer before deploying, type annotations are just as enforced in Python as in Haskell; that is, they are enforced by a process that happens before and detached from run time.


I wear a shirt in the pool because otherwise I get sun poisoning. Sunscreen works but its so easy to miss a part and the consequences are bad enough that i just keep the shirt on -- the skin never gets sun so its the most sensitive part.


I know this sounds crazy, but hear me out, I just use statically typed languages when I require static types.


Hear me out too - I just take advantage of tools which can help me in reducing bugs especially if I am already comfortable in some language. Type hints do help lot of people.


I've used Enthought's Traits very extensively for a form of static type checking in Python, and I can strongly recommend it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: