Hacker News new | past | comments | ask | show | jobs | submit login
Structural pattern matching in Python 3.10 (benhoyt.com)
273 points by chmaynard 35 days ago | hide | past | favorite | 147 comments



A while ago, PEP 591 [1] made its way into the language. It introduced a "final" decorator / type annotation to indicate to mypy that a certain class shouldn't be subclassed, or a certain method shouldn't be overridden. At the bottom, there is the comment:

>A related feature to final classes would be Scala-style sealed classes, where a class is allowed to be inherited only by classes defined in the same module. Sealed classes seem most useful in combination with pattern matching, so it does not seem to justify the complexity in our case. This could be revisited in the future.

Has this been picked up? Replacing long if-elif blocks is only half the power of pattern matching. The other is to have exhaustiveness-checking by the static type checker.

I have pieces of code that would really benefit from this, and I was looking forward to migrating it to the new pattern matching feature. But I can't find any word on whether this is being revisited. That's disappointing.

[1] https://www.python.org/dev/peps/pep-0591/


This is precisely the thing I do not want from python. Subclassing is super useful in convoluted tests where fakes are superior to mocks. I don't like being told "you can't do this" by some library's code.


From my experience with Kotlin, sealed classes aren't really used for classes you'd like to fake in tests - they're usually used to have an exhaustive list of data types with different parameters, basically enums on steroids. Kotlin has recently added support for sealed interfaces, which would potentially suffer from the issue you described, but I haven't found a compelling use case (or any usage in wild) for sealed interfaces. If anything, I'd expect it to be used for types that explicitly shouldn't be faked by the library consumer anyway


It would be enforced by the optional static type checker, not the language runtime.


And if you want to enforce it at runtime you already can!

    class Final(type):
       def __new__(cls, name, bases, dct):
           if any(isinstance(base, Final) for base in bases):
               raise TypeError("No!")
           return type.__new__(cls, name, bases, dct)

    class NoSubclasses(metaclass=Final):
        pass

    class WillThrow(NoSubclasses):
       pass


If it's enforced at the type-level then, don't check types during your test runs? shrugs


You can already have exhaustiveness-checking by the type checker right now today.

    from enum import Enum
    from typing import NoReturn


    class Stuff(Enum):
        Cat = 1
        Dog = 2
        Bear = 3


    def exhaustiveness_check(value: NoReturn) -> NoReturn:
        raise AssertionError()


    def do_thing(stuff: Stuff) -> None:
        if stuff is Stuff.Cat:
            print("meow")
        elif stuff is Stuff.Dog:
            print("woof")
        else:
            exhaustiveness_check(stuff)
And if you add a check for Bear it will validate it!


Okay but what if I need something more than an enum? Like if the stuff parameter is of type "BaseThing", and there are only meant to be five subclasses of BaseThing to check. I would want the type checker to verify that I dealt with every case. But without sealed classes, it can't determine that some other code that subclasses from BaseThing is wrong.


Absolutely! And since type checkers will likely never be able to implement support for __subclasscheck__ or __instancecheck__ the best we can ever do is annotations about the desired results of those methods. My only statement was that match isn't magic and you can do exhaustiveness checking (inasmuch as type checkers can detect it) without it.


Some more discussion in this thread

https://mail.python.org/archives/list/typing-sig@python.org/...

I have a fork of the "adt" library that supports sealed classes. For static type checking, I'm using a fork of typpete, a SMT based type checker.

It's not as user friendly as mypy or pyright, but in the longer term can be a winner because it supports user defined type constraints (aka refinement types).


Once my favorite language by far, Python is starting to became too cluttered. Very few of the additions to the language in the 3.x branch are worth it IMO.

I remember that in the early 00's people called it "executable pseudocode". I bet nobody would use now that expression for modern Python with decorators, comprehensions, walruses and the like.


I hear that quite often, but up to now, I haven't seen it translate in the field.

Code examples in docs and tutorials are still easy, libs api still clean, code base still readable.

Optional things stayed indeed, optional.

My grip is actually the contrary, many python code bases are stuck in the past. But I'm ok with it, people are not all pro devs, they need a productive language subset.

And the new syntax is providing some fantastic new tools, such as pydantic/fastapi/typer (if you haven't try it, it's really paradigm shift, and I say that as a django fan). f-string are awesome. dataclasses are nothing but convenient and readable. Advanced unpacking as well. I seldom use the walrus, almost nobody does, but when one does, it's useful.

> decorators, comprehensions,

Those existed in python 2, 10 years ago. It's hardly new python. And web frameworks would be terrible without those.


I have seen someone who could program, but was a non-programmer, try to learn Python for testing purposes, using a testing framework that did all sorts of magic with decorators, magically deciding what to call your function with based on what parameter names you gave your callbacks, etc. It was awful.

That was when I officially stopped recommending Python as a new language. It's just got too many things to explain now. And too many of those things to explain are like decorators, which requires two explanations, one to explain what it looks like they are doing when you use a decorator prepared by someone else, and a second explanation about what they are actually doing if someone ever wants to write one of their own.

It looks like "x.y" resolves the "y" value out of the instance, but actually, it's this complicated thing so that properties and other things work. It looks like the for loop simply ranges over the list, but it's actually this thing with iterators. It looks like calling this function "yields" a list but it's actually this magical thing where using a "yield" in the function body completely and non-locally changes the entire function. It looks like....

When Python was new and the list of this "it looks like..." was short, it was kinda need to be able to override __getitem__ for certain useful effects and such, but stacking on more and more of these with every release has created a real monster of comprehension.

Now it feels like Python has raised "it looks like... but actually..." to a design principle.

It looks like the match operator simply takes the argument you give it and sees if it is the same "shape" as the case statements, but it actually consults a new magic class variable and has this logic and that logic and....


> using a testing framework that did all sorts of magic with decorators

This seems a little unfair. Testing frameworks are notorious for pushing the envelope of language constructs and weird introspection tricks. And if in this case you picked an exceptionally ripe example I'm not sure that should reflect on the entire language.

Back in the day people used to use the baroque nature of some frameworks such as Zope to bash Python but Zope was an outlier. There will always be some gorgonzola in any ecosystem.


> And if in this case you picked an exceptionally ripe example I'm not sure that should reflect on the entire language.

I'm not sure i agree entirely: regardless of whether we're talking about decorators/annotations in Python/Java/.NET/... in the context of testing/web/... frameworks, any and every of those combinations have only caused me pain and headaches.

While i do agree in principle that it's nice to abstract away code with something that annotates a method and gives it additional meaning, in practice this sort of indirection usually has something to do with dynamic code calls, reflection or whatever the idiom is called in any particular language. All of those feel harder to debug and more brittle than just writing 3 lines of code instead of 1 annotation/decorator.

The problem with that, is that it's usually:

  - hard to make your own decorators/annotations (especially if you want to extend their functionality)
  - hard to introduce state, dependencies and just get them working nicely with the rest of your codebase and IoC/testing frameworks, or pass data between the logic behind them (from one decorator/annotation to another)
  - hard to use them, as opposed to your "regular mode" of writing code, IDEs won't offer many refactoring options
  - hard to debug them, especially in the case of wanting to examine their internals and use breakpoints
To me, it just generally seems like someone is trying to tack on functionality in ways that are hard to utilize, as opposed to just writing code in the first place, the very same thing that you see happen with generics, deeply nested inheritance and other "tricks" which are supposed to make one's life easier but do the opposite in practice in many cases.

Personally, i enjoy libraries like Typer (https://typer.tiangolo.com/) for Python and think that it has plenty of use cases, but i've been hurt far too many times by similar functionality in non-trivial cases in every single language that i've used that supports something like that, so at best i'm guarded about utilizing them.

Java rant: But perhaps that's just because Java is a major source of pain for me in that regard. To give you a concrete example: i'm migrating Spring to Spring boot and someone used Jersey instead of RESTEasy as their JAX-RS implementation and now i need to transpose hundreds of API endpoint definitions (for example @GET to @GetMapping with bunches of parameters). If it were just Java code, i could probably use some clever refactoring in it with the help of my IDE, but now i'd have to figure out where the annotations come from, create a stub to replace them, call the proper stuff from within them and hope it works, unless the reflection that's used by them breaks. Not only that, but debugging is kind of hard when the source code and the actual logic behind said annotation is hidden below dozens of layers of Eldritch indirection and you have no chances of feasibly finding all of that stuff out.


What was the name of the framework?


> My grip is actually the contrary, many python code bases are stuck in the past.

From the outside looking in the Python community seems aggressively anti reworking their code to adapt to new language features. The 2 to 3 debacle comes to mind when compared to other languages, like JS, which made a much bigger change and everyone jumped in it and rewrote their code to adapt, or Ruby which went through some backwards incompatible changes at the same time as Python.


I don't think there has been a non-backwards compatible JS change in a very long time. They even named the array flatten method as 'flat', just to avoid having a conflict with MooTools!


My point with JS is more that the JS community sees no problem and will actively rewrite vast swaths of code to use features that are not yet supported in browsers. Hence the extensive use of polyfills and transpilers. Major language changes are generally seen as a good thing (which they mostly are in JS' case).

With Python I don't get the same feeling, they seem more resistant to change in general. I don't think it's just that Python has more so-called "non-programmers" (meaning, data science and math people). Most of the libraries for these things are made by full-time programmers anyway. Perhaps the idea that "there's only one way to do it" brought about a "the current way is good enough, why rewrite in a new one?" attitude.

But anyway, I'm not familiar enough, just an outside perspective.


List comprehensions date from Python 2.0 (October 2000), decorators from 2.4 (November 2004) -- so both of those are from when people called it "executable pseudocode", right? At the least there's a big discontinuity before the walrus operator appears (3.8, October 2019).

I was once an avid Python user, but I switched to mostly Rust a few years ago and now write very little Python. A while ago I wrote some not quite trivial script-like code in Python, but ended up converting it to Rust before it finished running. The attraction of Python used to be that it was simple, but I now value robustness (through the type system in particular) much more than simplicity of the language.


Well. I have a book that uses a minimal subset of Python as the illustrative language, and oh boy would the examples have been vastly improved if the author had gone the extra yard and added comprehensions into the mix rather than ploddingly building his transformed lists in loops like a Neanderthal.

Decorators though I totally agree; had some tooling at work that was gussied up to the point of impenetrability by gratuitous use of decorators. A colleague got so fed up with it he rewrote it in Perl, which was, shockingly, actually a maintainability improvement.


Decorators become pretty hard to reason about as soon as you have to apply more than one of them to the same method. So I think there are cases for them where the ergonomics are fantastic, but exercising restraint is critical— and it's restraint on the part of API/library authors that is needed more than on the part of users, since a lot of the time, it's actually out of users' hands.


This. Nothing wrong with decorators per se. Just one typical example: measuring a couple of functions execution time by just decorating them definitely fullfils what the OP wants: simple and readable. Obviously there is more than one way to measure execution time, but the 'there should be one way' mantra really only holds true for very basic things anyway imo. And yes just like any other pattern out there it can be abused. One could try and blame the language for allowing such abuse, but that is not always fair I think, and also not in this case. Bad code is written all the time all over the place, and if a language would try to really really fix that it would be way to restrictive probably and hinder development in other ways.

Same story for list comprehensions: it's almost beyond me how one can be against a simple single list comprehension. It's simple, readable, almost beautiful. Now if people start nesting 5 comprehensions it's not so nice anymore; but then is the language to blame (as in, would we really want to forbid nested comprehensions?), or the people?


I really love the idea of decorators. However, after having used them in some rather large projects, I have to agree that it contributes to “magic” happening elsewhere in your code that can complicate debugging quite a bit.


a new simple language is created (Python) -> beginners start using it -> they become proficient in it, some build successful businesses around the language -> limits due to the simplicity are hit -> people lobby for advanced features (lambdas, match, etc) which make code much easier for the subset who understand it -> other people complain that they can't understand the code anymore -> a new simple language is created (Go) -> ... -> people lobby for advanced features (generics) -> ...


Go's over ten years in, though, and the first really significant overhaul is still about six months out. It breaks that cycle.

More on topic, I do sort of feel like maybe the Python team should have made a Newthon (with a better name) or something, because Python is becoming a shambling beast. Starting as one of the world's most dynamic OO language and then shambling towards being a static language, maybe a bit of a functional language, it's crazy.

I often joke about "Katamari Dama-C++!", which has the irresistible phonetic allusion, but damn if Python isn't trying to keep up at this point.

If you don't like Python for some task... use something else suited for your task, then. It's not like using something else means you must put Python down and never use it again on pain of being shot or something. Stop waiting for Python to mutate into O'Caml or something and just use O'Caml or whatever.


I googled "Katamari Dama-C++" to see what it was all about and found your post from 2014 [1]. That is impressive commitment.

https://news.ycombinator.com/item?id=7315163


Is it not all optional? You don’t have to use it if it doesn't improve your code


It is. But stablished idioms became obsolete, and the code that used them becames unidiomatic. And you have to learn them if you want to read other people's code.

Python doesn't need more features but better implementations, IMO.

Edit: Just remembered the Zen of Python, I don't think modern Python is following it.

...

Simple is better than complex.

...

Readability counts.

...

There should be one-- and preferably only one --obvious way to do it.


And yet i would argue that comprehensions and decorators are more readable and simple than wrapped functions and for-loops that append to a list.

Having one way of doing something and sticking with it even when better ways emerge is nonsensical imo.

f-strings are more readable (for me) and concise than string formatting or .format()

I do agree on the implementation front though, many of the other implementations are still on 2.7 or 3.6 at most, and don't have mainstream appeal. Made more with interop in mind (like ironpython with .net and jython with the jvm)


> And yet i would argue that comprehensions and decorators are more readable and simple than wrapped functions and for-loops that append to a list.

Not for beginners though. One of Pythons main strengths was that it looked simple. Not anymore.


Decorators, I'll grant. But I remember loving comprehensions as a beginner, because they resembled the set-builder notation I was already familiar with from mathematics. I actually found it harder to remember how to construct lists the "manual" way: do I use `append`, or `extend`, or `+=`?


But we're way past decorators and comprehensions. I don't understand why people focus on these 2 features to fuel the debate here.

Point is Python used to be a complete yet simple language with few constructs. It more or less followed the path of least surprise and people revered it for that. For several years, it feels like PEPs just add more and more clutter to that foundation.

It sort of looks like an arms race and I just don't get why or what's going on. Certainly, Python doesn't feel the same anymore.


Python has turned into the C++ of dynamic languages, but without the speed. My feeling is that Python is suffering the same fate as Pascal, where a "teaching and prototyping language" is used for things it was not intended for. Programmers that grew up with Python want more features, not thinking about the next generation of programmers.


An example loop:

    elif cmd.op == 'TUPLE':
        arity, = cmd.args
        tuple_args = []
        for _ in range(arity):
            tuple_args.append(self.data_stack.pop())
        result = '{' + ', '.join(map(str, tuple_args)) + '}'
        self.data_stack.append(result)
Is following use of comprehension any better? I don't think so:

    elif cmd.op == 'TUPLE':
        arity, = cmd.args
        tuple_args = [str(arg) for arg in reversed(self.data_stack[-arity:])]
        self.data_stack = self.data_stack[:-arity]
        result = '{' + ', '.join(map(str, tuple_args)) + '}'
        self.data_stack.append(result)
Any other suggestions for improvement?


    elif cmd.op == 'TUPLE':
        arity, = cmd.args
        stack = self.data_stack

        text = ', '.join(str(stack.pop())
                         for _ in range(arity))
        
        stack.append(f'{{{text}}}')


My personal rule is to use comprehensions when they improve readability and to avoid them when they don't.

There's usually a threshold where I go "nope" and unroll it into a good old loop.

But when you're in their sweet spot, they definitely improve readability.


It still ends up in other people's code, so it's hard to meaningfully contribute to python projects without learning at least the basics for all these new features.


I was going to say the same, but if you start to read other's code, you will need deal with it.

Buy I personally like it when I read other people's code and see something which I don't understand, which I've never seen, and then it turns out to be a feature which actually makes coding easier and more efficient. Then I think about all the places where I could have used that feature in my old code.


Part of me wants to disagree since some of these features make my code shorter, but I have to agree since they commit a sin of language design that I dislike - implicit magic. Decorators are useful because they can help you shrink code by letting the decorator generate boilerplate for you. The code gets smaller, but now it’s harder to know what’s going on since you need to know what magic happened behind the scenes due to the decorator. It feels very much like issues I ran into when I used C++ meta programming libraries - my code shrank, and at the same time my understanding of what it actually did also shrank. Same with the walruses - my code gets smaller because now there’s some implicit stuff happening.

Comprehensions are a little less magical - if anything, they are more explicit. If I want to create a list where each element is generated by some function over another list, I just say it. Doing so with loops is obscuring what I wanted to say in the first place. The problem with comprehensions isn’t so much the comprehension, but the obtuse ways people can use them to eke out performance by avoiding explicit loops.

I’m all for things that are closer to what a programmer means, but less keen on features that entail obscuring details that may come back to haunt the programmer later (I see this most often with decorators).


I don't really agree that it's so magical, especially the walrus operator. Sure, a beginner may be confused by it, but it's really easy once you learn the difference between `=` and `:=` IMO. Python is not a language designed to be only used by beginners. Decorators may be a little more "magic", but it makes little difference in practice if you do @bar or foo = bar(foo), the former is just a cleaner syntax (and stops linters wanting you to put two blank lines between the function and the "decorator").


Oh c'mon with the walrus. Assignment being an expression is the case in so many languages. They should have made it the case in Python and just be done with it.

"But someone will make a typo one day resulting in = instead of ==" argument is nonsense as that hasn't been an issue since forever in other languages. You can either require parentheses if you want to use the value (the way C compilers want it) or just make it := to begin with.


The biggest upset is that they didn't go the extra bit and implement match assignments. Such an insanely powerful, very functional feature. Helps reduce bugs simply by facilitating orthogonal branches of assignment that cannot interact, unlike if/else or for loops.

Oh well, maybe in a few years someone will put in a PEP to use walrus := with match/case to give match assignment and give the old Python guard something to _really_ clutch pearls about.


Do you mean like this?

  x = match expr:
      case 'a':
          1
      case 'b':
          2
      case _:
          0
I think mainly it doesn't work very well in Python due to indentation problems. Kind of like how lambda is restricted to single expressions, instead of blocks. Here's what they say in the rationale PEP (https://www.python.org/dev/peps/pep-0635/):

> Statement vs. Expression. Some suggestions centered around the idea of making match an expression rather than a statement. However, this would fit poorly with Python's statement-oriented nature and lead to unusually long and complex expressions and the need to invent new syntactic constructs or break well established syntactic rules. An obvious consequence of match as an expression would be that case clauses could no longer have arbitrary blocks of code attached, but only a single expression. Overall, the strong limitations could in no way offset the slight simplification in some special use cases.


I think they're right that it would be too foreign to python to be a good addition, but this really just goes to show that expression oriented languages, as opposed to statement oriented ones, are the way to go.


The debugging and instrumentation in Python is line and statement oriented, so there’d be a lot more work behind the scenes.


That would be at odd with the language style which separate statements from expressions.

You can dislike this language characteristic, but it's the right call to stay congruent.


I agree it's the right call, but boy would it be nice.


So my dream is that eventually mypy (the optional python static typing module) will be extended so that python becomes a gradually typed language — able to detect if an entire program has valid static typing, and if so, use a much higher-performance python compiler to skip over the python object system to increase code speed by 10x.

This would make python both a viable choice for rapid prototyping, as well as easy to convert to much higher performance code (just by handling some typing conflicts), and potentially making python a mega-language used even more pervasively.


If you stick to this subset of python

https://github.com/adsharma/py2many/blob/main/doc/langspec.m...

The tool can convert your code to C++/Rust/Go and give you a native binary which like you observe will run faster without the python runtime.

There are many open tasks that could use help.


How does this compare to Cython?


Cython is a compiler, which converts (compiles) Python source code into more performant yet still compiled executables.

Py2many is a transpiler, which converts source code of one language to source code of another language (which then needs to be compiled).

So far, Py2many seems to support Rust and C++14, and preliminary support for Julia, Kotlin, Nim, Go and Dart.


I realize that many people, like yourself, make a distinction between compilers and transpilers.

There are also many, like me, who find it to be a distinction without a difference. We call them all compilers.

One example is the TypeScript team:

> Let’s get acquainted with our new friend tsc, the TypeScript compiler.

https://www.typescriptlang.org/docs/handbook/2/basic-types.h...

Another is Bjarne Stroustrup. Back in the glory days of BIX, I posted a comment in one of the forums that said, "C++? That's a preprocessor, isn't it?" (At the time, the only C++ implementation was Cfront, which translated C++ to C.) Bjarne replied in no uncertain terms that Cfront was a compiler.

Longer version of the story with some discussion of other words and phrases:

https://news.ycombinator.com/item?id=15154994

Of course, the word "transpiler" had not yet been coined, but I have a feeling Bjarne still prefers to call Cfront a compiler, as the Wikipedia page does:

> Cfront was the original compiler for C++ (then known as "C with Classes") from around 1983, which converted C++ to C; developed by Bjarne Stroustrup at AT&T Bell Labs.

https://en.wikipedia.org/wiki/Cfront

So to my mind, a compiler doesn't have to generate a "compiled executable". It also doesn't have to take "source code" as input.

Most implementations of JavaScript and Java have at least two compilers: one that translates source code into bytecode, and a Just In Time Compiler that never sees the source code, but analyzes and profiles the bytecode while it runs, and translates sections of the code into machine language as it finds hotspots.

Now if you like the word "transpiler", as many do, I won't try to convince you otherwise. I just wanted to explain why some don't find it a useful distinction and prefer "compiler" for all these cases. It's a program that translates computer code from one language to another, whatever the form of those languages may be.

I suppose by my own argument I should also call an assembler a compiler! But no one does that... :-)


>Back in the mid-80s, [...] Of course, the word "transpiler" had not yet been coined,

Friend fyi... the word "transpiler" to make a distinction of converting the source to another higher-level language rather than a lower-level machine language appeared as early as 1960s. In the 1964 paper, see the last page and the 2nd-to-last paragraph:

http://comjnl.oxfordjournals.org/content/7/1/28.full.pdf+htm...

>There are also many, like me, who find it to be a distinction without a difference. We call them all compilers. [...] Now if you like the word "transpiler", as many do, I won't try to convince you otherwise. I just wanted to explain why some don't find it a useful distinction and prefer "compiler" for all these cases.

I understand your point of why "compilers" already encompasses transpilers so the word "transpilers" seems redundant. But I'll try to explain why "transpilers" still endures. If you're not familiar with concept of "lumpers vs splitters", take a look at: https://en.wikipedia.org/wiki/Lumpers_and_splitters

Imagine if we did not have the word "transpiler". Language usage would still evolve to make a distinction via extra prefixes or suffixes or extra adjectives. Examples of alternative history might be:

- "compiler-to-asm" or CTA acronym, or "compiler-to-executable", or "native compiler"

- "compiler-to-another-high-level-language" or CTAHLA acronym, or "programming language translator"

But humans don't like to say or write verbose multi-syllable terminology and they'd eventually substitute a simpler word as shorthand for the distinction. That shorthand word might be "transpilers" or another word to serve the same purpose. And that's what happened in 1964 when AF Parker-Rhodes suggested the word "transpiler".

Similar concept of labeling Linux ext4, Microsoft NTFS, Apple HFS as "file systems" instead of just "databases" -- even though they are all conceptually "databases" which have same computer science concepts of keys, blocks, btree indexes. If we tried to police the language and insist that "file system" adds no useful distinction because it's a "database"... human language usage would still evolve with extra adjectives such as "operating-system-built-in-database-for-documents-and-arbitrary-files-etc" ... which is a cumbersome mouthful which motivates a shorthand terminology... perhaps call it "file system".


Oh, now that is interesting! Thank you for the very cool history and linguistics lesson, and I stand corrected on when the word "transpiler" was coined. :-)


No, Cython is transpiler as well, it converts the code into C which you can then compile.

If you use additional type markings it can generate more performant C code.

BTW: I remember a blog article of someone who was using Cython as the primary language instead of Python.


The typical C compiler is a transpiler as well. It convert C code to assembly that is then converted by the assembler (another transpiler) to machine code.

As the sibling comment notes, transpiler is not a very useful term.


>assembler (another transpiler) to machine code.

A thought assembler was a "compiler", where a "compiler" is a special type of transpiler that is only capable of producing machine code.


You can also think of Cython as a .pyx -> .c transpiler, but it tries to build a python C extension primarily.

Py2many tries to primarily build a standalone C program without the python runtime, although it supports a --extension flag, where it generates a PyO3 extension for rust.


cython requires you to express types in a special way (cdef int in a .pyx file). py2many uses standard python3 types plus a few derived from ctypes, with rust compatible names.


I don't think typing is the main obstacle to Python performance, otherwise it wouldn't be that much slower than JavaScript. In fact, there are several alternative interpreters that are much faster, but they entail incompatibilities with popular packages. Static typing cannot fix this.

Now, my dream would be for Python to actually enforce type annotations at runtime when they exist. Maybe add some kind of syntax to avoid doing it when it's expensive, e.g. `list[nocheck int]`, but for the love of god, enforce it, otherwise I feel like I'm just peppering my code with limp suggestions that only stand if some third party package can find all paths that lead to them, at which point I'd rather use a true statically typed language where the whole system doesn't look and smell like a janky afterthought.


I was just looking at runtime type checking. Apparently the reason nobody does it is because of the roughly 30,000% slowdown compared to not doing it.

I e said it before and I’ll say it again: runtime type checking saves you from the case where you pass None instead of an actual object but not much else. How wrong would your program have to be to pass an instance of Animal instead of an instance of BankAccount? And how quickly would it error out with “giraffe does not have an attribute routing_number”?


You can’t compare to JavaScript as most JavaScript runtimes have a JIT. You’d have to compare against PyPy or something like that but it hasn’t seen anywhere near the amount of performance attention like any of the browser JavaScript engines.

Finally, CPython is very old. That means it made decisions around multithreading it still has to live with to this day whereas the JavaScript implementations have always been single threaded and added multithreading much later (learning from the experiences of other languages making the transition).

Dynamic typing is part of the problem but you’re right that it’s not the whole thing.


As far as I know, all JavaScript interpreters and JIT VMs are also single-threaded (web workers aren't real threads because they can't directly share memory with the main thread).


Web workers are real threads in the sense that they map directly to low-level threads (+ shared memory via SharedArrayBuffer which was only temporarily disabled due to Spectre).

The bigger thing is that the Python runtime has a GIL because that's kind of how you solved the problem when multiple cores started becoming mainstream in the 90s. The Linux kernel had a similar GIL. It's an easy way to technically support running in a multi-threaded environment while reducing maintenance issues at the cost that you can never use more than 1 core at a time. JS runtimes don't have this problem because the runtime support layer is almost non-existent & there's no multi-threading support at all (indeed, V8 by itself comes with almost no JS APIs - there's a clear delineation between language runtime & "app runtime"). Service workers is extremely thin & works on channels instead of shared memory, further simplifying the design. The JS GC is typically written to support concurrent multi-threaded operation to minimize the "stop the world" phase runtime whereas Python's cycle detector needs to stop the world due to the GIL. These are all small decisions that add up.


except via SharedArrayBuffer?


> my dream would be for Python to actually enforce type annotations at runtime when they exist

Does any mainstream language do this? For sure most don't, even statically-typed ones. I'm all for enforcement of types, but why does it need to happen at runtime?


But what other statically typed languages enforce static type constraints at runtime?


While it's definitely not quite the same idea, Typescript supports run-time type predicates that inform the type system.

https://www.typescriptlang.org/docs/handbook/2/narrowing.htm...

    function isFish(pet: Fish | Bird): pet is Fish {
      return (pet as Fish).swim !== undefined;
    }
Which I've found extremely useful


In python with mypy, runtime checks (e.g. assert) also inform the type system.


That's good to know -- it's a great feature for these sorts of gradual systems, in my opinion.


Flow types (for JavaScript) can be complied into run-time type-checks.

Properly statically typed languages do not need run-time checks because the compiled code is already proven to be valid by the type-checker.


> Flow types (for JavaScript) can be complied into run-time type-checks.

Okay, so there are some, though flow is pretty obscure, and flow-runtime (which is the part which does this AFAICT) is even more obscure. I don't think this is a popular feature.

> Properly statically typed languages do not need run-time checks because the compiled code is already proven to be valid by the type-checker.

If you want that, just enable strict mode for mypy.

And if you want to interface with code that is not stirctly typed, just use runtime type checks, which will inform the static type system:

   foo: Any = untyped_code()
   assert isinstance(foo, MyClass)  # after this mypy will treat foo as instance of MyClass.
Same can be done with other types of checks also.

So if you want runtime type checking that informs the type system, you have it. I certainly don't want runtime checks for what the type system has already checked for me.


> If you want that, just enable strict mode for mypy.

But I can't do that if I want to do gradual typing. If I type a function, but the type system cannot figure out what's going to call it, that's when I want it to perform a check: at the boundary.

> I certainly don't want runtime checks for what the type system has already checked for me.

The type system can remove the checks if it can prove that they always pass. My point is that if it cannot prove that they will pass, I want them at runtime.


> But I can't do that if I want to do gradual typing. If I type a function, but the type system cannot figure out what's going to call it, that's when I want it to perform a check: at the boundary.

Unless you start using # type: ignore - the only way to interface with untyped code will be to use asserts or other checks (if + throw), i.e. perform checks at the boundary.


> Properly statically typed languages do not need run-time checks

Well, what's a properly compiled language these days? Golang e.g. uses run-time dispatching due to the way interfaces work and, of course, garbage collection. Even C++ needs to do run-time dispatching of virtual methods. So the only languages left these days that do not use run-time checks are probably C and Fortran.


I mean run-time checks that the user must implement.

For example, in JS you might put these all over your code-base:

    if (typeof x !== 'string') {
      throw Error('Expected a string');
    }


This will already inform mypy that after that if statement x is type string.


PyPy is able to dynamically codegen significantly faster code. Other projects have found similar success. the most challenging part is complying with the extension system because the interface strongly implies CPython.

The use case for mypy is strictly for static type checking AFAIK.


I think https://mypyc.readthedocs.io/en/latest/ is the thing you want. In my experience, it isn’t (yet) 10x performance, but it can make significant improvements.


Static typing does not make programs automatically faster. The compiler also needs to know typical patterns and apply optimized assembly code for these cases like e.g. loop unrolling or binary arithmetic. Since typical python programs do not do number crunching much (NumPy delegates that to C routines), I wouldn't expect a great performance boost by a static compiler. Also note that statically compiled languages like Go e.g. still do a lot of run-time dispatching (due to the way interfaces work) which can not be removed by the compiler despite the presence of static types. Furthermore, much of perfomance is lost due to garbage collection that still needs to be done by a compiled python program. This why C programs will stay a thing when it comes to performance sensitive niches since it is fast due to the absence of language features.

I'd argue a good JIT compiler will always beat a static compiler, especially for a dynamic language using a complicated syntax such as python.


Cython has been around for years, can be ported gradually, and sometimes the speedup is near 100x.


A really nice collection of examples in their original and their refactored form - nice work. I'm similarly in two minds about the feature, it's cute, but is it worth it?

I think it's indicative that the nicest use is in AST processing. As someone that works on the occasional parser-y side project it looks pretty cool, but also, 97% of professional Python development is in data-science/application development. I suspect this is a case of the core language devs having a bias towards features that are useful to them as opposed to the community as a whole.


> but also, 97% of professional Python development is in data-science/application developmen

No. There is not a single field with " 97% of professional Python development". As a freelancer, I see python used for everything, everywhere. From education, to math, to biology, to geography, to web dev, to automation, to testing, to desktop app...


Thanks! Yeah, that's how it seemed to me too. The AST processing examples really sell it. But the fact that Warehouse is a sizeable application (60kLoC) but didn't really have any good uses for pattern matching is telling.


Pattern matching enable an entirely new class of features that are too tedious and error prone to implement otherwise. Specifically, anything that has to do with recursive structures. Are recursive structures useful? Perhaps, given that natural language is fundamentally based on recursive structures.

Apps designed pre ubiqutous pattern matching say little about whether pattern matching has good uses or not. More likely features alluding to pattern matching were culled early in the design process.


I think it's a bit of a stretch to say that "anything that has to do with recursive structures" is too tedious to implement with without pattern matching. I've written my share of recursion in Python, and have written a number of recursive parsers and language interpreters in Go (which also doesn't have pattern matching). I didn't find it bad at all.

That said, I think your overall point is interesting. Will this new feature in Python mean that Python is used for certain domains it wasn't previously? I have my doubts, partly because it's a fairly slow byte-code interpreted language, and that limits its usefulness (I've seen a number of compilers written in OCaml, for example, which has pattern matching, but is also fast).


I'd say it's a good thing people don't try to use the feature in a context it's not useful. Don't use the shiny new toy just to play.

Python has to cater to a vast crowd, and geographers don't have the same needs as data scientists, web devs, 3D artists or sysadmin.

So it's logical some tools will be useful in some cases, and not some others.

Amusingly yesterday I was doing something in JS, and though, "ah, I wish I has this new PM feature from python there".

The day before, I had to deal with pedantic exception, and wish I had a pattern matching to make a case depending of the number of errors and their message. I had to do a hacky loop for something that had to do with structure, not repetition.

Just like the walrus, I don't think it's something that will pop up everywhere, all the time. But when it is useful, it will make the code nicer.


I’m a heavy professional python user, and I’ve read about this new feature maybe a half dozen times. And I still don’t get it. Sigh.

Powerful? I guess so.

Complicated? Definitely.

Confusing? You bet.

Good addition? Uhh, I really hope so.


The concept came over from Rust and Haskell.

The value of the concept really comes out when you start designing your types to be case...match-ed against. I am not the best person to explain this, but the value I mean is allowing more 'algebraic datatypes' to be used in python. This way you can have the structure of your data(types) more closely resemble your business logic. Making code more readable, and also making it harder to represent invalid states, which reduces the need to validate data. (I recall an article called "parse don't validate" the same idea applies here)

I think it will take a while for python users to start using this paradigm, but I expect it will be very useful for folks who know Rust, Haskell, or similar languages.


The concept is much older than Rust and Haskell. It was present in ML which is from the 70s and has strongly influenced most modern languages (and yes, Rust and Haskell particularly strongly).


of course. But within Haskel and Rust the concept gained traction with the wider developer community. I think especially Rust was important for getting python developers to consider adopting it. Since it is a 'serious' language rather than an 'ivory tower' language.


This is the best piece I've read on the subject, you might give it another try. I'm now lukewarm on the feature, whereas before I thought an abomination.


Give a try to fastapi + vscode with pylance. Eye opening to what you can do if you use modern features right.


"It’s tempting to think of pattern matching as a switch statement on steroids. However, as the rationale PEP points out, it’s better thought of as a “generalized concept of iterable unpacking”."

This is the problem with the recently added features in Python. As the author of the blog post correctly notes, the features are unintuitive and full of special cases.

You always have to know and recreate the steps that the unpacking machinery takes, there is nothing logical and declarative like in the ML family languages.

Pattern matching in SML or OCaml is simple and obvious.

These features are added to Python in order to give an impression that some "development" is happening. With the benefit that they prevent implementations like PyPy from catching up. Meanwhile a non-existing vaporware JIT from Microsoft is pushed as the future.


Alone the indentation makes the code clearer to read. Generally it seems to clean up the code quite a bit in comparison to the if-statements.

I would like to have seen some benchmarks for it, though, specially for things like the gaming loop, where, compared to the if-variant, the type of the event has to be checked for every case, since grouping is not possible.

This specific game loop is not adequate, since input events are rare, but if this were to be used with Scapy for example, it would really start to matter, like when iterating over raw socket events.


See my other comment with some quick benchmarks: https://news.ycombinator.com/item?id=28601862

Yeah, the match is slower than if-else in that case because they're not grouped. However, the PEP points out that faster implementations are possible -- it would not have to check the type each time if it already knows it can't match. I guess some of those optimizations are left for later.


> Python evaluates the match expression

One other limitation from the perspective of people looking to transpile static python to other languages: match is a statement, not an expression in the language.

I've been looking to define an extension for transpilation purposes that doesn't break match in 3.10 but provides an avenue to transpile code to rust and other FP languages.


Now also add back lambda tuple unpacking. I wish Guido et al would totally get over the cringe aversion to functional programming concepts.


Seriously. I think Python 3 was a big improvement but removing tuple unpacking in function arguments (while adding it elsewhere!) was a huge regression.


I wonder what the rationale was for the removal?


https://www.python.org/dev/peps/pep-3113/

"In order to get all of the details about the tuple from the function one must analyse the bytecode of the function. This is because the first bytecode in the function literally translates into the tuple argument being unpacked. Assuming the tuple parameter is named .1 and is expected to unpack to variables spam and monty (meaning it is the tuple (spam, monty)), the first bytecode in the function will be for the statement spam, monty = .1. This means that to know all of the details of the tuple parameter one must look at the initial bytecode of the function to detect tuple unpacking for parameters formatted as \.\d+ and deduce any and all information about the expected argument. Bytecode analysis is how the inspect.getargspec function is able to provide information on tuple parameters. This is not easy to do and is burdensome on introspection tools as they must know how Python bytecode works (an otherwise unneeded burden as all other types of parameters do not require knowledge of Python bytecode).

The difficulty of analysing bytecode not withstanding, there is another issue with the dependency on using Python bytecode. IronPython [3] does not use Python's bytecode. Because it is based on the .NET framework it instead stores MSIL [4] in func_code.co_code attribute of the function. This fact prevents the inspect.getargspec function from working when run under IronPython. It is unknown whether other Python implementations are affected but is reasonable to assume if the implementation is not just a re-implementation of the Python virtual machine."

I'm not very convinced without further information - it sounds like it was already solved in CPython, and bug in IronPython? The whole PEP reads as if it's looking for excuses for removal.

I find this especially annoying for lambdas.



I think to simplify `__signature__`.


The author commits a logical fallacy, I think. The use cases aren't there yet, because pattern matching is missing. It's not the deconstruct statement that matters. But with it, you can work well with variants, which you couldn't really do before.

There are so many natural real-world applications to tagged union structures that I bet the ratio will look very different in a couple of years.


This functionality has been around a long time in other languages, enough for use cases to be discovered, no?


Type-safe tagged unions, one of the nicest features of Typescript:

    interface Foo { type: "foo"; name: string; }
    interface Bar { type: "bar"; size: number; }
    type Variant = Foo | Bar;
    
    function tell(x: Variant) {
        switch (x.type) {
            case "foo":
                // x now has type Foo
                console.log(`The name is {x.name}`);
                break;
            case "bar":
                // x now has type Bar
                console.log(`The size is {x.size.toFixed(2)}`);
                break;
            default:
                // statically unreachable
        }
    }
Hope to see this in Python type checkers as well.


You can already do this without pattern matching:

  from typing import Union, NoReturn
  
  class Foo:
      name: str
  
  class Bar:
      size: int
 
  Variant = Union[Foo, Bar]       # or `Foo | Bar` in Python 3.10
  
  def assert_never(x: NoReturn) -> NoReturn:
      # runtime error, should not happen
      raise Exception(f'Unhandled value: {x}')
  
  def tell(x: Variant):
      if isinstance(x, Foo):
          print(f'name is {x.name}')
      elif isinstance(x, Bar):
          print(f'name is {x.size}')
      else:
          assert_never(x)
mypy returns an error if you don't handle all cases.


The difference is that in Typescript, this works on any structurally matching object that you may get e.g. through JSON.


Typescript is so weird. Like this is cool I guess but I'd much rather just match on `Foo` or `Bar` directly. But Typescript doesn't have pattern matching so it compensates with this bizarre paradigm of enforcing dynamic type checks statically.


You can check instanceof of course, but the idea behind doing this is that object literals and raw data from JSON/whatever can be treated as a first class entity, rather than mucking around with all the boilerplate dataclass junk a lot of languages get tangled up in.


I am a big fan of this but the class based matching is a weak example as each case could be implemented as a method call on the class.


Which specific class matching example are you referring to, and how would you implement the matches as method calls?


Yes, but what if you don't have control over the class definitions because you're importing them from elsewhere? It seems a bit overkill to subclass all of them.


Odd that something this complex doesn't support regular expressions or even greater-than, less-than. I'm also not that good at Python so perhaps it is obvious and I missed it?

I didn't RTFM but I did CFTFM (ctrl-F'd the FM).


Those are supported via the `if` clause. But I wouldn't say they're "obvious" features, given that they're not present in the languages this idea is borrowed from (ML, Haskell, etc).


That's what I thought, it just feels weird to do a bunch of ifs, and then invoke this behemoth of a construct.


Can anyone comment on the performance of structural pattern matching in Python 3.10 for simple cases compared to if/elif? For example, say I am just checking for which of several cases from an enum to diverge based on.


I did a quick test of match vs if...elif for a basic enum switch, as well as a slightly more involved class/structural matching test. Results and code: https://gist.github.com/benhoyt/f648d6dd13ce112ba1a4422b175e...

For the enum switch, match generates very similar bytecode (use dis.dis to see) and the execution time is almost identical.

For the structural matching, match is significantly slower, which surprised me a bit (it's almost twice the number of bytecode instructions). That said, we're down in the nanoseconds range -- use of this feature shouldn't be about ns-level performance, but code clarity. I suspect "match" will get faster over time as they add optimizations or specialized bytecode for it.


I used to do this with a dict. Glad this is in Python 3.10.


Yeah, I think the author focused a bit too much on structural pattern matching being a replacement/alternative for if...elif blocks.

It will be much more useful as an alternative to dispatch dictionaries, which are more a side-effect of the language lacking any case based control-flow.


Yeah, there are several giant dictionaries (variously class to function, string to class, etc.) in just the project I’ve worked on most recently that fit that description.

Though considered from that perspective, it’s actually a bit of a disadvantage that the new match statement is a part of the language. It’s nice to dynamically extend (either as a first party developer or third party user) that sort of dictionary, effectively adding new branches to the implicit switch statement. A match statement is going to be fixed as written. Probably a readability and consistency win to remove spooky-action-at-a-distance, but at the cost of a bit of (ugly, pragmatic) usability


By the way, does anyone here know how Python compiles the match statements under the hood? Are they converted into something equivalent to if-elif, which tries each option in sequence? Or does it do something more clever?


The author replied here that yes, the bytecode is simple, with the exception of the Class matching.

https://news.ycombinator.com/threads?id=benhoyt


Do you mean like "f = handlers.get(args.command, error); f(args)"? It's a fine pattern (of course, it's only a tiny fraction of what pattern matching can do). However, it does mean that if the handlers aren't already functions, you have to convert them to (possibly nested) functions, but that can sometimes make the code clearer and easier to test anyway.

But if you're not matching structure I really don't see what's wrong with plain old if...elif? And you can easily add more complex tests like "in" clauses this way too:

  cmd = args.command
  if cmd == 'push':
      # do push
  elif cmd in ['pull', 'pul']:
      # do pull
  else:
      error(f'command {cmd!r} not yet implemented')


It took way too long to get this considering how case has been around forever. Now I just want a do-while.


dang, could we please merge the comments from https://news.ycombinator.com/item?id=28591173 into here (or vice versa)?

Thanks for resubmitting, chmaynard!


You should email the mods if you need something tweaked, summoning from comments doesn't work reliably but email does.


Good call, I emailed and he merged them!


I usually do a quick check for duplicates by clicking on the domain name in parentheses. If there is an identical post with some activity, I delete mine. Forgot to do this, sorry. You can have my karma. :)


No worries! Yours was getting lots of votes, mine wasn't (possibly a time-of-day thing). Side note: I re-read my comment and "dang" sounded like I was annoyed. By "dang" I mean "HN moderators". :-)


The key benefit of pattern matching is not code that is more concise, but code that is more likely to be correct.

Consider their first example, interpreting a command in a textual game. In many places the logic is divided into a few pieces: check that there are the right number of arguments, do something with some of them, pull off the rest, etc. It's really easy to make mistakes with this kind of code, where your conditions don't quite align. With pattern matching, you write a single pattern for a case, and I would expect this to give many fewer bugs.


On the theme of skepticism, though, I would worry that there may only be a fairly narrow band of complexity for this type of logic where Python 3.10's pattern matching is the ideal solution, past which you want some kind of grammar and a proper parser that just spits out a polymorphic object with a do_thing() method.


Would be nice if we can just get out of this local minima of Python 3.30, and start working on Python 4 that's somewhat backwards compatible and has no GIL and has meaningful threading support.


To rehash my typical HN reply about GIL:

I find that 99% of the time that you need to use threads for CPU work (not just IO), that means that you care enough about performance that your Python just calls into some C library (C extension libraries e.g. numpy, scipy, or even bits of the standard library e.g. zipfile operating on an in-memory buffer). The thing is, all these calls (almost) all release the GIL anyway. So threading support in Python is already meaningful, unless you want to run pure-Python CPU intensive code on multiple threads, which I'll admit does happen but in my experience is rare.


That only works if you can hide your inner loop in the extension module, and can get rid of all accesses to the Python object model inside the inner loop. This works great with numpy if you can work with sufficiently-large arrays so that the GIL can be released for a nontrivial amount of time. But if your algorithm doesn't have this structure, "call into C and release the GIL" just means the inner loop is constantly releasing+reacquiring the GIL, and multi-threaded execution won't be any faster than single-threaded execution due to the fights over GIL.

Unfortunately we're mostly dealing with complex stuff happening during graph traversals. There is no "inner loop" for us -- just the "outer loop" of the graph traversal. So the only way to get our code to run in parallel is to port the whole algorithm to C. But that's 150k+ lines of codes, there will be barely any Python remaining :(

IMHO Python is a trap, it's easy to get started with but you can stuck in a position you can't get out of without a full port to a different language. (and yes, we've already tried multi-processing and using shared-memory segments in the C code portions to avoid re-doing work in every process, ... but after more than a man-year of work, >80% of our application remains single-threaded thanks to the GIL, even though in any other language it would be trivial to parallelize the independent graph traversals)

It's neat to have a scripting language for fast prototyping but we would be much much better off if we had picked Javascript instead of Python (the ability to have multiple interpreters in the same process running in parallel would solve our issues).


Absolutely agreed, especially the part about needing to hide the inner loop in extension module. If you're using numpy to mutiply 2 x 2 matrices together then the fact it releases the GIL is not that much help.

As I said, my experience is that this is not normally a problem in practice. Clearly that's not your experience! I'm not sure whether I've been unusually lucky or you've been unusually unlucky.


I would say that the key benefit for this isn't really in making mature codebases nicer (though it can), but rather for rapidly-developing codebases, especially when doing prototyping or iterative development.

The structured nature and reified syntax works alongside the ability to quickly change the spec, as otherwise you would need to pick apart and reverse-engineer someone's in-the-moment if/elif (likely nested) when there's a change. The structure being based on first-class concepts in the language means that its meaning is common to all code, whereas the post went out of its way to specify how you'd "organise this differently as if/elif", which is clearly something requiring skill and experience (which experts always overestimate the level of), and would produce varying results depending on the person.

This will be taught at a more intermediate level of Python, leaving the complexity only skin-deep, akin to teaching pattern matching in Haskell or such. After all, the full complexity isn't necessary for most users, the vast majority of people will only read the tutorial PEP and not the full description and grammar, while those advanced users can make use of this to make life even easier. Combined with typing, this means tools can do more exhaustive checks and catch bugs pre-emptively,

In my opinion, this will be a boon for clean programs that are easy to read and pleasant to write. In line with what the post says about usage settling down in a couple years, libraries and APIs will find the right balance with time, whilst making it easier and faster to write the thing you meant correctly, especially as this is a boon to tooling since you're quite literally indicating the structural intention of your code.

I see this alongside decorators (used sparingly), comprehension (used one or two levels deep), and types as helping people write the code they want to write, what they mean to write, and the language helping them in that endeavour. Just as we finally have dict.__or__ and dict.__ior__, it's something that lets people write what they intend, rather than the scaffolding that is needed to support it. I personally think that's a positive, and the kind of feature that will become very natural for many people to use in Python even if they don't have a background in languages that have pattern matching.


Honestly I'm just disappointed that I can't match against random locals.

I'm at the point where I'm willing to pay someone to write a preprocessor that'll let me put switch/case statements in my code and then run the preprocessor and have it convert them to if/elif/else statements.


I find it strange how obsessed people are with structural pattern matching while simultaneously putting little mind towards function head pattern matching e.g. Erlang/Elixir. These break up recursive functions so nicely. Maybe it is the lack of TCO in said languages?


Not just recursion but also error handling. Gets all the if x = nil checks out of the way and the actual function only has Business Logic. Same for functions that need to operate differently based on input, no pollution of the body with if branches


I love this style of programming but it does not work well in Python, the language just isn't built for it. TCO is one limitation. Another is that performant Python code sends large chunks of work to native libraries and recursion inside of Python functions is pretty much the opposite of that.


Too much complexity now. This will make languages like Go more relevant.


Too many 'case' repetitions for my taste.


it's amusing that Python is going from no switch case to this ultra powerful version of it


Wtf is non-structural pattern matching? Pattern ~ structure. It's like saying eruptive explosion.

Stop trying to look more important and smart by using more (unnecessary) words.


Pattern matching can be done on run-time properties, see F# Active Patterns




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: