I strongly think Python should have more functional programming support. Functional Programming Languages are usually scary to look at for many programmers. Python would not only be a good FP introduction to them, it would also benefit greatly.
Years ago I found out the Guido wouldn't let tail recursion included, and even tried to remove map function from built in functions. Therefore I got the impression that python wouldn't have further support in FP. I really wish that is not the case. With the coming pattern matching in 3.10, My hope is high again.
I have very high respect for Guido van Rossum, I'm not here to discredit him. He's one of the authors in PEP 634.
I wish python would have simpler syntax for lambda which is currently using the lambda expression, even JS is doing better on this. A built-in syntax for partial would also be great. It could be even better if we can have function composition.
Some problems are better solved the FP way. It could even make the program more readable which is one of the strengths of Python.
Yes, coming back to Python after many years with Kotlin and Elm, I see how the Python language guides people to write code in a more unreadable way.
Mutations everywhere. But mainly because mapping and flatting stuff is so burdensome. And it lacking many fp functions and a proper way to call them ergonomically makes a crazy list comprehension the goto tool, which often is much less explicit about what's going on than calling a function with a named and widely understood concept.
> And it lacking many fp functions and a proper way to call them ergonomically makes a crazy list comprehension the goto tool, which often is much less explicit about what's going on than calling a function with a named and widely understood concept.
I've said roughly this before somewhere on HN but cannot find it right now: list comprehensions are like regexes. Below a certain point in complexity, they're a much cleaner and immediately-graspable way of expressing what's going on in a given piece of code. Past that point, they rapidly become far worse than the alternative.
For example, "{func(y): y for x, y in somedict.items() if filterfunc(x)}" is clearer than the equivalent loop-with-tempvars, and significantly clearer than "dict(map(lambda k: (func(somedict[k]), somedict[k]), filter(filterfunc, somedict.keys()))))". Even with some better mapping utilities (something like "mapkeys" or "filtervalues", or a mapping utility for dictionaries that inferred keys-vs-values based on the arity of map functions), I think the "more functional" version remains the least easily intelligible.
However, once you cross into nested or simultaneous comprehensions, you really need to stop, drop, and bust out some loops, named functions, and maybe a comment or two. So too with regex! Think about the regex "^prefix[.](\S+) [1-7]suffix$"; writing the equivalent stack of splits and conditionals would be more confusing and easier to screw up than using the regex; below a point of complexity it's a much clearer and better tool to slice up strings. Past a point regexes, too, break down (for me that point is roughly "the regex doesn't fit on a line" or "it has lookaround expressions", but others may have different standards here).
The annoying thing is they have a performance cost though: list comprehensions can be very fast because they stay in the C codepath of Python, whereas unrolled into a loop you lose some of that.
Please suggest a pattern matching expression syntax that doesn’t require completely changing how the language works.
Pattern matching took so long because it was extremely hard to find that compromise, and I actually think they did an okay job. No, it won’t cover every case, but it will also cover the important ones.
x = match y:
case 0: "zero"
case 1: "one"
case x:
y = f(x)
y+y
print(x)
it can require semicolons or parentheses or even braces somewhere if that's necessary to make the grammar work, idc. just don't make me type out `x = ...` a hundred times...
(yes, i know you could use the walrus operator in the third case)
(also, if anyone wants to reply with "use a named function because READABILITY", please save it)
Judging by the votes you got, your proposal is not well-liked.
In general, breaking Python's syntax rules to introduce block expressions or what you want to call it seems like a steep price to pay for what amounts to syntactic sugar in the end.
However, your proposal actually misses the mark in the same way as the actual match/case does as well. What would be /really/ handy in Python is something like
{'some_key': bind_value_to_this_name} = a_mapping
because it is so common, especially if you're consuming a JSON API. You of course see the myriad problems with the above.
I think you should read the PEPs on match/case, these things have actually been considered. One idea was to introduce a keyword to say which variables are inputs and which are outputs, but that also violates the general grammar in a way that isn't very nice.
The accepted solution has warts, I agree, but at some point you just have to accept that you can't reach some perfect solution.
> Judging by the votes you got, your proposal is not well-liked.
i'm at peace with that :) [1]
> In general, breaking Python's syntax rules to introduce block expressions or what you want to call it seems like a steep price to pay for what amounts to syntactic sugar in the end.
that's your opinion, i disagree! i think Python would be a better, more pleasant to use language if they figured this out. a girl can dream, ok?
> What would be /really/ handy in Python is something like
{'some_key': bind_value_to_this_name} = a_mapping
yes, extending the destructuring syntax would be nice, i agree! but the original question was "Please suggest a pattern matching expression syntax [...]", and the post you were responding to was specifically talking about `match`'s syntax. [2]
> I think you should read the PEPs on match/case
i read them when they came out. i'm assuming you're referring to this section from PEP 622:
> "In most other languages pattern matching is represented by an expression, not statement. But making it an expression would be inconsistent with other syntactic choices in Python."
i believe Python's statement-orientatation kinda sucks, so to me this is just "things aren't great, let's stay consistent with that". yeah, yeah, "use a different language if you don't like it" etc.
---
[1] well, maybe not quite, after all i'm here arguing about it.
[2] `match` uses patterns like the one you described in `case` arms, but is distinct from them. i don't see why dict-destructuring syntax couldn't be added in a separate PEP.
I'm interested in similar proposals that can provide the basis of some common syntax that could be transpiled to other functional programming languages https://github.com/adsharma/py2many
I'm less interested in the syntax (will take anything accepted by a modified ast module), more in the semantics.
Here's a syntax I played with in the past:
def area(s: Shape):
pmatch(s):
Circle(c):
pi * c.r * c.r
Rectangle(w, h):
w * h
;;
def times10(i):
x = pmatch(i):
1:
10
2:
20
;;
return x
Two notes:
* Had to be pmatch, since match is more likely to run into namespace collision with existing code.
* The delimiter `;;` at the end was necessary to avoid ambiguity in the grammar for the old parser. Perhaps things are different with the new PEG parser.
Learn You a Haskell for Great Good taught me list comprehension, and I was surprised to see such a pithy shorthand available in Python (+ dict comprehension). That's a very functional way to transform objects from one shape to another with an in-line expression, no lambda keyword there.
lambda keyword is better than nothing, it definitely can be improved. Just imaging using javascript syntax in your example.
> To your point, I only recently learned there's a Map function in Python, while in JS I'm .map(x=>y).filter(x=>y).reduce(x=>y)ing left and right.
I think with the introduction of list comprehension Guido saw map function was no longer needed, that was why he wanted it removed. I don't deny it, but using map and filter sometimes are just easier to read. Say [foo(v) for v in a] vs map(foo, a).
It's a bit hard to find recent numbers in that decade-old thread, but the relevant, more up to date, evidence I see (in this answer) says that map is not much slower than a list comprehension: https://stackoverflow.com/a/57104206
One gripe that I have with functions like map is that it returns a generator, so you have to be careful when reusing results. I fell into this trap a few times.
I'd also like a simpler syntax for closures, it would make writing embedded DSLs less cumbersome.
> One gripe that I have with functions like map is that it returns a generator, so you have to be careful when reusing results
I hope that is never changed; I often write code in which the map function's results are very large, and keeping those around by default would cause serious memory consumption even when I am mapping over a sequence (rather than another generator).
Instead, I'd advocate the opposite extreme: Python makes it a little too easy to make sequences/vectors (with comprehensions); I wish generators were the default in more cases, and that it was harder to accidentally reify things into a list/tuple/whatever.
I think that if the only comprehension syntax available was the one that created a generator--"(_ for _ in _)"--and you always had to explicitly reify it by calling list(genexp) or tuple(genexp) if you wanted a sequence, then the conventions in the Python ecosystem would be much more laziness-oriented and more predictable memory-consumption wise.
Personally, I mostly just avoid using map/filter and use a list/generator/set/dict comprehension as required. I don't find map(foo, bar) much easier to read than [foo(thingamajig) for thingamajig in bar]
i think thats the most powerful part of the tool. being able to effortlessly write lazy generators is absolutely amazing when working with any sort of I/O system or async code.
> It could be even better if we can have function composition.
I think composition and piping are such basic programming tools that make a lot of code much cleaner. It's a shame they're not built-in in Python.
So, shameless plug, in the spirit of functools and itertools I made the pipetools library [0]. On top of forward-composition and piping it also enables more concise lambda expressions (e.g. X + 1) and more powerful partial application.
First you claim functional languages are scary to look at, then say that you want Python to become more like functional languages. But maybe the reason Python is elegant and easier to read is exactly because Guido had the self-constraint to not go full functional. You also miss the part of the story where he actually did remove `reduce` from builtin, exactly because of how unreadable and confusing it is to most.
It's exactly that kind of decision making I expected from a BDFL, and I think Guido did a great job while he was one keeping Python from going down such paths.
Well, the BDFL is probably the only dictator we all love.
Any decision he made is infinitely more than I could. Because I am just a python user, and an outsider in any decision making process. So for me, he's right all the time. That's a perfect definition of a dictator :)
But I do have wishes. It's like I love my parents but I do want to stay up late sometimes.
Yeah, I totally missed the part he removed reduce from builtin. Sorry about my memory. map, filter, or reduce, it does not matter. As I stated, some problems are better solved functional way. Because Python is such a friendly language, if it includes functional paradigm properly, it would make the functional part more readable than other functional languages.
FP is scary not because it has evil syntax to keep people at distance, it's just an alien paradigm to many. Lot's of non functional languages has functional support, which doesn't make them less readable. E.g. C#, JS. I suspect these languages have helped many understanding FP more. Python could make the jump by including more FP, but not turning into a full-fledged FP.
> Because Python is such a friendly language, if it includes functional paradigm properly, it would make the functional part more readable than other functional languages.
I guess my argument is that Python is friendly exactly because it uses FP paradigms sparingly and conscientiously. Some paradigms such as map and filter can definitely make your code cleaner, while others (such as reduce) only lead to headaches. That being said as you mention we are getting pattern matching, I'm curious to see how that ends up.
i would argue that C# and JS do also use it sparingly and don't go too hard, but also that Python is generally a cleaner language than those. It's hard to know how much of it can be attributed to what, but generally, it's not a trivial thing to predict what impacts a given language feature will have on the readability of the code. A given FP paradigm might be cool in isolation, but when stacked with 4 other ones, it can quickly lead to gibberish code.
After giving some more thought on it. I just realized it was python got me into FP. Although the majority my projects are in C#, Linq never had me interested in FP.
You are right, C# and JS use FP sparingly. I think I expect more from Python because I enjoy writing in Python and my functional adventures were rooted in Python.
Personally, lambda expression is painful to look at. Many times I have deliberately avoided it. It's possible to make improvements, but it seems the FP part just stagnated. Looking forward to pattern matching though.
I'd much rather debug someone's pure functional code rather than trying to figure out how someone's imperative code got into some unexpected/invalid state.
The worst part of python is the lack of utility functions for munging collections. But it sits at a slightly higher level than this - things that are idiomatic in other scripting languages like groupBy(fn), sortBy(fn), countBy(fn), collate, are all inexplicably like IKEA self assembled furniture in Python instead of first class entities. It makes lots of routine data munging and algorithmic work much more annoying than it needs to be (comparing to Groovy, Ruby, etc).
The real issue is that python doesn’t really make it easy to define anonymous functions. If you want one with more than one line, you need to name it and move its definition outside the point of use. Quite annoying.
Note that the parentheses are to avoid parsing a '+=' expression with 'lambda x: x' on the left and '1' on the right:
>>> lambda x: x += 1
File "<stdin>", line 1
SyntaxError: cannot assign to lambda
This restriction is problematic because of the following combination:
- Python statements don't compose as well as expressions; we can write one statement followed by another, and we can nest statements (or blocks) inside control statements (if, else, with, etc.), but that's about it. We can't assign statements to variables, we can't call a function on a statement, or return a statement from a function, etc. i.e. we can't abstract over statements (unless we try wrapping them in a named function and using that as an expression, but that can break the underlying functionality of the statement, e.g. 'def ifThenElse(cond, true, false):' will evaluate both branches when called).
- Python uses statements for a whole bunch of stuff. In particular it has a 'with foo as bar: baz' statement, rather than e.g. a 'with(foo, lambda bar: baz)' expression; the recent pattern matching functionality is a statement rather than an expresssion; defining a named function is a statement; etc.
Taken together, we can often find ourselves needing to introduce a statement somewhere in our code; that requires us to turn a 'lambda' into a 'def'; but 'def' is a statement, so we have to turn any enclosing expression into a statement too; and this propagates up to disintegrate whatever expression we had; leaving us with a big pile of statements, full of boilerplate for managing scopes and names.
I honestly prefer that, anything function longer than a line probably could use a name that gives it some context, even maybe a short docstring, to make the code more readable.
Excessive usage of anonymous functions is a major pain. JavaScript developers for some reason almost refuse to name their functions, even though it would make reading and reusing code much easier.
You can make lambdas in Python, but naming function is in my mind very much a "feature".
Have you actually used other languages like clojure, haskell, kotlin, go or ruby? Python is the weird one here for not allowing multi-line lambdas. There are legitimate and actual useful cases for multi-line lambdas, disallowing them does more harm than good.
If the idea of multi-line alone compromises readability, then why even have multi-line blocks at all? Function, loop and conditional blocks should all at most have one line each. I hope you can see the absurdity of this.
Weird as in it goes against well-established practices. If there are good, compelling reasons to disallow multi-line lambdas, other languages would have adopted it. But no, the opposite is happening, newer and recent languages actually have multi-line lambdas. Get this, even java adopted multi-line lambdas. Crazy right? What's wrong with everyone?
There's nothing wrong with having to name a function. But there is something wrong with not having a choice and being forced to name a multi-line function. Just as you don't want to name every sub-expressions, you don't want to name every blocks of code. Unnecessarily naming things pollutes the namespace.
Where's the strawman? The premise was that functions that spans more than a line should be given a name because they are "unreadable". I have showed that, by this logic, anything that spans more than a line is "unreadable". You can't make special exceptions and say this only applies to lambdas. Lambdas (aka higher-order functions) are blocks of code just as much as anything else.
A pure function (which a lambda typically is) answers a question. It's helpful to have words describing what question it answers. The next person to read the code will actually be able to read the words.
An if statement is a lower level mechanism for directing the code path within a function. If it needs a name, wrap it in a function. I often do.
Control statements rarely "need" a name, but if they are significant enough to be named they can be named by extracting meaningful and somewhat reusable functions.
If you just have a short one or two lines of code in a callback, maybe it's not so important. Where I have an issue is when you have five nested anonymous functions. Why wouldn't you break those out into named functions?
Assuming you have five levels nested if, ignoring the fact that you might be doing something wrong, then you should name those, but that requires you to extract those code paths and put them into separate functions.
Because an if statement is very constrained in what it can do. Every if statement has two branches (maybe one is a no-op) and one of them executes. If someone sees an if statement, it is implementing a fork in the code path.
A function can do literally anything. Seeing a function call, it hasn't narrowed down the realm of what might be happening even a little bit.
"lambda f:" gives away very little information about what is about to happen. "if f:" gives away some information about what is about to happen.
Not a perfect rule of thumb, convoluted if statements do exist. But that is why functions are more in need of hints and comments than other control structures.
> "lambda f:" gives away very little information about what is about to happen. "if f:" gives away some information about what is about to happen.
I think that's an unfair comparison:
If 'the thing that happens' for "if f:" is 'branch on the truthyness of f', then I agree that's pretty clear. However, if we're going to ignore the content of the branches then we should also ignore the body of the function, so "lambda f:" is also pretty clear: 'parameterise with f'.
If we're worried that "lambda f:" could do pretty much anything in the body, then we should also worry that "if f:" can do pretty much anything in the branches.
> If we're worried that "lambda f:" could do pretty much anything in the body, then we should also worry that "if f:" can do pretty much anything in the branches.
That is the part common to both. If you take that common part out, you will be left with more information in the if statement than the function call.
In the function, you know what arguments it (might) be interested in. You know they are evaluated. But in the if statement, you know what values it is interested in, usually get the same amoutn of information about whether they are evaluated, much clearer guarentees about how they are used and that the aspect of them that matters is their truthiness. Also the code might be about to branch.
Put it this way - if(...) is a specific - and common - named function. It is a very easy argument to make that it is carrying more information about what is going on than a general anonymous function.
If you take that common part out, you will be left with more information in the if statement than the function call.
I don’t think this argument generalises well. Once you start working with structured data and more complicated control logic, a generic loop with an arbitrary body often provides less immediate information to a developer reading the code than a function that explicitly names a specific pattern of computation like map or filter.
There are many recurring patterns of computation that are much more specific than a generic map or filter, too. While imperative languages tend to have only a small number of built-in control structures, if we’re using higher-order functions then we can name as many of these patterns as we like.
For example, suppose we want to generate a list of the first few triangle numbers, which is a sequence of the form:
[0, 0+1, 0+1+2, 0+1+2+3, ...]
Writing this in imperative Python code with a typical loop would give us something like this:
triangle_numbers = []
total = 0
for n in range(6):
total = total + n
triangle_numbers.append(total)
Here’s an idiomatic functional version, which is real Haskell code using a standard library function that captures this pattern of sequence generation in just five characters:
triangle_numbers = scanl (+) 0 [1..5]
Assuming you’re equally familiar with both programming styles, I think it’s fair to say that “scanl” tells you much more about what this code is doing than “for”.
The inner function here is just addition in both cases. However, the code within the Python loop is allowed to do whatever it wants, so the reader has to recognise the pattern surrounding the addition. In the Haskell version, the function you pass to scanl must take an accumulated value so far and a next value as parameters and it must return the next accumulated value to add to the sequence being generated, and that is all it is allowed to do. So you just need to write (+), which is Haskell’s notation for what Python calls operator.add or an equivalent lambda, and that tells the reader exactly how the next accumulated value is being derived at each step to build up the sequence.
But you've named all your functions here so I'm not sure which part of this you're arguing about.
My intended point was that comparing lambda: to if: is apples-to-oranges.
Both styles in my example have an outer structure that builds a sequence, wrapped around an inner calculation that says how you build the next value in the sequence. The inner calculation here was simple addition, a one-liner in the for loop and (+) in the Haskell version. However, it could equally have been something more complicated.
If it were, a reader would still have to figure out what the body of the for loop was really doing. In contrast, the (possibly anonymous) function passed into scanl would still have a more constrained purpose, which is immediately known because scanl represents one specific pattern of computation. The implicit algorithm it represents has a hole that is a certain shape, and you can only pass it a function with the right shape to fit into that hole.
Put another way, if you’re programming in this functional style, it’s not writing the word lambda that tells you what the next piece of code is for, it’s passing that lambda expression into a higher-order function like scanl. Passing an anonymous function is like writing the nested block for a branch of an if statement or the body of a for loop. It’s the call to scanl that is roughly analogous to writing the if or for statement itself.
Lambdas are constrained by their type. If you see a lambda being passed to, let's say `map`, then (assuming you know the type of `map`) you know the type of the passed lambda.
Of course this assumes the lambda is pure, but in the kind of code we're talking about (pseudofunctional code with lots of higher-order functions) they should be pure for the most part.
Indeed, there are many such explicit omissions in python (switch statements, assignment expressions (before 3.8), increment/decrement operator, etc) that may seem to some as a bug, but it really is an explicit decision and part of what makes Python the clean and readable language it is.
IMO multi-line lambdas are unnecessary; they can't be easily documented or better maintained than a named function. If your anonymous function needs more than a statement it makes perfect sense to define it as a separate named function. Faster to find, document and maintain.
I don't understand why you wouldn't just define a function if you needed multi lines.
For the same reason that you wouldn’t necessarily define a function every time you wanted more than one line in the body of an if statement or for loop.
In a functional programming style, typically most of your control structures are represented as higher-order functions and you pass them other functions where you might use nested blocks of statements in an imperative style.
That is, where imperative pseudocode might look like this:
for n in [0..10]:
output_array[n] = n * n
some corresponding functional pseudocode might look more like this:
output_array = for_each [0..10] (n -> n * n)
In some functional languages, it’s even idiomatic to write the supporting function so it reads more like this (borrowing Haskell’s $ notation, which avoids the awkward parentheses):
output_array = for_each [0..10] $ \n ->
n * n
Now you might want a multiline body for the supporting function in much the same situations that you might want a multiline body for the imperative for loop:
for n in [0..10]:
location = get_location(n)
route = fastest_route_to(location)
times[n] = average_journey_time(route)
times = for_each [0..10] $ \n ->
location = get_location(n)
route = fastest_route_to(location)
average_journey_time(route)
Similarly, if the logic you’re using in the inner part of the code is a self-contained concept, you might want to factor it out into a named function of its own in either case.
This functional style can be tidy and very flexible in languages that are designed to support it. However, I wouldn’t necessarily encourage it in a language like Python, precisely because Python’s syntax and language features don’t make it natural and concise to write like this.
itertools.groupby isn't really the groupBy operation that people would normally expect. It looks like it would do a SQL-style "group by" where it categorizes elements across the collection, but really it only groups adjacent elements, so you end up with the same group key multiple times, which can cause subtle bugs. From my experience, it's more common for it to be misused than used correctly, so at my work we have a lint rule disallowing it. IMO this surprising behavior is one of the unfriendliest parts of Python when it comes to collection manipulation.
It's common in other languages as well (at least Haskell) and a bit surprising at first. However, a `.sortBy(fn).groupBy(fn)` is easy and of similar efficiency and when you actually need the local-only `groupBy()` you're happy it's there.
A bit more expressive overall.
At least it is better than lodash' useless groupBy which creates this weird key value mapping, loses order and converts keys to string and what not.
yep, that's a good example of what I refer to as IKEA assembling your groupby. You need to put something like 3 parts together before it does what you want, and they aren't that intuitive (or they only are in retrospect).
The resulting groups are also iterators which are exhaustible. It's good if you're running group by on a huge dataset to save some memory, but for everyday operations it's another trap to fall into.
Yes, for itertools.groupby to work as most people would expect, the data needs to be sorted by the grouping key first. That may obviously cause a significant performance hit.
Annoyingly, itertools is one of those packages in the Python standard library with wordsruntogether-named functions, so it's really known as itertools.groupby instead.
It's a package, not a built in, but pytoolz [0] is a very complete solution to the type of functional programming munging you are taking about. I wish more people knew about it (no one here seems to have mentioned it). And it has been Cython optimized in another package called cytoolz [1]. The author explains in the Heritage section how it is an extension of and plays nice with itertools and functools [2].
For Python coders new to functional programming, and how it can make working with data easier, I highly recommend reading the following sections of the pytoolz docs: Composability, Function Purity, Laziness, and Control Flow [0].
Yes - the description of writing Python as like assembling Ikea furniture is absolutely spot-on. Yes, you can do it, and the results can be nice, but by God it is sometimes such a pain.
The comment showing where groipBy, sortBy etc can be found just shows the problem - they are all in different libraries.That's just plain annoying! And don't get me started on the pain of trying to build an Ordered Dictionary with a default initial value!
What leaps out at me is that these 3 functions are all straight out of relational algebra style worldview.
Python the language doesn't support relational algebra as a first class concept. The reason it feels like IKEA self assembly is probably because you are implicitly implementing a data model that isn't how Python thinks about collections.
groupBy(fn): {fn(x): x for x in foo}
sortBy(fn): sorted(foo, key=fn)
countBy(fn): Counter(fn(x) for x in foo)
flatten: [x for y in foo for x in y]
filter(fn): [x for x in foo if fn(x)]
All the "for x in blah" can seem like a lot of boilerplate for a non-Pythonista, but actually it becomes subconscious once you're used to it and actually helps you feel the structure (a bit like indentation isn't necessary in C but it still helps you to see it).
For compound operations (e.g. merge two lists, filter and flatten), I find the code to be a lot more easier to "feel" than if you'd combined several functional style functions, where you have to read the function names instead of just seeing the structure.
Hmm, you're right... that one is actually quite hard to do in a Python-ish way. There is a one liner, just for the record, but it's absolutely horrific code because it redundantly re-evaluates the same conditions over and over:
g = {fn(x): [y for y in foo if f(y)==f(x)] for x in foo}
As you said, you'd have to use a for loop. I think sometimes people are unnecessarily afraid of for loops in Python (not everything has to fit on one line!) but this is a case it's a pity to take up so many lines. Or, back to square one, probably this is a case it's best just to use itertools groupby (after a sort) after all.
> All the "for x in blah" can seem like a lot of boilerplate for a non-Pythonista
It's not the boilerplate that's the problem, it's that using the same construct for everything doesn't convey its intention.
If I see a chain of filter, map, group etc., I instantly know what each part does. I know that filter does a filtering, and then I can look at the function passed to it do know how exactly it's filtering. If I see a list comprehension, I have to first understand if it's filtering or mapping or whatever (or even all at once), before I can grok it.
> About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite of the PR value, I think these features should be cut from Python 3000.
("Python 3000" was the code name for Python 3.) He eventually stepped back from this hard line view but it remains a fact that it's a bit of a historical accident that it's present at all.
The thing is, filter is present in almost all languages now while list comprehensions are in Python, Julia and Haskell. Maybe it's not "pythonic", but Python doesn't exist in a vacuum and is influenced by other languages, as proven with the optional typing and the pattern matching. Maybe it's time to review the definition of pythonic?
One of the harder things for beginners to grasp when learning a second or third programming language is that language is not just syntax. You see it all the time in Python in particular: people writing code like it's C or Java. It's no different to grabbing a French dictionary and transliterating your English sentence to French word by word. It just doesn't work and, in some cases, is completely wrong. Grokking a language means so much more than knowing the key words for doing all the stuff you used to do in your old language.
Most popular languages offer similar operations on collections these days. I've been using Java, Scala, JS/TS, purescript, Ruby, C++, Rust in the past, and these days all have a similar offering to iterate over collections using idiomatic functions such as map/filter/sort/etc.
I'm working on some python these days and I find quite unpleasant the 1-line lambdas, and general messy options to implement common collections operations.
10 years is very optimistic: flatten has little use without a functional pipeline (something like clojure’s threading macro) as in other contexts it’s a trivial comprehension.
I think the motivation there historically was to encourage use of generators and generator expressions — it’s definitely intentional, because things like `apply` and `reduce` used to be builtins but now they’re imports.
functools and itertools are amazing and I love them both. They are especially useful for teaching high school CS without having to stray from Python, which the kids at all levels know well and are comfortable with.
However.
Using @cached_property feels like a bad code smell and it’s a controversial design decision. The example given is more like an instance of a factory? Perhaps if the result of the render function was an Important Object™ (not just a mere string) then the function’s callers might not be so cavalier about discarding the generated instance.
I would like to see the calling code that calls `render` more than once in two different places (hence the need for caching with no option for cache invalidation?). When every stack frame is a descendant of main() then there’s no such thing as “two different places”!
def main():
page = HN().render()
while ‘too true’:
do_work(page)
tea_break(page)
Because of the GIL the cache is often pretty useless on a webserver anyway, as each and every request gets a different process not sharing that cache. So then one has to deploy Redis in addition and complicate the stack. Makes sense with a shared cache when there's multiple instances, but often it's overkill. Having to go "external" to fetch the data I might as well not cache anything.
I might have been a bit unclear. Not every request gets a new process, but there are a pool of processes. For instance gunicorn will maybe have ~20 processes ready to serve requests, with a main process sending work to the workers. Since it's different processes, there is very little communication or shared resources between them. It's almost as if you're running your main entry point multiple times in different terminals, just with gunicorn managing it for you.
This is different than for instance java, which there is one process and then multiple threads. But because of the GIL in python, only one thread can ever run at the same time (in the same process), therefore one has to launch multiple instances/processes of the application.
That's true, cache utilization declines as a function of your processing parallel (and in gunicorn/celery, your max-requests-per-process-before-suicide setting). Cached properties still can help a good deal though, especially when coupled with a) careful prefork initialization of cached data and b) even more careful use of multiprocessing's little-known shared memory interface which really does allow caches to be shared, without read locks, between multiple processes.
It’s just a smell. It hints that instead of passing the instance, one should instead be passing the output of the function.
It also doesn’t really matter. The only time things like this have actually been important was in giant Java codebases, where attempts were made to keep packages cleanly separated (and so therefore passing the simplest possible public types reduces the amount of public classes).
That's one of the thing I love about JS. Sure, you don't have all the good static typing that comes with Scala or OCaml, however map, filter, reduce and lambdas are easily accessible. We may get record, tuples, pattern matching and a pipeline operator at some point!
Huh, it's the little things that are the neatest surpises. Having to write the EEA multiple times during my crypto class to do the homework was a real pain in the ass.
Big shoutout to lru_cache. I tossed in two lines of code and was able to get a 60x speedup in my code by reducing the amount of regular expression compilations I had to do.
For people wanting to do more functional programming in Python, have a look at toolz (https://toolz.readthedocs.io/en/latest/). It goes a bit further in that direction.
The only thing I really think is missing from python is good pattern matching (curious to try out the new implementation), and a pipe operator.
Chaining functions is somewhat annoying in python if you want to do any sort of cross library work (which is super common in the data side of the language). Pandas has nice function chaining, but if you want chain data through cleaning functions and maybe do some nlp work you don't end up with a very readable or nice syntax IMO.
> Coconut is a functional programming language that compiles to Python. Since all valid Python is valid Coconut, using Coconut will only extend and enhance what you're already capable of in Python.
> Why use Coconut? Coconut is built to be useful. Coconut enhances the repertoire of Python programmers to include the tools of modern functional programming, in such a way that those tools are easy to use and immensely powerful; that is, Coconut does to functional programming what Python did to imperative programming. And Coconut code runs the same on any Python version, making the Python 2/3 split a thing of the past.
This is how I would expect it to work. Caching mutable collections is a legitimate use case, distinct from caching the immutable values those collections contain.
`lru_cache` does not wrap your cached values in container types on its own, so if you’re getting bit by mutable values it’s probably because what you cached was a mutable container (like `list`, `dict`, or `set`). If you use primitive values in the cache, or immutable containers (like `tuple`, `namedtuple`, or `frozenset`), you won’t hit this problem.
If you were to manually cache values yourself using a `dict`, you’d see similar issues with mutability when storing `list` objects as values. It’s not a problem specific to `lru_cache`.
This isn’t special behavior that lru_cache “implements”, it’s intrinsic to how mutable contained objects work.
Imagine you `lru_cache` a zero-argument function that sleeps for a long time, then creates a new temp file with random name and returns a writable handle for it. It should be no surprise that a cached version would have to behave differently: Only one random file would ever be made, and repeated calls to the cached function would return the same handle that was opened originally.
If you didn’t expect this, you might complain that writes and seeks suddenly and mysteriously persist across what should be distinct handles, and maybe this `lru_cache` thing isn’t as harmless as advertised. But it’s not the cache’s fault that you cached an object that can mutate state outside the cache, like on a filesystem.
Logically, the behavior remains the same if instead of actual file handles we simulate files in memory, or if we use a mutable list of strings representing lines of data. Generalizing, you could cache any mutable Python collection and see that in-place changes you make to it, much like data written to a file, will still be there when you read the cache the next time.
The reason you don’t see “frameworks” for this is because tracking references to instantiated Python objects outside of the Python process is pointless — objects are garbage-collected and are not guaranteed to stay at the same memory location from one moment to the next. Further, if the lists themselves are small enough to fit in memory, surely there’s no need for out-of-memory scale to cache simple references to those objects.
Stepping back, I think part of your surprise toward `lru_cache` stems from familiarity with out-of-core or distributed caches, where the immutability of cached values is simply imposed by the architecture. In a distributed cache, modifying the cached value in-place means making another API call, so you can’t mutate the cached object accidentally because you mistook it for a copy.
The only way you can have this confusion is if the actual cached object could be somehow returned. That only happens when everything in the cache is actually just a Python object in your same process.
I disagree- it works wonders for expensive functions that return strings and ints. Maybe "don't use lru_cache with mutable values" would be more accurate.
Thanks for noting that edge case though, I've never thought about that.
Kinda the same as using mutable values (list and dicts maybe being the most common pitfalls) as default arguments to functions. Shared between all function calls.
It’d be annoying for every type used in a cache to need to properly implement __deepcopy__(), and it’d pose significant performance impact, especially if the cached objects are large (which there’s a good chance they are, given you’ve felt the need to cache them rather than build them from scratch).
Much better off using the same assignment semantics used throughout Python and let people choose to deepcopy() all the objects they read from the cache if they really really want to modify them later; it would even work as a simple decorator to stack onto the existing one for such cases.
Aside: If we’re missing “deep coping” anything by default in Python, it’s definitely default parameters :) deep copying caches makes much less sense than that.
> An important fact about assignment: assignment never copies data.
Is that really what's going on here? I'm in way too deep to be sure what's best for beginner programmers, but I feel like Python must surely optimise...
sheep = 1
goats = sheep
sheep = sheep + 10
... by simply copying that 1 into goats, rather than tracking that goats is for now an alias to the same value as sheep and then updating that information when sheep changes on the next line.
Now, if we imagine those numbers are way bigger (say 200 digits), Python still just works, whereas the low level languages I spend most time with will object because 200 digit integers don't fit in a machine word. You could imagine that copying isn't cheaper in that case, but I don't think I buy it. The 200 digit integer is a relatively small data structure, still probably cheaper to copy it than mess about with small objects which need garbage collecting.
The semantics of assignments in Python are not the same as assignment in C. When you assign a local like `x = some_expression` in Python, you can read it as, “Evaluate `some_expression` now, and call that result `x` in this local namespace.”
The behavior that results from your example follows from this rule. First, evaluate `1` and call it `sheep`. Then evaluate whatever `sheep` is, once, to get `1` (the same object in memory as every other literal `1` in Python) and call it `goats`.
The last line is where the rule matters: The statement `sheep = sheep + 10` can be read as, “Evaluate `sheep + 10` and call the result `sheep`.” The statement reassigns the name `sheep` in the local namespace to point to a different object, one created by evaluating `sheep + 10`. The actual memory location that `sheep` referred to previously (containing the `int` object `1`) is not changed at all — assignment to a local will never change the value of any other local.
This is easy to remember if you recall that a local namespace is effectively just a `dict`. Your example is equivalent to:
It should be clear even to beginners that `d["goats"]` has a final value of `1`, not `11`, because the right-hand side of `d["goats"] = d["sheep"]` is only evaluated once, and at that time it evaluates to `1`. Assignment using locals behaves in exactly the same way.
>Is that really what's going on here? I'm in way too deep to be sure what's best for beginner programmers, but I feel like Python must surely optimise...
For these particular numbers, CPython has one optimization I know of: Small integers (from -5 to 256) are pre-initialized and shared.
On my system these "cached" integers seem to each be 32 byte objects, so, over 8kB of RAM is used by CPython to "cache" the integers -5 through 256 in this way.
There's similar craziness over in String town. If I mint the exact same string a dozen times from a constant, those all have the same id, presumably Python has a similar "cache" of such constant strings. But if I assemble the same result string with concatenation, each of the identical strings has a different id (this is in 3.9)
So, my model of what's going on in a Python program was completely wrong. But the simple pedagogic model in the article was also wrong, just not in a way that's going to trip up new Python programmers.
This isn't a pre-made list of certain strings that should be cached, this is the compiler noticing that you mentioned the same constant a bunch of times.
Also in general you would see a lot of things with the same id because python uses references all over the place. E.g. assignment never copies.
Might be interesting as an option for the @lru_cache decorator to be able to specify a function to call before handing a cached value to the user. Then you could just do
@lru_cache(post=deepcopy, ...)
to have a new (copied) instance in cases where that's required. Or do whatever else you needed. Maybe you happen to know some details about the returnee that would let you get away with copying less and could call my_copy instead.
Something something power of function composition.
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://martinheinz.dev:8080/api/v1/posts/52. (Reason: CORS request did not succeed).
Note to author: I am unable to parse this sentence: "It's a simple wrapper on top of the lru_cache which omits the max_size argument making it smaller and after as it doesn't need to evict old values."
Years ago I found out the Guido wouldn't let tail recursion included, and even tried to remove map function from built in functions. Therefore I got the impression that python wouldn't have further support in FP. I really wish that is not the case. With the coming pattern matching in 3.10, My hope is high again.
I have very high respect for Guido van Rossum, I'm not here to discredit him. He's one of the authors in PEP 634.
I wish python would have simpler syntax for lambda which is currently using the lambda expression, even JS is doing better on this. A built-in syntax for partial would also be great. It could be even better if we can have function composition.
Some problems are better solved the FP way. It could even make the program more readable which is one of the strengths of Python.