Hacker News new | past | comments | ask | show | jobs | submit login
Toolz: A functional standard library for Python (github.com/pytoolz)
166 points by optimalsolver on Jan 21, 2021 | hide | past | favorite | 108 comments

I had a bit of a derp here at first, I was like what, python already has a functional (as in working) standard lib, then I read the docs and was like ohhhhh that functional.

I think python made some half measures in regards to functional programming, not a bad thing since python tends to blend the different styles decently but at least for me it would be nice to have a good library to extend the functional side a little more, hopefully this scratches the itch.

Toolz, like seemingly everything Matt Rocklin is a major contributor to, is something of a model library: cleanly designed and coded, with strong documentation.

Although Python is not going to match a full Lisp, Haskell or ML in all their strengths, using a functional style can be useful and expressive. The toolz docs give some relevant background at https://toolz.readthedocs.io/en/latest/heritage.html .

At a language level, Peter Norvig gave a lengthy comparison of Python and Lisp at https://norvig.com/python-lisp.html in 2000.

I've been writing functional code for a year or so now at work with the help of ramda in nodejs, there is also a port for python.

My colleagues don't like becuase it's so different to what they are used to but I am putting out code faster and with less bugs so I'm not going to stop.

Why use a library for it? Can it not be done with just Python? And if using the library is much different from normal Python, does it mitigate Python's problems with functional programming? (For example one expression only lambdas and no TCO.)

I also do use some functional concepts in my Python work, but do not use a library for it. Only procedures or functions. No additional dependencies.

I think `grouby` is a compelling example:


> Most programmers have written code exactly like this over and over again, just like they may have repeated the map control pattern. When we identify code as a groupby operation we mentally collapse the detailed manipulation into a single concept.

If you don't use a library, then you have to re-write something like groupby many times, I would expect. Or WORSE, you don't even use the pattern, writing "code exactly like this over and over again".

You probably know this, but for the other readers, I'd like to note that "groupby" specifically is part of Python's standard library (in "itertools" module)


Don't forget that python is batteries included -- and you can avoid 3rd party dependencies a lot of times.

Huh, interesting. This does not remind me of the database groupby operation, but rather of partition, like in SRFI-1. I mean in natural language it is clear to me, why they'd name it groupby, but in programming terms I think partition is more appropriate, as groupby is already "blocked" by the database operation.

Often one only needs one of the partitions though, which is when filter is sufficient. Otherwise I guess one can easily write partition oneself and then use that function over and over again, without resorting to a library.

But perhaps it is a good example, so that you do not have to write partition in every project and if the additional dependency is OK to have, why not, if it is indeed a good one.

> This does not remind me of the database groupby operation, but rather of partition, like in SRFI-1. I mean in natural language it is clear to me, why they’d name it groupby, but in programming terms I think partition is more appropriate, as groupby is already “blocked” by the database operation.

But…this is exactly what a database GROUP BY does. (You’ll always have aggregations in the SELECT clause which work on the data in the groupings, but the GROUP BY itself just specifies splitting the dataset up into this kind of groupings.)

> Otherwise I guess one can easily write partition oneself and then use that function over and over again, without resorting to a library.

Yeah, literally all a library is avoiding having to rewrite code that someone has already written once.

Ah I see. Perhaps the need to always have an aggregation confused me, which is specific to relational databases (all? most of? some?). Of course the aggregation has to work on something and that might be the same as a partitioning. Thanks for clearing that up!

The downside of a library is often, that it comes with its own dependencies and a lot of things you might not need. In general you should not buy into using a library whenever one is available, that among other things offers one procedure, which you need. The decision to use a library should be thought about a little more.

In some databases like Postgres, you can do something like “SELECT x, array_agg(y) GROUP BY x” to get the exact same effect of this groupby operation in toolz.

The source code for that function (groupby in toolz) is bizarre! Creating a defaultdict where the entries are append-to-list functions, calling them, then going over the dictionary again to extract the underling list objects. Does anyone know what this pattern is for, and why one wouldn’t just create a defaultdict(list)?

The function: https://toolz.readthedocs.io/en/latest/_modules/toolz/iterto...

> why one wouldn’t just create a defaultdict(list)?

One shouldn't even do that. groupby is supposed to assume that the input is already sorted by the given key, so it can be implemented as a generator. What's more, it should be implemented as a generator (that's the way the Python stlib's itertools.groupby does it), to avoid having to realize the entire iterable at once.

The Toolz version has a different purpose than the one from Python's stdlib (itertools). One is unsorted input, while the other one is for sorted input. The Toolz version is not a replacement and the documentation states

> Not to be confused with ``itertools.groupby``

It's the avoiding dots pattern [1].

The idea is that referencing the `append` method in the for grouping loop, i.e. doing

takes more time than rebuilding the _rv_ groups dictionary with lists instead of `append` methods. Of course, a benchmark should be run and see how much longer the input sequence needs to be than the resulting groups, for this to happen.

[1]: https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Avo......

Hell yeah me too! Some of my coworkers like it, but there are a few out there who basically don't want to learn something new.. I have been using Ramda professionally for about two years now and I love the heck out of it. My org is using more and more TypeScript which makes using Ramda a little harder but it still works pretty well in this context too. I have been working on a test data inventory project with Ramda recently and it has made the implementation so much nicer than it would have been otherwise. I started using pipeP (by defining it with pipeWith) and omg it's so great for dealing with a mix of async and non async calls.

Are you using andThen with pipeWith?

We have started using types too I've noticed some of the types in @types/ramda are wrong where they only show arrays but it works with strings too like R.startsWith.

I might do a PR and update a few when I have time.

I've dealt with python noob codebase that had many deep side effects over multiple (~2000loc total) files. The overall logic fit into a single page, bugproof and easy to augment using FP idioms.

Yet people want to stick with a "style".


Exactly, I have many discussions with a front end dev about this sort of thing. Our fe team uses angular so is very class focused.

I use classes to but that doesn't mean I can't be functional inside them.

It is annoying when devs religiously stuck to a style even if that makes to code base worse

> My colleagues don't like becuase it's so different to what they are used

Is it worth being faster if it makes everyone else slower?

This does play into my code, we use types so I fully type my code and wrote comments and name functions with a fitting name.

I find classes much hard to follow then functional code should they change their styles becuase it makes me slower?

I think we all need to play to our strengths and as a team make it so it's as easy as possible to follow code even if it's in a style you don't fully understand.

Yeah let's all code for the lowest common denominator.

Functional programming in Python is severely hampererd by Python's lambda keyword, which - simply put - sucks.

I like the way the Functional Programming HOWTO in the Python docs (https://docs.python.org/3/howto/functional.html#small-functi...) puts it:

  Fredrik Lundh once suggested the following set of rules for refactoring uses of lambda:

  1. Write a lambda function.
  2. Write a comment explaining what the heck that lambda does.
  3. Study the comment for a while, and think of a name that captures the essence of the comment.
  4. Convert the lambda to a def statement, using that name.
  5. Remove the comment.

Yeah, Python would really benefit from multi-line anonymous functions.

There was a limitation in Python where spacing and indentation gets ignored between parentheses, which makes it impossible to pass a multi-line lambda as an argument to a method or function. However, given the new parser, that limitation might be able to be mitigated.

I would argue that multi-line anonymous functions are unpythonic. Exhibit A is this line from PEP 20:

  Readability counts.

I'm currently about a month into learning a legacy codebase that was written in a functional language. If I could single out one thoroughly egregious practice that has made this code far, far more difficult to read and understand than it should have been, it's multi-line anonymous functions. In general, if a function is doing something complicated enough to need multiple lines of code, it's doing something complicated enough to merit an explicit name.

> if a function is doing something complicated enough to need multiple lines of code, it's doing something complicated enough to merit an explicit name.

  def foo_and_bar(x):
whew! good thing i named that

IME this limitation just leads to throwaway names like

  do_foo(x) # in foo(x)
because a lot of things just don't have sensible names! just like how it'd suck to have to come up with a name for every loop body

(although there was some book advocating for "replace every loop body with a named function" so some people enjoy that i guess...)

I see you chopping the first two words off that quote. ;)

In this specific case, that single instance of a single pattern is such a throw-away that it doesn't deserve a name, but the pattern itself is easy enough to name. So I'd skip the single-purpose function and create a combinator.

  def do_each(*args):
    def helper(x):
      for fn in args:
    return helper
and then, when I need to do both foo and bar, I don't even need a lambda.

  map(do_each(foo, bar), some_sequence)
That's a fairly specific case, though. Moving back to the general, I would say that a function that does more than one thing, but can't easily be named, is a code smell.

Of course, every general rule has its exceptions. But I'm not so keen on the idea of optimizing one's coding style for the exceptional cases. Going back to PEP 20, "Special cases aren't special enough to break the rules."

(I realize mapping a function that returns nothing is terrible, but I'm feeling too lazy to think of a better example.)

i like a nice combinator as much as the next person! but consider this: if python already had multiline lambdas, would you be arguing for using a narrow-purpose combinator instead of `lambda x: foo(x); bar(x)`?

[this kind of reminds me of Go's generics mess, where workarounds for lack of generics are "just how you write Go and that's the language's philosophy"... until generics land and suddenly they won't be]

Probably. I think the former is more readable and, at the use site, more concise.

Regardless, I don't think the hypothetical is super useful, because its unstated major premise is, "But what if we, for the sake of argument, ignore all the other good reasons why Python doesn't have them?"

My favorite programming language is functional, and has significant whitespace and multiline anonymous functions. While it is my favorite, I do have to concede that the Python language maintainers' worries about the syntactic implications of multiline lambdas in a whitespace language are accurate.

(I could quote the zen of python some more here, too. Lines 5 and 6.)

Assignment expressions were "unpythonic" based on PEP 20 until they weren't. Modern languages have multi-line anonymous functions, and developer expectations have changed in the 17 years since PEP 20 was published.

the thing with python is that it's "indentation-based" -- it's difficult to see "where does this anonymous function return or end?"

maybe it's time to allow optional brackets for marking where a function starts and ends? eg. a { ... } (or anything else -- even {| ... |} will do)

But you can define functions inside of functions (recursively).

  def function1():
      print ("hello outer")
      def function2():
          print ("hello inner")


  hello outer
  hello inner

But you can’t do that (or try/except) in an expression such as a list comprehension or similar.

Yes, because gross.

Nor, IMO, should you want to.

Why not? Many other languages have closures and even Python has lambdas. Multi-line closures work well for these languages including in map/reduce/etc or comprehensions where applicable. Seems like your preference is overfit to Python’s limitations.

Because you can do it in a function. It's not a limitation when you can do it. Having to define a function that's easier to read, and testable as an independent unit, is not a problem.

I don’t think this limitation is likely to change - the only ways to allow whitespace-sensitive statements inside an argument list are all really ugly.

If you really want though, you can write a web server in a single python lambda:

    (lambda flask:
        (lambda app:
                lambda: 'Hello World!'
See https://gist.github.com/e000/1023982 for more horrible examples

I hate it, but it is clever.

You can do multiple lines, just not multiple statements:

    >>> f = lambda x: x * \
    ...  20
    >>> f
    <function <lambda> at 0x7f8b69f2c6a8>
    >>> f(2)

In fact, no statements at all - just a single expression.

well, you could pack a lot of other expressions into that expression, especially with 3.8/3.9, e.g.

x,y = 1,2

q = list(map(lambda t: (tx := t*x, ty := t*y, tx+ty)[-1], [1, 2, 3]))

Walrus operator: not even once.

walrus, actually, being an expression, it's a perfect fit for lambdas

Agreed. I suspect that most uses of python decorators become moot with proper multi-line anonymous functions. I assume some would argue that decorators create more readable code but they seemed like a syntax hack to me.

Functional programming in Python is severely hampered by the fact that Python is statement-oriented rather than expression-oriented.

The lambda limitations are a product of this plus the particular whitespace rules Python has.

This has really not been true since python grew a ternary operator, syntactically you can write functional code perfectly fine in python -- the only significant syntactic shortcoming is lack of pattern matching.

What actually hampers functional expression is semantics and mostly two-fold:

1. functions are terribly slow and always grow stack, so cannot replace iterative constructs.

2. Although python actually comes with a fair amount of functional data structures out of the box (str, bytes, tuple, frozenset, namedtuple, ...) none of them can be "updated" efficiently.

There are some other things (exception as opposed to sum-type based error handling), but fixing the above would be enough to write functional code pretty unhampered, I think.

Pattern matching is on the roadmap for Python 3.10[1].

[1] https://www.python.org/dev/peps/pep-0622/

> Although python actually comes with a fair amount of functional data structures out of the box (str, bytes, tuple, frozenset, namedtuple, …) none of them can be “updated” efficiently.

If they can’t be updated efficiently, they are just immutable than particularly functional.

Of course I agree that calling such data structures immutable is more precise, but it's still in contrast to both imperative programming and imperative data structures. In particular every single functional language (including the purest of the pure) fairly heavily use immutable array based structures in one way or other. So if functional programming requires only using data structures with structural sharing, there are no functional programming languages.

Not really.

Python's lambda does pretty much what you would expect from Scheme. It creates a callable the binds arguments to parameters in a lexically scoped namespace and then evaluates the body of the lambda in that namespace. And with an open parenthesis, you can write multiline lambdas and indent it however you want.

All the usual wizardry is possible:

  >>> (
      lambda n: (lambda fact: fact(n, fact))(
          lambda n, inner: 1 if n == 0 else (n * inner(n - 1, inner))
Also, Python has rough equivalents to some special forms in Scheme:

  (if testexp posexp negexp)  ⟶  (posexp if testexp else negexp)
  (cond (p1 e1) (p2 e2) (else e3)  ⟶  (e1 if p1 else e2 if p2 else e2)
  (begin e1 e2 e2)  ⟶  [e1, e2, e3][-1]
  (and e1 e2 e3)  ⟶  (e1 and e2 and e3)
  (or e1 e2 e2)  ⟶  (e1 or e2 or e3)
Some statements do have a functional form but aren't well known:

  class keyword  ⟶ type(name, bases, namespace)
  import keyword ⟶ __import__(name, globals, locals, fromlist)
You have map(), filter(), partial(), and reduce(). The operator module provides function equivalents for most operators. Also, the itertools were directly based on their equivalents in functional languages or array manipulation languages.

That said, Python does lack some essential tooling that you would really miss:

  - There is no way to create new special forms.
  - Some important special forms are missing: let, let*, and letrec
  - The language spec precludes tail call optimization.
  - Some Python statements lack functional equivalents: try/except, with-statement, and assert-statement

Actually there is a let and let* equivalent as well:

  (let* ((v1 e1) (v2 e2)) e3)  ⟶ ((v1:=e1), (v2:=e2), e3)[-1]
Dr. Racket example:

    > (let* ((x (+ 3 4)) (y (* x x))) (+ y (- x) 8))
Python equivalent:

    >>> ((x:=3+4), (y:=x**2), y-x+8)[-1]

Yes, with walrus expression, lambda gains a lot of flexibility

Except :

- python scoping is not really lexical (and it captures variables, not the values they contain) - python is statement based, so lambdas are less powerful than in scheme.

So not really the same thing.

Given that Python allows unicode characters in source, I'm surprised it doesn't allow lambda to be replaced by the λ character.

For those, like me, who get excited by this note that there are restrictions on the Unicode categories that are allowed, see the supported characters¹ and gory details². It is often enough to write math-y code in a usable way, but occasionally you'll find you can't use the character you want.

¹ https://docs.python.org/3/reference/lexical_analysis.html#id...

² https://www.python.org/dev/peps/pep-3131/

That would be quite unpythonic. Down that road lies Raku.

EDIT: To be clear, I like Raku. But Python has its own distinct aesthetic.

Well you can't do `l = lambda` to replace it either, it's not really anything to do with allowing full unicode identifiers.

I'm surprised that you're surprised.

Also by the lack of persistent data structures with structural sharing. That forces you to take an expensive copy whenever you want a modified version of some data, or bash it in place.

There's Pyrsistent[1], which provides persistent data structures.

[1] https://github.com/tobgu/pyrsistent

Isn't it a true functional way?

There are smarter ways to deal with data structures in the functional setting than copying the entire structure over and over again.

See https://en.wikipedia.org/wiki/Purely_functional_data_structu... and Okasaki’s book Purely Functional Data Structures.

I could definitely be wrong but I think most functional data structures aren’t fully copied. Instead, they’ll utilize something like a pointer to the data that stayed the same so only the part that changes is “copied”.

I imagine it is problematic that Python is already very slow, and then to pile up idioms with performance downsides on top of that is kind of rough.

Functional programming doesn’t have an inherent performance downside, but it depends on specific optimizations in compilers and runtimes to make it fast. Python doesn’t really optimize anything from what I know.

This is one of the biggest reasons I really like (the python superset) Coconut: https://github.com/evhub/coconut

It looks interesting, but I’m not entirely sold. All the examples are about math functions, which is kind of stupid to implement in Python.

How does this benefit “plumbing”?

My biggest problem with Python is that I either have to write


Or try to avoid all the variables by using classes... which is fine... if it wasn’t for self taking up roughly a third of the word count.

Can you elaborate on this? What is wrong with Python's `lambda`?

Like others have said:

- Can only have one line

- Can only use expressions, not statements. E.g. `print`s, loops, conditionals are out.

- Overall just kinda clunky

Here's an SO post about lambdas where the answer is "Use def instead." https://stackoverflow.com/questions/14843777/how-to-write-py...

Exactly. Kind of an own-goal too, from none other than ex-BDFL. While not totally obvious, it's not impossible to widen the syntax to allow multi-line lambdas, you just need to ditch the stack-based lexer-integrated whitespace sensitivity behaviour.

This works for conditionals:

    In [4]: f = lambda x: "Yes" if x else "No"

    In [5]: f(True)
    Out[5]: 'Yes'

    In [6]: f(False)
    Out[6]: 'No'
It's Python's version of ternary operators, so not sure if that counts as a "true" conditional; but it is one.

Loops don't work, but list comprehensions do, and they are definitely the way to go here. Multi-line loops deserve a `def`.

> Can only use expressions, not statements. E.g. `print`s, loops, conditionals are out.

print is a function (and thus can be used in expressions)

python has conditional expressions (<true-val> if <cond> else <false-val>)

loops are a limitation, though comprehensions, map(), functools.reduce(), and the itertools module can allow lots of looping functionality in an expression.

print() is a function now and you can use it with lambdas.

Even in Python 2, you can do something like this:

    println = lambda s: sys.stdout.write(s + '\n')
Not that it really makes things much better, but, at least it shows you can do it.

The lambda calculus is Turing complete, so in theory, Python’s lambdas should suffice...

Turing completeness is completely unrelated to whether or not anonymous functions are easy to use in Python.

Which of course says nothing about usability and readability.

I mean... You're right. They suffice. They're just less friendly (i.e. useful) than lambdas in other languages.

Python’s `lambda` can only contain a single expression.

There’s no good way to add support for full anonymous functions to Python’s grammar. One of the rules that makes significant-whitespace work elegantly is that statements can contain expressions, but never vice-versa.

Plenty of languages with significant-whitespace have multiline anonymous functions, like Haskell, Standard ML, Ocaml, etc. Maybe it's no possible in Python for a student reason, but the reason is not that the syntax is bad on indentation.

The syntaxes of those languages are fundamentally different from Python’s. They don’t have an “expressions may not contain statements” rule - they don’t even have statements.

It sounds like you agree with me: the reason Python does not have multiline lambdas is not that it has significant-whitespace, but other decisions including the “expressions may not contain statements” rule.

Nitpick : neither ocaml nor SML have significant whitespace.

You're right! It's been too long since I used them. I think I was probably remembering F#'s light syntax.

It can only have one line for example.

no, not correct

since lambda is for simple and short anonymous functions most of the time why do I need type the whole word each time? can they also do what javascript does(or similar):

    x => x * 2
instead of

    lambda x : x * 2

Until very recently, Python’s grammar was strictly LL(1), so the parser couldn’t handle, for example, `(x, y) => x * y`.

Perhaps with the move to a PEG parser, this syntax could now be supported?

I like:

    λx: x*2

The use of lambda is becoming somewhat of a smell in Python in general. PSF's own black code formatter will complain about using it and pretty much always says to just use a def instead

Also the recursion limit.

And no tail-call optimization.

Intentional to improve tracebacks. There's trade-offs with every design decision.

I prefer functional style whenever I can. I noticed that this style slows Python programs down. It seems function calls are rather expensive. Does anybody have a similar experience?

PEP 590[1] makes calling callables less expensive. It was finalized for Python 3.9.

[1] https://www.python.org/dev/peps/pep-0590/

That sounds promising, thanks!

It's been a while since I knew those kinds of details, but it certainly been the case in the past.


Yes, I suspect it is a pitfall of a multi-paradigm language that cannot assume so much about code in order to optimize. Opinionated functional languages (ie clojure, haskell) can have a lot more guarantees about what is going on in order to optimize all those function calls, lexical bindings, etc.

There's a bunch of oddities in Python, e.g. recursion limits.

They have to check a recursion limit, for one.

For a type safe approach to functional programming in python, try returns (https://returns.readthedocs.io/en/latest/).

It always amazes me the lengths people will go to to avoid learning & using a different language.

Perhaps if you want statically-typed functional programming, you shouldn’t be using Python? It’s not the best choice for that - in fact, it’s almost the worst choice.

Looks like a very nice more functional-focused alternative to pydash[1] (which is a Python port of lodash.js, which in turn is a superset of another library called underscore.js - whew).

[1] https://pydash.readthedocs.io/en/latest/index.html

I've always found logging and debugging to be just so darn tedious in functional realms.


how to do easily you log/instrument f(g(h(x)))?

Perhaps a custom compose function can help with these use cases? This series has a few examples of composing computation in Python that might be useful.


And there's a general list of resources here.


Not sure how easy convenient this is in practice, but the docs mention using the "do" function [1]

[1] https://toolz.readthedocs.io/en/latest/api.html#toolz.functo...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact