Hacker News new | past | comments | ask | show | jobs | submit login
Speeding up function calls with lru_cache in Python (hackeregg.github.io)
125 points by Immortal333 12 days ago | hide | past | favorite | 79 comments

This technique is called Memoization [0]

Here is an implementation of a memoize decorator in Python that will support all inputs [1]. You'd have to modify to not use all the Pylons framework stuff.

[0] https://en.wikipedia.org/wiki/Memoization

[1] https://github.com/reddit-archive/reddit/blob/master/r2/r2/l...

Caching the result is not speeding up function calls.

One way to optimize performance is to change program text from one thing into something else. In this case, "function calls" refers to the text before the transformation is applied.

It seems reasonable to me to describe an optimization by identifying where it can be applied. If someone wrote a blog post titled "optimizing multiplication by 2" and went on to describe changing "x * 2" into "x + x", there would be nothing unusual about that. It would no longer be multiplication, but we wouldn't necessarily expect it to be.

I agree it can be interpreted another way in this case, though. So while I maintain it is not wrong, I admit it is ambiguous. Someone could have used the same title had they written about a one-line optimization to the Python interpreter.

It is, though. It doesn’t speed up function execution, but it literally speeds up the function call.

It only speeds up calls to pure (side-effect free) functions, that have been executed before with these exact parameters and where the result is still cached. All other calls get slower.

Also, there's an argument to be made that "function call" is the overhead involved in moving control from the call site to the callee (which can be significant in interpreted languages such as Python). That's at least how I interpreted it at first.

It speeds up recursive functions that haven’t been called before with the same arguments. In this case, it caches all the intermediate results that would otherwise have to be recalculated.

No, you have to be more specific than "recursive functions", many recursive functions will see no such benefit. A simple example is factorial function: adding @lru_cache to that will only slow it down.

This technique ("memoization") is a technique from dynamic programming, and it only works on very particular types of functions. The two criteria are:

* Optimal substructure: the function has to be a recursive function that can be solved by solving "smaller" versions of the same function (this is roughly what people mean when they say a function can be solved "recursively")

* Overlapping subproblems: the smaller sub-problems have to overlap, so that memoizing them actually introduces a benefit. For instance, in order to calculate F(10) (F(n) being the Fibonacci sequence), you have to calculate F(9) and F(8), and in order to calculate F(9), you have to calculate F(8) and F(7). That means that F(10)'s subproblems overlap with F(9)'s (both require F(8)).

For the factorial function, the first is true, the second is not. There are no overlapping subproblems for calculating the factorial, so memoizing the function in order to speed it up is pointless.

This is a very particular kind of problem. It's a type of problem you learn about in computer science class, have to face in terrible developer job interviews, and find all over the place on Project Euler. It's is very rare that you actually encounter it in the wild. Giving the advice "@lru_cache speeds up functions!" is misguided: it only speeds up a very particular kind of function. It slows down basically all other pure functions. Also: if you've identified a problem of this kind, memoization is not the fastest way to solve it: turning the recursion upside-down and iterating usually is.

The real use-case for @lru_cache, IMHO, is to use it as an actual cache: like, caching database entries or other data you'd need to do I/O to fetch. That is a much more reasonable use case.

I agree, I think of function call as the time it takes to copy the stack frame, setup variables, and move the program counter. I'm not sure that adding a decorator slows this down, but I suspect it does. A better title would just be 'add memoization in one line in python' or something.

It speeds up algorithms that are amenable to memoization.

Author here. Sorry, If it feels misleading. I have no intention to mislead anybody. If you can suggest alternative title, I am happy to change title.

Thanks for article

Just a side note: with Fibonacci + caching it solidly become Dynamic programming problem so Time complexity reduces from quadratic to o(n), IIRC .

There is a whole class of problems where recursion + memoization(caching) = Top Down Dynamic programming , The other way to Increase performance and actually reduce call stack in these class of problems including Fibonacci would be Bottom Up Dynamic Programming

Some gists I found on it https://gist.github.com/trtg/4662449

Plugging my own writing, but I did a deep dive on top-down vs. bottom-up dynamic programming on my blog: https://avikdas.com/2019/04/15/a-graphical-introduction-to-d...

The follow-up posts go into even more detail on individual problems that can be solved using either form of dynamic programming, including some real-world problems like content-aware image resizing.

Thank you for the link, I enjoyed the post a lot. The drawings you've done add a lot to the explanation. Do you have any other deep dives you've done in the same style that you'd recommend?

Thanks! I'm glad the drawings were helpful.

Most of my writing can be found at my blog (https://avikdas.com/), but here are some from the same dynamic programming series:

- Deep-dive into the Chain Matrix Multiplication Problem - https://avikdas.com/2019/04/25/dynamic-programming-deep-dive... - Real-world applications of DP: https://avikdas.com/2019/05/14/real-world-dynamic-programmin... and https://avikdas.com/2019/07/29/improved-seam-carving-with-fo... - Another real-world application, this time in machine learning - https://avikdas.com/2019/06/24/dynamic-programming-for-machi...

If you look on my blog, you'll also see my recent series is on scalability, things like read-after-write consistency and queues for reliability.

Yes, I have mentioned DP in my article as just reference. The Purpose of article was to show that how easy it is to implement caching in python using lru_cache decorator which is in std lib. So, We don't have to install any 3rd party lib.

oh sorry , somehow I missed it :)

the simple recursive solution is actually O(fib(n)), which is exponential

I'd suggest something along the lines of "Speeding up functions [in Python] by caching results" :)

"python memoization in one line"

Edit: I don't actually have much of a problem with the article headline as it is now. It depends a lot on your target audience! For Hacker News, yeah, we know what memoization is because we learned it in CS 102, right after we learned about recursion.

But for a lot of people who would get the most value from the article, the word "memoization" isn't going to mean much, and wouldn't read the article.

Maybe something like: "Memoization in Python, or how I learned to speed up function calls in just one line"

I had worked on a open source project called 'safecache' on the similar note. As others has already commented, @lru_cache does not play well with mutable data structures. So my implementation handles for both immutable and mutable data structures as well as multi-threaded operations.


Thanks for this.

(Fibonacci numbers have a closed form analytic solution that can be computed in constant time: https://www.evanmiller.org/mathematical-hacker.html)

Yep, this is called Binet formula, and actually all linear recurrence sequences have one.

What defines 'linear' here?

The solutions form a vector space

I regret that does not elucidate. Does linear mean additions only? Or non-exponentiation, or something else?

Linear means that f(n) is a linear function of f(n-1)...f(n-k). This means that you can calculate f(n)...f(n-k+1) in terms of a matrix multiply on f(n-1)...f(n-k) (most of the matrix is empty, with ones just off the diagonal). Taking powers of the matrix lets you take more steps. Diagonalizing the matrix lets you take powers in constant time. (You may have to worry about precision. Often rounding to nearest integer after that fixes things though). If you are worried about that, or just don't have floating point, the "fastexp" algorithm turns it into logarithmic time.

If A and B are a solution, then A+B is also a solution. For a constant r, if C is a solution, r*C is also solution.

That's it. The nice closed form of exponentials as solutions is just for linear difference equations with constant coefficients. In that case, you can always find a solution by trying λ^n and seeing what that forces λ to be -- it'll be a root of some polynomial.

Does this work in practice for large values of n? You will be limited by numerical error in phi.

It does not! It fails very early (I tested it once in Python, and it failed on something like F(70) or something) and should not ever be used for practical calculation of Fibonacci numbers.

For small values, the iterative version works just fine. For larger values, there are other methods: one obvious one is using the matrix form of the Fibonacci numbers, which just requires you to take an exponent of a 2x2 matrix (which you can do quickly because of exponentiation by squaring).

The Fibonacci series grows so fast that for "large" values of N you'll need a arbitrary size integer implementation anyway, so at that point you might as well go for a (non-IEEE) arbitrary size float type and get all the significant digits you need.

You’ll still need to compute the digits of Phi though, and then exponentiate that. My intuition is that using ints is still faster.

Calculating powers isn't constant, as far as I know.

I recently discovered that joblib can do something similar, both on disk and in memory: https://joblib.readthedocs.io/en/latest/memory.html

This memoizes closures of functions, and lets them be executed on other processors? Does it support dependency tracking between memoized closures (incremental recomputation), or do I have to roll that, myself?

Unfortunately you need to roll that yourself in joblib. But it is automatic in bionic (https://github.com/square/bionic), assuming your workflow can be structured in the way bionic likes.

Bionic is what I'm looking for (right on their use-case page): a Make replacement that can work with intermediate computations, and not just files.

> As, we can see the optimal cache size of fib function is 5. Increasing cache size will not result in much gain in terms of speedup.

Try it with fib(35), curious what you find.

After, reading your comment. I cross checked result and found out that 3 is most optimal size. I even ran fib(40) on it. with size 2, There are many misses but after 3 and on wards misses are constant(if fib(40), then misses are only 40 which emulates DP approach of O(N)).

Why 3 is optimal, because of how recursion and LRU work. I wish i can explain it using animation.

You can play with it. https://repl.it/repls/NocturnalIroncladBytecode#main.py

It makes sense. f(n) depends on f(n-1) and f(n-2). So if the cache is able to produce these 2 values, you basically get the linear algorithm from the article. I assume the running f(n) also takes up a cache slot, hence 3 instead of 2.

If this theory is correct, every recursive function f(n) requiring access to f(n-x) should have x+1 as maximum usefull cache size.

Why would fib(35) be more optimal with a differently sized cache? And how come _5_ is optimal? Wouldn’t 2 be all you need? What am I missing?

I think it's likely that the author's analysis of why 5 is the optimal cache size heavily depends on the fact that they only tried fib(10). For example, it's clear that cache 20 and cache 30 are identical if your universe of possible inputs is only [0 .. 10], haha.

Because of the way the recursion goes down the tree depth-first, I think 5 basically cuts down the computation to fib(10-5) = fib(5) which is pretty manageable, so the author couldn't really see any further measurable performance gains by increasing the cache further. I think for fib(35) it'd be clear that cache size 35 would help compared to cache size 5. (I picked 35 instead of, say, 300, because I think 300 would just not finish with cache size 5, it'd take forever haha.)

Maybe I'm abusing lru_cache but another use for it is debouncing.

We had a chatbot that polls a server and sends notifications, but due to clock skew it would sometimes send two notifications. So I just added the lru_cache decorator to the send(username, message) function to prevent that.

I noticed that anytime I thought I was clever by using memoization on a function with side-effects, that it in fact could become very complicated very quickly. It's a bit similar in your situation: What if someone actually wants to send the same message twice? Depending on your codebase it could be hard to debug the issue which will arise.

One thing where I do think it is kind of safe is for loading a file into a class instance. The memoization will avoid defining (1) a method to check to see if the file is loaded, (2) a method to load the file and (3) a class variable which points to the loaded file.

That sounds very interesting. Can you detail the problem a bit (I'm not sure I understand how clock skew affects python code) and how lru_cache decorator fixed it?

It's not clock skew of the CPU/Python in any sense. My theory: Their script is run every 10seconds without taking into account the execution time of previous runs, and they noticed duplicates as the same message would be picked up on multiple executions of their polling script because of probably network delays on the REST API they're querying. So they used the lru_cache decorator to filter out duplicates according to the message content and the user that made it. Really bad design if you ask me, as it's essentially "throwing spaghetti" onto the wall instead of figuring out what's really going wrong. As other people have mentioned, they should instead be using a plain ID to filter out duplicates because it is the most unique identifier.

Essentially the OP is using the decorator to prevent the function call from ever being run more than one time. It’s at most once.

More specifically, it prevents a call being repeated _in quick succession_, if there are enough calls in between repetitions it'll fall out of the cache and be reprocessed.

Maybe clock skew is the wrong word. Basically we'd run the script every 10 seconds, which would use the Jira API to fetch any comments made in the last 10 seconds. So sometimes Jira would return the same comment twice, causing the script to send out two notifications, about 10 seconds apart.

That sounds to me like a Push setup would be preferable here. There are a number of Jira add-ons that can run code in response to system events like a new comment. That way you should be able to avoid the concurrency problems completely.

That comment entry that you get back from JIRA should have a unique "ID" attached to it that you can use to do a duplicate check on. According to you, your function signature is send(username, message), in which case this solution will fail if the same user makes the same comment on two different issues.

Have a look at their REST API docs for retrieving comments on an Issue:


You'll see that they respond with an ID for each comment. That ID is unique and probably the PK of the comment on the JIRA application's DB. That is the the key that you need to use to do a duplicate check. Not username and message content.

I checked the code (probably should have done that in the first place) and it turns out we include a Jira link in the message, which contains the comment ID.

This is neat atho I would agree it isn’t the right tool. The reason is that the time until a second message with the same content can be sent is indeterminate, based on how many intervening messages are sent. If your app does want to send the same notice twice, with an hour in between, and there are no intervening messages, it would fail, so the behavior is unpredictable.

A timed algorithm seems better suited. That said, it probably doesn’t matter much.

Are you not using a message id of some sort?

Do you mean in the send() function? No sorry I'm talking about a chatbot that forwards Jira comments into a chatroom, so usually the combination of the username and the message is unique enough on its own to produce a good hash for lru_cache without needing a message_id if that's what you're referring to.

Well, that's pretty cool.

Functools’ lru_cache also has good methods for getting more info about the cache’s utilisation (.cache_info() I think), which is quite helpful when found in logs.

One line summary: Use lru_cache decorator


def fib(n):

lru_cache doesn't work with lists or dicts, or indeed any non-hashable data, so it's not quite a transparent change. About half the time I use a cache I end up implementing my own.

What approach are you using to efficiently store and look up non-hashable function arguments?

Converting to tuple has mostly been enough. I want to say always, but I don't trust my memory that much, haha.

This is neat and I learned something new. TLDR: use the functools.lru_cache decorator to add caching to slow functions.

I must admit I was hoping for a general approach to speeding up all function calls in python. Functions are the primary mechanism for abstraction and yet calls are relatively heavy in their own right. It would be neat if python had a way to do automatic inlining or some such optimization so that I could have my abstractions but avoid the performance hit of a function call (even at the expense of more byte code).

A benefit of Python is that it removes consideration of low level type details. You can just code your problem and think at a higher level, in terms of larger custom classes etc. This is great for developer speed. But to improve performance of code, you need to micro manage your types, you need to understand details about the compiler and the processor, you need to know what the machine is doing to the data to service your particular process, and streamline it accordingly.

Is there a magic function? If one crops up, it would get baked into Python, development of which is extremely active. So magic functions trend obsolete. The main thing developers can consistently do to improve performance, is micro manage types and algorithms to improve the performance of bottlenecks. This is done with a tool like Cython, but you have to learn the discipline of coding for performance, you have to learn the details of what the machine is doing with the data in your particular piece of code. When do you want to replace linked lists with arrays? When are you doing unnecessary type conversion? How overweight are your high frequency objects? How is memory being used? Are there more efficient ways on my microarchitecture to get this calculation done... Performance coding is a discipline like application coding.

Is this a problem with Python? I don't think so, because performance comes second in the developer process. Write the code first then optimize the real bottlenecks.

You might be interested in pypy or nuitka

What's the ruby equivalent of `functools` and specifically `functools.lru_cache`?

Note that `lru_cache` doesn't just do caching, it also provides convenient methods on the original function to get cache stats etc in a pythonic way.

The last bit of deterministic functions is a main selling point of functional programming (and get help from compilers) or any design that promotes creating pure functions.

> One line summary: Use lru_cache decorator

Okay, at least got the decency of providing a TL;DR. But if your summary is literally three words, why not put it in the title: “speed up Python function calls with functools.lru_cache”?

God I hate clickbait.

Huh? He sped it up much more by rewriting as a loop . . .

dvic CV function on problems


tl;dr – memoization.

Instead of the last quote in the article, I prefer this one (got it from [0])

>"There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors." – Martin Fowler

And there's plenty of similar articles, for example [1] [2]

[0] https://www.mediawiki.org/wiki/Naming_things

[1] https://dbader.org/blog/python-memoization

[2] https://mike.place/2016/memoization/

Even in this version, the quote still feels incomplete without someone shouting "Concurrency!" while it is delivered...

Concurrency.""There are three hard things in computer science: cache invalidation, naming things, off-by-one errors, and

Thanks for similar articles. I am aware of similar articles and that's why I have kept it short and simple. I will add these as references, later.

I have enjoyed reading Python Tricks[0] book by Dan bader (author of https://dbader.org/blog/python-memoization ). It's awesome. Might be outdated but I still recommend it to any body who has just started learning python.

[0] https://realpython.com/products/python-tricks-book/

Hey, sorry if my post came off wrongly. In hindsight, I should've worded better, something like "similar articles for further reading".

That attribution is wrong - the mediawiki page links to Martin Fowler’s collection of quotes, where he says the off-by-one version came from Leon Bambrick https://twitter.com/secretgeek/status/7269997868?s=21

thanks for the correction, I cannot edit the post though :(

> @functools.lru_cache

Avoid these tricks if you care about thread safety.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact