
Optimize Python with closures - jaimebuelta
http://tech.magnetic.com/2015/05/optimize-python-with-closures.html
======
rpcope1
If you're going to optimize with closures, why not go a step further and pull
the conditional out of the closure? You could instead have a conditional in
the function that produces the right closure based on mode rather than
constantly re-checking mode every single time you call the closure.

Also, maybe it would be worth discussing building temporary function variables
in outer scope that resolve the methods before the closure is called (i.e.
my_list_append = my_list.append). There was no opportunity to do this in
example, but it's worth discussing.

Finally, dict access seems like a bad idea if you're this pressed for
performance, especially since you only need one value -- maybe better just to
encourage explicitly passing in a parameter.

~~~
dcrosta
As I mentioned in a comment on Jaime's response
([https://wrongsideofmemphis.wordpress.com/2015/05/08/optimise...](https://wrongsideofmemphis.wordpress.com/2015/05/08/optimise-
python-with-closures/)), we actually do quite a bit more than this in our
production code. The blog post here is meant to demonstrate the point, not to
show our exact production code.

In particular, I wanted all 3 examples to use nearly identical inside the
filter function, to isolate the differences just between the ways of accessing
data in the benchmark results, and to show that closures are an easy way to
gain some performance in hot spots (in CPython at least).

~~~
syllogism
I get that you wanted to simplify the example, but the example you've written
really just doesn't make much sense, so it's hard to understand your point.

The API in the example is really bad. Accepting a dictionary, only to require
a specific key in the dictionary, is the worst of both worlds. But then if you
use .get to access the dictionary body instead of the attribute access, you'll
take on additional performance penalties, and other solutions will start to
compete.

------
syllogism
Why not just use Cython[1]?

Optimizing pure Python is a waste of time. You end up with weird, unidiomatic
code that it takes ages to come up with, because you're fighting the language.
And in the end you hit a wall: a point at which the code can't go any faster.

If you just write Cython, you can easily reason about what the code is doing,
and how you should write it to be more performant. Ultimately you can make the
code run as fast as C, if necessary.

[1] [http://cython.org/](http://cython.org/)

~~~
elyase
Exactly my thoughts, if the code is slow after algorithmic optimizations I
would go directly for pypy, Cython, Numba, Pythran, etc. In my opinion it
makes little sense to optimize pure Python.

~~~
jerf
Errr, did the article change since you read it?

"At Magnetic, we’ve switched from running all of our performance-sensitive
Python code from CPython to PyPy, so let’s consider whether these code
optimizations are still appropriate... PyPy is incredibly fast, around an
order of magnitude faster than CPython in this benchmark. Because PyPy
performs more advanced optimizations than CPython, including many
optimizations for classes and methods, the timings for the class vs. closure
implementations are a statistical tie."

~~~
elyase
Why I said is that I would go _directly_ to pypy and related, without first
trying to optimize like they did in Python world. They guys at Magnetic came
to same conclusion in the end: "Curiously, the function implementation is
actually slower than the class approach in PyPy." I don't find this surprising
anymore because it is what almost always happens in my code.

~~~
gknoy
I think the techniques discussed here have value, pedagogical and otherwise.
Perhaps one is working on a codebase where we don't have the luxury to move to
pypy or similar (e.g., my current one which uses some incompatible libraries,
so transition would be nontrivial).

The fact that that method accessors are much slower than properties, or that
closures can speed things up, is something that we might not always be mindful
of, and that new python programmers might not think through.

------
PhantomGremlin
People say Python is "slow", so it's impressive to me that:

    
    
       our application handles about 300,000 requests
       per second at peak volumes, and responds in
       under 10 milliseconds
    

Of course that is using PyPy instead of CPython.

~~~
istvan__
Which p value is the 10 milliseconds? I would like to hear about the p99.99
latency of this platform. More on this:

[https://www.youtube.com/watch?v=9MKY4KypBzg](https://www.youtube.com/watch?v=9MKY4KypBzg)

~~~
dcrosta
Hi, I'm the author of the post. Our 95% latency is just shy of 10ms, and max
latency around 100ms. Our monitoring tool pre-calculates the percentiles, so I
don't have 99% or 99.99%, but my guess is that they're under or around 50ms.
Too much more than that and we'd be hearing from our partners about timeout
rates. We haven't thoroughly profiled the difference between "most" and "all"
in terms of latency sources, but I'd guess that GC pauses account for some of
it, and some requests are simply much more expensive for us to process than
others.

~~~
sologoub
Hi, work for one of the partners. From what I can tell, your account manager
should be able to get you this info, including 99 percentile, if you are
interested. This is for round trip from our point of view of course.

Spot checking, you've done better than you think :)

Exciting to see python implementation doing this!

~~~
dcrosta
That's always good to hear :) Drop me a line if you want to get in touch.

------
Animats
The bottleneck seems to be overuse of dictionaries instead of fixed
structures. Attributes on the class object might be faster, especially under
PyPy.

------
jaimebuelta
I wrote a follow up in my blog (too long to post here) optimising a little
more
[https://wrongsideofmemphis.wordpress.com/2015/05/08/optimise...](https://wrongsideofmemphis.wordpress.com/2015/05/08/optimise-
python-with-closures/)

~~~
masklinn
You could get a last slight bit of juice from localising the closed over
variables by binding them as default parameters e.g.

    
    
        def whitelist_filter(bid_request):
            return not categories.isdisjoint(bid_request["categories"])
    

to

    
    
        def whitelist_filter(bid_request, categories=categories):
            return not categories.isdisjoint(bid_request["categories"])

------
masklinn
Aside for the criticisms provided by others, there is one more optimisation
which is regularly used in the stdlib: using default parameters to make
closure and global values local.

Locals are the fastest lookup in cpython (by a fair bit), and since default
parameters create local variables and are bound once at function creation, you
can use them to aliase both nonlocals and globals in order to improve their
lookup time significantly:

    
    
        def func1():
            a = 42
            def func2(param, a=a, bool=bool):
                bool(param + a)
            return func2
    

Of course the gain depends on the exact number of lookups performed on these
localised variables and how much work is performed aside from that, but it
exists in both pypy and python and for lookup-heavy functions (such as the one
above) it can be well into the 10% range.

------
schmidtc
So the basic gist of this blog post was: Identify bottleneck, optimize
bottleneck, throw out optimization because pypy.

------
heydenberk
Alternate title, given the article's conclusion: "Don't optimize Python with
Closures"

~~~
dr_zoidberg
If you're using PyPy.

------
_ZeD_
why not use namedtuples (or plain tuples) instead of dictionaries?

~~~
dr_zoidberg
Dicts are the fastest dynamic structure out of the box. Tuples will bite you
with immutability sooner or later.

~~~
boothead
I usually think of immutability in terms of saving myself from getting bitten
:-)

~~~
dr_zoidberg
Touché! Thats a nice thought, however string concatenation with the +=
operator inside a loop is the kind of bite I was thinking about.

