
Guido van Rossum's Performance tips for Python - gits1225
https://plus.google.com/u/0/115212051037621986145/posts/HajXHPGN752
======
fijal
This is, sadly, completely opposite to what we try to advocate with PyPy,
which is you should not sacrifice your abstractions in favor of performance
and you should not need to rewrite in C, barring few corner cases, like
writing kernels.

Surprisingly enough, I happen to have a talk when I discuss this precise
topic, for people with a 30min to kill.
[http://www.youtube.com/watch?v=ZHF5Aius_Qs&feature=youtu...](http://www.youtube.com/watch?v=ZHF5Aius_Qs&feature=youtu.be)

~~~
jmilloy
The point of the OP was "patterns for fast python", i.e performance. It's
absurd to claim that you "advocate abstractions over performance" in the
context of achieving performance, so your comparison doesn't make sense to me.
Just because you don't advocate performance doesn't mean that these aren't the
ways to achieve performance. In addition, it sounds you mostly agree with
regard to rewriting in C... use it as a last resort. It sounds to me like you
are just trying to disagree.

~~~
JulianWasTaken
I don't see where in fijal's comment you got the impression that he doesn't
advocate performance. I very much would hope (and believe) otherwise, that he
cares very much about it. It also does not seem that he agrees at all with
regard to writing in C, since his "last resort" sounds much more last than
yours (again, a thing I'd expect from him ;). His point sounds like it is "if
you care about performance, these are not really ways to get it, because they
sacrifice more important things". If your issue with that is "yeah but I still
want performance" I would bet that fijal would point you to PyPy :).

~~~
jmilloy
>you should not sacrifice your abstractions in favor of performance

I only point out that a list of optimization techniques that advocates nothing
will be different than a list of optimization techniques that advocates
certain abstractions at the expense of performance. And there's nothing sad
about that.

~~~
JulianWasTaken
I repeat, I see nothing there that says "don't optimize for performance".

It says "if you want performance don't do X because X is not a reasonable
sacrifice". Then he linked to some better ones.

If you watch the linked talk, "write simple code when given the choice between
simple and complex code" seems like fijal's main suggestion (and in my limited
experience with PyPy has worked for me). The other main suggestion that I
assume is underlying is "if you want performance try PyPy". Neither of those
require sacrificing abstraction. Some of Guido's suggestions do.

------
Xion
Python code optimization is actually tricky ground where analogies from other
languages (like C/C++) don't necessarily apply.

One of the more surprising results (esp. for non-pythonists) is the fact that
string formatting:

    
    
        s = "%s" % some_integer
    

is faster than "casting" to string:

    
    
        s = str(some_intenger)
    

That's solely because looking up the 'str' symbol requires finding an element
in global symbols' hashtable. This turns out to be more expensive than parsing
the format string and building the result of % operator.

~~~
fauigerzigerk
Apparently there's a function call overhead as well that the % operator
doesn't incur because

    
    
      for i in xrange(10000000):
        s = "%s" % i
    

is also faster than

    
    
      lstr = str
      for i in xrange(10000000):
        s = lstr(i)
    

[Edit] After thinking about it a second longer, I wonder whether there is some
lookup for lstr as well even though it's local. But storing the function in
lstr is faster than using str so I'm not sure how this is actually
implemented. I'm sure someone here will know more.

~~~
njharman
Yes there is lookup, always lookup. Dynamic language, something in loop could
change what lstr is.

Each "dot" incurs lookup. Your example reminds me of one idiom. Assigning
nested.look.up to local var for access inside loop.

~~~
dmorgan
> _Yes there is lookup, always lookup. Dynamic language, something in loop
> could change what lstr is._

And why is that presented as something inevitable?

The interpreter/compiler could analyze that part and see that the
function/name is not changed during the loop, and cache for that.

I'd guess that PyPy tries to do it that way, anyway...

~~~
lvh
Yes, that's one of the many optimizations PyPy does, however it's only a very
limited one. PyPy goes much, much deeper than this, with tools like function
inlining and escape analysis.

------
freyrs3
Numba and Cython are also worth mentioning if you're doing numeric
programming.

~~~
pjscott
Cython is definitely worth learning if you're writing in Python and want to
make something that runs fast -- numeric or otherwise. It's a Python-to-C
compiler that has two killer features:

1\. For the most part, you can just take existing Python code and have it
magically transformed into not-too-horrible C. A few optional type annotations
will help with the speed, but the compatibility is great right from the get-
go.

2\. With a little care, you can often get the inner loops of your Python code
to be just as fast as hand-written C.

The tutorial is a quick read, and gets you up to speed without much effort:

<http://docs.cython.org/>

------
CJefferson
This is one advantage of C++ (and C, and others) which I didn't really realise
until fairly recently. Because I have a good quality compiler, I do not have
to worry about little functions.

If I want to make a class with a single int member, and a bunch of member
functions, I can trust that will, in most cases, compile away to be just as
efficient as a raw int and inline code. It is very liberating to not have to
worry about the efficiency of creating another function.

------
avallark
I would just say:

1\. Use built in data structures whenever possible

2\. First fix your process flow, then spend more time to fix your code. Doing
steps A C E F B and D might be better than A B C D E F.

3\. Write direct queries with db when all else fails.

------
dbaupp
Another good one is: use PyPy (assuming that it supports your code).

------
zobzu
It's missing one tip I like: Make sure your functions are calling CPython
functions for heavy operations, specially when using libraries.

I often use python libs which functions are written in python itself and
create bootlenecks. Rewrite those functions by calling "native" CPython
functions, generally make an extremely large differences

------
fox91
Be aware, a lot of Python tips for improving performance are releated to the
Python implementation you're using.

We should remark a clear distinction between Python tips and CPython tips. In
PyPy or other implementations some tips does just not make sense and are
useless (e.g.: local scope stuff vs. outer scope stuff in loops).

------
Nursie
Don't assume that threads will be useful, in CPython they are simply an
abstraction and don't actually run in parallel. To make use of a multi core
system you need to use processes.

I'd still like it if they could get rid of the GIL...

~~~
d0mine
GIL or no GIL, threads run concurrently in Python. The difference might be in
performance:

    
    
      - IO-bound tasks: GIL is released
      - CPU-bound tasks on N-core on a single host. To exploit multiple CPUs: 
    
         a) no GIL (hypothetical): N times speed up (optimistically)
             (it suggests weak data dependency i.e., 
              multiprocessing can be used to the same effect)
    
         b) option a) with multiple processes (shared memory or communication-based approach)
             Code complexity is the same on average (except on Windows)
    
         c) C extensions (existing or new): speed up 10*N or more on numerical code
             Cython makes it easy to write new extensions.
             Currently due to dynamic nature of Python, GIL or no GIL, 
             C extensions might be necessary to exploit hardware fully
             (though C extensions might not be an option in some projects)
    
      - scalable to multiple hosts tasks: different processes i.e., GIL is not a problem
    

Benefits of GIL:

    
    
      - C extensions (and the interpreter) are simpler to write correctly. 
         Multithreaded programming is not trivial we need all the help we can get.
      - no performance penalty for single-threaded code
      - it encourages a synchronization through communication concurrent model (builtin in Go)
    

Disadvantages:

    
    
      - some applications have no localized performance bottlenecks. 
        So writing a small C extension won't help to get possible 
        performance benefits of running on multicore in parallel.
        If performance is critical; Python might not be the right tool in this case
      - there are pathological cases when performance suffers greatly due to GIL
        (though other approaches would have their own pathological cases)
      - a (non-informed) perception that Python can't benefit from multicore

~~~
Nursie
OK, sure, they run concurrently, but they can't be in the interpreter at the
same time. The result is that using python threads to do python things, you
won't utilise more than one core, most of the time.

There are a myriad of ways around it, but it is another thing you need to take
into account when writing python programs or deciding to use python for a
project.

~~~
d0mine
yes, writing efficient programs is different in different languages.

"- Don't write Java (or C++, or Javascript, ...) in Python." and then you
don't need to search for workarounds.

~~~
Dylan16807
So if someone writes code that makes heavy use of nonlibrary python code in
multiple threads, you're automatically going to blame them as writing in the
style of some other language? That's pretty No True Scotsman.

I'm not arguing for or against the GIL here but it's something to keep in
mind. "Threads run concurrently" is misleading. And it's not a matter of being
pythonic, the GIL isn't even a _python_ feature.

------
rjh29
Programmers should not be forced to sacrifice flexibility from using function
calls, objects or getters/setters in order to speed up a program. The
performance hit from those things should either be miniscule or optimised away
by the compiler as appropriate.

~~~
grifaton
Sounds like you're looking for a Sufficiently Smart Compiler[0]. James Hague
has a good piece on why this might not be so desirable[1].

One of the reasons I'm fond of Python is that, while there is a tradeoff
between flexibility and performance, it gives you the means to sacrifice
flexibility to aid you in improving performance -- once you've identified what
(if any) actual performance bottlenecks you face.

[0] <http://c2.com/cgi/wiki?SufficientlySmartCompiler> [1]
<http://prog21.dadgum.com/40.html>

~~~
Peaker
Optimizing away getter/setter calls and function call overhead in general is
well within the reach of current compilers.

~~~
tjgq
In general, function calls cannot be inlined by the Python compiler because
(almost) any function name may be re-bound to a different object at run time.

A very smart compiler could probably attempt to prove that no such
modification can occur at run time throughout the whole program; but this is
much harder than simply deciding whether inlining a given call is worth it or
not.

~~~
cma
V8 handles them w minimal overhead in JavaScript.

~~~
pjscott
And more to the point, PyPy can do Python inlining with its tracing JIT. The
method, in both cases, is similar: find some type assumptions that are useful
and usually true, generate code under those assumptions, and fall back on more
generic code if they're ever false.

------
stefantalpalaru
"The universal speed-up is rewriting small bits of code in C" - pretty much
sums up the subject of Python performance. Even in Cython you have a Python
dialect that's translated to C.

