

Profiling Python with cProfile - ymichael
http://yrmichael.com/2014/03/08/profiling-python-with-cprofile.html

======
habitue
I'd say you had it right the first time with the list comprehension. List
comprehensions in python are way faster than while/append loops because
they're implemented in C. The issue is that you had the "not in" which is O(n)
over a list (making the comprehension O(n^2) ). If the documents are hashable,
I'd suggest making results a set. "not in" is constant time on a set. If a
document is something not hashable, you can make results a dictionary and get
the same constant time access.

In python function/method calls are expensive, so avoid them at all costs
inside tight loops.

~~~
pjscott
For fairly small values of "implemented in C". In CPython 2.7.5, the list
comprehension

    
    
        [doc for doc in all_docs if doc not in results]
    

generates this bytecode (output by the dis module):

    
    
              0 BUILD_LIST               0
              3 LOAD_GLOBAL              0 (all_docs)
              6 GET_ITER
        >>    7 FOR_ITER                24 (to 34)
             10 STORE_FAST               0 (doc)
             13 LOAD_FAST                0 (doc)
             16 LOAD_GLOBAL              1 (results)
             19 COMPARE_OP               7 (not in)
             22 POP_JUMP_IF_FALSE        7
             25 LOAD_FAST                0 (doc)
             28 LIST_APPEND              2
             31 JUMP_ABSOLUTE            7
             34 (whatever comes next in the program)
    

Let's go through it. Instruction 0 (BUILD_LIST) makes an empty list and puts
it on the stack. Instruction 3 (LOAD_GLOBAL) does a global variable lookup and
puts it on the stack. (If this occurs in a function it would be a local
variable lookup, but this isn't important.) Then we start iterating over an
iterable with GET_ITER and FOR_ITER. In the body of the loop we use STORE_FAST
to set the topmost stack value to a local variable "doc", then LOAD_FAST to
grab its value and put it atop the stack, then LOAD_GLOBAL to get "results",
then evaluate the "not in" with a single COMPARISON_OP bytecode, then either
go to the beginning of the loop, or load doc onto the stack again and then
append it to the list we're building. And then go to the beginning of the
loop.

Now, let's compare with the obvious non-list-comprehension version:

    
    
        docs = []
        for doc in all_docs:
            if doc not in results:
                docs.append(doc)
    

This generates some similar bytecode:

    
    
             0 BUILD_LIST               0
             3 STORE_FAST               0 (docs)
    
             6 SETUP_LOOP              42 (to 51)
             9 LOAD_GLOBAL              0 (all_docs)
            12 GET_ITER
       >>   13 FOR_ITER                34 (to 50)
            16 STORE_FAST               1 (doc)
    
            19 LOAD_FAST                1 (doc)
            22 LOAD_GLOBAL              1 (results)
            25 COMPARE_OP               7 (not in)
            28 POP_JUMP_IF_FALSE       13
    
            31 LOAD_FAST                0 (docs)
            34 LOAD_ATTR                2 (append)
            37 LOAD_FAST                1 (doc)
            40 CALL_FUNCTION            1
            43 POP_TOP
            44 JUMP_ABSOLUTE           13
            47 JUMP_ABSOLUTE           13
       >>   50 POP_BLOCK
       >>   51 (whatever comes next in the program)
    

The main difference here is that there's a lot more loading and saving of
variables (e.g. the "docs" variable which was implicit in the list
comprehension version), and things like the docs.append() method have to be
loaded each time through the loop and then called, which slows things down
some compared to using a single LIST_APPEND opcode.

I'm not really going anywhere with this; I just think that interpreter
internals are fun.

~~~
habitue
Nice! I knew comprehensions were faster, but apparently was mistaken as to
why.

------
pekk
I'm glad that there wasn't a "complain about the GIL" or "write everything in
Go because Python is too slow" step in this optimization workflow

------
nwenzel
Very helpful to have a UI for digging through profiler results. I use the
django debug toolbar but I think your profiler will be incredibly helpful for
finding bottlenecks. Thanks for posting, thanks for sharing.

~~~
jbaiter
Then you might _really_ enjoy runsnakerun:
[http://www.vrplumber.com/programming/runsnakerun/](http://www.vrplumber.com/programming/runsnakerun/)

------
tgb
Kernprof and lineprof are also helpful.

------
RogerL
You'll perhaps take this as snark, but I am not going to read light gray text
on a white background with a poorly rendered font (windows 7, firefox 27, 47yo
eyes).

~~~
ymichael
Ahh sorry about that. I'm like the most terrible UI person. ever... thanks for
the feedback. made the text darker.

