
Python Patterns - An Optimization Anecdote - niyazpk
http://www.python.org/doc/essays/list2str.html
======
nene
I decided to try writing the same thing in JavaScript and discovered something
really strange.

My first idea was:

    
    
        numbers.map(function(x){return String.fromCharCode(x);}).join("");
    

This was pretty fast already, but why not eliminate the anonymous function
completely and pass String.fromCharCode directly to map():

    
    
        numbers.map(String.fromCharCode).join("");
    

I timed it and... ...this was ~100 times slower than the previous version.
WTF!

Somehow passing this native function directly to Array.map() is way slower
than wrapping it inside another function and passing that to Array.map().

I have tested it so far in Chrome, Firefox and Opera - the results are the
same. I also tried forEach(), which behaves similarly.

Does anybody have an idea why this is so?

Update: I tried the same with Math.round and Math.sin, and with these the
results were as one would expect: passing the function to Array.map() directly
was a little bit faster than using intermediate anonymous function. So it
seems the problem is with String.fromCharCode specifically.

~~~
lysium
Just a guess: your anonymous function cannot be redefined (because there is no
name), but String.fromCharCode could potentially be. Thus, a similar reason as
mentioned in the article for global vs. local variables.

One would think that String.fromCharCode is looked up only once, though.

~~~
pornel
Actually it's the opposite. When you pass String.fromCharCode directly, then
you're passing reference (not name) to that particular implementation, and it
can't change.

When you pass anonymous function, then every execution of that function needs
to look up `String.fromCharCode` (anonymous functions save scope, not
references).

I'm surprised by the benchmark as well. I suspect it may be because calls to
native functions are handled differently from calls to JS functions, and JS
engine is able to optimize call inside anonymous function (create trace/JIT
and inline it), but not when calling by reference (and perhaps keeps calling
it by some expensive proxy object).

------
edanm
I once found myself writing a python program to calculate Poker hands. It was
pretty fun to write - it generated, for any given hand you hold, every
possible way the game can go forward, and generated statistics about how many
times you win (against 1 opponent only, who could be holding anything.)

The program initially ran for 20 minutes on each hand. I worked a day on
optimizing it, and got it down to around 1-2 minutes per hand, so I learned a
few tricks to make things quicker. Lessons learned:

1\. Local lookups are _much_ faster.

2\. Anything that you can somehow offload to a c-loop, makes things much
faster.

3\. Surprisingly, the fastest construct for building a new list was always
list comprehension (e.g. [x for x in calc(l) or whatever.) This was far faster
than a regular ol' for-loop, of course, but was also faster than using map's
and reduce's.

~~~
devinj
Point 3 is weakened by the fact that map can indeed be faster.
<http://codepad.org/SzXcubXy> \-- in fact, point 2 and point 3 are in
conflict, because map() _does_ use a C-loop, whereas list comprehensions do
not. As it turns out, point 2 was the correct one.

Also, and this is just on a more minor, pedantic note, list comprehensions
basically are regular for loops, except with the use of the list-append opcode
instead of a getattr-call pair. This produces a minor speedup, but it's
nothing to write home about, honest. (using a different pastebin because it
has a feature I needed): <http://ideone.com/dT1LG> . The story is a bit
different on 3.x, where list comps are implemented very differently:
<http://ideone.com/58FGZ>

~~~
edanm
First off, I'm really no expert in Python _or_ optimizations - I'm only
reporting on observations.

Having said that, I was pretty surprised that the list comprehensions ran
faster than map, exactly because of point 2 - I had assumed that map used a
c-loop, whereas list comprehensions did not. Unfortunately, I don't have that
code sitting around anywhere so I can't redo the tests - I just remember being
very surprised by those results.

~~~
deathflute
In my experience edanm is correct. map/reduce/filter are faster with built-in
functions, but list comprehensions are faster with user-defined ones

For eg. using python 2.5.2

In [9]: l = ['10'] * 5000000

In [10]: timeit -n5 map(int, l)

5 loops, best of 3: 1.29 s per loop

In [11]: timeit -n5 [int(i) for i in l]

5 loops, best of 3: 1.64 s per loop

~~~
devinj
You mean "incorrect". Also, I did post detailed timing results above.

------
danielh
It's a whole different story if the input gets longer and the quadratic
complexity kicks in. Just change the testdata like this:

    
    
      testdata = range(256)*100
    

Result:

    
    
      f1 5.539
      f2 40.747
      f3 5.072
      f4 5.311
      f5 0.068
      f6 3.304
      f7 1.431

------
rix0r
The first thing that stuck out to me like a sore thumb was the string-
concatenation-in-a-loop. Though the author was aware of its consequences he
only addressed it late in the article, optimizing relatively trivial stuff
like variable lookups first. And then he did so in a rather strange way,
instead of adding string fragments to a list and then join()ing them.

Not sure how fast that loop would be in Python compared to the implied
loop&join but I sure would have liked to see them compared.

~~~
RiderOfGiraffes

      > Not sure how fast that loop would be in Python
      > compared to the implied loop&join but I sure
      > would have liked to see them compared.
    

Why don't you try it and tell us the results?

------
codefisher
For those interested, the author of this was actually Guido himself.

I would have used "".join(map(chr, list)) myself. But I am not sure that was
support in the version of python available at the time. It is equivalent to
f6.

To some of those talking about the dynamic nature of python means you have to
look up chr every time, there is a way to stop that, see this project
<http://github.com/rfk/promise/>

------
RyanMcGreal
Why not just:

    
    
        def f(list): 
            return ''.join([chr(l) for l in list])

~~~
Goladus
This would be my default approach as well. I'd be interested to see how it
compares-- my guess is that it would be equivalent to or slightly faster than
f6(), and slower than the version that uses the array library.

~~~
RyanMcGreal
From the article, I'm guessing the author was using a version <2.x.

Incidentally, I ran my function and f6 from the article on a list of 9600
integers using Python 2.6.3. f6 ran in 0.00303250832159 seconds and my
function ran in 0.00421450212248 seconds.

------
rmc
Incidentally, are there any programmes that can scan your python functions and
suggest improvements that are in this article? e.g. any programme that'll
suggest referencing global names to the local namespace, using map etc.?

------
Devilboy
I hope that one day soon our compilers will be smart enough to optimize simple
things like this automatically, so we can focus more on writing for clarity.
This seems like exactly the kind of thing a next-gen compiler should be able
to optimize.

~~~
edanm
I'm not sure it's so possible, considering the dynamic nature of Python.

Consider one of the biggest problems - dynamic lookups. Calling a function
like "chr", which is looked up dynamically and found in the global scope each
time, takes a lot of time. Just adding a local variable which has the _same
value_ (i.e. lchr = chr) already gives a speedup.

How can an optimizer fix this? Technically, a new lookup must be performed
each time, since you never know if "chr" has been mapped to something new
during the iterations.

~~~
derefr
> you never know if "chr" has been mapped to something new during the
> iterations.

Why must this be true? Scan the loop body, find nothing that redefines chr,
create a local variable holding it at the beginning of the loop. The same goes
for all identifiers mentioned in the loop.

~~~
lysium
Generally, you cannot scan the loop body to find redefinitions of `chr`. This
is known as the halting problem.

Consider, for example, a redefinition of `chr` in an `if`-clause: you would
have to actually run the program to know if the `if`-clause is taken or not.

Further, in Python, the redefinition of `chr` does not even have to be in the
form of `chr = ...`, but could be an assignment into a global hash, maybe even
using a variable as an index.

~~~
derefr
I'm not suggesting that you have to know specifically that chr is being
redefined; to cancel the optimization, you simply have to know that "this loop
body is not limited to the subset of operations which are known to _not_
redefine chr" (i.e. basically the same sort of analysis Google NaCl does.)
This leaves a large number of optimizations on the table (i.e. any loop body
that modifies a global hash will cancel the optimization), but is better than
no optimization at all.

~~~
lysium
I wonder how large (or small) that restricted subset is and how easily it is
'violated', turning off the optimization although it would have been possible
(For example, the global hash could be referred to via a local variable which
might carry the result of a function call). Do you have any insights on that?

Until then, using a 'dirty flag' on the `chr` definition that turns off the
optimization at run time as mentioned elsewhere in this thread, seems to be
the appropriate approach for me.

