

A Python Optimization Anecdote - emwa
http://tech.dropbox.com/?p=89#disqus_thread

======
6ren
I wonder how much JIT compilation would help, without any hand-optimization?
e.g. it'd do the initial inlining.

I've been amazed at Java's speedup over multiple runs: dead-slow on startup,
then improving rapidly over the next 10-20 runs, and even keeps improving
slowly after that. It's a bit magical. Much (all?) of that JIT tech should be
applicable to Python, I'd think.

BTW: link is to the comments, not the story

~~~
lloeki
Take a look at PyPy, which goes out of its way to produce incredible results.

------
padobson
What performance gain could be had from using cStringIO as described here:

<http://www.skymind.com/~ocrow/python_string/>

It seemed like the concatenation was the primary bottleneck in this case.

Also, its worth noting that percentage gains on performance have huge cost
savings on infrastructure at scale. That's why blogs like this are valuable
because the user experience improves while the cost to provide it is reduced.

~~~
pavpanchekha
By the end the primary overhead was dictionary lookup or iteration overhead,
so I doubt this would have mattered

------
mhd
I'm just amazed that it took ten iterations until regexps were even
considered, especially considering that the set '[a-zA-Z0-9]' is involved…

~~~
pavpanchekha
Lots of function calls in the worst case made it seem unlikely to be a general
solution. So I held off on testing that before I'd explored other paths.

------
Jabbles
In C you could replace

    
    
        if x in WHITELIST
    

with

    
    
        if ((x >= '0' && x <= '9') ||
            (x >= 'A' && x <= 'Z') ||
            (x >= 'a' && x <= 'z'))
    

which I suspect would be much faster than any hash-table implementation. Also
I believe that should work for UTF-8 as well as ascii. I realise that this
makes it harder to expand the whitelist. I'm not very familiar with python, is
there something similar that could be done?

~~~
JesseAldridge
You could do that in Python using the ord function.

    
    
        white_ranges = [('0', '9'), ('A', 'Z'), ('a', 'z')]
        any([ord(low) <= ord(x) <= ord(high) for low, high in white_ranges])

------
cool-RR
I wonder whether it'll be more efficient to just have a big dict mapping every
character to what it should look like post-escape. (e.g. {'a': 'a', '(':
'&#(;', ... })

Then in your loop you're only making dict lookups.

~~~
cool-RR
Or possibly faster than a dict lookup: An array or a list of strings where the
index number is the ordinal number of the character. So `array[ord('(')] ==
'&#(;'`.

~~~
gbog
I think I remember that list indexing "list[x]" is o(n), while hash access
"hash[x]" is o(1), but my Martelli is not here around.

If I were the guy, I'd first ensure the final escaped result is cached in a
key-value store, then I'll check if this func is the real bottleneck. If so, I
might also have tried accessing it's content as a byte array. Then, if the
file is of Asian origin (ascii being very low minority), I'd bulk escape it
with the "&#x%s;" trick. It is rare to have documents with even mix of
ascii/latin and other glyph, so I makes sense to have two functions, like he
did.

~~~
kbd
Python's lists are not linked lists. List indexing is O(1).

~~~
gbog
You are right, double checked in my Martelli.

------
pdhborges
It would be nice if the author profiled the code instead of just measuring the
test templates time.

------
phren0logy
At the risk of exposing my ignorance, I thought

>Other “common wisdom”, like using locals instead of globals, yields
relatively little gain.

this advice was typically more related to avoiding collisions with variable
names, rather than performance?

~~~
thristian
In Python, it's both - the basic structured-programming advice to avoid global
variables is always good, but it's a specific quirk of Python that makes
global variables (including built-in functions) slower than local variables.

Python has full dynamic scoping, which means that inside a function you can
refer to any variables set in outer scopes. Because Python is a dynamic
language, every time you refer to a variable, the Python interpreter looks for
it in the local scope first, and then each enclosing scope until it hits the
containing module. A local variable will always be found in the first
iteration of that loop, a global variable will take at least two iterations.

~~~
encukou
Another CPython quirk is that global (module-level) variables are looked up in
a dict, but a function's local variables normally get a reserved chunk of
memory that's directly indexable.

------
wladimir
Interesting story, with a very good speedup. Though I personally wouldn't be
this patient, and would write a simple function like this in cython or even
the C API directly (especially as the Python code drifts further from
idiomatic with each step...).

~~~
pwang
Writing it in Cython would have been my solution. No futzing around with 8 or
9 iterations. It would just be fast, and the code would still look clean
(unlike the obfuscated garbage they ended up with in the blog post).

------
mace
This is slightly better, IMHO:
<http://www.python.org/doc/essays/list2str.html>

Some best practices when optimizing CPython code:

* Re-evaluate your algorithm (an inefficient quicksort is still faster than an optimized bubblesort)

* Use Python functions and constructs implemented in C (ex. most builtins, list comprehensions)

* Move loops from outside functions to inside (function call overhead is high)

* Use try/except to handle uncommon cases rather than using conditional checks in a loop.

* Eliminate dots (attribute lookup) in tight loops (create a local alias if needed)

See also: <http://wiki.python.org/moin/PythonSpeed/PerformanceTips>

------
dgrant
Why is string interpolation in Python so slow?

~~~
wladimir
Good point which would be worth an article in itself. Also comparing the
various string interpolation methods that Python has (% versus format).

------
jamwt
Clean C > Ugly Python, if you're really going to force the issue out of
Python's comfort zone (readability).

------
MostAwesomeDude
Dude, use PyPy. Seriously. Please. Sacrificing readability for this kind of
work is not good.

~~~
jacoblyles
I would love to see the results of a similar test case for PyPy.

------
captain-asshat
I haven't written any python in a while, but a 15% speedup from inlining a
function call? Really? This reinforces my preference for statically typed
languages.

~~~
pavpanchekha
I haven't written C in a while, but I dereferenced null and my program
crashed. No stack trace, no exception? Really? This reinforces my preference
for dynamic languages.

How about we stop painting pictures of languages from one specific difference.

~~~
beej71
I agree with you, not the parent, but I just want to clarify for posterity
that it is generally possible to get a stacktrace from a crashed C program.
:-)

