
Python: Faster Way - kozlovsky
http://pythonfasterway.uni.me/
======
perlgeek
And this, ladies and gentlemen, is how you _not_ make speed comparisons.

#7 surprised me: why would 'if a:' be slower than 'if not a:'? So I ran the
code myself, and ran it a few times. And lo and behold, the third digit varies
greatly between runs. And the timing between the first and second variant only
differ in the third digit.

I ran the first and second variant ten times, and got these values:

first: 0.138 +- 0.05

second: 0.137 +- 0.04

So there is no significant difference in my timings, though individual runs
differed as much as 0.132 vs. 0.150.

Though kudos for providing the full code and the raw output. It made quite
easy to reproduce!

These are the timings I measured, in octave/matlab syntax:

    
    
        t1 = [0.132781028748, 0.140866041183, 0.13587808609, 0.138177871704, 0.137129068375, 0.150414943695, 0.138676166534, 0.137069940567, 0.13484787941, 0.136043071747];
        mean(t1)
        std(t1)
    
        t2 = [ 0.134783983231, 0.143576860428, 0.138277769089, 0.142880916595, 0.140572071075, 0.136868000031, 0.131128787994, 0.134744167328, 0.132730007172, 0.137500047684];
    
        mean(t2)
        std(t2)

~~~
thomasahle
I was going to post this: If you do an average of `N` independent trials, your
confidence interval is `O(1/sqrt(N))`.

Hence with the 10^6 trials done in the article, we must expect a couple of
points deviation on the third digit.

~~~
Matumio
Taking the average is a bad idea for performance benchmarks like this running
under an operating system. The minimum is much more stable. (Easy to verify
experimentally: do 10x10 runs of a short benchmark and compare the variance of
10 minima to the variance of 10 averages.)

~~~
stefantalpalaru
True, but the minimum might also be dependent on the various caches being
primed from previous runs of the same code - a condition that will not appear
during normal use.

~~~
Matumio
Caches being primed is a concern, but not when choosing between minimum and
average. Neither is affected much if the first run takes longer than the other
nine.

~~~
gcr
IPython's %timeit magic works by running the function in three loops. It
returns the minimum of the three averages. This seems like a good compromise.

------
jhgg
Examples b and c on test #6 is something you should NEVER do in Python. Using
"is" checking for integers is a huge no-no, and will only work for smaller
integers (I think between -5 and 256) that are cached by the interpreter. "is"
does identity checking, not equality checking.

    
    
        >>> a = 500
        >>> b = 500
        >>> a is b
        False
        >>> a == b
        True
        >>> a = 2
        >>> b = 2
        >>> a is b
        True
        >>> a == b
        True

------
ayrx
I honestly do not get the point of this page.

None of the time differences in all the test cases are significant at all.
Concerning yourself with this is premature optimization of the highest level
especially in a language like Python. One should definitely be more concerned
about writing clearer and more idiomatic code.

~~~
vegabook
I have to agree, though there is some information to be gleaned precisely
because this guy has gone very far to show us that there is no point to try to
get clever with alternative, non-idiomatic ways of doing things. Mind at ease
to use the cleanest idioms.

It reminds me a bit of some of my big data jobs. Sometimes you spend days and
days torturing data to find interesting aspects, only to find nothing and end
in "failure". Only it isn't failure because you've eliminated a bunch of dead-
end avenues that you might otherwise have wondered about, freeing you to
pursue more promising uses for your resources.

~~~
jarcane
This is mainly what I took from it. The faster way here is almost always the
most idiomatic way to do it in the examples, and the reasons why are usually
pretty obvious.

And while the individual differences are small, in a language already as slow
as Python, starting from the faster, idiomatic code from the start is likely
to save a lot of headache down the line.

------
TickleSteve
Two points:

\- The time differences displayed here are just not significant on a desktop
OS such as Windows or Linux. Maybe Windows decided to swap or collect your
email (or more likely virus check) as the test was running? The error bars on
these numbers will make them insignificant.

\- When using Python, readability>>performance. If you want performance, use
C; don't even try to optimise Python, its not worth the effort.

~~~
nine_k
In Python, it's worth to optimize your _algorithm_ , e.g. replace an O(n²)
algorithm with a O(n log n) one, but trying to shave 0.1% of performance in a
tight loop is a wrong approach. Instead, pick Numpy / Numba, Shedskin, etc
down to plain C.

~~~
acveilleux
I've written a few specific python apps where optimization at this level was
worth contemplating in a couple spots after profiling (basically, tight
loops). That said, replacing the algorithm has usually been a much smarter
play.

------
Tobu
Those are the quirks of CPython 2.7. The best optimisation is to run with PyPy
— those benchmarks run about 10 times faster.

After that, if you want to optimise, you need an applicative benchmark. A
microbenchmark won't work because type specialisation runs for (traces of) the
program as a whole.

------
jevgeni
This just feels like picking up pennies in front of a steam roller.

------
jwl
Could someone please explain #Test 11. How is using %s instead of %d double as
fast even though it is the same thing is being calculated? I always use %d
because it seems like the prooper way to do it.

~~~
nikital
I looked at the source of 2.7:

The function PyString_Format (Objects/stringobject.c) does the formatting for
the % operator. For %s it calls _PyObject_Str which in turn calls "str()" on
the object. For %d it calls formatint (located in the same file).

The str() implementation for ints is in int_to_decimal_string
(Objects/intobject.c) and it's incredibly simple:

    
    
        do {
            *--p = '0' + (char)(absn % 10);
            absn /= 10;
        } while (absn);
    

The code for formatint is way more complex, and it contains two call to the
native snprintf:

    
    
            PyOS_snprintf(fmt, sizeof(fmt), "%s%%%s.%dl%c",
                          sign, (flags&F_ALT) ? "#" : "",
                          prec, type);
            // ...
            PyOS_snprintf(buf, buflen, fmt, -x);
    

The native snprintf is heavier because it handles precision, zero-padding and
stuff like that.

I believe this is why %d is slower. %s is a straight "devide-by-10-and-
subtract" loop while %d are two library calls to the full-blown sprintf.
However I didn't actually profile the code because I don't have a debug build,
so I might be completely wrong.

Edit: Link to code:
[https://github.com/python/cpython/blob/a5c7a4257507a77699a1a...](https://github.com/python/cpython/blob/a5c7a4257507a77699a1ab8ec079f41d505b9eee/Objects/stringobject.c#L4444)

------
thu
I met someone who was specifically using `dict()` instead of `{}` because, he
told me, the CPython parser was much slower on the later than on the former
and that it was noticeable on larger code base.

(That being said I think using Python should be about writing maintainable
code, not about micro-optimization.

I don't claim the purpose of that site is to push you to use those micro-
optimizations, even if the title seems to suggest so. Including the
disassembly makes for a nice visualization.)

~~~
chriswarbo
> I met someone who was specifically using `dict()` instead of `{}` because,
> he told me, the CPython parser was much slower on the later than on the
> former and that it was noticeable on larger code base.

I've been told by code reviewers to use micro-optimisations like this before.

My default reply is "I think this way is faster, since the shorter code
results in fewer cache misses. I'd be happy to change it if you show me some
benchmarks."

Of course, the cache may or may not affect anything; the point is to burst
their philosophical bubble with an example of why measurements are needed.

There's also the good old fashioned "If we cared that much about performance,
we wouldn't be using the branching logic of a templating system in an
interpreted language."

~~~
Spidler
Optimize for readability. Optimize for maintainance. Optimize for consistency.
Once those three are done, you can optimize for performance.

~~~
jqm
But, you can also possibly habitually change a few things in your initial
coding syntax that produces less computational effort to begin with....

I thought it was a good article.

~~~
chriswarbo
1) We _don 't know_ which one takes more computational effort, hence the need
for realistic benchmarks with decent confidence intervals.

2) If benchmarks show one style to be slightly faster than another, then I
_will_ take note. I've already noted that these benchmarks have no associated
statistical information; they're just numbers quoted to an arbitrary
precision.

3) If I ever find myself _needing_ such slight increases in speed, it's
probably a sign that there's a looming disaster, since it means that a)
there's a bottleneck in the program, b) I can't improve the algorithm, c)
porting that section to native code hasn't been enough.

If it's a case of death by a thousand papercuts, where slight slow-downs
throughout an application are actually making it unusable, then I'll roll up
my sleeves and patch the interpreter/compiler (this is easier in self-hosting
languages, but doable in all).

~~~
jqm
No need to get all carried away talking about rewriting compilers or pulling
out statistical confidence intervals...

If a certain style has a slight performance improvement, and this can be
manifested, and that style is equally as readable and maintainable, there is
simply no reason not to adopt it as habit. That's all I'm saying.

------
chriswarbo

        >>> timeit(a, number=1000000)
        ... 0.102125167847
    

Measurements shouldn't be quoted without some kind of range, confidence
interval, error bar, etc.

I've not come across timeit before, but maybe they could learn a thing or two
from systems like Criterion
[http://www.serpentine.com/criterion/](http://www.serpentine.com/criterion/)

------
Beltiras
I found the most enlightening part being the module 'dis' which I was totally
oblivious of, but will partake of often from hereon in.

------
njharman
Profile dammit!

Things that improve by few tenths of a second in a million iterations are NOT
WORTH YOUR TIME.

------
ssebastianj
Make It Work -> Make It Right -> Make It Fast

------
TheLoneWolfling
These all seem like simple peephole optimizations to me. So why doesn't the
Python compiler (and yes, Python has a compiler) do these optimizations?

(I am aware that some of these are not in fact equivalent, but at least some
of them are)

On an unrelated note, the website would be easier to grok if it did diff-like
highlighting of changed lines.

~~~
bjourne
Most of them are not equivalent. dict() is not the same as {} as someone might
have overwritten the dict function to do something completely different.
Realizing that dict() and {}, in fact, are equivalent in this particular case
requires whole program analysis which maybe PyPy is capable of.

~~~
TheLoneWolfling
At least tests 3, 4, 5, 6, the first 3 cases of test 7, the first and third
cases of test 11, 13, 16, 17, 18, 20, 21 cases 1 and 2, 22, the first two
cases of 24, 25, 26, 27, 28, 29 tests 2 and 4, 30, and 32 can be optimized
without this issue, if not more.

As I said, "some of these are not in fact equivalent, but at least some of
them are".

~~~
bjourne
That's not right, #5 is dependent on what True is defined to:

    
    
        True = 0
        def a():
            a = True
            if a:
                return True
            return False
        print a()
    

There are likely similar subtleties with the other examples, but I don't know
the Python spec well enough to see them. You asked why the Python compiler
didn't optimize these snippets more and that is the answer.

~~~
TheLoneWolfling

        >>> True = False
        SyntaxError: can't assign to keyword
    

...

~~~
bjourne
The examples are for Python 2, not 3. But it's blatantly obvious that you are
more interested in "not being wrong" than discussing Python's compiler so I'll
stop here.

------
ww2
Almost none of them are significant.

------
vanderZwan
I'm more surprised that the disassemblies differ _at all_ with half of these
tests.

------
eveningcoffee
I am wondering what is exactly measured here (by timeit). For example for test
5, there is noticeable difference between timings of variants b and c despite
them evaluating to the same list of instructions (and I did tests by myself).

~~~
im3w1l
The first has

    
    
        12 COMPARE_OP               8 (is)
    

while the second has

    
    
        12 COMPARE_OP               2 (==)

~~~
eveningcoffee
You are right! I overlooked this. Thanks!

------
nosefrog
For the tests where the speed difference is significant, these are really
areas where the python optimizer (I assume the bytecode compiler does some
optimization?) needs to improve. Humans shouldn't be concerned with such small
differences in the literal representation of code.

------
im3w1l
What's up with test 11, variant 1 vs 3?

~~~
sbergot
I guess that the float formatter has more work to do that the string one.

------
nichochar
This is so easy and quick to read, yet very interesting. Good job and thanks
for making it

------
jdimov
Here are my annotations for how I'm reading these. Feel free to correct or
update:

1\. Faster to use dict literals than the `dict()` constructor

2\. Faster to sort a list in-place than to use `sorted()`

3\. Faster to use multiple assignment than multiple individual assignments

4\. Faster to evaluate multiple individual conditions than a single multi-
comparison condition

5\. Faster to use idiomatic implied truth value than to compare to `True`

6\. Faster to use `!=` than `is not`

7\. Fastest way to check for empty list `a` is the idiomatic `if not a`

8\. Fastest way to check that an object is not null is `if a`

9\. `enumerate` is faster than doing it yourself

10\. Constructing strings with `+=` is faster than string surgery

11\. Fastest way to place a number in a string is "%s" (not "%d")

12\. For built-in types, using built-in functions (e.g. `len`) is faster than
calling methods (e.g. `a.__len__`)

13\. For built-in types, using built-in operators (e.g. `+`) is faster than
calling methods (e.g. `a.__add__(b)`)

14\. But for custom-defined classes, the opposite of #13 is true.

15\. Adding up a bunch of numbers using `+` is faster than using `sum`

16\. When initializing a large list, it is faster to use list comprehensions
than to append each element individually

17\. The fastest way to build a large dict is to use a dictionary
comprehension

18\. Same as #17

19\. Not sure what the author is trying to demonstrate here. That `bool(a)` is
faster when `a` is an empty list?

20\. Faster to use implied truth value of an expression than to compare
explicitly to `True` or `False`

21\. The fastest way to refer to a list is to use the list literal (duh...)

22\. It is faster to perform individual comparisons than to check for
membership with `in`

23\. Faster to `map` the `str` function to a list of values and join, than to
manually build up the string. Also, when doing this, faster to use `xrange`
than `range`

24\. Faster to unpack than to access each list element individually. Also, as
in #1, faster to use dict literal than the `dict()` constructor.

25\. Faster to just return the boolean value of an expression than to check
for it and then decide what to return.

26\. Faster to give explicit values to keyword arguments than to expand a dict
with kwargs.

27\. To switch the values of `a` and `b`, it is faster to use a third variable
than to do it in-place using unpacking.

28\. Multiple assignment is faster than unpacking, which is faster than
multiple individual assignments.

29\. Fastest way to get integer part of a real is to use `math.floor`

30\. Similar to #26 - it is faster to give explicit values to positional
arguments than to expand a list of args.

31\. Faster to access local vars than globals.

32\. When trying to safely access a key in a dictionary, it is faster to use
the idiomatic `D[k] if k in D` than to use the `get` method.

~~~
msellout
I'd expand on #9.

It's generally faster and more readable to use an idiomatic Python iteration
than to use indexing inside the loop. The difference was small over 5
elements, but `enumerate(a)` will become dramatically faster than
`range(len(a))` if you have a large list. A better comparison would be
`enumerate` vs `xrange`.

