

Python idioms - preek
http://jaynes.colorado.edu/PythonIdioms.html

======
Estragon

      > [Despite the existence of "enumerate", "xrange(maxint)"] can still
      > be useful when you want to include an index along with several
      > other lists, however, e.g. zip(list_1, list_2, indices)
    

I would just use

    
    
      for idx, (elt1, elt2) in enumerate(zip(list_1, list_2))
    

The following is not necessarily good advice, and he should take his own
earlier advice and validate his claims with some profiling.

    
    
      > [map is ] much faster, since the loop takes place entirely in
      > the C API and never has to bind loop variables to Python
      > objects.
    
      > If you find yourself making the same list comprehension
      > repeatedly, make utility functions and use map and/or filter...  
    

Note that the following "map" snippet is considerably slower than the
corresponding list-comprehension snippet. Function calls are expensive in
python. Also, his claim that you save time with a "map" by avoiding the loop
variable is obviously bogus: you still have to bind the variables in the
signature of the function you're mapping, unless the signature is empty.

    
    
      met% python -m timeit "[x**3 for x in xrange(10000)]"
      1000 loops, best of 3: 1.27 msec per loop
      met% python -m timeit "map(lambda x: x**3, xrange(10000))"
      1000 loops, best of 3: 1.88 msec per loop

~~~
orlandu63
I'm getting entirely different results with Python 3.2:

    
    
        % python -m timeit "[ x**3 for x in range(10000) ]"
        100 loops, best of 3: 7.85 msec per loop
        % python -m timeit "map(lambda x: x**3, range(10000))"
        1000000 loops, best of 3: 1.26 usec per loop
    

By these benchmarks map is 700 times faster than the corresponding list
comprehension.

~~~
imurray
When times are that short you should question if Python is actually doing
anything, or just promising to do it later (remember that range is a generator
in python3).

    
    
      $ python3.1 -m timeit "map(lambda x: x**3, range(10000))"
      1000000 loops, best of 3: 1.01 usec per loop
    

Seems fishy. Adding up the elements forces Python to actually do the cubing on
all of them:

    
    
      $ python3.1 -m timeit "sum(map(lambda x: x**3, range(10000)))"
      100 loops, best of 3: 13 msec per loop
      $ python3.1 -m timeit "sum([ x**3 for x in range(10000) ])"
      100 loops, best of 3: 10.6 msec per loop
    

map really is slower, at least for me in python3.1.

~~~
ldng
$ python3.2 -m timeit "sum(map(lambda x: x __3, range(10000)))"

100 loops, best of 3: 7.61 msec per loop

$ python3.2 -m timeit "sum([ x __3 for x in range(10000) ])"

100 loops, best of 3: 6.27 msec per loop

$ python2.7 -m timeit "sum(map(lambda x: x __3, range(10000)))"

100 loops, best of 3: 2.81 msec per loop

$ python2.7 -m timeit "sum([ x __3 for x in range(10000) ])"

100 loops, best of 3: 2.28 msec per loop

I wasn't expecting python3 to be that slower.

~~~
imurray
[ldng: put some whitespace before code on hacker news: it indents it and the
asterisks don't disappear.]

I infer that ldng has a 64 bit machine. On my 32 bit machine, Python 3.1 is
faster than 2.6 for these examples. On a 64 bit machine I get similar results
to ldng's, with Python3 being slower. If I wrap long() around the x in the
example, Python2 becomes as slow as Python3.

Note that taking the cube of lots of big integers is not typical for many
people: it generates very large integers that have to be in Python2's special
long type on a 32 bit machine. On a 64 bit machine they stay as normal ints in
Python2, which are much faster. Python3 has a single automagic int type, which
seems to internally convert to the arbitrary precision type sooner than it has
to on 64 bit machines(?).

Examples more typical of my use would wrap float() around the x, or change the
example to add up 3x instead of x^3. These examples are all faster in Python3
for me. Faster still is to use numpy (which is now supported in Python3).

Summary: the people who would be affected by this regression have both a 64
bit machine, and do a _lot_ of exact integer arithmetic on integers that can
be represented in 64 bits, but not 32.

------
Ysx
Good stuff, though there's a few outdated idioms:

Lists, or any iterable, can be reverse-sorted with reversed(sorted(my_list)).
That'll give you an iterator, though you can call list() on the result if you
need it.

"while True" should be used in-place of "while 1". Reads better.

In Python 2, xrange() is preferred over range() when looping - it won't create
an in-memory list of integers, and behaves mostly the same as range but for a
few edge cases. Python 3 renamed xrange() to range(), and removed the original
range() function.

~~~
shazow
> reversed(sorted(my_list))

I thought sorted(my_list, reverse=True) would be slightly faster, but it seems
not (by a tiny amount). Weird.

> "while True" should be used in-place of "while 1". Reads better.

If you disassemble these statements, you'll see that "while 1" creates fewer
instructions:

    
    
        def foo():
            while True:
                pass
    
        def bar():
            while 1:
                pass
    
        import dis
    
        dis.dis(foo)
        #       0 SETUP_LOOP              10 (to 13)
        # >>    3 LOAD_GLOBAL              0 (True)
        #       6 POP_JUMP_IF_FALSE       12
        #
        #       9 JUMP_ABSOLUTE            3
        # >>   12 POP_BLOCK
        # >>   13 LOAD_CONST               0 (None)
        #      16 RETURN_VALUE
    
        dis.dis(bar)
        #       0 SETUP_LOOP               3 (to 6)
        #
        # >>    3 JUMP_ABSOLUTE            3
        # >>    6 LOAD_CONST               0 (None)
        #       9 RETURN_VALUE
    
    

Edit: This is in Python 2.7.1. d0mine pointed out that the discrepancy is no
longer the case in Python 3. Good to know. :)

~~~
d0mine
There is no difference on Python 3:

    
    
      >>> dis.dis(foo)
      2           0 SETUP_LOOP               3 (to 6) 
    
      3     >>    3 JUMP_ABSOLUTE            3 
            >>    6 LOAD_CONST               0 (None) 
                  9 RETURN_VALUE         
      >>> dis.dis(bar)
      2           0 SETUP_LOOP               3 (to 6) 
    
      3     >>    3 JUMP_ABSOLUTE            3 
            >>    6 LOAD_CONST               0 (None) 
                  9 RETURN_VALUE

------
kqueue
~$python -mtimeit "'a' + 'b' + 'c' + 'd'"

10000000 loops, best of 3: 0.026 usec per loop

~$python -mtimeit "''.join(('a','b','c','d'))"

10000000 loops, best of 3: 0.197 usec per loop

~~~
Ysx
Interesting! ''.join() has the advantage on longer strings though:

$ python -mtimeit "'aaaaaaaaaaaaaaa' + 'bbbbbbbbbbbbbbb' + 'ccccccccccccccc' +
'ddddddddddddddd'"

1000000 loops, best of 3: 0.224 usec per loop

$ python -mtimeit
"''.join(('aaaaaaaaaaaaaaa','bbbbbbbbbbbbbbb','ccccccccccccccc','ddddddddddddddd'))"

10000000 loops, best of 3: 0.201 usec per loop

~~~
kqueue
That's definitely interesting.

------
1amzave
> _Calling it from the empty string concatenates the pieces with no separator,
> which is a Python quirk and rather surprising at first._

This is stated twice, but I can't make any sense of it. How is this even
remotely surprising? What _else_ would anyone possibly expect joining with the
empty string as the separator to _do_?

~~~
kragen
The surprising thing is that .join is a method of the separator, not of the
list of pieces.

~~~
dzderic
I'm pretty sure it's a string method so it can take any iterator as an
argument.

~~~
kragen
I think that was the deciding argument, yes.

------
skimbrel
_Use function factories to create utility functions. Often, especially if
you're using map and filter a lot, you need utility functions that convert
other functions or methods to taking a single parameter. In particular, you
often want to bind some data to the function once, and then apply it
repeatedly to different objects. In the above example, we needed a function
that multiplied a particular field of an object by 3, but what we really want
is a factory that's able to return for any field name and amount a multiplier
function in that family:_

    
    
      def multiply_by_field(fieldname, multiplier):
        """Returns function that multiplies field "fieldname" by multiplier."""
        def multiplier(x):
            return getattr(x, fieldname) * multiplier
        return multiplier
    
      triple = multiply_by_field('Count', 3)
      quadruple = multiply_by_field('Count', 4)
      halve_sum = multiply_by_field('Sum', 0.5)
    

Other languages (most prominently Haskell, though you can convince most Lisps
to do it through the clever use of macros) have built-in support for doing
this with whatever function you want, and it's called partial function
application. It's a rather useful technique and it saddens me that it's not
supported as such in more languages claiming to support functional
programming.

~~~
Lycanthrope
There is functools.partial:

    
    
      from functools import partial
      def add(x,y): return x+y
      add3 = partial(add, 3)
      add3(2) # returns 5

~~~
skimbrel
Ah! I didn't know about that. It'd be nice to have the syntactic sugar à la
Haskell, but oh well. Close enough.

------
j_baker
_Use if not x instead of if x == 0 or if x == "" or if x == None or if x ==
False; likewise, if x instead of if x != 0, if x != None, etc._

Be careful with this one. "if not x" isn't necessarily the same as "if x ==
None". It's easy to forget that "if not x" will be true for values other than
None.

Also, use "if x is None" rather than "if x == None". :-)

------
mixmastamyk
I read many years ago that '%s%s' % (a,b) was faster than a + b, the reason
given that it was done in C. But after reading this thread and trying myself,
it seems to be false also. On Py 2.6:

    
    
      python -m timeit " '%s%s%s' % ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbb', 'ccccccccccccccccccccccccccccccc') "
      1000000 loops, best of 3: 0.201 usec per loop
    
      python -m timeit " 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaa' + 'bbbbbbbbbbbbbbbbbbbbbbbb' + 'ccccccccccccccccccccccccccccccc' "
      10000000 loops, best of 3: 0.136 usec per loop
    

Perhaps it changed in new versions of Python, but either way, I guess I'll be
using the % form less than I used to.

~~~
wahnfrieden
Use whatever is clearest to read, since the benchmarks vary upon circumstance,
VM, and VM version.

------
fijal
For what is worth, most "performance" idioms are anti-idioms on PyPy.
Especially those:

    
    
      sum = 0
      for d in data:
          sum += d
      product = 1
      for d in data:
          product *= d
    

Are much faster than reduce/map equivalents. The zip/dict example at the end
is even more confusing. I'm convinced PyPy would be the fastest on the
simplest-possible code (the first one, marked as "bad")

------
astrofinch
What's the rationale for preferring map/filter over list comprehensions?

------
tocomment
I wish every language listed it's idioms somewhere.

