
Python Is Not C - johndcook
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Is_Not_C?lang=en
======
mikeash
It's worrying that this article completely glosses over the fact that the
Manhattan distance approximation is seriously wrong. It may have given the
right answer in this case, but it definitely won't do so in all cases, and if
you don't already have the right answer to compare to then how will you know
if it's working or not?

Literally any problem can be solved quickly in any language if you're willing
to accept an incorrect answer.

~~~
cowsandmilk
It would have been nice to come back and see how fast calculating the correct
answer would be using numpy arrays. I bet it would have been acceptably fast.

~~~
jfpuget
I did it. It runs way faster than the loop, but it runs about 4x slower than
my approximation. Details in the post. Thanks for suggesting that.

------
adultSwim
Seems the lesson is that Python is slow. Author's realization about how to use
it correctly was to just call C instead.

Article not worth reading. Would have been much better as a quick tip. "Quick
tip: if you need to loop through an array in Python, do it this way
instead..."

~~~
dekhn
Actually, in most cases, you can speed up loops in python knowing just a few
tricks.
[https://wiki.python.org/moin/PythonSpeed/PerformanceTips](https://wiki.python.org/moin/PythonSpeed/PerformanceTips)

I see numerous issues with the loop, for example, hash key lookups on 'Lat'
and 'Long' for every iteration.

~~~
rasz_pl
umm NO. Whats next, BasicPerformanceTips? Its an interpreted language, it will
always be at least 10x slower. Great for prototyping, bad for serious work.

~~~
outworlder
There is no such thing as an "interpreted language", there are only
implementations. Please get rid of these old concepts.

CPython, which is the standard, most common implementation, uses bytecodes
(and so does Java). There's nothing preventing one from generating machine
code from a Python source. Some of Python's dynamic features are mandated from
the spec, and there are limits on how fast that can be. But that has nothing
to do with "being an interpreted language".

------
toyg
The lesson here is that for any fairly-common Python task, there is likely to
be a library that already does it 300% faster than your implementation will
likely ever be. It's got nothing to do with using this or that style of
programming.

~~~
collyw
I don't think that is exclusive to Python.

------
compostor42
His variable names really bother me. I can live with "lat", "lon" if I must,
but "d", "md", "trkpts"? Readability would be greatly enhanced if he just
spelled those out.

Programmers shouldn't be wasting brain power deciphering cryptic variable
names. Save that energy for where it counts (solving actual problems!)

~~~
gregor7777
Gotta say I agree with this. I notice a lot of Python programmers tend to use
less descriptive variable names. That's always been a pet-peeve of mine.

~~~
JupiterMoon
Source?

~~~
asdfas123414
Himself.

"I notice a lot of Python programmers tend to use less descriptive variable
names."

~~~
shele
Lovely :-)

------
larrydag
The same mental shift could be attributed to R as well. For instance use the
*ply function (apply, lapply, sapply, tapply) if you want to loop over vectors
of data instead of using "for" loops.

~~~
mbq
Nope; *ply are implemented as loops, you are only saving time on reallocating
memory structures. Real speed-up is gained from using vectorised functions and
operators (implemented in C or Fortran in R engine or a certain package) for
whole vectors.

------
dr_zoidberg
There are many programmers out there using Python as if it was C, and that
leads to slower than necessary code. It takes some time getting used to and
learning the performant way to write loops. For example, if 'trkpts' was a
list of (lat, lon) tuples/lists, he could've avoided the lookups (but it would
also mean having a different structure).

Another example, if memory isn't a concern he could write a list comprehension
with all the distance values, and then get the index of the smallest. This,
however, has the problem that, though the list comprhension surely runs faster
than the for loop, it takes more memory and then looking for the lowest value
can take all the time you saved (and maybe some more).

Without prfiling in his use case, it's difficult to say what is "the best
solution", but his problem comes mostly from coding Python as if it were C.

Edit: yes, I ignored the fact tha he used numpy (a good solution, giving his
300x speedup with changing the structures), because sometimes your data isn't
prone to "numpy array conversion" \-- for example, if you aren't programming
numerical code.

~~~
odonnellryan
> the list comprhension surely runs faster than the for loop

How much quicker? It isn't too significant, is it? I mean, we have two O(1)
operations to worry about, I suppose: lookup and appending to the list.

~~~
dr_zoidberg
It depends on what the for loop does, but main difference is that a for loop
runs "interpreted", vs the list comprehensions that runs "compiled" (or in
"C-space"). The list comprehension has some additional benefits, the pattern:

    
    
        squares = []
        for x in xrange(100):
            squares.append(x*x)
    

in which you have an append call on every iteration is a mess because it may
lead to a lot of memory operations which are really unnecesary. The LC version
of it is much shorter an runs faster, because the memory allocation is handled
differently:

    
    
        squares = [x*x for x in xrange(100)]
    

Append may be O(1) but in some cases it doesn't behave nicely. On large lists,
you can hit spots in which it reallocates parts of the list in memory, leading
to weird slowdowns. Timings from ipython on a Windows 8 64bit Python 2.7.9
machine:

    
    
        In [12]: def f1():
           ....:     sq = []
           ....:     for x in xrange(100):
           ....:         sq.append(x*x)
           ....:     return sq
           ....:
        
        In [13]: def f2():
           ....:     return [x*x for x in xrange(100)]
           ....:
        
        In [14]: %timeit f1()
        10000 loops, best of 3: 20.7 µs per loop
        
        In [15]: %timeit f2()
        100000 loops, best of 3: 11.9 µs per loop

~~~
odonnellryan
Very interesting!

That's a pretty good speed-up! Makes me reconsider writing some "complex" LC
as for-loops.

~~~
dr_zoidberg
Bear in mind that the LC will take all the memory needed at once, and that can
bring some problems. You can also use a generator expresion (similar to a LC,
but you use parenthesis instead of brackets) to avoid this problem, if you're
expecting an iterable object and have no further need for the values after the
computation is done:

    
    
        In [1]: (x*x for x in xrange(100))
        Out[1]: <generator object <genexpr> at 0x0000000003558438>
    

And the generator expression could be used inside a for loop in cases where de
LC would be awkward or take more time (because you have to allocate memory for
a billion numbers all at once, just to check those that are meet a certain
requirement).

------
Dav3xor
If you're iterating over all your points calculating distances, you are going
about this the wrong way.

(edit) The author could use my handy python quad tree if he so wishes --
[https://github.com/Dav3xor/pyquadtree](https://github.com/Dav3xor/pyquadtree),
and if he asks nicely, I could even add support for simple approximation of
spherical coordinates.

~~~
dhenneberger
I feel like a k-d tree would be a more appropriate solution, if the programmer
wanted to write more than a simple loop.

~~~
Dav3xor
Sure, absolutely. Just wanted to point out that the article isn't very
sophisticated.

~~~
jfpuget
It is simple indeed. The performance I got was good enough to spare me the
effort of using quad trees. Yes, I know what quad trees are. I even coded
octrees for 3D reasoning in a previous life ;)

------
erjiang
Interesting to see this - we ran into a similar problem of finding points
within a certain distance from amongst thousands or millions of points. We
ended up using Cython[0].

Would this numpy trick work if he still needed an accurate distance
calculation? Kind of underwhelming to throw away the accuracy to get speed
without adding it back later.

[0] [http://doublemap.github.io/blog/2015/05/29/optimizing-
python...](http://doublemap.github.io/blog/2015/05/29/optimizing-python/)

~~~
noreasonw
Just thinking a little bit about this problem, there is an easy to code,
precise and top performance algorithm for solving the following two problems:

1) finding the points Pi within a certain distance d0 of a fixed point P0 in
your database

2) finding the nearest point Pmin to P0, with Pmin in your database.

I will keep it for myself, but as a hint here are two steps: first: read the
John Cook article about deriving the distance formula, and second: think and
easy way of avoiding unneeded computation.

It took me just a minute to realize the correct way to solve the OP, so it
shouldn't take you long to solve it.

------
noreasonw
The post is about using numpy or pypy to get better speed in python since in
python for is slow. That is well known, is like the well known fact that you
should use vectorized operations in R to get better performance. Anyway there
is something interesting: The problem of given a point P0 as input, find the
the nearest point to P0 among a fixed billion points (all of them on a sphere)
can be solved easily and quickly. You should be surprised the code a
mathematician could devise to solve this.

~~~
jfpuget
I agree it is well known. I am just providing yet another example.

I contemplated quad trees but the performance of these 2 lines of code was
good enough. Why would I bother writing something more complex?

you'd be surprised by my mathematical background ;)

------
bite_victim
Some time ago I wrote a really simple code snippet to see the performance
differences between Python, PHP, C and Java (the languages I tinker in) on my
particular machine (i3 M 330, 2.13 GHz / 4 GB RAM / Ubuntu 15.04 x64).

The results were as follows:

~ 14.2 seconds for Python 3.4.3 [1]

~ 9.0 seconds for Python 2.7.9 [1]

~ 9.0 seconds for PHP 5.6 [2]

~ 2.3 seconds for C [3]

~ 2.3 seconds for Java 8 [4]

Again, this was on my machine with out of the box settings. I have linked the
test code that I wrote and perhaps there is something wrong with my Python and
PHP code but to me the results were quite revealing. Also it's interesting to
see that on my configuration C and Java both hit the limit of my CPU (I can't
explain the score otherwise) and I can't know for sure if on a more powerful
CPU Java would still be on pair with C.

[1]
[https://gist.github.com/anonymous/7edafa3889be967a1e1d](https://gist.github.com/anonymous/7edafa3889be967a1e1d)

[2]
[https://gist.github.com/anonymous/56ff76849f5a312340d9](https://gist.github.com/anonymous/56ff76849f5a312340d9)

[3]
[https://gist.github.com/anonymous/5717ba935b43bad09e1d](https://gist.github.com/anonymous/5717ba935b43bad09e1d)

[4]
[https://gist.github.com/anonymous/6b0c2f11609b951b64f3](https://gist.github.com/anonymous/6b0c2f11609b951b64f3)

~~~
maxerickson
You should use the built in pow for the python version (or the double asterisk
operator). I guess most of the difference with PHP is there.

The type conversion behavior of the math.pow function is clearly documented:

[https://docs.python.org/3/library/math.html#math.pow](https://docs.python.org/3/library/math.html#math.pow)

~~~
bite_victim
I have updated the test with the new figures that I got. Initially I only
tested with Python 2 with math.pow. I have to say I am quite disappointed with
the performance of Python 3 though and even using the built-in pow function,
PHP 5.6 still is the fastest in this particular case.

~~~
maxerickson
The call to int is unnecessary when using the built in.

~~~
bite_victim
I am a clumsy individual. Dropping the int call resulted in shaving 3.5
seconds. But how is this possible?! It's ridiculous, really!

~~~
maxerickson
The CPython interpreter is pretty naive, so it more or less does what you tell
it to.

~~~
bite_victim
PHP is no better, if you add an intval call it will add almost 4 seconds to
the result! This is an extraordinary proof that being a little sloppy can cost
you a lot when using interpreted scripting languages!

Also, I am thinking if and how much speed gain could one inject by not using
OOP in PHP...

------
ArenaSource
It's not Python, you face the same problems with Matlab, unless you vectorize
your code to remove loops it's quite unusable for anything but small arrays

------
sonium
To make it even faster, there is a (non-free) numpy version compiled with the
Intel MKL math library [1]. We use this library in high-performance-computing,
it's as fast as you can get on intel hardware.

[1] [https://store.continuum.io/cshop/mkl-
optimizations/](https://store.continuum.io/cshop/mkl-optimizations/)

------
TheLoneWolfling
I'd personally have jumped straight to PyPy.

And let me get this straight. He can use C, but cannot use PyPy? How does that
make sense? If he's able to use C, he's able to run binaries anyways, at which
point he should be able to use PyPy. Unless I'm missing something?

~~~
jfpuget
I didn't use C. Read again ;)

Pypy was not an option either, details in the post now.

------
hardwaresofton
I wonder if the poster heard of pypy

Not many things approach C speed, but PyPy has always seemed like pretty close

~~~
_dark_matter_
Or, you know, Cython. Just find whatever is slowing you down and write that in
(semi)raw C. Keep everything else Python.

~~~
jfpuget
And you lose python interactivity. If I have to compile code then I use C or
C++ directly.

------
Lofkin
He missed a critical option: you can write those loops in python and JIT them
to C fast LLVM with numba:
[https://github.com/numba/numba](https://github.com/numba/numba)

~~~
jfpuget
I tried numba. It isn't helping here.

Yes, I used @jit(nopython=True). It does not compile pandas code.

~~~
Lofkin
Try converting to a numpy array, executing the operations and converting back
to pandas.

More details here: [http://pandas.pydata.org/pandas-
docs/version/0.16.2/enhancin...](http://pandas.pydata.org/pandas-
docs/version/0.16.2/enhancingperf.html?highlight=numba)

Also they recently added support for array expressions and allocation so you
should upgrade if you haven't already.

------
kazinator
> _The reason for that speedup is that numpy array operations are written in
> C._

> [ ... ]

> _The lesson is clear: do not write Python code as you would do in C._

:)

------
ovis
If this is more than a one-off, then maybe he should be using a quadtree
rather than a search along a list of points.

~~~
jfpuget
Agreed.

------
chomp
Actually the Python C API is part of the Python language, so part of Python is
actually C.

------
dilap
The same thing happens in reverse for Julia programmers coming from Python.

~~~
JupiterMoon
So if I move to Julia will I need to go back to looping over arrays all the
time? I mean Fortran 90 had whole array operations already...

~~~
elcritch
My initial thoughts as well. It's hard to think about going back to manual
looping. Seemed like a step backwards even.

This article changed my opinion quite a bit and highlights the power of having
a real macro system: [http://julialang.org/blog/2013/09/fast-
numeric/](http://julialang.org/blog/2013/09/fast-numeric/)

I've begun to leverage the points made in the article quite a bit. Alas, now I
get stuck when writing numpy based code because it constrains my ability to
express a solution in the best suited idiom.

~~~
JupiterMoon
I just glanced through that document. Some thoughts (which may be stupid).

> Devectorize expressions

Why can't a compiler/whatever do this devectorisation for us if it is faster -
I see that there is a macro package for this why not do it as standard?

> Write cache-friendly codes

Well duh.

> Identify opportunities to use BLAS

Is this not what numpy does already?

I suppose my real question about Julia is. Why? Why not put the effort into
optimising numpy or making fortran a little more convenient to use rather than
making a whole new special snowflake language just for one set of problems.

