
Python performance the easy(ish) way - craigkerstiens
http://jiaaro.com/python-performance-the-easyish-way
======
wulczer
I've tried this with GCC 4.7.1 on Debian x86_64 and did not get it to work in
constant time with -O2.

I'm guessing (from the OP's usage of -install_name) that he's been compiling
this on OSX. I wonder what did my compiler miss that the OP's didn't?

EDIT: just tried with clang and got constant time behaviour, interesting

EDIT 2: reading the comments in the post, I now suspect it has to do with
integer overflow. However, compiling with -fwrapv did not change anything.
Need to dig into it more.

EDIT 3: it seem that clang simply notices that the computation can be done in
constant time, whereas gcc does not. I'm not sure if it's actually useful in
real world code, but it's certainly somewhat magical to see a compiler
understand that you can substitute the entire loop with a simple calculation

~~~
keeperofdakeys
I assume the OP would be using llvm, as this is the default OSX compiler, and
the gcc command is usually just an alias or link to clang.

In real world code this kind of optimisation is designed to catch things
programmers aren't aware of, for a large decrease in required computation. In
this case, sum(1 .. n) == n*(n+1)/2.

------
afhof
Did anyone even read his code? The C code gets the wrong answers once its
integers overflow. The Python code will produce the correct answer using longs
once the integer calculations overflow. Ctypes is the right answer for a
different problem.

~~~
jiaaro
I agree that it'd be more correct to make the C and python code output the
same answer.

The important thing here is how to hook up C code to python.

the benchmarks are still (mostly) valid because it's still the same number of
addition operations (regardless of the overflow).

Honestly, I'm not a very good C programmer (as you can see). I was really just
documenting how to do this for myself. I never expected to get such a big
surge of traffic from HN.

That being said, I'll take a second look at the code and try to make it more
correct, suggestions welcome!

------
arunchaganty
I'd also recommend the cython library
(<http://docs.cython.org/src/tutorial/>). It is as easy as writing usual
python code with some type annotations to achieve similar speedups.

------
elliptic
So I've done some limited work with writing C/numpy extensions (for numerical
work) and it's gone swimmingly. However, a lot of performance-critical things
I need to do don't necessarily fit into array/matrix numerical computing -
lots of times there are difficult,complex datastructures involved that need to
be accessed in the "inner loop" (not really a loop, but you get the point). I
typically end up using Java for these sorts of problems, although sometimes
Pypy does okay.

I rather dislike the constant advice I see doled out to those who mention
Python's performance issues - namely, that they can just rewrite the
performance critical parts in C. Lots of the time, the performance critical
parts are just as sophisticated and require just as much abstract, high-level
coding as the rest of the program. Some awareness of that would be nice.

~~~
evgen
I hate responding with such a flip answer, but have you tried Cython? If you
have to do some serious munging inside inner loops and need something that is
a bit more aware of data structures and the shape of your data I have found
Cython to be a nice option for building the equivalent of a Python C-extension
module with mostly Python-like code and structure.

------
freyrs3
CTypes is neat for simple tasks but Cython is the industrial strength solution
for large extensions.

------
tsahyt
First of all, the C code will return wrong answers because of integer overflow
for sufficiently large inputs. But since this is only for the sake of an
explanation on Python performance, I'm willing to take that.

What disturbs me much more is that the article basically concludes, that if
Python performance isn't good enough for you, write it in C? How is that
Python performance? If speeding up Python is what you want I'd suggest having
a look at PyPy, which is a JIT python runtime. It doesn't work with some
libraries (for reasons I didn't look into yet) but works wonders on
performance!

Oh yeah and if anyone ever sums up large amounts of continous integers using a
linear time algorithm: Please use n*(n+1)/2. This is obvious, but since the
article ommited it I though I'd say it here.

~~~
dagw
As awesome as PyPy is it isn't going to give you anywhere near the sort of
speedup you'll get from dropping down to C for you critical functions. The
best I've seen on actual code I've written (as opposed to benchmarks) is 4-5
times speed up, which is certainly nothing to sneeze at, but still a long way
to go before you can start comparing with C. For what it's worth In the
contrived example used in the article I only got a 1.2-1.3 time speedup using
pypy (pypy 1.9 vs python 2.7).

(of course you can use ctypes from pypy as well, so you can get the best of
both worlds)

~~~
tsahyt
This is true, yes. The kind of performance C delivers is often only achieved
by C, Fortran and some other compiler languages. But that's not the point of
what I was saying. I was talking about speeding up the Python _language_.
That's what PyPy is incredibly good at.

I once wrote a small traffic simulation in Python. Not real-time. It would
take about 15 minutes to run a medium sized simulation using the CPython
interpreter. PyPy ran the same simulation in slightly less than a minute. This
is a substantial speedup. In this case it was because function calls can be
incredibly slow in Python and PyPy takes care of that.

I'm not claiming that implementing parts of the program in C is bad (I'm
mainly a C programmer anyway). All I say is that it's not really "speeding up
Python" when you're not using Python. It's definately an option though and a
very good one too!

ctypes might not be the best way to do it for some use cases though. In my
opinion, the most beautiful way is extending the Python API with native C.
That option produces the most LOCs though, and therefore probably more bugs.
However, if you've got large portions of a Python program in need of speedup
and PyPy won't cut it, it's probably the best option to write a decent part of
the low level stuff in C and provide native Python bindings. This way you get
the best of both worlds: The performance of C for your underpinnings and the
flexibility of Python for developing all the other parts, saving you time and
therefore money and sweat.

------
wladimir
If you like this kind of dynamic code generation and usage, LLVM with ctypes
is also a promising avenue, as you don't (necessarily) have to invoke an
external compiler. You can build the module on-the-fly.

For example see bitey, an abstraction to directly import LLVM bitcode files:
<https://github.com/dabeaz/bitey>

That said, there's so many ways to speed up Python these days...

In most practical cases where I need performance I'd try to run it in PyPy
(easiest), use Cython (great python-C hybrid), use Theano (to generate GPU
code), or even write a plain Python C API extension (no dependencies to
deploy, and in contrary to ctypes you can directly offer Python classes
without Python wrapper).

------
sdfjkl
Thanks for showing it was that easy. I knew this was possible, but I thought
it to be way more complicated. Now I'm more inclined to use it (if I really
need that level of optimisation).

------
charliesome
I wrote a program in assembly a few weeks ago to do this exact thing.

It can spawn worker threads to speed up the summation. I have a quad core CPU,
so I've set it to 4. If you have a different number of cores, you can modify
line 8 to change the number of workers it will spawn.

<https://gist.github.com/3369946>

------
azylman
I'm not a big user of Python, but it seems to me that any language that
requires you to embed a different language for performance is inherently
broken.

To regular pythonistas: Is this a common issue with Python, or only in the
author's contrived scenario of summing a ridiculous amount of numbers?

~~~
azylman
Wow, I don't think I've ever had such a visceral reaction to anything I've
posted on HN before. By people who clearly didn't take the time out of their
day to even read what I posted!

Basically, I don't use python very much and wanted to know if python had
performance issues that required you to drop down to C code often, or if that
was just for a scenario such as this.

I think that any language that REQUIRES you to drop down to a lower level for
performance (not one that SUPPORTS it, but one where it's REQUIRED) is
inherently broken. I would think that would be a pretty noncontroversial
statement - if every python script you wrote had to have embedded C code
because otherwise it was too slow, that would be pretty broken. That's why I
wanted to know how common this was.

~~~
bryanh
I think the reaction was because you put Python in a broadly negative camp by
infering ctype extensions are required for any sort of performant code. Its
actually relatively uncommon to use ctypes and most developers who use Python
know this.

Python is a great general purpose language and is used as such. No one would
be silly enough to solely rely on it in highly performance sensitive
environments (IE: 3d renderers or high frequency trading). Just like no one
would be silly enough to write an entire CRUD web app in C.

The point is, Python isn't broken and claiming that it may be because you read
a short article on a Python feature got you the negative reaction.

~~~
azylman
I never inferred that ctypes were required - in fact, I explicitly did the
opposite. I asked if it was required for performant code. See the part of my
post where I said "To regular pythonistas: Is this a common issue with Python,
or only in the author's contrived scenario of summing a ridiculous amount of
numbers?"

I also never claimed that python was broken.

~~~
bryanh
> I'm not a big user of Python, but it seems to me that any language that
> requires you to embed a different language for performance is inherently
> broken.

It seems like you imply both of those things, though perhaps not as explicit
as "Python is broken because ctypes are required", which you did not say.

