Prediction: lots of emerging startups who are basing their code on Python today are going to resort to this (or PyPy, but that's unrelated) when the scaling pain begins, or simply to attempt making things run a bit faster. I think this is great. Most people I know avoid C completely because of the hidden pitfalls every novice has to go through, but maybe Cython will slowly change that. It just needs a bit more maturity and some endorsements to pop up around here... which shouldn't take too long.
With C and Python you get to play on both ends of the spectrum -- concise clear code, and high performance code with C.
Cython is interesting, but as cited, there are also some limitations and caveats. See http://docs.cython.org/src/userguide/limitations.html and http://docs.cython.org/src/tutorial/caveats.html
About playing on both sides of the efficiency/expressiveness spectrum, the important thing is to be sure to do real benchmarks so you only drop down into C when it pays off.
Sometimes, the reasons are not just efficiency -- you may already have C code that does the right thing.
In that case, however, you should very definitely (as in, I'm not even considering an alternative) go with Python/Cython, as you will develop many times faster and your program will have near-C speed.
My current favorite is ShedSkin, which compiles your (unmodified) program written in a subset of Python (not a particularly limiting subset, mind you) into C++ which you can then compile (and link as a module, if you like).
My experiences with both:
Back when I used Pyrex my main problem was accidentally using some variable or function that was not C declared and Pyrex silently generating code for that that called into Python, requiring a review of the generated C code -- I think Cython has fixed that so you can see when it turns something like foo(x) into Py_GetTheGlobalVar("foo") etc.
Anyway, that's a view from a production environment which has used Pyrex for 3-4 years with great success.
I've also been replacing some rather excessive struct.unpack usage in my code with Cython's C struct pointer casting syntax, and uncovering _massive_ performance gains. 45 seconds of parsing now takes 3 seconds.
I'm pretty much convinced there's no reason to learn CPython's C API, given Cython's maturity and PyPy's improbable, scintillating ascendancy. viz. RPython may be Python's performance future, but Cython is ready now.
Was there an increased complexity within the two modules you rewrote?
Do you have an estimate on how long it would have taken you to reimplement the two modules (or their critical components) completely in C/C++?
(Qs intended out of curiosity, to help quantify the benefits)
I've done some integration of C code using ctypes, which works quite well, and offers the obvious speed boost, but feels less coherent and ultimately less maintainable, project-wise, than a well-coded Cython module. Writing a full-on CPython module from scratch would probably offer better performance than Cython if you know the quirks and are disciplined. But to someone who doesn't already drip CPython C modules, Cython is a godsend.
Ultimately, there's 5 commonly used ways (CPython, Boost::Python, SWIG, Cython, ctypes) to integrate C into Python, and right now you'd be crazy not to give Cython a shot, if that's your need. It's very easy to learn for anyone familiar with both C and Python.
Wow, interesting. Is it released somewhere? Googling found me the Copperhead project, was that what you used?
I'm not sure if "implemented" means you implemented your code for it, or you implemented the entire compiler. :)
I originally wanted use Copperhead and got in contact with the developer a year ago, but it was too early even for "private" beta testing, so I never got access to their code. Also, my compiler is specialized on image processing, so probably Copperhead wouldn't have worked, anyway. I'm only jealous of Copperhead's type inferencer. :) But then again, I have to get finished with my thesis and a type inferencer wouldn't help with that goal. ;)
Basically, if you are doing any work that requires heavy numerical processing, Cython is the way to go. On the other hand, I was playing with it to do some basic text processing and the improvements were negligible.
As far as text processing is concerned, it seems like the python code is just a nice interface to the underlying compiled library and hence there isn't much difference.
"Shed Skin is an experimental Python-to-C++ compiler designed to speed up the execution of computation-intensive Python programs. It converts programs written in a static subset of Python to C++. The C++ code can be compiled to executable code, which can be run either as a standalone program or as an extension module easily imported and used in a regular Python program."
I feel that PyPy and Cython alleviate that fear.
The ability to transition parts of one's code base to C/C++/ASM through CPython's excellent library makes using python from day one such an easy decision. It is even easier as there is a library ecosystem that makes the transition even easier.
Quick googling only found one comparison: http://jaredforsyth.com/blog/2010/jul/21/cpython-vs-pypy-vs-...
The routines he benchmarked are simple string tokenizing, though, so I'm not really surprised the translated-to-C version came in so much faster.
And it's easy to wrap C libraries in Go. See http://code.google.com/p/gosqlite/source/browse/sqlite/sqlit... for an example of a Go Sqlite binding in < 300 lines.
Implement some navigation, and it's perfect.
I'm unable to come up with a project of my own which can use Cython, but I'm curious enough to try it on an open source project if it benefits developers.
I would disagree. Cython is less known and has a learning curve. CPython extension interface is well established and more common. That is not an insignificant advantage.
Do you stop for a couple weeks to learn Cython, and do you have complete confidence in its generated code, or do you just start using something you know and is tried and true? It depends. We chose C Python extensions or just writing hotspots in C in a separate process.
The problem of C python extensions is that it is so hard to write correctly because of reference counting.
Some types of array addressing will take the slower---but still correct---invocation path through Python, but the common form is compiled to direct (albeit strided) array access.