I stand by my assertion that ctypes along with an external C library is a great way to do Python speed-ups. It's very simple to do, see here:
This is the kind of optimizations I usually use ctypes for, or for interfacing with a third-party shared library.
ctypes + C code can be quite efficient, but you have to write the entire fast-path in C, not flip-flop between C and Python. It's best when you have a certain operation that needs to be fast (say, serving your most common path on a webserver, or running a complex matrix operation in Numpy), and then a bunch of additional features that are only called once in a while.