
Writing C in Cython - syllogism
http://honnibal.wordpress.com/2014/10/21/writing-c-in-cython/
======
Smerity
Python + Numpy: core loop is computationally intensive + benefits from vector
ops

Cython: repeated invocations overpower any core loop advantage

In the case of NLP, which is the author's field, you'll find not only do you
have an intensive core task, but the number of times you'll call something is
insane.

To give an idea: the time complexity for parsing is O(n^3) [n = words in a
sentence]. For Combinatory Categorial Grammar (CCG), the formalism that the
author Matthew and I commonly use, it's O(n^6). At that point, the inner loop
almost doesn't matter -- even if it was just a null op, the function call
overheads and everything around it are performance killers. Both of those are
what Cython works incredibly well for, especially when you get full control of
the data structures.

This is one of those situations where the concept that premature optimization
is evil is really quite far off..! =]

~~~
davvolun
I don't see where this is premature optimization. You, and the author, clearly
know a lot about the field and the nature of the problem, so you're already
past the 'premature' portion. If I know from the start that I've got a
function that _will_ be called 10,000 times per second, there's no reason I
should write the un-optimized version first. That's like starting every sort
off with a bubble sort; I don't need to optimize to quick sort switching off
to insertion as appropriate from the beginning of coding, but I know I can do
better than bubble from the get-go.

Defining "premature optimization" as the less hyperbolic "optimization before
we know we need it," I think it's clear that the problem you've written shows
you already know you need it!

------
qewrffewqwfqew
I prefer Writing C in Tcl:

[https://chiselapp.com/user/rkeene/repository/tcc4tcl/wiki/Ex...](https://chiselapp.com/user/rkeene/repository/tcc4tcl/wiki/Examples)

~~~
_pmf_
That's amazing and makes me really sad that Tcl has not won the DSL world
over.

~~~
pjmlp
Back in the late 90's I worked for a startup that used it as their core
language.

Quite a good experience.

------
krick
Speaking of "really is just C with some syntactic sugar" — so, is there really
a point in putting effort in learning (and using) Cython over C? Reading the
code in the article I wasn't completely sure if I should feel like it is
"cool" or "confusing". Not that there was something I actually dislike, but
it's somewhat awkward to see C written like Python, so I can't "rate" it
myself right away.

~~~
syllogism
Well, here's a less trivial example:

[https://github.com/syllog1sm/redshift/blob/develop/redshift/...](https://github.com/syllog1sm/redshift/blob/develop/redshift/parser.pyx)

The task/algorithm is roughly (but not precisely) the one described here:

[http://honnibal.wordpress.com/2013/12/18/a-simple-fast-
algor...](http://honnibal.wordpress.com/2013/12/18/a-simple-fast-algorithm-
for-natural-language-dependency-parsing/)

If you just scroll through parser.pyx, you might get the flavour of why this
is better than straight C. The "core", or "skeleton" of the code is all
C-typed, but I "fall-back" to Python for work-a-day stuff like reading and
writing config files, collecting the set of labels, etc.

There are also places in the algorithmic code where I use Python data
structures. In Parser._count_feats, I use a dictionary of dictionaries. This
would be quite annoying in C, and efficiency is quite irrelevant here.

I think using Cython in this way has given me a competitive edge in my
research. This library has by far the best published results on speech
parsing[1], and its speed/accuracy trade-off on written text is as good as any
other libraries out there.

[1] [http://www.transacl.org/wp-
content/uploads/2014/04/46.pdf](http://www.transacl.org/wp-
content/uploads/2014/04/46.pdf)

~~~
Beltiras
Which profiling tool do you use to target your optimizations?

~~~
andreasvc
I use cProfile; for the Cython code you need to set profile=True as a
compilation directive.

------
dang
This is similar to how people are using the LuaJIT FFI, as a higher-level,
garbage-collected C that lets you control data layout in memory.

~~~
haberman
That's an interesting comparison, because under the hood Cython and LuaJIT FFI
could hardly be more different.

Cython is an optimizing Python (+ extensions) -> C compiler.

LuaJIT FFI is a library for Lua. It doesn't change the language at all (there
are none of Cython's optional annotations), there is no build-time tool or
need to run a C compiler.

But I can see how they could serve some of the same use cases.

~~~
dang
In both cases a tool is being adapted for something it wasn't intended for: to
write C-style code—specifically, code that works with memory the way C does—in
a high-level scripting language.

This way of combining high and low levels within the same program seems
qualitatively unlike, say, C++, traditional FFIs, and other answers to this
problem. It's fascinating to see it emerge from more than one existing
technology.

~~~
maxerickson
"Python with C data types" was somewhat of a design goal for Pyrex, the
language that Cython followed/built on:

[http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ver...](http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/About.html)

~~~
dang
That's fascinating because it is very much about the same problem.

~~~
maxerickson
I guess I'm not sure which tool you meant was adapted; Cython (and Pyrex)
certainly adapt Python to work more directly with memory, but I would say the
style of code in the article is an intended use case of them.

(I also might have undersold the connection between the two, Cython is
arguably just a friendly fork (I guess mostly they wanted to move faster))

~~~
dang
I just meant that the OP is adapting Python (via Cython) similarly to how FFI-
driven LuaJIT adapts Lua. The resulting programming style is different enough
from classic Python or Lua that it approaches being a different
language—something like an embedded C DSL.

------
xioxox
When I've tried to use Cython I've not enjoyed it. I feel like I'm invoking a
load of barely comprehensible copy and paste magic if I need to use numpy
arrays. I imagine it would be possible to understand this stuff, but I need it
so rarely that I can't justify learning it. C or C++ feels so much simpler.

~~~
ngoldbaum
Can you give an example of the copy/paste you're worried about?

I agree that working with numpy arrays in cython introduces some semantic
overhead, then again in newer version of cython they've introduced the more
generic concept of typed memoryviews [0]. I had a lot of success recently
using this to generate some very efficient multithreaded code with a minimum
of hassle.

[0]
[http://docs.cython.org/src/userguide/memoryviews.html](http://docs.cython.org/src/userguide/memoryviews.html)

~~~
xioxox
It's things like the square brackets and triangular brackets, e.g. def
naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g)

I know what this is for, but I have no idea how this is implemented - is it
some sort of library, some sort of macro, something inside the compiler, or
what? It feels very opaque. I don't understand the lifetime of these objects
or what could go wrong.

Even with C++ templates, I can usually work out how something is happening,
but not here without research. The Cython-specific syntax also feels pretty
unpythonic. The feeling I get from using it is that it's an ugly Python/C/C++
mutant with a load of hidden rules.

What I really would like is a set of templates or even macros for C++ which
would allow easy wrappers to be written for different Python implementations,
handling the reference counting, etc. Does such a thing exist?

~~~
andreasvc
The example you give is the old way to specify the type of a numpy array. With
typed memory views it's more straightforward IMHO:

    
    
        cdef convolve(double [:, :] f, double [:, :] g)
    

I don't know if there are templates or macros for C++ which do the same thing,
but if you look at the output of Cython and the complexity and amount of work
it's doing, I don't think it's likely something like that could work.

The generated code is of course clunky and difficult to follow, but I found
that when trying to understand specific things it's easy to follow what parts
of the Python C API are being invoked and consider the overhead involved. The
annotated HTML output is useful for this (cython -a).

~~~
ngoldbaum
And usually you want to _avoid_ invoking the python C API from cython unless
you have to use it for some reason. The whole point of the typed memoryviews
is that you get direct access to the raw memory buffer and can access it like
a numpy array or even like a pointer array.

This blog post [0] incrementally shows how to get microoptimizations out of
your cython code.

In the end, the autogenerated code is usually not too bad too follow since
it's a one-to-one transpilation from the cython to C. cython -a helps a ton,
or if you can use it, the cythonmagic [1] for the IPython notebook embeds the
output of cython -a inline in a notebook, making the iteration process much
quicker.

[0] [https://jakevdp.github.io/blog/2012/08/08/memoryview-
benchma...](https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/)

[1] [http://ipython.org/ipython-
doc/2/config/extensions/cythonmag...](http://ipython.org/ipython-
doc/2/config/extensions/cythonmagic.html)

------
syllogism
In response to
[https://news.ycombinator.com/item?id=8482445](https://news.ycombinator.com/item?id=8482445)

------
MordodeMaru
We developed biicode (a C and C++ deps manager) in python using Cython

[http://blog.biicode.com/bii-internals-compiling-your-
python-...](http://blog.biicode.com/bii-internals-compiling-your-python-
application-with-cython/)

------
mahyarm
So how would you do multithreaded apps with cython while avoiding stuff like
the GIL?

I work with objective-c a lot and python was my first language, so to see this
kind of unification just blows my mind.

~~~
syllogism
In theory this is easy and good, in practice it's not.

[http://docs.cython.org/src/userguide/parallelism.html](http://docs.cython.org/src/userguide/parallelism.html)

The problem is that it's really nasty to declare all your functions nogil,
because that means you don't get exception handling. There may be a trick to
this that I've missed --- I find the GIL handling stuff in Cython sort of
confusing.

I think my applications are genuinely hard to multi-thread, so maybe it's
usually easier than I've found it. It doesn't help that I use a Macbook Air,
and OpenMP isn't available for CLang. Developing on OSX sucks sometimes.

~~~
ngoldbaum
I'v had a lot of success with cython's parallelism. The key is to release the
GIL in tight inner functions of big loops. You can then parallelize the loop
by iterating over calls to the nogil function or "with nogil" block.

The support for only openmp kind of sucks, but I'm working on a Linux cluster
so it doesn't bother me except when I'm on my laptop.

------
cturner
In C, you need header files so that vars are declared before they're used. How
does cython deal with this? I find headers painful, and would appreciate a
memory foundry language more if it didn't suffer this.

~~~
syllogism
Unfortunately, you're dead right --- this sucks in C and it sucks in Cython!

I don't understand why .pxd files are necessary --- they should be
generateable from the .pyx files. At some point I'll have a go at this...

But, as of now, you need to maintain this separate file, which you must keep
synchronised to changes in the original. And, yes --- this sucks!

The only mercy is that writing code in the headers is illegal.

~~~
andreasvc
It is not illegal to write code in pxd files. For cdef functions to be inline-
able across modules, they need to be specified in a pxd file. But that is the
exception to the rule.

------
Reef
It is a bit hard to read code with one-letter variable names

~~~
zellyn
I'm not sure why position is "r", but F for force, m for mass, w for world (in
a short function), N for, well, N - doesn't seem too bad. Math looks much
worse if you spell everything out.

~~~
randlet
It's pretty standard to use _r_ as a position vector in physics.

------
lhc-
I took at peek at Cymem just to see what was going on, as it was described as
a "small" wrapper. A 3600 line .c file was... more than I expected.

~~~
syllogism
[https://github.com/syllog1sm/cymem/blob/master/cymem/cymem.p...](https://github.com/syllog1sm/cymem/blob/master/cymem/cymem.pyx)

The .c file really shouldn't be in the repo...

------
pjmlp
While it is interesting to see all these efforts improving the Python
ecosystem, I go straight to a ML language with native compiler.

------
timClicks
I wonder how much time is needed to learn C proficiently to make use of Cython
properly.

~~~
Fede_V
It depends what you mean by 'use cython properly'. To speed up some numerical
routines, you don't need to know almost anything about pointers/malloc, etc,
you just move your Python function into its own separate file, rename it from
.py to .pyx, add type annotations, and enjoy the free speed up.

To do more complex things (low level wrappers, etc) then you do need to know
more things about C.

------
ape4
Its good for job security.

------
aleksandrm
Cylons write in Cython.

