
Python vs Julia – an example from machine learning - ajtulloch
http://tullo.ch/articles/python-vs-julia/?
======
hartror
As others have pointed out improvements could be made to improve performance
of the Cython code. However it seems to me the take away from this post should
be in this application Julia has a better performance profile while being as
simple and expressive as Python [1].

As a Python programmer that often works with these sorts of problems this
reinforces my perception that Julia, while not suited to all programming
problems offers a productivity gain when it comes to mathematical/algorithmic
computing.

Sure I could sit and tweak the Cython code or rewrite the hot loops in C++,
but it seems to me Julia hits a sweet spot where I am less likely to have to
compromise programmer performance for computational performance.

[1] I will note however that the code in the OP doesn't seem like the clearest
implementation of the algorithms in either language, some of that heavy
nesting should be factored for a start.

~~~
StefanKarpinski
Another subtle but important point is that the Julia code can be very easily
made completely generic without sacrificing any performance or clarity. All
you need to do is _remove_ those Float64 annotations and replace the 0.0
values with `zero(y[1]*weights[1])`. With that small change, the same exact
code will work for Float32, Rational, BigFloat, Complex (if that even makes
sense), etc. – even new, user-defined numeric types that didn't exist when you
wrote this code and that you know nothing about, as long as they can be
compared, multiplied, added, and divided.

This kind of generality isn't necessary for a lot of user-level code, where
there's a specific application, but it is quite important when writing
libraries that need to be as reusable as possible. And this is where the
Cython solution has some problems – sure, you can make it pretty fast, but in
doing so, you lose the generality that the original Python code had, and your
code becomes as monomorphic as C.

~~~
btown
Cython does have limited support for generics in the form of its fused types
[1]; it can be made to generate C code for different primitive types, for
instance. It's nowhere near as flexible as Julia in that regard, since you
can't redefine operators for working on Complex numbers for instance, but if
you want the interoperability and maturity (read: available programmers and
existing libraries) that Python offers, you don't need to trade away generics
completely. I'd love to live in a completely-Julia ecosystem, but
unfortunately that's not quite reality yet, and luckily Cython is more than
capable for making do.

[1]
[http://docs.cython.org/src/userguide/fusedtypes.html](http://docs.cython.org/src/userguide/fusedtypes.html)

------
srean
If the objective is to get a fast implementation of isotonic regression, I
think the priority would be to choose the right algorithm. PAVA is indeed one
of the fastest such algorithms and has a running time bound that is linear in
the number of data points. I may be wrong, but from a quick look at the code
it seems neither the [P/C]ython version nor the Julia version has an
implementation that ensures linear time. The files are named Linear PAVA so I
believe there was an expectation that they are indeed linear time.

Consider applying it to this sequence

    
    
         1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, -30.
    

It seems that the number of operations required according to these
implementations would be quadratic in the size of the sequence. (It will
average the tail repeatedly.)

I cannot read Julia yet, so not sure about the Julia version, but the others
do indeed look very much a quadratic time algorithm at the least. The behavior
will of course depend on input, so on certain inputs they may have a runtime
that scales linearly. Complexity addresses the worst case behavior and as I
showed coming up with such bad cases is not hard, and they need not be very
un-natural either.

    
    
       OTOH: 
    

If the objective was to compare speeds of CPython, Cython and Julia, then it
does not matter whether an efficient algorithm was chosen for the task. My
hunch is, because the current implementations computes more than necessary, it
is going to magnify the speed difference between Cython and Julia.

Does Julia do any inlining ? Or are there any plans for such.

If so, that would be one aspect where it could shine. As mentioned in the
comment the Cython version is _unfactored_ and manually inlined for
performance. If Julia inlines in its JIT pass, such measures would be
unnecessary and would allow one to write cleaner code.

~~~
ajtulloch
Yeah, I implemented a guaranteed O(N) PAVA in Julia (see
[https://github.com/ajtulloch/Isotonic.jl/blob/master/src/poo...](https://github.com/ajtulloch/Isotonic.jl/blob/master/src/pooled_pava.jl)),
and experimented with one in Cython - it was uniformly slower than the
'linear' PAVA across a few sample datasets I tried. See e.g.
[http://bit.ly/1oDF7nH](http://bit.ly/1oDF7nH) for some graphs.

I'm not claiming I've written the fastest possible implementations or anything
though - please feel free to improve any of this code and submit to scikit-
learn if you find some improvements!

~~~
srean
> I'm not claiming I've written the fastest possible implementations or
> anything though

Oh I never thought that you were claiming so. Thanks for clarifying the
implementation. Julia is indeed something that I want to pickup. BTW what
data-structure does this snippet invoke

    
    
        ydash = [1 => y[1]]
    

fixed length array ?

> please feel free to improve any of this code and submit to scikit-learn if
> you find some improvements!

Ah! then you would all hound me for my insistence of keeping
[https://bitbucket.org/sreangsu/libpav](https://bitbucket.org/sreangsu/libpav)
LGPL. A discussion I dont want to engage in. BTW I am not even sure if libpav
would be faster as it stands now, because it does some really stupid stuff,
like unnecessary allocation/deallocation on the heap. Its an old piece that I
wrote years ago to entertain myself (proof: comments like these

    
    
       // I hereby solemnly declare
       // that 'it' is not aliased 
       // with anything that may get
       // deleted during this object's
       // lifetime. May the  runtime
       // smite me to my core otherwise.
    

[https://bitbucket.org/sreangsu/libpav/src/dcc8411b10a0d4e220...](https://bitbucket.org/sreangsu/libpav/src/dcc8411b10a0d4e2202e95fc2fec29480ea5da61/src/under_oath.h?at=default#cl-48)
and the perverse namespace pollution).

Its such a strange coincidence: I started hacking on it again just a few days
ago, and lo and behold a thread on HN on PAVA. Now motivated enough to fix my
crap :) Once I get rid of that damned linked-list I think it would be decent
enough, although there is a fair chance that it would be faster than the
current Julia version.

That said, libpav does a lot more than just isotonic regression, its more of a
marriage between isotonic regression and large margin methods like SVM. Even
for just the isotonic regression case, it can work with any abstraction of an
iterator (so it is container agnostic) can take a _ordering_ operator, and is
of course fully generic (meaning can handle float32, double64 without any
change).

@jamesjporter Thanks

~~~
jamesjporter

        ydash = [1 => y[1]]
    

creates a Dict that maps for from Int64 to Float64, `=>` is used for declaring
Dict literals in Julia. c.f.:
[http://docs.julialang.org/en/latest/stdlib/base/?highlight=d...](http://docs.julialang.org/en/latest/stdlib/base/?highlight=dict#associative-
collections)

------
idunning
Very cool. We had a similar experience when implementing the Simplex method
for linear optimization, except we evaluated PyPy versus Julia instead of
Cython. Its similar in that it benchmarks "just" the language, not BLAS, which
is common pitfall I see when people compare the two. I guess at this point, a
year or so later, it isn't surprising to me anymore that Julia is as fast as
it is for these sorts of computational tasks - its been demonstrated pretty
comprehensively at this point.

Slides:
[http://www.mit.edu/~mlubin/informs2013.pdf](http://www.mit.edu/~mlubin/informs2013.pdf)

Paper: [http://arxiv.org/abs/1312.1431](http://arxiv.org/abs/1312.1431)

~~~
lqdc13
I am always confused by this. I think the primary reason for picking a
language is how simple it is to write something in it.

So if I can call scipy and get the result easily in one line, I would do that.
When implementing new algorithms, it will indeed be easier with Julia.

Pypy unfortunately cannot be used by most people who would benefit a lot from
it because numpy/scipy isn't currently supported.

I am curious if you tried running your pypy code with Numba.

~~~
EpicEng
Depends on your requirements. In many applications, performance matters a
whole lot more than "how much can I accomplish in one line of code."

~~~
lqdc13
In those applications, everyone would be writing everything in C then and such
benchmarks would be useless.

~~~
pjmlp
Why spend time increasing the amount of unsafe code in the world, if
comparable speeds can be achieved in better designed languages?!

------
thisrod
This looks dodgy to me. The purple and red lines on the graph don't end up
parallel, so the active set algorithm appears to have time complexity O(n) in
Julia, but O(n^2) or O(n^3) in Python. Can that be explained?

------
Sweyla
No kidding the Active Set solution in Cython is slow, it barely leverages
Cython at all! It uses a Python list which is managed by Python and uses none
of Cython's static typing.

Implement the algorithm inside a "with nogil" and you will see some nice
speed-ups. It won't look pretty, but for certain performance-sensitive code it
will be worth it.

~~~
ajtulloch
Hi, author here.

Yeah, the active set implementation in Cython is relatively slow - see [1] for
where I discussed how I benchmarked the existing algorithm and demonstrated
the PAVA implementation was faster.

Then again, until two days ago the Cython active set algorithm was the
production implementation of isotonic regression in scikit-learn, so it's not
like it was a strawman example that I made up.

[1]: [http://tullo.ch/articles/speeding-up-isotonic-
regression/](http://tullo.ch/articles/speeding-up-isotonic-regression/)

~~~
Sweyla
Try this implementation for PAVA:

[https://gist.github.com/gustavla/9499068](https://gist.github.com/gustavla/9499068)

Also, how are you timing Python? When I run this on a sample size of 10 using
IPython's %timeit, it takes a few microseconds, which is 2 orders of magnitude
from what you report.

------
crasta
Coming from R, I have to ask... Why am I seeing 'for' and 'while' loops? Why
isn't this code vectorized? It looks like both implementations suck.

~~~
3JPLW
Yeah, it's a big change in mindset for Julia.

Basically, the reason MATLAB/Numpy/R folks harp so much about vectorizing code
is because their interpreters are _slow_ , and within a vectorized function
they can punt to some C/C++/Cython implementation. But Julia loops compile
down to almost ideal assembly; they are remarkably fast. In fact, verbose
loops acting in-place almost always beat the vectorized alternatives because
they don't need to allocate space for the result.

The wonderful thing about Julia is that you can code it vectorized first, and
if that's not fast enough (or if it uses too much memory), you can easily
rewrite it in the same language.

~~~
crasta
You can't parallelize a for-loop the way you can apply a function on every
member of a vector. If code can be vectorized in Julia, that is what I'm
interested in seeing. These samples are obviously not intended for distributed
compute. Thousands of CUDA cores sit idle waiting for this synchronous for-
loop to finish.

~~~
maxerickson
Cython has a construct for parallelizing for loops.

[http://docs.cython.org/src/userguide/parallelism.html](http://docs.cython.org/src/userguide/parallelism.html)

I wouldn't want to claim an informed opinion about whether it is a good style
or not, but a block with no side effects isn't that far off from a function
with no side effects.

------
westurner
1\. Where is the source for this benchmark?

2.[http://benchmarksgame.alioth.debian.org](http://benchmarksgame.alioth.debian.org)
could be a bit more representative of broad-based algorithmic performance.

3\. There are lots of Python libraries for application features other than
handpicked algorithms. I would be interested to see benchmarks of the
marshaling code in IJulia (IPython with Julia)

------
maxerickson
How much time is Cython spending making the copy into the solution array?

(Or is Julia making a copy and I just don't see it?)

------
gregimba
If you need speed wouldn't the ability to write as a C extension beat using
Julia?

~~~
ihnorton
Writing performant extensions for (C)Python requires knowing both C and the
CPython API. In Julia it is easy to call C shared library functions directly
with no overhead (similar to PyPy with CFFI).

~~~
mmerickel
CFFI works fine with CPython, and ctypes loads shared libraries without issue
as well.

~~~
ihnorton
Both of which have non-trivial overhead in CPython.

------
pjmlp
Another good example how dynamic languages can be made to execute faster with
good native code generation support than having to force people to use C or
similar.

Looking forward to Julia's JIT improvements. This is only the early days.

------
e12e
Wait, are we looking at the same graphs? Doesn't this show that for large
problem sets cython and julia are pretty much equivalent as long as you choose
the best algorithm?

------
amit_m
tl;dr: Julia is killing it.

~~~
est
In every benchmark article: you are doing X wrong. Use Y method instead.

~~~
coldtea
Yes, it's as if those people think that instead of trying to measure rough
relative speed for similar implementations of some algorithm (which is what
you want when you benchmark), you are interest in finding the optimal
algorithm for the problem.

------
halayli
Why not use C++11 if performance matters?

~~~
emidln
"Why not use Fortran 2008 if performance matters?"

~~~
pekk
Well, why not? Julia is very different from Python, if I am willing to use a
completely different tool then the field is wide open and includes anything
that has better performance on average

~~~
StefanKarpinski
Julia's mental programming model for many things is not much different from
Python: dynamically typed, pass by sharing, the implicit scoping rules are
even quite similar. Sure, Julia has a few extra bells and whistles, like type
annotations and dynamic multiple dispatch (which you can ignore and you have
something very much like Python but faster), but still, Fortran and C++ are
much more different from Python than Julia is.

~~~
tejinderss
Any tentative idea when Julia 1.0 will be released?

~~~
StefanKarpinski
Possibly late 2014 or early 2015, but we're not going to rush it if things
don't feel ready yet.

------
t3kcit
Why are you not using typed memory views in Cython? That might easily close
the performance gap.

------
dbecker
Would love to see numba added into this benchmark.

