
How to Make Python Run as Fast as Julia - bsg75
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance
======
jordigh
I don't think this is accurately representing Julia's aims. Of course the
Julia team wrote the Python code in a way that makes it run slowly. But it
looks like perfectly natural Python code! It's almost a literal translation of
the Julia code. Julia's benchmark is even far worse with Octave, which they
almost deliberately wrote in the worst way possible for Octave, with lots of
loops and recursion.

We have written some documentation for Octave in order to guide people towards
writing faster Octave code:

[https://www.gnu.org/software/octave/doc/interpreter/Vectoriz...](https://www.gnu.org/software/octave/doc/interpreter/Vectorization-
and-Faster-Code-Execution.html)

But look at how much we have to explain, and look at all the hoops we have to
jump through in Python and Octave in order to write faster code. Matlab used
to have similar guides telling people don't write this, write that instead.

What Matlab eventually did was look at what people were writing and making
that fast. Julia did the same and made it even faster.

This is a lesson from C and C++ compilers that seems to be taking a long time
to trickle to other programming languages: as long as you're writing
reasonable code, speeding up your code is your _compiler_ 's job, not yours.
Your compiler usually knows better than you how to unroll loops, how to cache
results, how to use multiple cores, how to elide unnecessary intermediate
results, how to completely remove dead code. You should focus on writing easy
to understand, maintainable, high-level code. That's why you're using a
programming language and not machine code.

~~~
jacobolus
> _What Matlab eventually did was look at what people were writing and making
> that fast._

Except in practice, to write fast Matlab code you need very deep understanding
of Matlab’s internals and years of experience. Seemingly trivial patterns end
up slowing your code down by multiple orders of magnitude, and there’s more
“guess and check” involved when trying to write code that can be effectively
JIT compiled by Matlab than careful reasoning. Even worse, the JIT and the
profiler don’t get along, so it’s often impossible to get any insight into the
reasons for JIT-related performance differences.

In general, Matlab is an extremely unpleasant and frustrating environment
compared to almost any other language I’ve worked with. The only thing Matlab
has on Python/Numpy is a nice quantity of publicly available code for various
technical functions. Most of this code is hacky academic prototype stuff, but
that’s much better than nothing if you’re trying to follow someone’s algorithm
written up in a paper.

The Matlab GUI and tooling is a buggy and unpolished Java turd from the 90s
which fits in poorly with any modern operating system.

~~~
dagw
_The Matlab GUI and tooling is a buggy and unpolished Java turd from the 90s
which fits in poorly with any modern operating system._

And yet every time I try to get Matlab users to use something else (Python or
Julia) the one thing they almost immediately complain about is the lack of a
GUI IDE as good as the Matlab one.

~~~
skierscott
> And yet every time I try to get Matlab users to use something else (Python
> or Julia) the one thing they almost immediately complain about is the lack
> of a GUI IDE as good as the Matlab one.

Spyder offers a similar environment. It has a variable explorer, script
editor, console, etc all in one window (which I think is what matlab users are
looking for).

~~~
MagnumOpus
It's not just that - it is also the extremely extensive and comprehensive
documentation of every function (integrated in the GUI), and the powerful
profiler (integrated in the GUI), and the charting (integrated in the GUI)...

------
thebooktocome
The point of the Julia benchmarks was to show _compiler_ performance.

You can do something clever in any language. There are plenty of really,
really smart people that spend a lot of time writing incomprehensible (to me)
Haskell that outperforms C.

The question is, do you _have_ to do something clever to get performant code
in the language of your choice?

In Julia -- not often. I've written around 50kloc of Julia; almost all of it
is first-pass prototype code that manages to be performant despite itself. The
most polished code I've written in Julia is about 100x faster than the MATLAB
it replaced.

IMO, the main advantage of Python is its massive library of modules. As a
prototyping language, on the other hand, it just seems to me that Julia is
more flexible.

~~~
lqdc13
I agree, but this is only true as long as you stick with numerical
applications, which is kind of the point of Julia.

It is currently more of a domain specific language, kind of like Matlab,
because it's not really optimized for other things and has no libraries in the
other domains.

On the other hand, I implemented a prototype neural network in it, and it went
very smoothly. Will have to eventually rewrite it in Python though.

~~~
niutech
Julia is a general purpose language, not a DSL. You can create web services,
desktop apps, file tools with Julia. And it has access to a lot of external
libraries using CCall.jl and PyCall.jl

------
saurabhjha
I have worked with python in two domains- scientific computing and web
applications

\- In scientific computing, we can either use better algorithms (yes, that
makes a lot of difference) and dropping to C as necessary. The canonical
example of the second alternative is Numpy.

In my humble opinion, the expressive power of python is what makes it an
excellent language for scientific computing. For these kind of problems you
cannot afford to worry about buffer overflows and memory allocations. You need
a free mind to think about mathematical algorithms.

My own approach is to use python whenever I can do it and then use cProfile to
determine whether to port some parts to C.

\- If you are using python in application server, most of the time is spent
waiting for data. Mostly, the job of application server is to collect data and
make some kind of response which is not CPU intensive.

What you should optimize for in this case is data access patterns and creation
of data objects. On the other hand, if you have any CPU intensive work, write
it as a separate service outside of your application.

------
pathsjs
They really lost me at "Caching computations". It should be clear that the
benchmark is NOT the quickest way to compute Fibonacci numbers. The reason why
it is included at all is that - without caching - computing Fibonacci numbers
this way involves an exponential amount of computation, so it is easy to get
long running times. Using a cache invalidates the point of doing the benchmark
at all!

------
tokai
You can easily make Python run nearly as fast as Julia while only using the
standard library.

    
    
      import subprocess
      with subprocess.Popen(["julia", "tongue-in-cheek.jl"], stdout=subprocess.PIPE) as proc:
              print(proc.stdout.read())

~~~
mahouse
Nearly? That's understimating how much it will take for the Python interpreter
to wake up!

------
jankiel
"Making it fast" wad not a goal of that benchmark. The whole point of it is
measuring core features of the language - looping, recursion and so on. That's
why they have this horrible Fibonacci impelentation: to meadure how language
hanfles recursion.

Author misunderstands this. If you say that you're optimizing python by
calling C, then something's wrong here.

------
Animats
Suprisingly, they didn't try PyPy, which is about two orders of magnitude
faster than CPython on simple loops. PyPy needs to become the main production
version of Python. CPython should be viewed as obsolete technology, like the
original non-compiling Java interpreter in Netscape 1.

~~~
jfpuget
I did not try Pypy because last time I checked, it wasn't supporting Numpy. It
means that Pypy would not been able to run these micro benchmarks as Numpy is
used in some of them.

Please let me know if Numpy is supported now in Pypy. I'd be happy to add Pypy
in the mix in that case.

~~~
itsadok
I'm pretty sure everything you used is already supported:
[http://buildbot.pypy.org/numpy-
status/latest.html](http://buildbot.pypy.org/numpy-status/latest.html)

~~~
jbssm
Does PyPy work with Python 3?

Python 3 has been out for 7 years and I refuse to use anything that doesn't
work in Python 3, it's just ridiculous to keep building stuff for Python 2,
it's hindering the language and keeping it back in the past.

~~~
heinrich5991
Yes, it does:
[https://en.wikipedia.org/wiki/PyPy#Project_status](https://en.wikipedia.org/wiki/PyPy#Project_status).

------
chrispeel
The benchmarks seen at [1] which compare Julia with other languages were
written to be idiomatic in those languages. I.e., they weren't supposed to be
the fastest you could be in that language (which often would mean call a C
library), but rather something which represents the language well. So the
right criticism is not just if the benchmarks could be made faster, but also
whether the speedup is too tricky or hard or requires calling wrapped C
libraries.

[1] [http://julialang.org/benchmarks/](http://julialang.org/benchmarks/)

~~~
RyanHamilton
Actually Chris it says on the website the benchmarks were "written to test the
performance of specific algorithms, expressed in a reasonable idiom". I took
issue with how they had written some of the java, e.g. They wrote their own
quicksort which was slower than just using Arrays.sort, the much more
idiomatic way in java. I even submitted a PR which went nowhere:
[https://github.com/JuliaLang/julia/pull/14229](https://github.com/JuliaLang/julia/pull/14229)
I then broke the code improvements into smaller PRs and am still waiting after
2 weeks for the first PR to be merged.

~~~
ViralBShah
Hi Ryan, the point was not to use Java's built-in sort, but to implement a
textbook quicksort implementation in all languages to see how the compiler
performs. That is why the original PR was not merged.

On the smaller PR, I had requested fixing the mandel benchmark that in Java is
doing lesser work than the Julia and Lua benchmarks, which gives it an unfair
advantage. That should be easy enough to fix too - but I didn't get a reply.

Let's get it merged though, and continue the discussion on the PR.

------
pjmlp
The difference is that in Julia one stays within the language, doesn't need to
use subsets of the language or optimized C libraries where the language plays
a glue role.

I would agree with the article if Cython and Numba would support 100% Python
or Numpy wasn't used to achieve similar execution speed.

~~~
Nrpf
Julia also requires subsets of the language. Try writing rolled array
expressions (in a loop or otherwise) in Julia and in Numba and see which one
is faster.

Also try IO or text processing in Julia. Python is known to be faster right
now.

~~~
tavert
Counterpoint - try user-defined types (classes) in Numba.

~~~
Nrpf
User defined types/classes are currently being worked on in an open PR.
Excellent counterpoint for the time being, though.

Aside- Do you know if multiple inheritance/traits will happen at some point? I
need this for modeling, even though it can be worked around for general
software architecture.

~~~
tavert
Likely. There's some recent discussion at
[https://github.com/JuliaLang/julia/issues/6975#issuecomment-...](https://github.com/JuliaLang/julia/issues/6975#issuecomment-160857877)
regarding taking inspiration from Clojure's protocols. There are a few
different Traits and Interfaces packages floating around showing proof-of-
concept implementations. Couldn't give you a timeline on when it'll make it
into master, but it should happen.

~~~
Nrpf
Thanks. The link seemed still to be single inheritance focus (sorry, I don't
know the right terminology I come from a OO background.) but I could be wrong.
Though I did get the sense that its just a interim step towards multi
inheritance.

What about the dataframe and stats infrastructure? its currently in shambles.
Any Idea when this can be expected to be fixed?

~~~
tavert
Stats and dataframes now has someone working full time on it. It will get
appreciably better soon.

~~~
Nrpf
Very glad to hear that. Anywhere roadmap/ central place where I can contribute
and follow progress?

~~~
tavert
Hopefully there will be a blog post soon with some details and plans.

~~~
Nrpf
Looking forward. I was about to embark with python on a new long term project,
but I might delay that pending the new blog post. If possible do you have a
guestimate on what sort of time window we are looking at for this blogpost?
Days, weeks, month?

~~~
tavert
Probably not days. I won't be the one to write it so I can't make any
especially reliable predictions here. If enough people ask for this, probably
some time in January.

------
gaze
Turns out that if you bend over backwards and use some libraries and do a
bunch of awkward stuff you can make most languages fast.

~~~
kriro
I don't know but

    
    
      def benchmark_sort_numpy():
          lst = np.random.rand(5000)
          np.sort(lst)
    

isn't awkward at all and I'd argue that numpy (and pandas etc.) are very
natural choices for anyone working on the problems they solve well. That's
precisely the beauty of Python. There are very good libraries for almost
everything. Usually there's also great communities around those libraries and
it's usually not very hard to identify the "state of the art" library for any
given problem.

For me the main question is not "will the compiler optimize well in the
general case" but rather "will you naturally reach for the right libraries
which are optimized well". For me/Python I'd say more often than not the
answer is yes. I understand that that's not the point the Julia team is trying
to make but it's a decent practical approach (imo)

~~~
tavert
The motivating factor for using Julia in a lot of cases is: what do you do
when the problem you're trying to solve hasn't been exactly solved already by
someone else's C extension? Can an average person (scientist, grad student,
etc) who knows the math behind the problem they're trying to solve, but
doesn't want to jump through hoops of awkward extension compilation (where you
have to know not only the high level language and the low level language, but
also how to use the interface layer API's that sit between them), write a
high-performance implementation from scratch without it taking too much time
or effort? If libraries do exist, do they work in parallel? And the standout
features of the language like multiple dispatch and metaprogramming also allow
some new, very natural ways of approaching a lot of problems in technical
computing.

~~~
dr_zoidberg
That motivating factor can be achieved in Python using Numpy and Cython
effectively. Check any of Ian Oszvald's High Performance Python talks.

~~~
tavert
Numpy is great for dense multidimensional arrays of (edit: fixed precision)
floating point numbers. Most problems I face need to deal with richer, more
complicated, less uniform data structures than that. Similarly Cython is way
better than writing a C extension by hand, but it feels very tacked-on (why
are you writing libraries in a different sub-language than you use them
from?), what you can do in nogil mode is pretty limited, and the choice of
supported compilers is depressingly limited for when you need C++11, inline
assembly, Fortran, linking to libraries that build with autotools, etc all to
work cross-platform. If absolutely everything in the Python ecosystem were
written using Cython then Python would have less of a performance problem, but
there's a productivity, distribution, and difficulty barrier there.

------
princeb
i noticed that running arithmetic loops tend to be the popular method of
benchmarking these three languages.

I don't know how many folks who use MATLAB care that much about loop
performance that they will be inclined to look at Python or Julia just because
someone found an amazing improvement there.

The one thing that made me finally go over to Scipy from MATLAB was that I
changed focus and no longer have to do analysis involving nonlinear/stochastic
optimization. Several years ago (maybe 2009?), Octave was really struggling
with speed here, and numpy still felt too new. I vaguely remember needing to
wrangle with the mathematics a lot more (like approximating the Jacobians or
Hessians) in order to get Octave to work. On the other hand I can only
remember a handful of times where I needed to get MATLAB to do loops like
these benchmarks (like maybe Runge Kutta or FD ODE solutions). Loops are
really quite unnatural/unidiomatic in MATLAB, if you can keep your algorithm
as close to linear systems as possible it's quite fast.

Has it changed much since? I know right now most of the important nonlin opt
algorithms are available in scipy and if you want more there are external
packages, but you still have to tinker a bit for the best solution. There's
nothing like mindlessly using fmincon for every single problem in the world. I
am only a layperson at nonlinear optimization, so I can't tell you why MATLAB
is so much better out of the box.

~~~
tavert
"Vectorizing" (interpreter-out-of-the-way vectorization, which isn't the same
and doesn't necessarily give you SIMD vectorization) your code to get Matlab
or NumPy to run it efficiently always seems like an unnecessary burden to put
on the programmer, especially for algorithms that don't lend themselves to
expressing in a vectorized way. Sometimes you just need to write a for loop,
and it's great when the language gets out of your way and lets you do so
without slowing your code down by an order of magnitude.

If you care about constrained optimization, Julia has leaps-and-bounds more
sophisticated tools than anything Matlab or Python have to offer. Check out
[http://www.juliaopt.org](http://www.juliaopt.org) and especially JuMP.jl.
[http://www.optimization-
online.org/DB_FILE/2015/04/4891.pdf](http://www.optimization-
online.org/DB_FILE/2015/04/4891.pdf) has some detailed comparisons. Macros and
fast generic programming make Julia a very well-suited language for doing
automatic differentiation
([https://en.wikipedia.org/wiki/Automatic_differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)).

------
porker
While it's great to make code run as fast as possible, not every scientific
problem is bounded by code execution.

I've recently been writing satellite image processing code, and profiled it
thinking the algorithm was the problem. It turned out that even on a SSD
nearly 90% of the program time (~30 minutes) was reading from & writing to
disk intermediate image files.

More could be kept in RAM, but high-memory cloud machines aren't cheap, and
our local development machines only have 16-32GB.

~~~
sgt101
You can get some amazing rack mounted machines now, ok not for cheap, but for
relatively cheap given the value that they bring. I'm buying an analytics
"box" at the moment with 20k gpu cores, 386GB ram and a 45k iops ssd, ok its
$40k but shared across 20 engineers the productivity boost more than costs in
really fast. It sounds to me that you need to take an investment case to your
boss.

~~~
porker
> It sounds to me that you need to take an investment case to your boss.

Welcome to academia...

~~~
sgt101
well, if you want the publications you need the kit.

------
mahouse
By rewriting Python code into something that does not resemble Python not even
remotely, you can make Python code run faster.

~~~
IndianAstronaut
True. It's almost the equivalent of using Rcpp with R and saying it runs
really fast.

~~~
Mikeb85
But at the end of the day, it does run really fast. And that's really all that
matters.

Rcpp is really nice too, and definitely is a win for R.

------
awqrre
Python is really slow but performance can be improved by just switching to a
different interpreter.

------
jfpuget
I updated the post with Julia running times on the same machine as the one
used for Python.

------
hltt
use Pyston

------
stefantalpalaru
> I am not using an alternate implementation of Python here

What does he think Cython and Numba are?

> I am not writing any C code either

No, he's just using libraries written in C and Fortran in a ridiculous attempt
to praise Python.

> Writing better Python code to avoid unnecessary computation

You don't improve a language benchmark by changing the algorithm. If you don't
understand why, you have a lot more to learn before teaching people how to
"make Python fast".

