
Python JITs are coming - Derbasti
https://lwn.net/Articles/691070/
======
chrisseaton
> If it needs to be fast for CPython, it has to be written in C, but if it
> needs to be fast for a JIT, you cannot use C. He showed a simple mysum()
> function that totaled up the elements in an iterable. If it is passed a
> Python object like list(range(N)), the JIT knows what it is and can do lots
> of optimizations. But if it is passed a NumPy array, which is "opaque C
> stuff", the JIT doesn't understand it, so it will have trouble even
> achieving the performance of a non-NumPy version on a JIT-less Python.

There's a third way here - run the C code using the same JIT as the Python
code, instead of compiling it natively.

That might sound mind-bending, but C is just a language like any other. It's
actually fairly simple and consistent to implement compared to something like
Python. You can interpret and JIT compile it if you want to - there's no major
magic to that.

Then you can optimise the C code and the Python code at the same time, inline
the two, do the same optimisations as you do on Python code, etc. We're using
this technique to run Ruby C extensions in JRuby and the results so far are
great - running real C extensions faster than native code because we can
optimise both the Ruby and the C at the same time.

[http://chrisseaton.com/rubytruffle/cext/](http://chrisseaton.com/rubytruffle/cext/)

~~~
vmorgulis
Very interesting point of view.

I see a fourth option:

\- compile the scripting language to C (like Crystal for Ruby)

~~~
duaneb
Sure, but then you lose any advantage using a scripting language gives you.

~~~
natermer
That and you lose the advantage of what JIT compiling can get you.

------
chriswarbo
The Truffle implementation of Ruby can be plugged into a Truffle
implementation of C, in order to maintain compatibility with C extensions

[http://chrisseaton.com/rubytruffle/cext/](http://chrisseaton.com/rubytruffle/cext/)

Has anyone in the Python community thought of doing something similar, e.g.
using PyPy to build a C interpreter, maybe re-using front-end components (pre-
processor, parser, type-checker, etc.) from an existing C compiler? In fact,
it might be useful to build an interpreter for something like LLVM IR, or even
x86 machine code, in order to gain access to a bunch of existing languages.

Once access is gained at that low (ABI?) level, abstractions and interfaces
can be built to hide the horribleness. The performance would initially be
terrible, but some targetted, profiling-guided optimisation might get it down
to reasonable levels; in a similar way to a JS interpreter adding specific
optimisations to make asm.js code run fast.

~~~
PeCaN
Something like Sulong¹ in RPython would be really cool and probably entirely
feasible—I too am curious if that's on the PyPy team's radar.

¹ [https://github.com/graalvm/sulong](https://github.com/graalvm/sulong)

~~~
ltratt
Sulong is a really good idea -- I wish I'd thought of it first! We've had a
student do a small project looking at something equivalent for RPython. There
are, as expected, no show-stoppers yet, but I have no idea how far we'll be
able to go with the limited resources we have to throw at the problem.

------
fijal
I must say I'm always a little annoyed by the split in Python community around
the scientific stack vs the rest. There are two ecosystems, two packaging
tools etc (which is fine), but the insistence that there are no other python
users that are worth considering (both sides are guilty) is really
frustrating.

~~~
dagss
Former Cython developer here.

To get some perspective to this, consider the alternatives for a scientific
programmer: MATLAB, R, FORTRAN, Mathematica, or if you're hip, Julia -- all
specifically made for scientific programming with 0% general purpose/web/etc.
development going on.

So I would say that the scientific Python community has been doing extremely
well in terms of even using a language that isn't designed ground up for
scientific computing.

I could write a lot about why that is (and how some of the CS and IT crowd
doesn't "get" scientific computing..) -- I'll refrain, I just wanted to say
that were you see something and get frustrated, I see the same picture and
think it's actually an incredible success, to bring so many scientists onto at
least the same ball park as other programmers, even if they are still playing
their own game.

~~~
Athas
> I could write a lot about why that is (and how some of the CS and IT crowd
> doesn't "get" scientific computing..)

Please do! I'm a PhD student in CS, and I don't think I "get" scientific
computing (I'm in compilers myself).

~~~
alcidesfonseca
I am a PhD in CS, specifically in Programming Languages and Parallel
Programming and I believe I do get scientific computing.

Most people doing it did not have a formal CS education. They are biology,
physics, mathematics or chemistry majors that have had one or two courses on
programming, from other scientific programmers.

There are two main families, one that comes from the Fortran background, which
still writes programs like they did in the 80s, with almost no new tooling.
Programs are written for some time, and then they are scheduled for clusters
that spend months calculating whatever it is.

The other family of scientific programmers, which I believe is the majority,
uses a tool like Matlab, or more recently R, to dynamically inspect and modify
data (RStudio is a Matlab/Mathematica-like friendly environment for this task)
and use libraries written by more proficient programmers to perform some kind
of analysis (either machine learning, DNA segmentation, plotting or just basic
statistics).

Most of these programmers know 1 or 2 languages (maybe plus python and bash
for basic scripting). They write programs that are relatively small and the
chances of someone else using that code is low. Thus, the deadline pressure is
high and code maintainability is not a priority.

For a non-CS programmer, learning a new programming language is almost
impossible, because they are used to that way of doing things, and those
libraries. They take much more time to adjust to new languages because they do
not see the language logically, like anyone who had a basic compiler course.

Given this context, web apps, rest APIs and all the other trending tech in IT
are not commonly used in scientific programming, because they typically do not
need it (when they do, they learn it). Datasets are retrieved and stored in
CSV and processed in one of those environments (or even in julia or python-
pandas).

~~~
matt4077
You're painting an awfully dark picture of scientists' skills. Having been on
both sides, I believe the deciding factor is simply the availability of
libraries.

If you're doing web development you have an insane amount of languages to
chose from because after String, Array, and File are implemented, HTTP is
next. Having done a bit of web development, I'd also say a typical project
only uses a subset of libraries that is surprisingly small.

Scientific computing is quite different: a paper in structural biology (my
former stomping grounds) can easily require a few dozen algorithms that each
once filled a 10-page paper. These could easily be packaged as libraries, but
it's a niche so it rarely happens. Newer language quite often don't even have
a robust numeric library. Leave the beaten tracks and your workload just
increased by a magnitude.

That's also why science, unlike "general purpose" programming, often uses a
workflow that connects five or more languages or so: a java GUI, python for
network/string/fileIO, maybe R for larger computations, all held together by a
(typically too long) shell script.

But these workflows are getting better. There's a build tool that formalizes
the pipeline somewhat (I forgot the name) and APIs are surprisingly common.
The reason why csv will never die is that the data fetched from APIs is
usually more static than it is in a typical web app (-> local cache needed)
and that scientists often work with data that just isn't a good fit for a
database. Postgres just doesn't offer anything that enriches a 15MB gene
sequence.

~~~
pjmlp
I worked in the academia for a few years about a decade ago and nowadays
interact with biology research in the industry for the last couple of years.

The way he painted the scientists skills matches my experience thus far.

~~~
dagss
Yes, scientists programming skills (as averaged over population) suck. Factor
1: Programming not credited in itself or reviewed in publishing process.
Factor 2: Often little education in or focus on programming, relative to wall
clock time spent doing it.

But I don't think that is only fixed by more education and making scientists
behave more like programmers. I think that to change things one also needs far
better alternatives than the options available today, so that people are
really encouraged to switch. Somehow, these must be written by people who know
their CS and can write compilers, yet engage with the _why_ scientific
computing is a mess on the tool side too, not dismiss it as laziness.

I started out as a programmer, I have contributed to Cython, past two years
have been pure web development in a startup. So I know very well why MATLAB
sucks. Yet, the best tool I have found myself for doing numerical computing is
a cobbled mess of Fortran, pure C, C/assembly code generated by Python/Jinja
templates, Python/NumPy/Theano...

The scientific Python community and Julia community has been making great
progress, but oh how far there is left to go.

~~~
pjmlp
I agree, this is also one of the things that drives me against C and more into
saner programming languages.

Because the majority of programmers in areas where software isn't the core
product being sold, don't spend one second thinking about code quality.

As such tooling that on one side is more forgiving while allowing for fast
prototyping, but at the same time enforces some kind of guidelines is probably
the way to improve the current workflows.

------
vonnik
Total tangent: Anyone doing scientific computing work on the JVM may be
interested in:

ND4J: N-dimensional arrays for the JVM [http://nd4j.org/](http://nd4j.org/)

Libnd4j: The C++ engine powering the above
[https://github.com/deeplearning4j/libnd4j](https://github.com/deeplearning4j/libnd4j)

JavaCPP: The bridge between Java and C++ (Cython for Java)
[https://github.com/bytedeco/javacpp](https://github.com/bytedeco/javacpp)

Fwiw, all that works on Spark with multi-GPUs.

[http://deeplearning4j.org/spark](http://deeplearning4j.org/spark)

[http://deeplearning4j.org/gpu](http://deeplearning4j.org/gpu)

~~~
adrianm
Wow, how did I miss javacpp? Looks featureful and actively maintained to boot.
Thank you for these recommendations.

~~~
vonnik
Glad you like the looks of it. It was built and is maintained by a Skymind
engineer. All of us are on this Gitter channel if you have questions:
[https://gitter.im/deeplearning4j/deeplearning4j](https://gitter.im/deeplearning4j/deeplearning4j)

------
bakery2k
An example of how, for certain types of code, more than an order-of-magnitude
performance improvement can be obtained with a good JIT compiler. I wrote a
decompression routine (for a simple compression format) in several languages -
it took the following lengths of time to run (normalized to C++ = 1 second):

* _CPython_ , Jython, Lua, MicroPython, Ruby (1.8): >100 seconds

* Cython (naive), IronPython, Ruby (2.3): 50 - 80 seconds

* _LuaJIT_ , _PyPy_ , Cython (with type hints): ~4 seconds

* C#, Go, _JavaScript (V8)_ : ~1.5 seconds

~~~
Jcol1
can you try it in julia?

~~~
bakery2k
I actually wrote a Julia version at the time, but it was difficult to
accurately benchmark because the total runtime was significantly affected by a
long (~5 seconds) delay at startup.

Recently, however, I came across
[https://www.reddit.com/r/Julia/comments/4c09m1/](https://www.reddit.com/r/Julia/comments/4c09m1/),
which suggests running with `--precompiled=yes` to reduce startup time. I have
now run the Julia version with that command-line option, and the runtime is
impressive - comparable to C#/Go/V8.

~~~
tavert
Most Julia users run code more often in an interactive REPL session rather
than as a script to do a single thing. The precompiled code loading is now on
by default on Windows in Julia 0.5 since the backtrace issue was fixed, but
there's still a 0.3-0.5 second delay on startup at the moment.

------
vegabook
I can't help thinking that we need a new scientific programming language that
is properly vectorised and directly targets GPUs, as opposed to all these
hacks that unroll loops. Now that a large proportion of data science and
scientfic computing is GPU centric, aren't there new languages designed with
these parallel architectures _foremost_ in mind? Doesn't functional
programming map very well to this?

It almost seems to me that just as we're about to get c-speed in Python via
JIT tech, so it'll almost immediately be left behind because the GPU is where
it's at.

~~~
Athas
One problem is that functional programming style is often focused on recursion
- not just in its functions, but also its data structures. Trees, linked lists
and graphs are GPU poison, while arrays map very well to the hardware.
However, arrays don't have the pleasant recursive structure that allows you to
encode nifty properties (and do pattern matching) the way that is popular in
functional programming.

However, array programming is basically functional programming, and _does_ map
very well to parallel execution in general. While most array programming is
nowadays of the fairly simple form you see in libraries like Numpy, older
languages like APL show how far you can go. It's a somewhat alien style of
programming to most modern programmers, though.

My rule of thumb: parallel programming is hard. Functional programming makes
it trivial to make parallel programs that are correct, but they still might
not be fast.

~~~
vegabook
Is Rust the right way forward then? Functional enough without obsessing about
linked-memory structures / recursion etc?

~~~
Athas
I don't think so. Rust has excellent support for concurrency, but does not
have any interesting parallel constructs. Which means that while it can
probably be fine for coarse-grained multicore parallelism, it does not provide
the fine-grain parallelism you need on something like a GPU.

~~~
steveklabnik
Not built in, but there are libraries, like rayon or crossbeam.

------
faragon
Why not a Python to machine code compiler like Cython, but much faster and
with all Python 3 capabilities? (e.g. as GNU gcj for Java)

Getting 30x speed up with compiled code using optimized data structures (e.g.
pointer compression for better data cache usage, etc.), although not reaching
performance of best JIT implementations, it could be good enough.

------
robohamburger
The best way forward seems to be to continue to fund these existing JITs.
Maybe once one is ready it can replace cpython.

Also pypy not supporting the latest python 3.5 is kind of a big deal for
adoption. I would rather have 3.5 support than numpy/cpython compat but I
don't do scientific computing :)

~~~
ambivalence
You ask for it, you got it!

[https://morepypy.blogspot.com/2016/08/pypy-gets-funding-
from...](https://morepypy.blogspot.com/2016/08/pypy-gets-funding-from-mozilla-
for.html)

~~~
robohamburger
That is awesome! Python 3.5 all the things :)

------
ksec
Compared this to the Ruby, which is NOT getting much love from the compiler /
JIT community.

While one can argue Ruby gets the best and leading edge JIT from Jruby / Graal
/ Truffle. I dont think it is used much in production. It seems the majority
of Python and Ruby has stuck to the default CPython and CRuby runtimes. Both
PyPy and JRuby seems to be rather small in usage.

Why may that be?

------
ChickeNES
>LLVM or libraries from Microsoft and IBM can be used to ease the building of
JITs.

Which libraries are they talking about? I tried searching but the only
mentions of JITs I found on Google were IBM's JVM, and Microsoft's CLR.

~~~
mastax
Oracle has GraalVM

[http://www.oracle.com/technetwork/oracle-labs/program-
langua...](http://www.oracle.com/technetwork/oracle-labs/program-
languages/overview/index-2301583.html)

------
Animats
Google tried to speed up CPython with a JIT, and failed. Remember Unladen
Swallow?

~~~
ragebol
IIRC, Unladen Swallow aimed to remove the GIL, which is a different can of
worms to open.

~~~
brettcannon
Actually Unladen Swallow's aim was simply to be fast. :) They attempted to do
that through using LLVM's JIT but ran into a lot of bugs in LLVM and had to
spend their time fixing rather than on Python itself. There was no specific
work to deal with the GIL.

------
dman
Are any of the JIT strategies able to create something that can be invoked as
a raw function pointer from C/C++ without any dependency on the GIL?

~~~
corysama
Not Python, but you might like [http://terralang.org/](http://terralang.org/)

------
tripzilch
> Jupyter (formerly IPython) is "an overgrown REPL"

They say that like it's a _bad_ thing.

------
denfromufa
How about add JIT and remove GIL at the same time?

~~~
gtirloni
Interesting talk to answer part of that question:
[https://www.youtube.com/watch?v=P3AyI_u66Bw](https://www.youtube.com/watch?v=P3AyI_u66Bw)

