
Let's Write an LLVM Specializer for Python - tenslisi
http://dev.stephendiehl.com/numpile/
======
Erwin
Interesting. I currently use Python's AST to convert some nested logical query
expression (in a syntax unique to my application) into bytecode executed by a
specialized VM (I originally tried using V8 and LuaJit for this but
performance wise that was unsuccessful; the project replaced some old
Boost::Python C++ code). This article should make it easy to get started
attempting an LLVM replacement.

~~~
travisoliphant
Yes, LLVM is a great approach for doing code-gen from arbitrary specialized
VMs. And the Python interface to it makes it easy to experiment. We no longer
use llvmpy for Numba (we use a simpler interface llvmlite) and so llvmpy could
use a maintainer.

------
walkamages
An excellent article! I had wanted to get back into some python recently after
seeing the changes in 3.4, I had also wanted to become more familiar with
LLVM, and this does both.

~~~
gtirloni
there is this discussion (flame war?) about python3 not bringing too many
benefits. i haven't made my mind yet. could you elaborate what you saw in 3.4
that was nice?

~~~
exDM69
Simply put: it's a better language. The whole "discussion" is whether it makes
sense to migrate since many parts of the ecosystem (many important libraries
and frameworks) have not made the transition. And many distros ship Python 2
by default, Python 3 is optional. Python 3 only is not feasible.

To me, the killer feature is better lazy evaluation (generators). In
particular, important builtins like map, filter, zip, enumerate, etc are
generators, instead of returning lists. This makes it feasible to write things
like

    
    
        (process(line) for line in map(str.upper, open('giantfile.txt')) if line.lstrip()[0] != '#')
    

Some of the above can also be done with itertools package in Python 2, but not
everything.

Python 3.4 changelog is here, it contains e.g. asynchronous io facilities
(asyncio module):
[https://www.python.org/downloads/release/python-342/](https://www.python.org/downloads/release/python-342/)

edit: added enumerate() in the example above, for line in open(filename)
returns a generator in Python 2.x too.

edit2: enumerate is lazy in python2, I replaced it with map(str.upper)

~~~
halflings
The example you gave works perfectly in Python 2.7 (would also be a generator,
and you're not using map filter or else); but I agree: those should've been
generators from day 1, especially zip and enumerate since they make more
elegant code but often come with a performance overhead in Python 2.7

~~~
maxerickson
Python 2 enumerate returns an 'enumerate' object that is more or less a light
weight wrapper of the sequence that was passed in.

Generators provide a convenient syntax to implement that sort of object.

~~~
exDM69
D'oh. I put in a map(), that returns a list.

------
travisoliphant
This is a great tutorial about first-generation Numba. The author learned a
lot about LLVM and llvmpy while working with several of our devs. If you are
interested in the "Further work" in his article, come join the Numba project.

~~~
tadlan
What is second generation numba? And any plans to branch numba out of pure
numeric application s?

------
jonstewart
I really appreciate the length and detail in this blog post. It's
comprehensive, not just showing off.

~~~
ch0wn
Stephen Diehl continues to blow my mind on a regular basis. His latest work in
progress "Wrote You a Haskell"[0] is also worth keeping an eye on. I've worked
through the first couple of chapters and it's fantastic.

[0] [http://dev.stephendiehl.com/fun/](http://dev.stephendiehl.com/fun/)

------
illumen
Very nice article! :)

Storing types via traces could be another step for gathering types. As well as
using the more advanced static type checking code that is around for python.

Now I have something to work through on the weekend. Looking forward to part
2!

~~~
travisoliphant
The numba code-base implements quite a bit of this. We actually moved away
from the AST approach and went back to the byte-code approach because the AST
approach quickly becomes unwieldy as the number of Visitors that you apply
grows. Compile times are also slower.

The author is definitely helping people learn about LLVM and how it can be
used with Python --- which is great, because this is exactly what Numba is:
[http://numba.pydata.org](http://numba.pydata.org). But, please don't start
another "Numba". Just come help us improve the current one.

------
wedesoft
I did something similar in Ruby but using GCC as a "JIT" compiler for image
processing (software [1], thesis [2]). I can really recommend JIT compilation
for doing array processing.

[1] [http://www.wedesoft.de/hornetseye-
api/](http://www.wedesoft.de/hornetseye-api/) [2]
[http://www.wedesoft.de/downloads/thesis_wedekind.pdf](http://www.wedesoft.de/downloads/thesis_wedekind.pdf)

EDIT: In my approach I didn't go through the Ruby AST though. Rather I used
the approach of injecting "GCCVariables" which emit C code instead of doing
the actual computation.

~~~
chrisseaton
You should submit your thesis to the Ruby Bibliography
[http://rubybib.org](http://rubybib.org)

~~~
wedesoft
Ok, will do. Cheers :)

------
ericfrederich
I love Python but when I use a statically typed language like C, Rust, Go, etc
I really feel that it is missing from Python.

I'd love to see a new language exactly like Python but compiled and statically
typed. Something similar to Cython, but rather than generating a bunch of C
code it would target LLVM. Additionally it would be able to generate pure
Python code simply by removing any typing syntax.

------
cyberneticcook
Could this be used to ahead-of-time compile Python code ? I'm more interested
in getting to a native executable or library.

