
JIT compiling a subset of Python to x86-64 - ksaua
https://csl.name/post/python-compiler/
======
benhoyt
Oh, very interesting! This is somewhat similar to a hack project I did
recently, but more dynamic -- going from bytecode and doing the assembly at
runtime amps things up a bit! Mine works from a Python AST and is much more
static, but it also supports quite a lot more Python syntax.
[http://benhoyt.com/writings/pyast64/](http://benhoyt.com/writings/pyast64/)

~~~
csl
Indeed I had already linked to your really cool project at the end. Perhaps I
should have made it stand out more. The AST approach is more stable, as the
Python bytecode can change between any release while the AST changes in a
slower, evolutionary pace.

------
sandGorgon
The future of this approach is Java 9 Truffle + Graal language compiler.

Already a substantial amount of work has gone into making Ruby, R, node and
Python to work.

This is the Python implementation -
[https://github.com/securesystemslab/zippy](https://github.com/securesystemslab/zippy)

[http://chrisseaton.com/rubytruffle/jokerconf17/](http://chrisseaton.com/rubytruffle/jokerconf17/)

~~~
csl
Truffle + Graal looks awesome. I looked at some old ZipPy slides (JIT compiler
for Python) here:
[http://socalpls.github.io/archive/2013nov/slides/zippy.pdf](http://socalpls.github.io/archive/2013nov/slides/zippy.pdf)

They have an example where this code is compiled:

    
    
        def sumitup(n):
            total = 0
            for i in range(n):
                total = total + i
            return total
    

It was optimized quite well, but still had loops. I know this is a lot to ask,
but I would have expected it to be possible to specialize it to a loopless
variant:

    
    
        def sumitup(n):
            if n < 0:
                return 0
            else:
                return n*(n-1) // 2
    

Clang 4+ actually finds this optimization, but gcc and icc doesn't seem to:
[https://godbolt.org/g/v4zhrm](https://godbolt.org/g/v4zhrm)

That is, the assembly generated by clang seems to be equivalent to

    
    
        if n <= 0:
            return 0
        else:
            return ((n-1)*(n-2) >> 1) + (n-1)
    

which might even run faster on the CPU, due to the speficic code emitted.

~~~
sandGorgon
you should file this bug on the graal vm directly.
[https://github.com/graalvm/graal](https://github.com/graalvm/graal)

------
wallnuss
The blog is fun introduction into JIT compiler, but the big thing that is
missing and that makes Python a hard problem for JIT is supporting Objects.
From [http://blog.kevmod.com/2017/02/personal-thoughts-about-
pysto...](http://blog.kevmod.com/2017/02/personal-thoughts-about-pystons-
outcome/)

    
    
        I'll try to just talk about what makes Python hard to run quickly (especially as compared to less-dynamic languages like JS or Lua).
    
        The thing I wish people understood about Python performance is that the difficulties come from Python's extremely rich object model, not from anything about its dynamic scopes or dynamic types.  The problem is that every 
        operation in Python will typically have multiple points at which the user can override the behavior, and these features are used, often very extensively.  Some examples are inspecting the locals of a frame after the frame 
        has exited, mutating functions in-place, or even something as banal as overriding isinstance.  These are all things that we had to support, and are used enough that we have to support efficiently, and don't have analogs in 
        less-dynamic languages like JS or Lua.

~~~
nerpderp83
Newer versions of Python have an in-band way of detecting if an object has
been overridden. Foreseeably this could have been done completely internal to
the VM. Defaulting to closed and requiring an explicit step to override
behavior would have made optimizing Python far easier.

------
smortaz
Related:
[https://github.com/Microsoft/Pyjion](https://github.com/Microsoft/Pyjion)

~~~
int_19h
Also:
[https://www.python.org/dev/peps/pep-0523/](https://www.python.org/dev/peps/pep-0523/)

~~~
UncleEntity
A couple years ago I was toying around with libjit in python, wrote some
bindings, traced down (most of) the seqfaults and then got distracted by other
things. Was going to dust it off to try to hook into pep-0523 but totally
forgot about it until you posted that link.

[https://github.com/eponymous/libjit-
python](https://github.com/eponymous/libjit-python)

~~~
csl
What is your impression of libjit? It looks really nice. I've only tried GNU
Lightning, and I made Python bindings for it (although, today I would have
used cffi to interface with it instead of my ctypes approach, because of the
header files):
[https://github.com/cslarsen/lyn](https://github.com/cslarsen/lyn)

I'd love to see a comparison of the various JIT libraries. I guess both give
you optimizations and register allocation for free. Of course, the JITting
speed is quite important as well.

~~~
UncleEntity
It seemed pretty solid though I didn't do anything too complicated with it.

Mostly what messing around with it did was taught me that I had a _lot_ to
learn about compiler construction which is what I've been slowly doing since
then.

------
nerdponx
This was a great intro to how just-in-time compilation works, thank you!

------
denfromufa
[https://github.com/Maratyszcza/PeachPy](https://github.com/Maratyszcza/PeachPy)

------
jjawssd
My favorite Python JIT is Numba

