Hacker News new | comments | show | ask | jobs | submit login
JIT compiling a subset of Python to x86-64 (csl.name)
128 points by ksaua on Nov 21, 2017 | hide | past | web | favorite | 18 comments



Oh, very interesting! This is somewhat similar to a hack project I did recently, but more dynamic -- going from bytecode and doing the assembly at runtime amps things up a bit! Mine works from a Python AST and is much more static, but it also supports quite a lot more Python syntax. http://benhoyt.com/writings/pyast64/


Indeed I had already linked to your really cool project at the end. Perhaps I should have made it stand out more. The AST approach is more stable, as the Python bytecode can change between any release while the AST changes in a slower, evolutionary pace.


I tried writing a bytecode optimizer a few months ago, and actually got some basic inlining and constant propagation and other such optimizations working, but then realized it's gonna be fairly useless since bytecode changes between Python versions and isn't necessarily even going to be consistent across different Python implementations. So I abandoned that project and started working on an AST-based optimizer. Sadly I actually deleted that project (which is otherwise not something I normally do) as a means to get myself to stop sinking more time into it, but now that means I still have to mentally figure out some of the techniques I came up with again if I want the AST-based version to go anywhere...


The future of this approach is Java 9 Truffle + Graal language compiler.

Already a substantial amount of work has gone into making Ruby, R, node and Python to work.

This is the Python implementation - https://github.com/securesystemslab/zippy

http://chrisseaton.com/rubytruffle/jokerconf17/


Truffle + Graal looks awesome. I looked at some old ZipPy slides (JIT compiler for Python) here: http://socalpls.github.io/archive/2013nov/slides/zippy.pdf

They have an example where this code is compiled:

    def sumitup(n):
        total = 0
        for i in range(n):
            total = total + i
        return total
It was optimized quite well, but still had loops. I know this is a lot to ask, but I would have expected it to be possible to specialize it to a loopless variant:

    def sumitup(n):
        if n < 0:
            return 0
        else:
            return n*(n-1) // 2
Clang 4+ actually finds this optimization, but gcc and icc doesn't seem to: https://godbolt.org/g/v4zhrm

That is, the assembly generated by clang seems to be equivalent to

    if n <= 0:
        return 0
    else:
        return ((n-1)*(n-2) >> 1) + (n-1)
which might even run faster on the CPU, due to the speficic code emitted.


you should file this bug on the graal vm directly. https://github.com/graalvm/graal


About zippy. Licence says :

Copyright (c) Regents of the University of California and individual contributors.

On the wiki home page, we have :

Author Wei Zhang, Facebook, Inc. Mohaned Qunaibit, University of California Irvine

So basically, this is half owned by FaceBook... Right ?


The blog is fun introduction into JIT compiler, but the big thing that is missing and that makes Python a hard problem for JIT is supporting Objects. From http://blog.kevmod.com/2017/02/personal-thoughts-about-pysto...

    I'll try to just talk about what makes Python hard to run quickly (especially as compared to less-dynamic languages like JS or Lua).

    The thing I wish people understood about Python performance is that the difficulties come from Python's extremely rich object model, not from anything about its dynamic scopes or dynamic types.  The problem is that every 
    operation in Python will typically have multiple points at which the user can override the behavior, and these features are used, often very extensively.  Some examples are inspecting the locals of a frame after the frame 
    has exited, mutating functions in-place, or even something as banal as overriding isinstance.  These are all things that we had to support, and are used enough that we have to support efficiently, and don't have analogs in 
    less-dynamic languages like JS or Lua.


Newer versions of Python have an in-band way of detecting if an object has been overridden. Foreseeably this could have been done completely internal to the VM. Defaulting to closed and requiring an explicit step to override behavior would have made optimizing Python far easier.




A couple years ago I was toying around with libjit in python, wrote some bindings, traced down (most of) the seqfaults and then got distracted by other things. Was going to dust it off to try to hook into pep-0523 but totally forgot about it until you posted that link.

https://github.com/eponymous/libjit-python


What is your impression of libjit? It looks really nice. I've only tried GNU Lightning, and I made Python bindings for it (although, today I would have used cffi to interface with it instead of my ctypes approach, because of the header files): https://github.com/cslarsen/lyn

I'd love to see a comparison of the various JIT libraries. I guess both give you optimizations and register allocation for free. Of course, the JITting speed is quite important as well.


It seemed pretty solid though I didn't do anything too complicated with it.

Mostly what messing around with it did was taught me that I had a lot to learn about compiler construction which is what I've been slowly doing since then.


I had super high hopes for this because they have a solid PoC and are taking a really neat, pragmatic approach.

Sadly it looks like it's stalled :(


This was a great intro to how just-in-time compilation works, thank you!



My favorite Python JIT is Numba




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: