Hacker News new | comments | ask | show | jobs | submit login
Bitey: Import LLVM bitcode directly into Python (github.com)
177 points by mace on Aug 9, 2012 | hide | past | web | favorite | 34 comments

I very much like the concept. But I dislike the "magic" feel of overloading "import". Especially as it creates namespace collisions, eg when you have python module and object file with same name. Imho something like "bitey.import_obj(name)" would have been nice and more clear.

I wonder how complicated it would be to parse header files to populate field names of structs automatically? Maintaining separate .pre.py and .h files seems like recipe for trouble.

Neither do I, but to be fair to the author this particular magic is encouraged in Python (see http://www.python.org/dev/peps/pep-0302).

To parse headers for field names would require a complete C preprocessing and parser. That wouldn't be a problem for this author (who wrote a very popular parser generator for Python), but it still wouldn't be perfect until it completely replicated the system compiler's behaviour with respect to system headers (consider conditional compilation). It is particularly annoying if the host and target systems are different, i.e. in cross compilation. I've tried this exact thing (header parsing to get type information) and it is quite a pain to get it right.

Maybe you could use LLVM/Clang to parse the headers. That way at least the parsing would be consistent with the compilation. I don't know how difficult it would be to access such type information, but I thought the whole point of LLVM/Clang was to be usable as a library.

The collision is already happening between .so and .py files, so it's not a new issue.

I'll be interested to see how this compares to numba and how easy it is to fit it into the scientific python ecosystem. I'm also curious about performance versus numba and my current go to solution, cython.

What about performance? Is there any benchmark for a function compared with pure python and C implementation?

You can find one in the mandelbrot example.


It would be nice if it was compared to pure c implementation too, to measure the overhead.

"I call the big one Bitey."

First thing that came to my mind :-)

Very cool ! Can't wait to test it ! Why exactly no C++ support ?

It would be hard in the presence of parametric polymorphism (multiple functions having the same name but different numbers and types of parameters), as well as the difference between C++'s and Python's method resolution semantics (in Python, for instance, everything is virtual).

Also, templates.

Also, a bunch of other things.

That's not parametric polymorphism, that's function overloading. The former makes it possible to use the same function implementation on different types, whereas the latter is about using the same name for different functions.

Thanks for the reply. I just spent the last 3-4 days hacking around osgswig, having a good (and simple) binding solution for C++ / python is still an open problem ...

And obj-c? Also hard?

You'd have to shoehorn the obj-c runtime in there somehow. Almost definitely not worth the effort. What would you want to run in this way?

While the mapping between C and LLVM IR is rather straightforward, things are more complex with C++. C++ constructs get dismantled to be compiled to IR, and the binding generator would have to restore all the C++-ish information from metadata and type info, which isn't easy.

It seems cool but why would you use it instead of having a c lib (or c obj file) interfaced with ctypes or swig ... ? Maybe I miss what LLVM brings ? Thanks

I addition to cdavid's comment: LLVM IR is system independent. So if you had a deployment over heterogeneous machines you could write the extension code once and have it run everywhere without recompilation.

Fair enough, that was too strong a claim. But it is more portable than an extension module compiled with the system compiler.

It's not portable enough to let me take IR generated for my x86 laptop and run it on my arm board, in general, even though both platforms are ILP32, little-endian, and running the same OS.

Yes, but this is a problem with the source language and/or the host system libraries, not with LLVM IR itself. There is a broad domain of applications for which it is portable.

I was confused about this too, but I think the idea is that it's easier to infer type information from the bitcode than from raw object code.

more closely related to ctypes than swig.

swig requires glue code (.i files) to be written and generated, then compiled. ctypes can take a system native library (.so or .dll) and access it.

bitey, on the other hand, uses platform-neutral llvm bytecode. imagine ctypes but platform neutral. that's bitey.

pretty darned cool.

once you can import llvm code at runtime, you are pretty close to injecting llvm code at runtime: you write some LLVM IR in a string, and "llvm_eval" it.

So can you use this with C functions compiled normally with gcc? It seems like that would be even more useful.

What difference would it make? To my understanding, Clang is fully compatible with gcc.

I guess I mean can you use this with C instead of LLVM?

Your terms are off. C is compiled using LLVM (clang) into an intermediate form that is compatible across platforms (LLVM IR).

(AFAIK, please correct me if I'm wrong)

No. You have to use llvm.

I could be completely wrong here, but doesn't this completely bypass the optimizer that makes C so fast?

LLVM performs its optimization passes on the LLVM bitcode itself (the "middle-end" of the compiler), before finally translating the optimized bitcode into machine-specific binary code.

Not an LLVM expert though, I could be glossing over a few details.

You're correct. LLVM includes a pass manager that performs extensive optimizations.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact