Hacker News new | past | comments | ask | show | jobs | submit login
Mapping Python to LLVM (exaloop.io)
121 points by arshajii on Jan 9, 2023 | hide | past | favorite | 32 comments



A major type-incompatibility not mentioned in the linked blog post is this, from[1],

* Strings: Codon currently uses ASCII strings unlike Python's unicode strings.

Judas priest, after all the effin' grief we went through to learn how to handle Unicode strings in Python 3, and to finally begin to realize their value, you take this step backward? Forget the i64 limits, the lack of native Unicode strings is a flat deal-breaker. (For example, will Codon warn if it sees an "open(encoding='UTF8')" call? Or a normal open(mode='rt') if the default local encoding is UTF8?)

It doesn't help that the same doc also mentions that

* Dictionaries: Codon's dictionary type is not sorted internally, unlike Python's

Current Python dicts are not "sorted"; rather they "preserve insertion order, meaning that keys will be produced in the same order they were added sequentially over the dictionary."[2]

This is new functionality added only recently (3.7) so its lack would not inconvenience a lot of existing code. OTOH, why did they not plan to reproduce this useful feature from the start?

Possibly they were thinking of the pypi package SortedContainers[3]?

[1] https://docs.exaloop.io/codon/general/differences

[2] https://docs.python.org/3/reference/datamodel.html#index-30

[3] https://pypi.org/project/sortedcontainers/


Breaking compatibility from the current spec/functionality of python should be a definite no-no for any implementation. That being said, I can still appreciate that they didn't try to write their own busted unicode implementation since many other ones have contributed to security issues.


I don't understand why you say that. Like, it's gonna cost them in adoption to diverge, they don't need a lecture to understand that, they are doing what meets their needs and sharing it.


The implementation is optimised for genomics, thats why.


The type conversion assumptions here are real problematic. "64 bits ought to be enough for anybody"-style statements ignore integers as bitfields, large constants (eg Avogadro's number), any kind of math with large intermediate terms, all kinds of stuff.

Makes me very suspect about the rest of this project when they try to glide past all of these issues with nary a mention.


> There are many things we took for granted here, like how we determine the data types to begin with, or how we put the source code in a format that’s suitable for code generation. These, among other things, will be topics of future posts in this series. Stay tuned!

I don't feel they are trying to glide past anything. It's the first post in the series about a product in 0.x state, it's gotta start somewhere other than perfection and they seem to know that.


https://docs.exaloop.io/codon/general/differences

Looks like you can use bigger integers and they're very explicit about it not being a drop-in replacement for Python


I recall that Google had a project to compile Python to LLVM (Unladen Swallow @ https://code.google.com/archive/p/unladen-swallow/), but work stopped on it a long time ago.

If I recall it really wasn't that much faster than CPython given the overhead, but it's been a long time; if it was faster I assume it wouldn't have been abandoned.


I think the main difference is that this doesn’t purport to be a drop in replacement. also, they seem to be doing some multithreading.


Quite. Unladen Swallow was unfortunately a failure, in part because LLVM at the time was quite buggy, and in part because LLVM wasn't (isn't?) magic enough to speed up a dynamic language.

The blog post here mentions they do their own optimization passes, before handing over to LLVM. I imagine that's pretty important.


LLVM really wasn't that buggy at the time (circa 2009); the project I was using it for at the time, a .NET compiler that targeted video game consoles, was quite stable from a code generation point of view, and we were shipping games with it.


Ah, that's cool. Thanks for the correction. I was misremembering the Unladen Swallow retrospective[1]. It's fair to say they used a lot fo their available time contributing to LLVM, but it sounds like that was feature work, not bug focused.

[1] https://qinsb.blogspot.com/2011/03/unladen-swallow-retrospec...


Codon is very impressive, it feels a lot like Python without being slow like Python.

Don’t think of it as a Python compiler, it is its own language. (Esp. re choice of int == i64, this saves SO MUCH computation for the CPU.)

I will say though that I’m not sure where to use it yet, since it’s too immature for important projects and also aims at the “we need a nuclear bomb” level performance.


> How can I use Codon for production or commercial use?

> Please reach out to... to inquire about about a production-use license.

Having "contact us" pricing with several incompatibilities makes this pretty hard to consider in a commercial environment. I wish they had a public pricing structure.

From their faq: https://docs.exaloop.io/codon/general/faq.


I agree, I'm curious enough to try it but not gonna bother if I have to email someone just to get an idea of how much it costs to use in real code.


> int can become an LLVM i64

You can be efficient without sacrificing correctness; using LLVM shouldn’t mean “throw out arbitrary-precision semantics”.


I don't know how codon does this. But I always supposed that existing optimized pythons like pypy map integer operations to native types and promote them to arbitrary precision when they encounter overflow. It's IMO a similar problem to "but what if someone decided to overwrite Int.__add__ with some other function?" - arguably these are weird/bad things to do but AFAIK permitted by the language semantics. So to fix problems like these you just make it work for the paranoid case and implement optimizations that rely on that not being common. When the weird behavior is detected you fall back to the slower path.


How much does this differ from PyPy's RPython in terms of how the language is restricted?


While RPython is a restricted version of python used to build the PyPy python interpreters, the interpreters themselves are not restricted. Any deviation from CPython behavior, intended or not, is considered a bug. So Codon should be compared to the PyPy python interpreter, not to RPython. The advantage to writing the interpreter in RPython rather than C (CPython) or pre-compiling python code to LLVM IR and from there to creating and executable (Codon), is that RPython comes with a metaJIT (which can generate a JIT) and a mark-and-sweep garbage collector for any interpreter built on top of it.


kind of like numba, isn't it ?


See https://docs.exaloop.io/codon/general/faq

"While Codon does offer a JIT decorator similar to Numba's, Codon is in general an ahead-of-time compiler that compiles end-to-end programs to native code. It also supports compilation of a much broader set of Python constructs and libraries."


Super neat that it has an FFI as well


How good is Coden's performance compared to Cyphon and mypyc?


Very well written article. Delves into some details of things like exception handling semantics without going too far in the weeds. Thanks for sharing.


Have to assign variables to a bit of memory on the stack because SSA??


LLVM has an optimization pass that takes care of that.

https://llvm.org/docs/Passes.html#passes-mem2reg


That's not the only way to do that tho? Or is that the recommended way to do it?


You can write your code generator to produce the optimized output right away, of course. But the whole point of LLVM is to not have every compiler worry about doing stuff like this well.


How does Codon leverage the lessons learned from Unladen Swallow?


Looks like it does it by not being Python, but rather Python-ish.


The LinkedIn href is wrong in exaloop.io website


A very clean and beautiful blog design.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: