
Show HN: Cannoli – A compiler for a subset of Python written in Rust - joncatanio
https://github.com/joncatanio/cannoli
======
joncatanio
I recently finished the code for my thesis and wanted to share with you all
:). The goal of the thesis was to evaluate language features of Python that
were hypothesized to cause performance issues. Quantifying the cost of these
features could be valuable to language designers moving forward. Some
interesting results were observed when implementing compiler optimizations for
Python. An average speedup of 51% was achieved across a number of benchmarks.
The thesis paper is linked on the GitHub repo, I encourage you to read it!

This was also my first experience with Rust. The Rust community is absolutely
fantastic and the documentation is great. I had very little trouble with the
"learning curve hell" that I hear associated with the language. It was
definitely a great choice for this work.

I also included PyPy in my validation section and "WOW". It blew both Cannoli
and CPython out of the water in performance. The work they're doing is very
interesting and it definitely showed on the benchmarks I worked with.

~~~
sandGorgon
I still don't understand why Pypy hasn't been adopted by Google or Dropbox
(the standard bearers of the Python ecosystem) as a forward looking
investment. It is constantly underfunded
([https://pypy.org/py3donate.html](https://pypy.org/py3donate.html)) and given
the potential for the work that's happening, I don't understand why these guys
don't write cheques for a few hundred k.

~~~
sanxiyn
I am hoping Facebook to fund PyPy given that Instagram runs on Python.

It seems Google and Dropbox are not interested. Google is working on Grumpy,
Dropbox worked on Pyston.

~~~
stevekemp
It seems that Instagram is making tweaks here and there, as reported in this
weeks LWN for example:

[https://lwn.net/SubscriberLink/754163/a38214c50e7b3ece/](https://lwn.net/SubscriberLink/754163/a38214c50e7b3ece/)

To be honest I barely use python, but when I read lwn, etc, I get the
impression multiple people are tryign to solve multiple problems with python
(from the GiL onwards).

It seems like a hard problem, given the dynamic nature of the language and the
unwillingness to break the C API.

~~~
sandGorgon
thanks for linking to that!

> _Thomas Wouters asked if he had looked at PyPy. Shapiro said the company
> had, but there was only a modest bump in performance for its workload. He
> was not the one who did the work, however. Wouters noted that PyPy is more
> than "Python with a JIT" because it has its own data model as well._

This is interesting. How much was a "modest bump" in performance ? And why was
the bump in performance not a reason for adoption ? Does Pypy break a lot of
stuff ?

Oh and this

> _Some of what Shapiro presented did not sit well with Guido van Rossum, who
> loudly objected to Shapiro 's tone, which was condescending, he said. Van
> Rossum thought that Shapiro did not really mean to be condescending, but
> that was how he came across and it was not appreciated. The presentation
> made it sound like Shapiro and his colleagues were the first to think about
> these issues and to recognize the inefficiencies, but that is not the case.
> Shapiro was momentarily flustered by the outburst and its vehemence, but got
> back on track fairly quickly._

> _Shapiro 's overall point was that he felt Python sacrificed its performance
> for flexibility and generality, but the dynamic features are typically not
> used heavily in performance-sensitive production workloads. So he believes
> it makes sense to optimize for the common case at the expense of the less-
> common cases. But Shapiro may not be aware that the Python core developers
> have often preferred simpler, more understandable code that is easier to
> read and follow, over more complex algorithms and data structures in the
> interpreter. Some performance may well have been sacrificed for
> readability._

------
taoistextremist
This isn't a really important question, but why the name Cannoli? I feel like
you missed an opportunity here to call it "PycRust". (c standing for "compiled
to" of course)

~~~
jrs95
I think I read this as "Pike Rust" about 5 times before I got the joke. I
guess it's time for that 2PM coffee.

~~~
tothrowaway
Took me a minute too. "pie crust".

------
harrisreynolds
Nice work Jon! The cannoli logo is great!

Spun up a quick dashboard of the project here: [https://chart.ly/github-
dashboard/joncatanio/cannoli](https://chart.ly/github-
dashboard/joncatanio/cannoli)

Not tons of revelations there, but cool to see your longest streak was 7 days
straight committing to the repo. Also cool to know this is part of your
thesis.

What are your plans after Cal Poly?

~~~
joncatanio
This is very cool! Thanks for doing that :)!

I'm actually moving out to NYC this July to work for Major League Baseball.
The Advanced Media division (MLBAM). I'll be doing some software engineering
there, mainly API work for various apps, I'm very excited about it!

I'll have to work on compilers in my free time haha, I really enjoyed the work
I did on this thesis.

------
alex_g
This is awesome! I definitely recommend reading through Jon's thesis (link on
GitHub). It's well written and very readable even if you know nothing of Rust
or compilers.

------
ufo
How was your experience using Rust as a target language (instead of C)? I
understand that Rust has lots of features for when you want to write code by
hand but do those also help when you are working with generated code? Or does
the borrow checker get in the way all the time?

~~~
joncatanio
Great question! The "Compiling Python" section of my thesis is pretty much an
explanation of how I had to translate elements of Python into Rust _because
of_ the borrow checker. There were a couple tricks (like using closures for
functions) to getting around compile-time borrow checking. Some situations
required the use of Rc & RefCell to provide multiple references to mutable
data, this defers borrow checking to run time. So yes, the borrow checker got
in the way. But I didn't have to write a garbage collector because the
automatic memory management was handled via Rust's ownership rules (the caveat
here is with cyclical references which would need to be tracked, this work was
omitted for time).

It does complicate the generated code, I don't know if Rust is the greatest
intermediate representation. But I do think it was a better choice than C.
Debugging the generated code was so great because of the detail that the Rust
compiler displays for warnings/errors.

I'd be interested in seeing how a Python interpreter written in Rust would
compare to CPython, this would probably make use of more Rust optimizations
(than trying to generate code).

~~~
ufo
Ah, I hadn't realized that Cannoli is also using Rust-style memory management.
In that case compiling to Rust would certainly help a lot.

------
tathougies
Interesting project... Why python, out of curiosity?

~~~
joncatanio
I've used Python quite a bit for various projects. For a compilers class I
wrote a compiler in Python and had a blast. So I spoke with that advisor and
decided I wanted to get a Master's and he had suggested a project that
analyses Python. The main question concerned which dynamic features of a
language cause performance issues. Python just happened to have a lot of the
features that we hypothesized caused slowdowns so we chose it. Plus we were
both familiar with the language so that was a draw.

The same analysis could be done on JS or Ruby, it would be cool to see if a
similar compiler would yield the same performance results for restricting
features in JS/Ruby. It would also validate this work nicely as well.

------
gabcoh
The name of the thesis this repo is a part of

> Leave the Features: Take the Cannoli - Jonathan Catanio

That's pretty good

------
Beltiras
Leave the features - take the cannoli.

Made me laugh out loud.

------
alexnewman
Skimmed it. They spent a lot of time flushing printing to the screen

