
Unladen Swallow Retrospective - jnoller
http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html
======
malkia
Somewhat relevant to the topic - Mike Pall discusses (2009) the usage of llvm
for lua - he's not ditching the approach, just pointing out the difficulties,
and why he took on making luajit the way it is:

<http://lua-users.org/lists/lua-l/2009-06/msg00071.html>

~~~
kingkilr
A big one Mike misses (because it isn't as critical for Lua AFAIK) is escape
analysis. For PyPy our aggressive escape analysis can bring huge wins on
things like numeric code and some types of string processing, but no static
compilers really do this type of optimization, because it's not very important
in C-type languages.

~~~
haberman
Out of curiosity what makes this such a big win for Python, but not so much
for Lua or C-type languages?

~~~
kingkilr
I honestly don't know why it's not such a big deal for Lua, but given LuaJIT's
performance, and knowing that it doesn't do escape analysis I know it must not
be a big deal :) In Python it's a big deal because everything is boxed, and
your inner loops just get bogged down with allocations, which are expensive
compared to arithmetic operations.

~~~
beagle3
LuaJIT being a trace compiler, escape analysis in implicit. As long as a value
stays (say) int or float, the code to handle it stays int or float. If for
whatever reason it switches type, the code will change to accommodate that --
but as long as the types (and/or values) stay the same, the machine code to
handle them will take advantage of that.

~~~
kingkilr
It's absolutely not true that escape analysis is implicit in tracing:
[https://bitbucket.org/pypy/extradoc/raw/63e4617062b2/talk/pe...](https://bitbucket.org/pypy/extradoc/raw/63e4617062b2/talk/pepm2011/bolz-
allocation-removal.pdf)

------
jnoller
See also Collin's comment on the post:
[http://qinsb.blogspot.com/2011/03/unladen-swallow-
retrospect...](http://qinsb.blogspot.com/2011/03/unladen-swallow-
retrospective.html?showComment=1301186609242#c5054925394205960815)

------
rayiner
The points about LLVM being designed for "static C-like languages" isn't
totally on-point. There as an impedance mismatch between Python and LLVM, but
it's less about dynamic versus static than it is about the nature of the stack
frame.

In LLVM, most optimizations operate on SSA values (the Value class in the LLVM
IR). There is some support for CSE-ing loads, etc, but the effectiveness of
that depends heavily on your alias information. So to get good optimization
out of LLVM, you've got to represent things as Value's.

This is hard to do in Python. Python's lexical scoping semantics are a little
bit wonky and there are lots of scenarios in which the thread's stack frame is
reified as an object. So without some heavy analysis, you end up keeping your
local variables in an activation frame object on the heap, and at that point
you're toast. The local variables themselves won't be represented as SSA
Value's, and most of the LLVM optimizations won't do anything with them.

This is not a "dynamic language" thing per se. Lisp, which is also a dynamic
language, actually maps quite cleanly to LLVM. Every binding introduced by LET
is already in SSA form unless it is either closed-over, assigned-to, or both.

1) If the value is just closed-over, you demote it to the function's
environment and replace uses of the variable with a load from the environment
vector.

2) If the value is just assigned-to, you just demote it to a stack slot via
ALLOCA and LLVM's mem2reg pass will take care of re-promoting it to a Value.
This latter technique is exactly what Clang does for all C local variables, so
LLVM is highly-tuned for handling this scenario. In C, variables that have
their address taken cannot be promoted, but this cannot happen in Lisp so
every assigned-to value demoted to an ALLOCA should be promotable.

3) If a value is both assigned-to and closed over, you demote it to a heap-
allocated box and replace all uses with heap references.

After this transformation, nearly every Lisp variable is an SSA Value, and the
optimizers can work with them. Even if you use function calls to do generic
arithmetic, etc, LLVM will happily CSE them for you as long as you mark those
functions as readnone (ie: pure).

Now, there are some things LLVM won't do for you. It can't const-propagate
generic arithmetic because it does't know the semantics. It can't do
reassociation, etc, because you're not using the ADD/SUB, etc instructions. I
don't see anything that would prevent you from doing it yourself in a custom
pass, however.

In short, the criticism isn't so much that LLVM has an impedance mismatch with
dynamic languages as it is that it only handles the bottom of the optimization
stack. You still need to do the high-level language-specific optimizations
before handing things to LLVM.

~~~
beza1e1
I think the deeper problem is not that stuff like arithmetic is done by
function calls, but that it is done by method calls. This requires points-to-
analysis and a lowering to function calls first. Since this analysis will be
very conservative for performance reasons, you want to steal more tricks.
Tracing seems to be en vogue today.

~~~
rayiner
Global type inference (which is what you'd be doing if you used a points-to
analysis to figure out what methods would be called) is a fragile
optimization.

Runtime type feedback (observing the actual type at call sites and doing
optimistic compilation based on that) is more robust and predictable (your
points-to analysis doesn't suddenly fail just because there is a single
polymorphic use somewhere in the code). That in itself is orthogonal to
tracing. Runtime type feedback was developed on whole-method compilers, after
all. Tracing is neat, and it's in vogue because it gets you to a "good"
optimizer with relatively little development cost, but you can do all the
things tracing does in a whole-method compiler with more engineering work. On-
stack replacement (necessary for doing optimistic compilation) is arguably
easier in a tracing compiler, and is definitely the major thing missing in
LLVM for dynamic language implementations.

But all of this is one step beyond the threshold problem. Even all these
tricks don't help you if you can't reify your local variables as SSA values
--- all that fancy LLVM machinery won't see anything to optimize.

------
mitchellh
The blog post touched on it briefly with regards to PyPy but I wanted to ask
here since I'm sure someone knows a lot more than I do on this. I didn't
follow unladen swallow very closely, but it seemed like an exciting project
from the get-go. Its sad to see it go, but it appears its not in vain.

I want to ask: What lasting effects did the work that went into Unladen
Swallow have on any Python implementations (or any other languages)? What
"legacy" was left?

~~~
kingkilr
The primary legacy of Unladen Swallow (this was something that everyone agreed
on at the VM summit at PyCon) were: a) improvements to LLVM for all users of
its JIT, things like valgrind and GDB integration, lifting that 16 MB of
machine code, and fixing various bugs, b) developing a cohesive set of
benchmarks for the Python community, and c) drawing more attention to the idea
of a fast Python.

~~~
cdavid
B is especially important for python people - it is my understanding that a
lot of pypy current benchmarks are coming from unladen swallow.

------
tianyicui
> Our potential customers eventually found other ways of solving their
> performance problems that they felt more comfortable deploying.

Just curious, what are the "other ways"?

------
nikcub
_"Most Python code at Google isn't performance critical."_

What about AppEngine? The primary 'cloud' platform at Google.

~~~
jjm
You would think that such a product would in fact be an excellent candidate
for high performance but I believe that group strives more toward process
scalability than any particular low level optimization.

------
jbarham
"Our potential customers [within Google] eventually found other ways of
solving their performance problems that they felt more comfortable deploying."

I wonder how much this includes teams within Google that have since started
using Go for writing performance sensitive applications...

~~~
jnoller
Not very many, as far as I've been told.

------
18pfsmt
Are we talking about an african or european swallow?

~~~
rbanffy
It's definitely unfair to mod this down. It's a very pertinent question when
Python is discussed.

~~~
rbanffy
sigh... There was a time most HNers would get the joke.

