
A supercompiler pass for Erlang - bryanrasmussen
https://github.com/fenollp/erlscp
======
fenollp
Author of the repo here.

Just wanted to say that 99% of this is thanks to [1] and his 2013 Masters
thesis.

There has been slight discussion to integrate a supercompiling pass (more
involved than the ones that are in the compiler already) over at [2].

Hope to motivate enough people to see this happen, especially since we now
have even cheaper constant literals [3]! Let's get on this.

[1] [https://github.com/weinholt](https://github.com/weinholt)

[2]
[https://github.com/erlang/otp/pull/1080#issuecomment-3145985...](https://github.com/erlang/otp/pull/1080#issuecomment-314598581)

[3] [https://medium.com/@jlouis666/an-erlang-
otp-20-0-optimizatio...](https://medium.com/@jlouis666/an-erlang-
otp-20-0-optimization-efde8b20cba7)

~~~
cpeterso
Can you share any more information about the types of code patterns that can
be superoptimized or how much speedup you see on benchmarks?

~~~
fenollp
So if you look into [1] you'll find .hrl sources (actually .erl) and their .S
counterpart in the subfolders. From these you will see that functions `a` and
`b` look more and more like each other. Note that `b` is the literal version
of `a`.

How much speedup? Potentially infinite, at the price of compile time. In real
world cases though I don't have any data yet. I want to be able to compile a
whole project first and that fails for one last reason, described in the
README. But I have faith :)

The patterns that can be superoptimized are few for now. [2] describes how
some of lists.erl functions work so that they can be supercompiled. The long-
term idea is to have a local(/global if secured) cache of code so that inter-
module calls can be optimized if the developer wants.

My real world case/motivation is supercompilation of code such as [3] (goal is
to turn `a` into `b` at the assembly level). This pattern is extremely common
in my employer's software [4] (which is probably the biggest open source
Erlang project AFAIK).

A short example of a being super compiled into b:

    
    
        a() -> hd([get()]).
        b() -> get().
    
    
    

[1]
[https://github.com/fenollp/erlscp/tree/master/test/asm_data](https://github.com/fenollp/erlscp/tree/master/test/asm_data)

[2]
[https://github.com/fenollp/erlscp/blob/master/src/erlang_sup...](https://github.com/fenollp/erlscp/blob/master/src/erlang_supercompiler.erl#L29)

[3]
[https://github.com/fenollp/erlscp/blob/master/test/asm_data/...](https://github.com/fenollp/erlscp/blob/master/test/asm_data/unfold3.hrl)

[4] [https://github.com/2600hz/kazoo](https://github.com/2600hz/kazoo)

~~~
smaddox
From what I understand (on phone so no refs right now), "super compilation" is
limited to a linear speedup. However, "distillation" can achieve super-linear
speedup.

------
makapuf
For those wondering what a supercompiler is, here is a simple introduction to
it on c2 :
[http://wiki.c2.com/?SuperCompiler](http://wiki.c2.com/?SuperCompiler).

To quote it :

A SuperCompiler is a dynamic optimizer that can transform the code of an
object according to its actual usage.

This may be simple static optimizations, removing state that isn't used from
an object; or it may be very complex e.g. partial evaluation of functions
leading to code transformation.

~~~
Sean1708
How does a supercompiler differ to the optimisations that occur in a JIT
compiler?

~~~
willvarfar
It's on a spectrum with it. JITs don't actually do much "optimization",
because they need to be fast. It's subjective, but I'd say supercompiling is
nearer to profile-guided whole-program optimization.

~~~
exikyut
Huh.

It would be kind of cool to hook a profiling execution tracer into a JIT to
find hot paths, resolve that to the nearest "usefully supercompilable" code
block (it sounds like supercompilation works best when fed whole programs, I'm
not sure how much bigger-picture comprehension [large-scale] JITs have of
program scope in practice) and then supercompile that in a CPU-throttled
background thread.

Thinking in the context of LLVM et al.

I'm aware that JS engines do a "fast" compile designed to just generate code
_now_ within a few ms, and a fractionally slower compile that does do some
optimizations and returns executable results within a few hundred ms.

This could be a third optimization stage that takes 1 second, or maybe even 2
or 3, but produces really really good results. If this doesn't already exist -
I don't know, so I can't really assume either way - it almost sounds like low
hanging fruit, and a super fun project too.

~~~
willvarfar
Completely agree! There is plenty of unexplored space and tradeoffs across the
spectrum :)

------
DonbunEf7
"Not that the Futamura projections can be used with erlscp yet, but, you know,
just in case that day ever comes. A man can dream."

I feel like this is the lament of every compiler author.

~~~
toolslive
When first read about partial evaluation, I misread Futamura as "Futurama".
Maybe my brain was trying to tell me something...

~~~
devmunchies
Yes, and then I found this wiki page about the topic and I still am confused:
[https://en.wikipedia.org/wiki/Partial_evaluation](https://en.wikipedia.org/wiki/Partial_evaluation)

Can anybody explain like I'm 5?

~~~
toolslive
Not like you're five, but if you're interested: the standard introduction is
still 'Partial Evaluation and Program generation', which is freely available
[http://www.itu.dk/~sestoft/pebook/pebook.html](http://www.itu.dk/~sestoft/pebook/pebook.html)

