
Scala on LLVM - DanielRibeiro
https://days2011.scala-lang.org/sites/days2011/files/ws3-2-scalallvm.pdf
======
ErikCorry
I am very sceptical about an approach that concentrates on code generation and
punts GC to later. My experience is that you can always improve the code
generation later, but the early decisions you make about GC will be with your
forever. Things like data representation, whether objects can move, the
interface to native code, the concurrency model etc. are very dependent on the
GC and once you have built a big system that doesn't have a moving, precise GC
you have probably boxed yourself in in terms of making a good design for your
memory management later.

~~~
wbhart
LLVM has almost no support for GC and it is about the biggest problem that
just hits you in the face when you use it.

What I've been doing for my own experiments is very naive (and quite possibly
broken). I've been implementing an interpreter with all GC done on the
"compile side" of the Jit. What this means is that to allocate at runtime it
reaches back around and calls functions which are pre-compiled into the
interpreter codebase.

The interpreter itself uses linked lists extensively for its symbol table and
to hold references to memory that it needs at compile time, and this is just
hooked into by Jit compiled code at runtime, i.e. there's automatically an
object associated with each allocation done at runtime because there is an
object associated with it at compile time which is stored in the symbol table.
Once a given object is out of scope, assuming no other object (which is still
in the symbol table) holds a reference to it, the GC can then see the object
can be cleaned up.

[And just writing this I know this sounds very much like nonsense. After all,
once something is out of scope at compile time, what is to stop the GC
cleaning it up before runtime? Indeed one has to be careful about this. In my
case, the source code itself is connected with the symbol table. The source
code is itself represented as a linked list and the symbols in it _are_ the
symbols in the symbol table. Until the source is relinquished, the part of the
symbol table that is currently in scope is not relinquished. And of course the
source is not relinquished until after it has been "evaluated", i.e. Jit
compiled and executed.]

So far I haven't managed to trick it, and it is very efficient. But then again
I'm using Boehm-Demers-Weiser which is possibly so conservative that nothing
is actually getting cleaned up in the examples I am running. I simply don't
know.

I do know that this is a technical kludge and (I think) I can see it isn't
going to work for a compiler, only for an interpreter.

~~~
aphexairlines
Have you looked at how GHC provides a GC for its LLVM backend?

~~~
wbhart
No. I did read an article about the GHC LLVM backend and it mentions the Cmm
language (a variant of C--). But there was no mention of GC.

In fact, this thesis:

<http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf>

seems to state that Cmm implements almost everything but the runtime aspects
of C--. In particular GC and exception handling are done more directly by
Haskell without the need for direct support in the back end.

I do know some hooks were added to LLVM to support Haskell, e.g. the calling
conventions used by GHC.

If you have any references regarding the GHC garbage collection, I'd be
interested in seeing them. Of course I can also dig some up for myself.

I assume they do something clever.

(Pretty much everything that is clever that is missing from Scala, from a
mathematician's point of view, is in Haskell or Lisp I suppose. Shame, because
most mathematicians refuse to learn something as mind bending as Haskell.)

Edit: OK, now based on what I do know, I'm going to take a guess at this. The
GHC GC is written in Haskell? Now that's gonna bake someone's noodle.

~~~
eru
Lots of Lisps have GCs written in those Lisps, too. But I guess it's harder
for Haskell, since you can not just "not cons" and mutate in Haskell.

------
rickmode
I'll totally jump back on to the Scala bandwagon once it breaks free of the
JVM.

I'd love for Clojure to make the same jump to LLVM however it relies far more
on the host platform's libraries (the Java/C# libraries).

~~~
jherdman
I'm actually a little confused by this. Isn't one of the major benefits of
Scala, Clojure, JRuby, et al, that it _is_ hosted on the JVM?

~~~
wbhart
Yes, it is. And I suspect the Scala people aren't advocating that it move away
from the JVM.

However, I can definitely say, should a Scala on LLVM become available with
trivial FFI with native code and close to compiled C performance _even for the
Scala interpreter_ (this is definitely possible), I would heartily switch to
Scala from C.

I absolutely love the idea of Scala on the LLVM back end. That opens up a
whole new world to me which currently doesn't exist.

I've recently been trying to design my own language on the LLVM Jit. So far,
many of the design decisions I am making are very similar to, or the same as
Scala, even though I am familiar with many different languages.

What's missing from Scala for me personally, is precisely what is proposed in
this paper.

~~~
runT1ME
Functional and hybrid functional languages benefit _greatly_ from a highly
tuned GC. I don't see LLVM supporting something on par with the JVM's
collector anytime soon so I think you'll lose a lot of the speed you see with
Scala on the JVM unfortunately.

~~~
aphexairlines
The flip side is that the Scala developers can provide a GC tuned to Scala's
use case, rather than the JVM's focus on Java's use case.

------
swah
Were there compromises in Scala's design because it targetted the JVM?

~~~
bobbyi
Scala had a CLR version from pretty much the beginning, so I don't think it
was designed with the intention of exclusively targeting the JVM.

~~~
skrebbel
The current CLR version is definitely not at production quality like the JVM
version, and afaik it never was.

At a recent talk in the Netherlands, Martin Odersky also basically admitted
that there's still a fairly long way until the CLR version is really at
production quality, and even then it will probably keep lagging behind the JVM
version.

So surely you're right about intention, but that's also about all it is.

------
skrebbel
If this works out, it basically also means Scala->JavaScript through
emscripten (<https://github.com/kripken/emscripten>).

Slow, probably, and probably with nearly impossible interfacing with the DOM
and JS libraries, but still - pretty cool

------
smcj
I'm really interested in that idea. That would create the possibility to clean
up the remaining Java cruft like "every object is a monitor", clone vs.
Cloneable, ...

