
ZetaVM, my new compiler project - ingve
https://pointersgonewild.com/2017/04/29/zetavm-my-new-compiler-project/
======
grashalm01
Your mission seems to be matching with what we are trying with Graal and
Truffle. The Truffle API aims to be stable as well. We also provide basic
building blocks like an object model. I am curious how you plan to support
speculative optimizations that need to deoptimize and reconstruct interpreter
stack frames? In my experience that's essential for building high performance
dynamic language implementations.

~~~
mpweiher
> essential for high performance dynamic language implementations

Is it? Objective-C is a dynamic language (AOT-compiled) that doesn't have
these features, and it is possible to write very high performance code with
it.

~~~
chrisseaton
I don't have any numbers, but I think it's generally possible to write very
high performance code in Objective C... by not using the dynamic language
features of it such as message sends.

Objective C I believe does a globally cached method lookup for every message
send (!) and so can't inline through message sends (!), and since inlining is
the mother of all optimisations I would imagine this would severely limit
performance if you tried to use a lot of message sends in your inner loops. We
should actually think about doing that experiment to see what the cost would
be.

In a language like Ruby almost all operators, even basic arithmetic, are
dynamic method calls, so you can't avoid using message sends anywhere. I think
if you tried to do that in Objective C things might grind to a halt.

Objective C also lacks many dynamic language features which are the ones
solved through speculative optimisation, such as integer overflow, access to
frames as objects, and so on.

I'm not an expert on Object C though so happy to be corrected.

~~~
mpweiher
> I don't have any numbers

I do :-) Wrote about a book about it, in fact.

> but I think it's generally possible to write very high performance code in
> Objective C... by not using the dynamic language features of it such as
> message sends.

Yes, that's a common misconception...with a grain of truth. In my experience,
you get the best performance by judiciously mixing dynamic and static
features. And yes, that means eschewing some dynamic features in some inner
loops (The 97:3 rule applies). However, you can also often gain significant
performance by hiding behind a polymorphic dispatch.

For example, I reimplemented Apple's binary plist parsers+generators in
Objective-C (from C) for a significant speed boost: the polymorphic
implementation allowed me to put in override points for things such as lazy
loading, and interface-based (de-)serialization removes the need for a generic
intermediate representation. Compared to those advantages, the cost of
message-sends is negligible (and optimizable if it becomes a problem).

> Objective C I believe does a globally cached method lookup for every message
> send(!)

Yes. It's quite fast and despite what people fret about rarely a problem.

> and so can't inline through message sends (!),

If it does become a problem (measure, measure, measure!), there are techniques
to avoid the lookup: IMP-cache, convert to C function call, convert to inline
function, convert to Macro.

> since inlining is the mother of all optimisations

Hmm...the mother of all optimizations is measuring and removing unnecessary
code. Then comes eliminating/reducing and "sequentializing" memory access.

Very few of these can be automated.

Inlining is nice, too.

~~~
chrisseaton
Thanks for that extra info.

How do you think Objective C would perform if every operator was a dynamic
method call as it is in Ruby? Surely then you'd start to get frustrated with
the overhead? That's why languages like Ruby need the speculative
optimisations.

~~~
mpweiher
> How do you think Objective C would perform if every operator was a dynamic
> method call as it is in Ruby?

Depends very much on what you mean with "every": the 97:3 rule applies, and is
almost certainly even more highly skewed today[1]. So for the vast majority of
code, it wouldn't matter. Correction: _doesn 't_ matter. For example, Apple's
Swift language produces code that is incredibly slow when non-optimized, loops
and the like can easily be 1000x (a thousand times!) slower than optimized,
and yet Xcode's debug builds default to non-optimized and people don't report
that their debug builds are unusable.

Another example: I implemented the central re-pagination loop in my
BookLightning imposition app[2] in my Objective-Smalltalk language[3], which
currently has just about the slowest implementation imaginable (an AST-walker
inefficiently implemented in Objective-C), at least an order of magnitude
slower than Ruby. Despite that, BookLightning is at least an order of
magnitude faster than the OS-X print system, which is largely written in C.
Why? It computes by page (which is sufficient for this task), rather than by
individual PDF graphical element. That difference is so great that the
steering code controlling the computation just doesn't matter.

> Surely then you'd start to get frustrated with the overhead?

As long as Objective-C were still a hybrid language: probably not, because I
could always eliminate the overhead in the (very) few places that mattered,
and could do so reliably/predictably [4]. In fact, for Objective-Smalltalk I
am very much leaning towards that approach (Smalltalk-ish by default,
optimizations optional), and so far things are looking good.

> That's why languages like Ruby need the speculative optimisations.

Or C libraries, which is what I believe high performance Ruby code does.

p.s.: I think Truffle and Graal are awesome, and as a researcher I wish I'd
come up with them. When doing actual practical performance work, I prefer
simpler and more predictable tech.

[1] "The Death of Optimizing Compilers"
[http://cr.yp.to/talks/2015%2E04%2E16/slides-
djb-20150416-a4....](http://cr.yp.to/talks/2015%2E04%2E16/slides-
djb-20150416-a4.pdf)

[2] [http://www.metaobject.com/Products/](http://www.metaobject.com/Products/)

[3] [http://objective.st/](http://objective.st/)

[4]
[http://blog.metaobject.com/2015/10/jitterdammerung.html](http://blog.metaobject.com/2015/10/jitterdammerung.html)

~~~
chrisseaton
But going back to the original argument that was being made - you say that you
don't need speculative optimisations to make something like Objective C fast.
But you say to do that you don't use the dynamic features where you need
performance - use macros and C functions instead.

So yes you don't need speculative optimisations... as long as you apply
similar optimisations manually in the source code yourself. I'm not convinced
therefore :)

~~~
mpweiher
> you don't need speculative optimisations to make something like Objective C
> fast.

There is a difference between "make X fast" and "write fast code in X", a
distinction that is hugely important in practice.

So no, I don't need speculative optimization to write fast code in
Objectice-C, but I would need speculative optimization to make Objective-C
code fast [without touching the Objective-C source code].

> But you say to do that you don't use the dynamic features where you need
> performance

No, I did not say that at all. I said I mix dynamic features and non-dynamic
features where necessary, and that I need both to make things fast. And (more
importantly), that the most important optimizations have nothing to do with
either (which counters the assertion that inlining is "the mother of all
optimizations").

> So yes you don't need speculative optimisations... as long as you apply
> similar optimisations manually in the source code yourself.

The point being that (a) those optimizations are such a small part of the
overall optimization process, which in turn is applied to such a tiny part of
the overall code-base, that automation is not needed. Which invalidates the
assertion that these optimizations are "necessary". Nice to have? Yes.
Necessary? No.

The other point (b) is that optimizations by the compiler/JIT aren't as good
as those applied manually for many reasons, one being that the compiler/JIT
has to make the transformations indistinguishable at a fairly low-level,
whereas the author has a higher-level overview and can adjust the semantics to
fit. So doing it manually is also worth it.

The third point is that I cannot rely on the compiler/JIT making those
optimizations, there is no guarantee that they will be applied. And with
today's performance landscape being what it is, reliability is paramount,
meaning that being able to guarantee even a slightly higher bound is more
important than meeting a lower bound some of the time or even on average.

> I'm not convinced therefore :)

As long as I am _capable_ of applying those optimizations manually, I have
made my original point which is that having these optimizations done
automatically is not necessary for performance, but at best "nice to have".
Which doesn't mean that more compiler support wouldn't be nice, the process in
Objective-C is too ad-hoc.

------
msangi
I'm wondering how different languages implemented on top of this VM could be,
especially at the semantic level.

Having different syntaxes on top of the same VM is nice and all the JVM
languages show that there is room for a good variety and for different
paradigms.

On the other hand, the choice of a VM sets some constraints while providing
important features. Think for instance at the differences between the
languages running on the JVM and the languages running on the BEAM (Erlang's
VM).

I think there is a lot of untapped potential for innovative ideas at the VM-
level but for some reason most of the effort goes into proposing new syntaxes
that reuse the same concepts all the other languages are using.

~~~
Johnny_Brahms
I have been playing with toy languages for the guile VM, and you can do quite
a lot of things before the VM limits you much.

With some exceptions though: Java VM does not support tail call optimization,
which limits how you implement loop constructs.

------
marktangotango
How does this project compare to nekovm? Sounds very similar.

~~~
tachyonbeam
Not familiar with Neko, but superficially, it seems that VM is designed for a
specific programming language. It uses a bytecode IR whereas Zeta uses a
textual one. Neko is also farther along, more mature and feature complete than
Zeta.

I intend to take a more experimental direction with Zeta. It won't have an
FFI, for instance. It will only provide a small set of minimalist APIs. The VM
will be intentionally designed to avoid code rot and breaking changes.

Another thing I would like to experiment with is transparent compilation of
pixel shaders to run on GPUs, the ability to run code in any language running
on Zeta (given some restrictions) on both the CPU and GPU. I believe I have
found a way to make this work, based on my type-specialization research.

~~~
akkartik
Hmm, I haven't heard this idea before that the FFI or surface area of
libraries in a language causes bitrot. Could you elaborate on the connection?
I can take version 1 of the C sources of Vim from back in '92 and compile them
without trouble. I'm not aware of dynamic languages like Javascript or Python
2 having any bitrot issues either. Backwards compatibility seems like a pretty
big constraint for everyone.

 _Edit 16 minutes later_ : I just tried your benchmark with my VM-like
language ([https://github.com/akkartik/mu](https://github.com/akkartik/mu)),
and the time taken was almost identical. Interesting exercise! Here's the Mu
and Plush/0 programs side by side:

    
    
      $ cat fib.mu
      def fib n:num -> result:num [
        local-scope
        load-ingredients
        base-case?:bool <- lesser-or-equal n, 1
        return-if base-case?, n
        n <- subtract n, 1
        fib-n-1:num <- fib n
        n <- subtract n, 1
        fib-n-2:num <- fib n
        result <- add fib-n-1, fib-n-2
      ]
    
      def main [
        local-scope
        x:num <- fib 29
        $print x, 10/newline
      ]
    
      $ cat benchmarks/fib29.pls
      #language "lang/plush/0"
      var fib = function (n)
      {
          if (n < 2)
              return n;
          return fib(n-1) + fib(n-2);
      };
      var r = fib(29);
      print(r);

~~~
tachyonbeam
C has been fairly stable across time, but you gave to admit, a C program from
1992 still compiling as-is, that's an exception rather than the rule. The
reason Vim from 1992 might still compile is actually because it doesn't have
that much dependencies apart from standard C and POSIX APIs.

JavaScript has massive bitrot issues. The HTML DOM is huge and constantly
changing. I have had my own web apps break multiple times over the years. As
for Python and C, if you stick to the core language, and minimize
dependencies, you might be Ok. The problem is that the more dependencies you
have, if any one of them break, they can render your program broken... And if
you're not there to fix it, your program remains broken forever.

My argument for reducing API surface and keeping APIs low-level and minimalist
is that the smaller an API, the more difficult it is to implement it wrong.
It's easier for two implementations to implement a smaller API and have the
same behavior. It's also easier to test small APIs for conformance, etc.

~~~
akkartik
I think I see what you mean. However, all these previous languages would claim
that the parts they control have not suffered from bitrot. The dependencies
you're thinking of are not considered part of the language in each case. Is
that right? How would you keep people from creating new libraries in your
language for unanticipated use cases? Say a self-driving car library, or a
command-and-control module for all the IoT devices in a house from 2029?

It seems to me that bitrot is fundamentally a result of change in the outside
world. The only way to opt out of bitrot when the world changes rapidly seems
to be to disengage from the world and become irrelevant. I'm fairly certain Mu
will not suffer from bitrot in 30 years -- but it'll only be because nobody
ever built anything with it :)

~~~
sitkack
I have come to the same conclusion as tachyonbeam, it is interaction with the
outside world that causes bitrot. Rather than have APIs, we need communication
protocol with simple semantics. The VM should interact over a constrained,
well defined protocol, that protocol could implement a POSIX io model, but
that would be up to the client to offer that abstraction over messages.

The surface area of POSIX is too high to build systems that can run for 10s or
hundreds of years. Look at the design of Lua over Python. The assumptions it
makes of the underlying platform are much much cleaner, thus it is more
portable and easier to debug platform issues.

~~~
akkartik
Interesting. Can you show an example of how Lua's interface to the underlying
platform improves on Python? Doesn't replacing APIs with protocols merely
shift the breakage to higher-level logical errors rather than lower-level
mechanical ones?

~~~
sitkack
Lua is under-coupled with the base system. For the longest time it wouldn't
even support dynamically loaded modules because it wasn't portable. It wasn't
part of C89.

So when one uses Lua, it is up to the user to supply IO. This threading
through the needle or hour-glass allows the users (embedder) to define how the
scripts interact with the base system. This makes them more resilient over
time, the interactions are better specified and highly mediated.

Somewhat analogous to the library vs framework dichotomy. Libraries survive
better than frameworks do.

~~~
akkartik
Fascinating! Seems to have some commonality with dependency injection, which
is one of my interests
([https://github.com/akkartik/mu#readme](https://github.com/akkartik/mu#readme))

Can you point me at any Lua docs where I can read more about overriding IO in
Lua?

------
salmonlogs
+1 for the amazing domain name

------
zerr
Why do you need Tag member in the Word union?

~~~
tachyonbeam
Author here: sometimes you want to get the tag of a value, and operate on that
tag as a value, but I guess this may not be necessary in this system, given
that the get_tag instruction will produce an immutable string.

EDIT: removed it. It was probably left over from old code, previous VM's I've
written:
[https://github.com/maximecb/zetavm/commit/7ef746f06113530468...](https://github.com/maximecb/zetavm/commit/7ef746f061135304684c5029820631a688c399fa)

