

Implementing your own language on the Parrot VM - Xichekolas
http://www.parrotblog.org/2008/03/targeting-parrot-vm.html

======
icey
In my opinion, if someone were to want to port Clojure to something other than
the JVM, Parrot would be an amazing place to go.

The idea of having Clojure + CPAN makes my pants tighter.

I know that there is someone working on Clojure on .Net, but I don't think
that really adds much vs the JVM; especially since it seems like some of the
best libraries for the .Net stack have come from Java.

I'm sure I'm at risk of sounding like yet another Clojure fanboy, but to me
this really seems like one of the few times where I've seen an article and
that's the first thing that popped into my head.

~~~
old-gregg
In my opinion, anything on JVM isn't going to become a mainstream language
because VMs refuse to share code between OS processes.

.NET and Mono solve this with AOT-precompiled modules. Ruby and Python kind of
solve this by heavily reusing C runtime/libraries, so at least there aren't 25
copies of printf() and qsort() wasting space in RAM.

JVM doesn't solve this at all. JVM-based softwares act like pigs showing zero
respect to the enrivonment they're in. They're good at "one program = one
computer" tasks, i.e. only for server-side or perhaps developer's workstation
scenarios.

Flash/Air is even worse: not only it won't share anything, but it also brings
in its own graphics, font rasterization, hot key bindings, scroll bar
behavior, etc, making every software built on top of it look and feel like a
poorly designed console game running on your PC under software emulator.

This is why I never touch anything like JVM/AIR: I don't believe the cloud is
"the answer" and I want my code to run on servers, desktops, netbooks, routers
and cell phones.

~~~
nradov
I find it hard to believe that sharing libraries in memory makes a significant
difference. Most libraries are small relative to RAM sizes. The only exception
is on cell phones, which only run a few processes anyway.

~~~
old-gregg
You don't need to "believe", you just have to know it. CS is a discipline, not
a church.

As an exercise, get a full list of all your processes and do the math by
replacing them by hypothetical JVM instances [one per process] and I suggest
paying attention to "shared pool" size for each process too. I guarantee
you'll be shocked by total RAM consumed by an "empty" OSX or Vista. Just
imagine: every little tiny process, even simple background services, will have
a full copy of your entire userspace layer in its address space.

Moreover, with 12-core CPUs becoming a reality very soon, the issue becomes
even _more_ important. With proper code sharing on OS level 12 instances of a
process should (theoretically) take not much more RAM than just one. But
because of VM overhead you'll be essentially duplicating it 12 times.

More code duplication also means more I/O hits on disk and system bus and CPU
L1/L2 cache drain.

This is why Java is for enterprise/web software only. It is simply retarded to
build on it in a true multi-process environment.

OS-level code sharing is hugely important. Your hard drive is full of code,
gigabytes of it, and you can't just clone big chunks of it into every process
who needs it. Without code sharing you'd have to wait a few more years for an
iPhone. He-he, actually because of lack of it, you aren't reading this in a
JVM-powered browser and probably never will.

~~~
nradov
Have you actually tested this on a complex system of multiple applications
running or JVMs to measure the effects, or are you just guessing that it would
be a problem? I don't dispute that sharing libraries across multiple processes
would be a nice optimization, but for typical use cases it's way down on the
priority list.

Most Java applications are written to use multiple threads (rather than
multiple processes), which all execute on the same JVM and share all memory.
Web apps typically run in some sort of J2EE container and those can be
configured to share libraries across multiple applications; the OS doesn't
come into play. Even if you're talking about running multiple separate desktop
applications those will mostly only have the J2SE runtime library in common;
for Java 1.6 the rt.jar file is 42MB, and only a fraction of it even needs to
be loaded into memory.

~~~
old-gregg
_Have you actually tested this on a complex system?_

What? Have you actually tested the multiplication table with rocks or buttons?
That's simple math. Memory consumption is in front of you: I already pointed
you to tools and told you what numbers to use.

Your last paragraph only restates what I already said: Java is for few-
processes-per-machine use case. _And cloning 42MB of code into every process
is insane_ , I can't believe you're actually proposing this. BTW, that's
_compressed_ 42MB, you seem to actually be using J2EE download size to
estimate your RAM consumption. Nice.

I am amazed by the damage done by Java lobotomy at US schools and have very
little hope for progress moving forward: UNIX clones from the 70s are here to
stay, not only we can't hope to advance in that area (OS design) but I'm
afraid we're losing skill to even comprehend existing systems.

One more time: right now, as you're reading this, nearly all processes of you
machine have much, much larger code segments than data segments. Most of your
RAM, young man, is consumed by code, not data. The code-vs-data ratio becomes
even more ridiculous if you're running a few instances of VMWare. Most of the
time you're waiting for your Macbook Pro to do something, it's pushing _code_
around. And the reason Vista is so bloated and so much slower is because it
loads a lot more code into RAM than XP did. So yes, code sharing is hugely
important. A server machine, however, is very different: there a very few
applications and lots of data, so using something inefficient as JVM makes
sense if it speeds up development/deployment.

~~~
nradov
You missed the point. Code sharing in memory only helps if you have multiple
processes all using the exact same library. That problem is already solved for
server applications, and is mostly irrelevant for mobile platforms, so I guess
you're mostly worried about the desktop environment. But typical desktops only
run a few user-mode processes at a time, and wouldn't be able to share much
code regardless of whether the JVM supported it or not.

While it would certainly be possible to implement that feature, it seems most
of us just don't care. RAM is cheap and I would rather see the Sun and IBM
developers focus on more important issues. Dynamic method invocation and tail-
call optimization are a lot higher on my list.

The reason you're not seeing much progress on OS design is not because we
don't comprehend the systems, but rather because what we have now is good
enough for what most people want to do. The OS has largely ceased to be an
obstacle and quietly faded into the background. Disruptive OS innovation will
have to wait until someone comes up with a killer app that requires
fundamental capabilities which current OS designs can't support.

Most of the time I'm waiting for my Vista machine to do something it's not
pushing code around. It's idling, waiting for a response from a server.

~~~
old-gregg
_But typical desktops only run a few user-mode processes at a time_

No. Run ps -A goddamit. A "typical" desktop, especially a UNIX-derived
variant, runs a lot of processes. And servers in the near future will do too.

 _and wouldn't be able to share much code_

No, there are a ton of userspace code which is being shared between processes.
The code that allocates memory, the code that draws lines and buttons on your
screen, the code that sorts strings, the code that implements threading, the
code that renders bitmaps out of TTF glyphs, reads files and opens sockets,
each process needs megabytes of code you seem to have no idea about: it's not
even about libraries, it's about _everything not drivers_ , do you understand
now? There are good reasons why JVM programs are memory pigs: they are,
essentially, running their own OS in complete isolation.

 _Disruptive OS innovation will have to wait until someone comes up with a
killer app that requires fundamental capabilities which current OS designs
can't support._

The problem is that programmers are cheap. There is no market pressure to
evolve and leave technology from the 70s in the past. Armies of incubated Java
code monkeys cost less in the short run as opposed to going against the stream
of mediocrity. This is why new platforms (iPhone) are so exciting and
refreshing: they let us leave obsoleted/inherited junk behind: iPhone doesn't
have JVM nor Flash/AIR for very good reasons: OS is your VM. Has always been.
It's just the right thing to do.

 _The OS has largely ceased to be an obstacle and quietly faded into the
background._

You have demonstrated enough ignorance regarding _what OS actually is_ to
allow me to safely ignore this comment. There are a few folks at Apple and
Microsoft who are still capable of understanding these issues, this is why
Java, despite of 10+ years of availability, with all its beautiful promises,
still lives in the obscurity of server rooms.

~~~
nradov
This is silly. The "typical" desktop runs MS Windows, with only a few user
processes running at a time. Some of those processes may have a bunch of
threads going, but the process count is small. Of course all of those
processes share the low-level OS libraries. And guess what: each JVM process
shares those OS libraries, too.

The Java standard library does duplicate some of the higher-level OS features,
which is the only practical way to achieve cross-platform compatibility. Any
cross-platform solution is going to impose some overhead. Of course you can
write native iPhone apps with no extra overhead, but then they can't be used
by the 95% of mobile device users who have other platforms.

I'm not sure what you mean about code running in server rooms being obscure.
That's what drives web applications and network services which are more
visible and critical to many of us than our desktop applications.

By the way, anyone who wants to use the JVM for writing desktop applications
should take a look at the Eclipse RCP. I haven't had a need to develop for
that platform myself, but the results I've seen from other companies look
pretty good.

------
maximilian
I'm planning a summer, "write my own language" project for fun, and I wonder
how well the parrot compiler tools would be for this. I imagine its a nice way
to get my language into an AST, even if I don't target the Parrot VM.

~~~
Xichekolas
Yeah I'm doing the same 'summer project'. I was writing my own parser/lexer by
hand, but vaguely remembered hearing that Parrot had hit 1.0, so figured it'd
be a good platform to experiment on. No idea if it'll support everything I
want to try, but it at least seems easy to get started.

~~~
maximilian
I tried doing that too, but writing a parser was a pain in the ass in C. I was
going a little crazy trying..

I do a lot of numerical things for my masters, and I want to write a simple
numerics language that is heavily JIT compiled to speed up the code. Most
numerics is done in pretty tight loops (at least everything I see), so I'm
hoping to get pretty good performance. Its also just fun to read about VMs and
armed with a parser, it won't be as hard to target different ones and compare
performance.

~~~
chancho
You should check out LLVM. It generates really fast code and has a JIT
compiler. I don't know much about Parrot, but it seems heavily skewed toward
making dynamic languages fast whereas I believe LLVM aims more toward static
languages. Not that you couldn't implement any language on top of either, but
take for example the fact that Apple's support of LLVM was (I think) in part
so they could use it as a JIT compiler for OpenGL shaders in their software-
fallback drivers. It has good support for cross-platform SIMD instructions and
such, which will benefit numerical computation greatly.

\-------------

Also, why don't you both use something like flex and bison? Is writing a
lexer/parser by hand still that much of a nerd rite-of-passage? That's the
most banal part of implementing a language. You could spend your summer
writing an optimizer instead.

~~~
Xichekolas
I wasn't doing it to prove anything or even to get a working language to
use... it just seemed like a fun problem to solve. I've done a bit with yacc
and lex before, and had a lot of fun back in the day with the Metacircular
Evaluator in SICP.

~~~
chancho
Ah sorry. You said "I wrote it by hand" and he said "it's a pain in the ass"
so I mistakenly thought you felt the same way. One person's banal is another's
fun.

