
*JS : Low-Level JavaScript - bpierre
http://mbebenita.github.com/Mvm/
======
geoffschmidt
Why does malloc take a byte count, just to divide it by sizeof(u32)? Why not
embrace the fact that natural "word" in the JavaScript VM is a boxed value
that stores an int, float, or string, and define sizeof(u8) === sizeof(u16)
=== sizeof(u32) === 1? Unless you really want each cell in your memory arena
to literally emulate a byte and always store a value between 0 and 255?

In a similar vein, why not use the native JavaScript object type to back the
structs? This will cooperate nicely with the inline caching in typical
JavaScript JITs. The result will be native assembler that indexes directly
into the structs in the way you'd hope, since your static typing system forces
every object reference to be monomorphic.

Does this language actually eliminate GC pauses? Isn't the GC still going to
run, except now it has to walk every single cell in your memory arena every
time, defeating the very heuristics that GCs use to reduce pauses?

There's definitely something alluring about the static typing, but I'm less
sold on the manual memory management. Maybe there is a way that the language
could help you generate less garbage, without actually requiring you to
manually control the lifetime of each object?

From your linked-list example, it's eminently clear that it needs a C++-style
template system and a port of the STL :) If you wrote a C-to-*JS translator,
you could hypothetically use a pre-existing cfront (C++-to-C translator) and
STL implementation to get that working.

~~~
azakai
> Does this language actually eliminate GC pauses? Isn't the GC still going to
> run, except now it has to walk every single cell in your memory arena every
> time, defeating the very heuristics that GCs use to reduce pauses?

Yes, this eliminates GC pauses.

The JS GC will only run if allocations occur, allocations in the sense of
actual JS objects. If you do malloc which amounts to manually handling indexes
in a single array, no JS objects are created and destroyed.

Also, JS GC's would not walk the memory arena. The memory arena is just an
array of numbers, it does not contain native JS references which is what the
JS GC traces.

~~~
geoffschmidt
I meant the synthetic memory arena, the JS array that malloc() and free()
manage. That certainly could contain JS references and needs to be examined by
the GC.

I don't think you can actually avoid GC by avoiding JS object allocations. Any
form of string manipulation, any form of IO (event handling, DOM manipulation,
XHR), any use of timers, etc, is going to allocate objects, right? Also, the
JS runtime could generate garbage internally. For example, since JS has
closures, it wouldn't be totally unreasonable for the runtime to manage the
lifetime of activation records (JS stack frames) using the GC.

~~~
modeless
The memory arena is a typed array, which cannot contain object references and
so does not need to be scanned by the GC.

If you keep JavaScript object allocations to a minimum, then GC pauses will be
short and infrequent. This could actually be useful to write a game render
loop or any other code that can't tolerate pauses.

------
keyle
I understand low level languages going high level. But high level languages
going low makes me frown very intensely.

Btw if you like this kind of thing, you will love Caterwaul JS. My best
frowning faces are made reading that language.

<http://spencertipping.com/> / <http://caterwauljs.org/doc/caterwaul-by-
example.pdf>

~~~
dherman
In practice, most safe languages provide back doors and foreign interfaces to
unsafe code. Look at Obj.magic in Ocaml, or unsafePerformIO in haskell, or
ffi/unsafe in Racket.

But on the web, you obviously can't tolerate _true_ unsafe back doors. This
project gives you something that is "unsafe" in that you can read or write to
parts of the allocated "heap" buffer that you aren't supposed to, but it's not
possible for that to violate _JavaScript_ 's safety. And the compiler is
generating extremely optimizable code, so it JITs very, very well.

On top of that, the fact that it's all implemented on top of a simple API
(typed arrays) means that you could compile in a safe mode that does extra
checks, or even trap all memory traffic and build valgrind-like diagnostic
tools. It's starting out as a virtualized abstract machine, which means it's
easy to hook into and provide great development tools.

I probably wouldn't reach for this kind of dialect for most purposes. But for
very performance-sensitive kernels, it's got promise. But it's an experiment,
so we'll see!

~~~
hermanhermitage
Dave, is there any way to get involved?

There would always be the option of running the code out into a raw VM host
using NaCL or Xax style container (I know Mozilla dont see these approaches as
necessarily compatible with the open web). But to take an approach like boot
to gecko to the next level and being compatible with low power usage, I dream
of running near native code - inline SSE/Neon with a fallback to generic code.

~~~
mbebenita
Why not extend the typed array API to include SIMD operations on multiple
array elements, we can compile to use those and special case them in the JIT.

~~~
hermanhermitage
Makes perfect sense.

I'm also thinking how the vm/language could self extend in the field. How the
SIMD extensions (or any feature) could be introduced a-posteriori or
independent from browser release cycle.

------
pixie_
Cool. Does anyone else write their code with the * after the type, instead of
before the variable? Like int* x; as opposed to int * x; to me int* is the
type of x, so keeping it together makes sense. Like a function returning an
int pointer would be - int* myfunction(); or type casting something (int* ).
Putting the * before the variable and not as part of the type just seems
unintuitive. Anyone else agree/disagree?

edit: another point - putting * before the variable makes me think of a
dereferencing operation, and that's part of why putting it before the variable
name in declaration is unintuitive, and very confusing to people learning C.

~~~
Peaker
This style rests on the premise that C declaration syntax is:

    
    
      <type> <space> <name>
    

It isn't, and using the style is misleading people to think that it is.

You can't write:

    
    
      int[10] a;
    

You write:

    
    
      int a[10];
    

Basically any type that involves more than a base-type and a pointer is going
to fail to work with that style, and then you will lose uniformity as well.

There's actually a pretty good reason for the way C syntax is: The declaration
and use syntax are virtually identical, and the precedence rules are
identical. It's nice to only have to learn that once.

The fact the * before the variable looks like dereferencing is _intentional_ ,
you're supposed to read this:

    
    
      int *x;
    

as: "Declare the dereference of x to be an int".

And this:

    
    
      int (*(*x)(void))[10];
    

As "The dereference of x is a function of void, the dereference of which is an
array of 10 ints".

~~~
pixie_
The declaration and use syntax of pointers is not 'virtually identical.' int*
a; could be used to set int* b = a; or int c = * a; or int* d = &a;

Also for arrays it's unfortunate the syntax is like it is, because int a[10];,
is of course a pointer and would be a bit more intuitive in the form of int[]
a = new int[10];

Also you say * before the variable 'looks like dereferencing is intentional.'
but is it really? what's your source?

~~~
Peaker
You are misunderstanding.

I'll start from what you said about arrays:

> because int a[10];, is of course a pointer

This is a common misconception about C.

    
    
      int a[10]
    

defines an array, _not_ a pointer. typeof(a) is int[10], and not (int ptr).
The thing is, whenever you take the _value_ of an array in C, it is degraded
to a pointer to the array's first element, and this confuses people to no end
that arrays are pointers.

For example, if we declare:

    
    
      int a[10];
      int *b;
    

Then sizeof(a) and sizeof(b) are very different. typeof(&a) is pointer-to-
int[10], whereas typeof(&b) is pointer-to-pointer-to-int.

    
    
      void f(int (*x)[10]) { ...
      }
    

If you call:

    
    
      f(&a); // will type-check
      f(&b); // will not
    

So the syntax for declaring arrays is fine, and when you declare an array, no
pointer is being declared at all.

This is C syntax for type declarations:

    
    
      <base type> <type declaration here>
    

The "<type declaration here>" part uses virtually the same syntax as use
syntax.

For example, if x is a pointer to a function that returns a pointer to an
array of floats, you could reach a float via this syntax:

    
    
      (*(*x)())[1]
    

You could declare x using this syntax:

    
    
      float (*(*x)())[10];
    

The main differences are:

* Array size instead of index

* There's no requirement to dereference a function pointer to call it, but there is a requirement to use the dereference syntax to declare it.

* Array values can be used as pointers to their first elements (due to the automatic degradation), but must be declared as arrays.

------
hermanhermitage
I like to see this line of work.

I'd really love to see a Frankenstein of:

* C/Javascript/Lisp on an augmented LuaJit (structs, efficient array slices, machine word types).

* dual syntax interchangable S-EXPR and C{}.

* lang(machine) { instr over machine } blocks. Where machine is either a syntactic transformation or a way of hosting binary code in another language and interchanging objects.

* raw continuations and stack management.

* explicit and implicit memory management.

* language profiles to allow assert() and compilation failures based on feature set used to allow different restricted forms of the language to be used according to the position in the software stack the code is targeting (embedded, client, server, enterprise boilerplate). eg. profile(no-gc, no-lang, must-types)

* full ability to specify code to execute and participate at: edit/view, compile (pre-process, parse, code-gen etc), link, load and runtime (debugging, re-jit, trace analysis).

------
dysoco
I love how they name languages like *JS or C! so I can't google them later.

~~~
dherman
Just a temporary name. We're tossing around name ideas. Michael's a true
hacker. He writes code first, asks questions later. I'm more useless, so I
like to think about names. ;-)

~~~
doubleconfess
I'm going to throw 'J--' into the suggestion box. :-)

~~~
gbog
Jscape?

~~~
yuchi
Ecmalang?

------
bsaul
Not really into high speed javascript, but wouldn't you be able to achieve the
same kind of memory reuse by simply using factory methods patterns that would
reuse object instance from a pool instead of reallocating ? It's the same kind
of tricks we use in ios / android programming when we want to avoid memory
allocation or garbage collections...

~~~
VMG
Using the pattern you describe you could only reuse the same kinds of objects.
With a general heap you could reuse the memory for arbitrary structures.

------
IsaacSchlueter
I have a problem with this line:

    
    
        Objects in JavaScript are not cheap, they need to carry 
        around lots of extra information and can be many times 
        larger than their C style counterparts, moreover property 
        access is slow.
    

It's easy to say "X is not cheap, Y is slow".

It's much more valuable to actually prove it with numbers that show _where_
it's fast and _where_ it's slow.

In my experience, object creation and property access is almost never the
bottleneck, and GC pauses only are relevant (in v8 at least) if you're leaking
a large number of objects, especially where those objects are of mixed
generations. (Ie, have a lot of long-lived and short-lived objects that refer
to one another and occasionally leak.) That is, if you are getting hit with
long GC pauses, then it's worth re-thinking your design and tracking down
objects that may be leaking.

Newer JS engines (ie, the ones that implement TypedArrays) already highly
optimize object creation and property access. The benefit therefor seems slim
and highly niche. Looking at the generated code, it seems like the debugging
cost is going to be very high, and it drastically increases the number of
lines that the programmer has to write, which will increase bugs.

I could see the value in certain niche situations, but I would really like to
see the performance characteristics explored more fully.

~~~
dherman

        It's easy to say "X is not cheap, Y is slow".
    

Yeah, I probably wouldn't have worded it that way. I think the better way to
look at it is, it's hard to know how to predict the performance of JavaScript,
given the complex, dynamic, heuristic optimizations performed by modern
engines. This project is better thought of as an attempt to build a dialect of
JS that can be more easily tuned for performance, because its performance
model is simpler than that of JS.

If I were going to try to make that claim more precise, I might do some
experiments to demonstrate the high _variance_ of object performance in modern
JS engines.

    
    
        I could see the value in certain niche situations...
    

That's really the idea. This isn't meant to be an alternative to JS, but
rather a tool for performance-sensitive kernels. Interoperability will
therefore be key, because it has to be easy to write just a small component of
a larger app in this dialect while smoothly integrating into the rest of the
app.

    
    
        ...but I would really like to see the performance characteristics explored more fully.
    

Perhaps. I'm skeptical of our ability to accurately measure general claims
like "objects are slow" or "X is easier to program in than Y" or "*JS is
easier to performance tune than JS." But I do think it's easier to evaluate
those claims, at least informally, after you've built something you can
experiment with. Hence this experiment!

------
stcredzero
I'm a bit confused by the "it compiles to JavaScript" part. I would've
expected it to compile to C.

~~~
dherman
Think of it more like C (really more like a hybrid of C and JS) compiling to
JS. It runs on the web, but it compiles to a very stylized kind of JS: it
allocates a large typed array that represents a C-like heap and does its
memory access on that typed array. This is the same basic approach that
Emscripten uses for running C/C++ programs.

The purpose of this project is to experiment with a dialect of JS that
integrates well with regular JS, so you can write performance-sensitive parts
of your application in this dialect, and they can interoperate smoothly with
the other parts of your normal JS code.

------
psykotic
Cool! For a while I've been pondering a similar idea with the goal of cross-
compiling the language to idiomatic C and JavaScript. The JavaScript
translation would necessarily be less than perfectly idiomatic, but it would
be more idiomatic and have better performance and smaller size than the code
generated by Emscripten. The intent was to use it for high-performance
applications like games.

------
petegrif
This is interesting - who is doing this? and where?

~~~
dherman
This is a project run by Michael Bebenita at Mozilla Research, along with our
colleague Shu-Yu Guo.

------
aphexairlines
Has Typed Scheme shown that this approach yields significant performance gains
or do js typed arrays not have a scheme counterpart?

------
jhrobert
Performance matters. An efficient JavaScript dialect is very much needed
indeed.

Another solution would be to include a C compiler in every browser, with some
DOM thing binding and a convenient JavaScript bridge.

It will be interesting to see what optimizing interpretors can get,
performance wise, when source code uses "performance friendly" features only.

~~~
ilaksh
Serious question. Do you browse the web with Internet Explorer?

V8 and the new Mozilla engine are optimized to hell already. And they compile
to native code. [http://kailaspatil.blogspot.com/2011/08/jaegermonkey-
archite...](http://kailaspatil.blogspot.com/2011/08/jaegermonkey-
architecture.html) And they are efficient.. These engines have proven that you
don't need pointers or explicit types to be an efficient Javascript
implementation.

The only reason that I can see for these types of features like explicit types
and pointers are to support systems programming, where they are required for
certain activities (probably less that you would think though). For that, I
would really love to see a CoffeeScript with those features available for use
optionally. I think to do that you would want to find a way to compile
directly to machine code or assembly or LLVM bytecode or something like that.

~~~
hermanhermitage
There is still a lot of performance slippage out there in high level languages
(eg look at the need for assembly when people write raw codecs in C - x264
etc). We are rapidly approaching the battery wall on mobile devices and the
clock wall is already here. ISA's may be extended to support dynamic languages
more efficiently (tags, direct uop translation from user specified
instructions) - but right now at a given TDP static execution is greener for
many applications than dynamic, and history has shown us cpu vendors are
unwilling to go down this path.

Javascript is anything but efficient right now - some code paths are fast but
with a high startup cost. Look at the startup problem with Chrome/V8 for
instance, or watch a nodejs application hit the GC wall with 800MB of live
data.

There are times when static is faster than dynamic and vice-versa. Times when
heavy upfront compilation is better than incremental run time analysis and
vice-versa. As the clock wall, TDP wall and process communication wall hits we
need all the tools available to exploit maximum performance.

~~~
ilaksh
I'm proposing to compile as much upfront as possible, more than the default
for those engines. But for something like JavaScript, without the types
specified or inferred in all cases, you can't necessarily compile everything
upfront. Type inference would be much better than requiring manual
specification of all types.

I think that it is just much better software engineering to improve the GC and
JIT compilation rather than to code all of the memory management and types
manually or use pointer tricks. If you are building a codec or critical part
of an operating system then you may need assembly or well-defined types for
static compilation.

It would be nice to have the memory management and other features available
for when you need them but I don't think they should be the default.

Anyway I think it could actually be useful to find ways to remove the
separation between assembly-level coding and higher level coding. For example,
if I were writing a codec in CoffeeScript, I could would probably write
something like interleave.highBytesFromQuads rather than PUNPCKHBW.

~~~
hermanhermitage
I agree totally about removing the separation between assembly level coding
and higher level coding.

With current ISAs no matter how much genius is thrown at GC/JIT its always
going to yield a layer of overhead. The pipes are a fixed with, the caches a
fixed size - the plumbing is static.

Below this floor a thinner abstraction will yield greater performance. The
thinner abstraction is useful to implement GC/JIT. Any language that wont let
you bust outside of the GC heap is always going to hit a pain point.

Until a language can self host with no significant efficiency loss there will
always be another language to wedge under it. Until a platform has a mechanism
to expose its raw feature set up to a hosted language we will forever be in a
world of software rasterizers, sluggish Java UI, where software developers re-
implement functionality that highly tuned hardware pipes already provide.

Better to be able to write using interleave.highBytesFromQuads and better to
able to include your implementation of this using a punch thru to PUNPCKHBW
where available or an emulation where not. I guess its the old argument of
high level interfaces versus low level. Useful high level abstractions appear
over time, but without access to the low level we cannot experiment and build
them on a rapid cycle.

I'm sure some OS vendors would like to keep the browser crippled - because in
the natural end game their OS doesnt need to exist as an expensive product.
Its good to see Mozilla pushing the boundaries.

------
phamilton
As a C programmer I like the concept but debugging is going to be a nightmare.
Look at how pointers translate to Javascript. I don't foresee being able to
debug anything with code like that in Chrome Developer Tools.

~~~
sounds
The code is still very young (less than a month old). Debugging in C is done
by printing hex values and using valgrind, so debugging here can be done by
printing hex values (array indexes in the heap or stack) and creating a
valgrind that runs at about the same speed as a non-valgrind session, since
valgrind can be implemented in the js engine.

~~~
mbebenita
Actually, you could build something like Valgrind quite easily. You can
replace the underlying heap with a JS proxy and trap all read/writes, it would
be a fun project.

------
wowoc
If it ever gets popular, it will happen _despite_ its name. Googgling it would
be a flicking nightmare. Or change the name now, before it spreads.

------
drivebyacct2
I was skeptical at first, but I've been writing so much Go lately that the
struct syntax, typing and pointers would be lovely to be able to utilize in a
web app without having to go so extreme as to use Dart.

