Hacker News new | past | comments | ask | show | jobs | submit login
Architecture for a JavaScript to C compiler (timr.co)
90 points by timruffles 39 days ago | hide | past | web | favorite | 45 comments

I don't understand why people keep making js-to-native compilers. the result is always less performance that what modern js engines can do to the original source code, because all those runtime behaviours the author mentions build up really fast. js engines actually run the code and figure out the types and such to create highly optimized machine code.

Yeah, the JS runtimes are really impressive. I'd be interested to see what a compiler could do with TypeScript though. There could perhaps be guidelines of which features you can use in hot code paths in order to get optimised code, which could be enforced by the compiler for sections of the code which you mark...

I think that the AssemblyScript project is halfway there. It compiles a stricter subset of Typescript to webassembly format, which can be then translated to CPU-specific machine code.

> Yeah, the JS runtimes are really impressive. I'd be interested to see what a compiler could do with TypeScript though.

You can probably just compare AOT and JIT Java or C# implementations and get something of an upper bound.

Actual improvements will likely depend on your use case: for short-running programs the overhead of the parsing, JIT machinery & interpretation might dwarf the execution time so even a straightforward compilation full of virtual calls will edge out the pre-JIT interpreter, however for longer-running programs the JIT should be able to devirtualise, inline and optimise the code much better than the AOT compiler will.

Inlining is the most important optimisation, and it's hard to AOT inline dynamic dispatch.

I would imagine JS JITs can determine more specific types at runtime than you can express in TypeScript source code.

Could you ELI5 why you think that?

TypeScript lets you annotate that a parameter will only ever be a number, but the JIT can find that out for itself by seeing that it’s only ever a number when it runs the program and a number is all you pass to it, using stamps and class profiling. The JIT can also see types beyond that, like that it’s only ever a number and it’s always below 0xff, using stamps and value profiling. The JIT has to guard that the type is as expected, yes, but so would the static compiler in order to do the more specific types.

Lets say you've built a house in lego. But the computer do not know it's actually a house. Something called a compiler has to take apart the lego house, then package the parts in groups of 8,16,32 or 64 (dissembles the house into stacks with 8 lego bit's in each). Then the compILer have to tell the compUter what to do with the stacks, and what each stack is. For example take this stack with red pieces and this stack with green pieces and build a wall, then take these bits, and combine with these bits to build a roof, etc. The computer is dumb, but it's very fast.

Modern JS engines rely on tracing JITs to achieve good performance. These optimizations only kick in once code has executed multiple times.

But a lot of JS code executes only once, such as layout code running at app launch. Heck, lots of JS code executes zero times. This code still imposes a cost (parsing, etc).

Consider the complaints about app launch time of Electron apps. A static compiler can be more effective at the runs-once or runs-zero cases.

Not tracing. Speculation.

The difference is that you compile user control flow as-is unless you have overwhelming evidence that you should do otherwise.

And yes you are right. This is promising for run-once code, but all of the cost will be in dynamic things like the envGet. An interpreter can actually do better here because most environment resolution can be done as part of bytecode generation. So it’s possible that this experiment leads to something that is slower than JSC’s interpreter.

Agreed that envGet() will be brutal, but optimizations can eliminate a lot of that, e.g. by promoting locals to stack variables instead of heap variables.

It is possible to statically decide which environment contains each upvar. So I don't understand your conclusion that this experiment may be slower than JSC's bytecode interpreter. What information can JSC exploit at this stage that is statically unavailable?

Yeah, and escape analysis could be used to avoid putting variables that can't be easily placed onto the stack in the scope of a garbage collector. Further optimization work could be performed to compute variables with constant results or to partially compute expressions. It's interesting, got me thinking about building one.

I got to thinking... with the right types, isn't it possible to take each JS function and emit a Rust generic function over every JS type (value/object) and have the Rust compiler emit optimized code for each specialization. Each invocation at compile time can be optimized by the Rust compiler then and so long as there's no eval or dynamic dispatch in the translation unit, each call should be maximally efficient and all unused paths can be trimmed statically.

Not possible to do statically if you’ve got eval or with. But maybe this experiment will not support those.

JSC has speculations that eval won’t introduce variable names we’ve already resolved for example.

v8 is remarkable but what is described in those links is primarily a JIT with a cache tacked on. A pipeline optimized for static compilation would make different tradeoffs.

Indeed, only so much can be done to improve v8 in this regard.

Tracing and other dynamic analysis can do a lot to improve the quality of generated code -- it's effectively a realtime PGO. On the other hand, there's a substantial activation energy in the form of the initial lex, parse, AST generation/manipulation, interpretation and dynamic analysis, re-optimization, etc. that can all be handled ahead of time, albeit with less information to optimize.

> Modern JS engines rely on tracing JITs

Which modern JS engine still relies on tracing? I thought they’d all moved on from that technique many years ago, but I’m not an expert in JS.

Yes, I misused a term of art, thank you for the correction. I meant only to distinguish modern engines that rely on runtime profiling, from the wholly static approach in this post.

Can the compiler really do much considering how dynamic Javascript is?

Depends on your definition of "much", but more than you might expect.

Pre-parsing is an obvious big win, but there's also classical compiler optimizations: constant propagation, DCE, certain inlining, etc. Local variables can be statically determined to be non-captured, which means they can be stack allocated.

One totally bizarre but profoundly useful property of JS is the restrictions on eval. eval has the ability to run new code, but this code is globally scoped unless it's run as the "eval" function itself. [1] Thus it's easy to decide which local variables are immune to eval; this in turn unlocks lots of optimizations. To my knowledge this weirdo behavior is not present in Python or other dynamic languages.

1: https://es5.github.io/#x15.

I can not visualize what you mean. What does it mean to be run as the eval function itself?

It's the dorkiest thing you can imagine, literally a string comparison against "eval".

In JS functions are first class, so one might attempt:

    function wut() {
      var x = 1;
      var obj = {sneaky: eval};

Here we are calling `obj.sneaky()` with some JS code. The sneaky property is the eval function: won't it run the code and thereby increment x?

The answer is no: because the caller's name 'obj.sneaky' does not literally compare equal to "eval", the eval function is run with global scope instead of local scope. Therefore the 'x' inside the eval'd string is a global property, not the local variable.

So with this analysis we can (statically) apply constant propagation and replace x with 1.

Now if we replace obj.sneaky with eval, the string comparison succeeds, it becomes "direct eval", and the constant propagation is invalid.

Since this seems such a roundabout way of calling eval, does anyone actually use this?

Well usually we want to run code at global scope, and direct eval is an obstacle.

There's workarounds like "(1, eval)", but the most idiomatic way is the Function constructor, which always runs code at global scope. That leads to this pattern:

    var global = (new Function("return this")())
which really is the safest way to get the global object. I know right.

TIL. I love this.

I guess that makes sense. You can optimize the not so dynamic parts of the code.

Oh sure, it's just for fun/learning! Check my first post.

Maybe you want to port some existing JS to another platform (eg. OS X, iOS, Android) where you want to run it natively rather than spinning up a JS engine and dealing with the bridge.

It may be slower, but I'd give up speed if this would result in less memory usage.

On the one hand, you’re totally right.

But maybe there will be some breakthrough, so it’s important to stay open-minded. That breakthrough may be something modest like if this style of JS execution was better for some niche use case.

I could imagine a use case: you want to include some Javascript code base into your own EXE, but you don't want to drag in node.js or something heavy like that.

I wonder how much optimization will it bring to compared to existing JS runtimes such as V8.

Thanks to competing world for web browsers, JS runtimes not only efficiently parse to optimized native code but also provide really good JIT compilation benefits.

Speculative optimization for V8 - https://ponyfoo.com/articles/an-introduction-to-speculative-...

Parallel and Concurrent GC - https://v8.dev/blog/trash-talk

Good summary on 10 years of V8 - https://v8.dev/blog/10-years

v8-style engines do not parse to optimized native code.

As described in the links, v8 parses to an AST, which then is compiled to bytecode. A bytecode VM then executes the JS, collecting runtime type information, which is input (along with the bytecode itself) into the next compilation tier; only at that point is machine code generated.

The key idea is that v8 expects to execute the JS code before it can generate native code. It won't generate native code from parsing alone.

A lot of research and hard work has gone into TclQuadCode [0], which compiles Tcl (which is even more dynamic than JavaScript) into machine code via LLVM.

The authors indicated at one point it took around 5 PhDs to get it going.

[0] https://core.tcl.tk/tclquadcode/dir?ci=trunk

This is bizarre and fascinating. I had no idea there were Tcl codebases of a size that could benefit from this sort of perf work. How much Tcl is out there?

I actually think this is possible; and started prototyping it (because esprima is awesome, and not to many other languages have an equivalent that's so easy to use).

Some thoughts: I think it's easier to target C++ than C, since C++ can help you write more type generic code. I think it's easy to generate tagged unions, then for optimizations try to prove monomorphism. Finally, it may be simpler to start off with support for typescript, and fail to compile if there are any ANY types. I do think it's possible though. JS/TS -> C++ -> WASM (yes, I was out of my mind when I thought of this)

Isn't this kind of what V8 and other modern JavaScript engines do on the fly already?

Yes: one of the Google engineers working on V8 talked about it here: https://softwareengineeringdaily.com/2018/10/03/javascript-a...

It was this conversation that make me wonder if at some point in the future V8 might have experimental support natively for TypeScript but it makes more sense that compiling web assembly to native binary would make more sense. Who knows? It's an awesome time to be a programmer!

With all the dynamic stuff Javascript has it seems really difficult to create performant C code.

There is a PHP to .NET compiler which probably has similar problems. On second thought that one is probably easier because .NEt has a dynamic runtime.

That, and in practice modern PHP is often relatively staticly typed (classes declare their fields, etc), and many codebases even include type annotations (which are supported in PHP and usually enforced at runtime).

Congrats to completing this project. What's the status and further plans for it? I didn't find a license.

How are exceptions handled with this design? For example the `n < 3` may throw (it invokes the valueOf method).

I went for longjmp(), non-local goto, for precisely the reason you highlighted: I realised pretty well everything in JS can throw, so needed something easy to trigger from anywhere

See https://github.com/timruffles/js-to-c/blob/1befbf4220753576e...

I wonder why people still try write transpiler from JS to C/C++ or LLVM (which make sense at least). But this not performant way and usually produce much bigger overhead than jit vm which use speculative optimizations.

Some projects: 1. https://github.com/fabiosantoscode/js2cpp 2. https://github.com/raphamorim/js2c 3. https://github.com/ammer/js2c 4. https://github.com/NectarJS/nectarjs 5. https://github.com/ovr/StaticScript

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact