Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Opal: Ruby to JavaScript Compiler (opalrb.org)
67 points by napolux on June 5, 2014 | hide | past | favorite | 67 comments


I'm going to use this as a little example of why compiling to JS is hard. I work on the Dart team, and people often ask what the big deal about JS compilation is. Opal's compiler is a good example of why it can be hard.

I want to stress, though, that I'm not singling Opal out here. I think Opal is a really cool project, and I hope it works well for lots of people. It's just a good example language since it's on HN right now.

Let's compile this Ruby code to JS with Opal:

    a = 0
    10000000.times do
      a = a + 1
    end
    puts a
Opal gives us:

    /* Generated by Opal 0.6.2 */
    (function($opal) {
      var $a, $b, TMP_1, self = $opal.top, $scope = $opal, nil = $opal.nil, $breaker = $opal.breaker, $slice = $opal.slice, a = nil;

      $opal.add_stubs(['$times', '$+', '$puts']);
      a = 0;
      ($a = ($b = (10000000)).$times, $a._p = (TMP_1 = function(){var self = TMP_1._s || this;

      return a = a['$+'](1)}, TMP_1._s = self, TMP_1), $a).call($b);
      return self.$puts(a);
    })(Opal);
We'll compare it to some vanilla JS:

    var a = 0;
    for (var i = 0; i < 10000000; i++) {
        a = a + 1;
    }
I don't care at all that the generated code is a little funny looking. That's fine. Opal's code is actually pretty readable to me. What is a problem for some (many?) users is the performance.

You'll note that Opal did not compile "a + 1" to "a + 1". Instead it generated "a['$+'](1)". That's because Ruby's arithmetic semantics are different from JavaScript's. To implement those semantics correctly, it needs to use a method call instead of using the built-in arithmetic.

We can profile the two using this fiddle: http://jsfiddle.net/3UtNf/1/

On my laptop, the Opal code is 264 times slower than the raw JS code. In other words, it runs at 0.3% of the speed of the JS code. Now imagine sacrificing that much perf on a mobile device. That's enough to make the language unsuitable for many real-world use cases.

This isn't intractable, though. You just need to compile math down to real JS arithmetic operators when JS's semantics line up with your language's (which typically means, when you're suring you've got numbers and not some other type with a user-defined operator).

Determining where you can do that is the hard part. It requires type analysis. Doing that well in a language that doesn't have a sound static type system requires whole-program analysis. It's extremely complex, monolithic, and leads to very strange output code.

This is why, for example, Dart's dart2js compiler is so complex and heavyweight. It does do this kind of analysis. It compiles this Dart program:

    main() {
      var a = 0;
      for (var i = 0; i < 10000000; i++) {
        a = a + 1;
      }
      print(a);
    }
to this JS:

    function() {
      var a, i, line;
      for (a = 0, i = 0; i < 10000000; ++i)
        ++a;
      line = "" + a;
      H.printString(line);
    }
That has the same performance as the JS code. The reason it does this is because it knows "a" is a number. If we change the Dart code to:

    class Foo {
      operator +(other) => this;
    }

    main() {
      var a = new Foo();
      for (var i = 0; i < 10000000; i++) {
        a = a + 1;
      }
      print(a);
    }
Then generated JS changes completely:

    function() {
        var i, line;
        for (i = 0; i < 10000000; ++i)
          ;
        line = H.S(new Q.Foo());
        H.printString(line);
      }
Interestingly, here you can see the compiler understood that the "+" operator on Foo always returns the same object and was able to inline the call to it and then hoist it out of the loop completely.

This is the kind of stuff you need to do if you want to have a language that compiles to JS and (unlike, say CoffeeScript and TypeScript) has semantics that aren't very very similar to JS.


Not necessarily. You can compile a Ruby VM from C to JS, as people have done. That should give you somewhere around 2 times slower performance, or better, not 264 times slower (and with very little effort).

Things get more complicated if your VM has a JIT, but there are interesting results even there, see pypy.js.

This approach lets you have arbitrary semantics, even ones that differ hugely from JavaScript, with decent performance.

I've suggested this in the past on HN - I think that approach could work for Dart as well. Would be happy to help investigate it.


> That should give you somewhere around 2 times slower performance, or better, not 264 times slower (and with very little effort).

You are comparing different ratios: munificent was comparing against JS and your "2 times" prediction is against native performance of the compiled VM.

> there are interesting results even there, see pypy.js.

Yes, the most interesting result is that you have to warm up pypy.js with a blow torch, otherwise performance is abysmal.

> I think that approach could work for Dart as well.

Have you seen people complaining about size of dart2js output? Can you imagine how big emscripten output would be?

I have been arguing that dart2js could have been using hand-written JIT compiler on the client side, but the startup performance compared to AOT would really be a big deal. I think a combination of AOT and JIT would be the best, but sadly it is also true that JavaScript lacks right level of abstraction. It is either too low-level (typed arrays, hand rolled allocations etc) or too high-level.


> You can compile a Ruby VM from C to JS, as people have done.

That sounds to me like it would just kill your startup perf. Users would have to download an entire Ruby VM every time they hit your site, wouldn't they?

Maybe I'm just a luddite, but spending network resources downloading a garbage collector written in JS only so that I can run it in JS... which natively supports GC just seems really gross to me.

Don't get me wrong, I think Emscripten is very very cool. It just feels like a strange fit for applications written in a language whose semantics aren't that far from JS.


I see your point, but don't think it is quite as bad as that. For one thing, it would be cached etc., so it is about as bad as every site on the web using jQuery (that is, not great, but not horrible either).

Yes, it seems ironic to download a GC written in JS, when JS can do GC. But JS can only do SOME types of GC. For example, it lacks destructor callbacks, which things like Lua require. Some other language might need weakrefs which JS also lacks. So it is not quite that unreasonable to download a GC, as you can get the right semantics you want.

But I do agree it gets less clear when the semantics are very close to JS. I'm not sure if Ruby is close enough, though (CoffeeScript certainly is).


While convoluted, I believe this results in some pretty fast js. mruby -> llvm -> emscripten -> asm.js

See: http://vimeo.com/70673036


Looks like we should have eliminated the loop entirely.

Also, a disclaimer in case anyone tries this themselves: there is a bunch of plaftorm/library code in the Dart output that's not used by such a trivial program and could be eliminated. The dart2js team is focusing of reducing the size of large programs, which do usually use this code, so eliminating it in small programs isn't much of a real-world win right now.


> there is a bunch of plaftorm/library code in the Dart output that's not used by such a trivial program and could be eliminated.

Note that the Opal code also has this.


The core of the issue here is that, in Ruby, it's all about messages and method invocation, right?

I'd suggest that, while correct, your observation about types is a strawman--the generated Javascript (while quite slow!) follows the semantics of the Ruby language correctly; the addition operator is just a function, and with different context could've done something entirely differently. We could've, say, briefly overwritten the + operator on Fixnum or done silly alias_method tricks.


> follows the semantics of the Ruby language correctly;

Exactly, the compiled code follows the Ruby language correctly. It's also unusably slow.

It's possible to generated compiled code that also follows the Ruby language semantics by using knowledge of which portions of those semantics the actual program in question relies on. That program can be a lot faster.

In other words, if you haven't overloaded arithmetic operators, your program compiled to JS shouldn't pay the runtime penalty as if you had.

That's a compiler's job: preserve semantics but make the resulting code run as fast as possible by whatever means necessary.

If you've got a way to compile Ruby to JS that has tolerable arithmetic perf and doesn't do type analysis or generate a huge volume of code, I would definitely like to know more about it.


> If you've got a way to compile Ruby to JS that has tolerable arithmetic perf and doesn't do type analysis or generate a huge volume of code, I would definitely like to know more about it.

Yeah, that's kinda the point, you can do that and preserve Ruby semantics?

Let me note here that is a strawman indeed, Dart has frozen classes, Ruby is a bit different. In order to do something like that you should inline a ton of ternary operator to check if the method has been redefined on that particular object.

Something like:

    (a._isNumber && !a._plusRedefined) ? a + 1 : a['$+'](1)
plus listening on new method definitions to set `_plusRedefined`.

It's definitively a trade off, readability will suffer.

Opal had optimized operators in the past (that assumed native operations on numbers), they could even be back in the near future. But what I see here is a comparison of two different language semantics. I agree that “compiling to JS is hard”, but I don't think that this demonstrates anything. What can be argued instead is that JS VMs could and should do that for us.


It's not just readability that suffers. Even inlining the check like you do here, what you have will be dramatically slower then raw arithmetic.

(My hunch is that your inlined check would actually be slower than just calling the method, actually. If you just do a method call, the inline caching will optimize it away in many cases.)

> I agree that “compiling to JS is hard”, but I don't think that this demonstrates anything.

All I was trying to demonstrate was just that. If you want a new language that runs in a browser, you can have:

    1. Different (presumably better) semantics from JS.
    2. A simple compiler that generates readable JS.
    3. Performance competitive with raw JS.
But you only get to pick two. CoffeeScript picks 2 and 3 (though it does subset out some of the bad JS behavior). Dart picks 1 and 3. Opal (from the very little that I know about it) seems to be picking 1 and 2.

By belief is that for a client-side language (i.e. a language that runs on the end user's hardware, which you don't control), that you don't have the luxury of sacrificing perf. You can in a server-side language since you can just throw more hardware at it. That's why languages like C++ still dominate client-side apps and games.

Given that, I don't think a language that targets the browser can really discard #3 and expect to get a decent number of users. Given that I love different languages, I'd be happy to be proven wrong here.


The name is a bit awkward, given the existence of Opalang, which is another $SomeLanguage to JavaScript compiler.


Given that Opal.rb was started in 2010 and Opalang (back then called opages) in 2010 as well, I would file that under "unfortunate".

https://github.com/opal/opal/commits/master?page=107 https://github.com/MLstate/opalang/commits/master?page=126


I don't remember when Opalang was started, but I joined the project in 2008 :) Fwiw, back then the name was Opa. Opages was the name of the CMS written in Opa/Opalang.

Funnily, Opa and Ur/Web were two projects started around the same time and sharing very similar ideas, and they also collided linguistically – in German, Opa == grand-dad, while Ur == ancestor.

Anyway, have fun with Opal.rb :)


Thanks a lot for clearing that up (also, my post misses a "?" after opapages)!


What is a good use case for this? Interesting project otherwise! :)


I have a few developers that can't wrap their head around too much languages without popping one out. The less variability we have, the more robust the final code will be. Theoretically, all developers are ideal learning machines with a robust CS background and a powerful abstracting mind with infinite time to learn and jump around languages at will, with perfect understanding of each language caveats and idiosyncrasies. In practice, one has to build stuff. This kind of things, while not a perfect solution, allows my team to produce better code more reliably.


Say you have a function that validates a file written in ruby. Now you want to have client side validation in the browser for good UX. The spec for validation may be complicated or you are not sure about the implementation (no docs, person that written it just left). This is internal format with no good parsing library. I'd try it out.


Or you're completely sure about the implementation, but it may change in the future (because requirements evolve) and you don't want to have maintain two different versions.


I never got to using opal, but I was quite intrigued after watching this https://www.youtube.com/watch?v=GH9FAfKG-qY


What about something like rubular.com inside the browser?


There's something to be said for writing the client and server app using the same language. Node.js serves a similar use case.


What is the thing to be said?


The thing to be said is that client and server often share functionality, and if they are in separate languages, code for that shared functionality is necessarily duplicated in two different languages and must be maintained twice.


Sorry if you were attempting to troll but if not 'there is something to be said about..' is a phrase which translates along the lines of 'there are definitely good things that could be said about..'


He's asking what those things are.


I was entertaining the idea of using this to make a code combat for ruby when it was open sourced.


Ruby aficionado programmers who think coffeescript is too pythony perhaps? :)


As DouweM said; there is a case to be made for having the whole stack in a single language.

While I use different languages server side, Ruby (either with Rails or Sinatra) is my go to option more often than not these days.

I have used CoffeeScript extensively, but this has always been on contracts where the decision was made by someone else.

It is important to learn and use real JavaScript (IMHO) rather than all of these transpiled languages. This is the route I take when I have a choice on the client side; I use JavaScript.

If I were to use a transpiled language though Opal makes more sense (IMHO) than CoffeeScript if you are working on a project with a Ruby backend. At least then you are creating a uniform stack and hopefully getting some type of productivity boost for your development team instead of always context switching between different languages.


>I have used CoffeeScript extensively, but this has always been on contracts where the decision was made by someone else.

>It is important to learn and use real JavaScript (IMHO) rather than all of these transpiled languages. This is the route I take when I have a choice on the client side; I use JavaScript.

I've been using Rails and Coffeescript. Intuitively, I feel like using Javascript directly would be better somehow, but I prefer the Coffeescript syntax. (I started using it just because Rails nudges you that direction.)

What are the benefits of using Javascript directly instead of Coffeescript in a Rails environment?


I don't think there would be much benefit except possibly some minor performance gains. But then to even take advantage of these you would likely have to be extremely proficient in native JS.

Personally, I don't really see much of a downside to JS transpiled languages. You are already working with a high-level scripting language. The incredible amount of benefits you gain from using something like Coffeescript or Typescript far exceed any negligible downside to them.

(You do have to trust that the transpiled language is good however. You don't want to accidentally start using a bad language that fails to transpile or is simply harder to work with than native js. With Coffeescript/Typescript being very popular, I would consider them both "safe".)


If you don't know JS really well, I wouldn't recommend Coffeescript. It's hard to tell what will work correctly and what won't unless you're quite familiar with the actual code the Coffeescript is compiling down to. I don't know how true this is for other transpiling languages, but it's definitely true for Coffeescript.

I'm a pretty experienced JS dev, so I know exactly what my CS is compiling to and what that's going to do. If I didn't, CS would make me tear my hair out within a week. As it is, it's just shorthand for JS with a bunch of quality of life improvements (?, no var, etc).


This is a good post by wycats that discusses this:

https://meta.discourse.org/t/is-it-better-for-discourse-to-u...


It seems to me that complaining that you need to use "real JS" instead of a compiled-to-JS language for web apps is like saying you need to use real machine code instead of a compiled-to-machine-code language for an other kind of app.


That is an apples to oranges comparison. Hence the terminology difference between transpiler/compiler.

A transpiler is a very specific subset of a compiler that takes one language that has a "similar" level of abstraction and compiles it to a different language with a "similar" level of abstraction.

C++ compiled to machine code is not at all similar to CoffeeScript transpiled to JavaScript.

If you are working in CoffeeScript/TypeScript/Opal you are still likely to need to understand JavaScript to best leverage existing JavaScript libs/frameworks and to build on or debug those tools when necessary.

A couple of problems arise when you introduce something like CoffeeScript. Now you have added yet an extra layer of knowledge required to the infinite pool of required knowledge

Problems

1) Knowledge required

Now your team is required to know

CoffeeScript(Replace with other transpiled language here) + JavaScript + HTML + CSS + Backend Language of Choice

2) Hiring pool

Eventually you will need to hire someone. This presents a problem from both the employer and employee side

Employer Side (Pointy Haired Boss)

We use CoffeeScript/TypeScript/etc... so we need to post that as a job requirement.

Employee Side

Really smart potential employee who wants to apply knows JavaScript; sees posting requires CoffeeScript. This employee hits indeed.com and sees there are <1000 coffeescript jobs total, versus >43,000 javascript jobs. Employee decides to stick with JavaScript and doesn't apply.

This is a bit of an old interview and I don't know Jeremy Ashkenas personally so I don't know his current stance of CoffeeScript, but I'm going to post it here since it's from the horse's mouth and I believe it illustrates why using JavaScript is really a better choice.

"If the question is 'why is CoffeeScript not a DocumentCloud project?' - it's because I can't justify using it for the main DocumentCloud development. Imagine trying to hire someone. 'You'll have to learn to use a new language that we made up...'" - Jeremy Ashkenas

http://readwrite.com/2011/01/07/interview-coffeescript-jerem...

3) Future proofing

I'm just going to link back to the same wycats link someone else posted, because it's much more elegant than how I'd put it.

https://meta.discourse.org/t/is-it-better-for-discourse-to-u...


How hard is it to use an existing javascript library with Opal?


Since the Opal developers are reading:

I'm curious, do you plan to support Encodings and proper Ruby strings, or it isn't worth the effort, and you'll keep using plain javascript strings?

(I tried to look for this detail on the website but came back empty-handed)


How do people know that one lang would compile down to another? Is there something about the lang that make this possible?

Also, is it possible that something won't translate from ruby --> js and cause a bug?


Ruby and JS are both Turing-complete languages (as with just about any other programming language you'll ever use), which means that one can simulate the other. The simulation might be messy, complicated, or slow, but it will still work. In this case, Ruby and JS are close enough that it can be done without too much trouble (see the compiled JS).


But does "anything written in language A can be simulated using language B" imply "A can compile down to B"?


Yes, it does. It may not be easy, efficient, or elegant, but theoretically any problem solvable by a turing complete language can also be solved by any other turing complete language.



yes. If you can write a body of code X in language B which performs the same operations as any given body of code Y in language A, then all an A->B compiler needs to do is find X given Y.


Ok, but is finding X given Y tractable?

To me, it seems like compiling is translation of source (usually higher level source to lower level source), whereas simulation is more like translation of behavior.

Edit: and the ability to translate behavior does not seem to imply the ability to translate source


Couldn't you translate source given the ability to translate behaviour? If some behaviour in language X can be simulated by a state machine in Y, then you could just emit said state machine as the translation of the behaviour, give its output to the next state machine, etc. And gradually build up the behaviour of a program in X, but using a bunch of generated code in Y instead of just an interpreter. In other terms, if an interpreter for / simulation of X may be expressed in Y, then for each bytecode / AST node in X you could just inline the bytecode handling you would use in Y into the compiled output.


I think thats a very clever algorithm.

But I think your first articulation amounts to converting a turing machine to a huge generated state machine, which would be impossible. The behavior of the X program given one input could be modeled as a sequence of states if it terminates. But the process of generation that you described would have to be repeated for all possible inputs into the simulated X program, which would be impossible.

Maybe a single "behavior" could be modeled by a state machine, but it would also seem impossible to me to decompose a program into individual units of behavior.

But as for bytecode -- one could always compile the X code into bytecode, and then decompile that bytecode into Y code. If you can always find a common bytecode language for two languages X and Y, then I think that would be a generic algorithm for X->Y.

Edit: I guess that common bytecode could just be in a language that describes a turing machine.


If there is a bug in the compiler, is possible that something won't translate ruby --> js and either:

* throw some error from the compiler.

* cause a bug in the resulting code.


If performance is an issue, you can drop down back to JS using %x{} or ``. Good idea to do this for mathematical operations.


I pasted in the Chrome console opal generated js code, but I get that "Opal" is not defined.


Never mind, solved.

I see potential on this.

Thank you.


I'd rather learn js when I need to. Auto generation is waste time in this case


How would Opal handle meta programming like method_missing and define_method?



This is not pure way of doing meta programming. It looks at the code to figure out what methods are being called in future.

Opal.add_stubs(["first", "second", "to_sym"]);

But lets say I am dynamically generating method using a string passed from user input or server then this would fail.


I had the same fear at first, but actually if you generate dynamically a method name you need to use `#__send__` (and friends) to call it.

Of course `#__send__` supports method_missing: https://github.com/opal/opal/blob/0-6-stable/opal/corelib/ba...


Great, I love Ruby. But what do I do for debugging?


(Opal developer here).

When it comes to debugging, I just use the standard chrome dev tools with source maps. Variables and properties on objects all compile using their ruby names, so viewing local vars and ivars in the debugger is easy. I only ever debug the generated javascript when I am debugging a bug with Opals runtime/internals/compiled code.


If you use any 'something-to-another' abstraction, be it language, ORM, whatever... You pray that it doesn't come to debugging.

Might be a great tool for adding a grain of JS salt. But of course if you want to build an entire project based on JS, you better get a book and learn JS...


So compile a bad language to worse :) (Ruby developer here)

I think it makes sense to transcompile type-checked languages to JS, though. Elm is pretty neat.


afaik,it's not really a ruby to javascript compiler(like coffeescript compiles down to js).

You cant write xhr = XMLHttpRequest.new and expect it to compile in js.without importing some libs.


It is really a ruby to JS compiler.

Ruby, however, is not just a thin layer over JS the way CoffeeScript is. The thing you are referring to is a sign of CoffeeScript being a thin layer over JS, not CoffeeScript's compiler being a "real compiler".


Interestingly enough though they are both transpilers (source-to-source compilers).


my point was it doesnt transpile Ruby to Javascript.


It does, though; it just doesn't expose things that are exposed in the underlying JS environment to Ruby code without jumping through certain hoops. Which makes sense, because unlike CS, which is tied to the underlying JS environment, Ruby isn't, and exposing the underlying JS environment would make it harder to port general-purpose Ruby code as you'd be more likely to run into collisions with the JS environment that you wouldn't see on other Ruby platforms.


My use case was an online IDE where instead of using javascript to code webapps,users would use Ruby to do the same.But since opal doesnt expose window object,it's pretty useless for me. Maybe you have a solution for that,but I just didnt find one.


Exposing the window object is a different issue than being a transpired and, in any case, you can access the underlying KS environment in Opal, including fairly simple access to things like the window object.http://opalrb.org/docs/interacting_with_javascript/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: