Before speculating too much about "a bytecode standard", etc., it would probably be helpful to understand virtual machines and instruction sets.
I know web programmers generally aren't big on assembly or writing virtual machines, but an instruction set (group of bytecodes) design and implementation predisposes a processor/VM to certain operations. A VM for an OO language is going to have bytecodes (and other infrastructure) for doing fast method lookup, because it will really hurt performance otherwise. A functional or logic language VM will probably have tail-call optimization, perhaps opcodes specific to pattern matching / unification, and a different style of garbage collector.
Compare the JVM to the Lua or Erlang virtual machines; think about the issues people run into when trying to port languages that aren't like Java to the JVM. Unless people are very deliberate about making a really general instruction set, a "bytecode standard" informed by Javascript could be similarly awkward for languages that aren't just incremental improvements on Javascript. Besides, you can't optimize for everything.
There are a LOT of details I'm glossing over (e.g. sandboxing/security concerns, RISC vs. CISC, the DOM), but I've been meaning to point this out since I read someone saying, "Why do people keep writing more VMs? Why don't we just use the JVM for everything and move on?" It's not that easy.
Another issue is that when you create a virtual machine you basically tie down your potential for optimization to what the opcodes can do, instead of optimizing for the language itself. This is a major problem with Java.
For example, Smalltalk has an advantage compared to Java in which designers of VMs can change the opcodes as needed to get better performance. That's not possible with Java because of the JVM.
The future of the language may be compromised by the decisions you make on the VM. For example, think of the problems Java has in fully supporting 64 bit software -- most of it based on the decisions that were made when 32bit processors where the norm.
All true. But I think the potential benefits are real, and I don't necessarily think it's a bad thing if the JS VM was specialized for JS. Standardizing the byte-codes could allow a looser coupling between browsers and JS. It could also allow for people to play with JS optimizations and augmentations without having to touch the VM internals itself.
All of this makes me think that if I was a VM researcher, I'd seriously consider going in this direction. And not being a VM researcher makes me think maybe I should be one.
However the more beneficial question is how long does it take it to convert js-code -> bytecode and how much it boosts overall performance. Seems like it is very small compared to execution times. https://lists.webkit.org/pipermail/squirrelfish-dev/2009-May...
On the contrary V8 does the direct compile yet following jvm/jit techniques. More details by Lars Bak here http://channel9.msdn.com/Shows/Going+Deep/Expert-to-Expert-E... and here http://www.youtube.com/watch?v=hWhMKalEicY V8 does some more non traditional things like snapshotting, hidden classes, etc which give incremental performance boost. If there was a bytecode standard some V8 techniques might not be applicable but it is possible to maintain the performance boost.
I would love to see bytecode getting as a standard so that people prefer to stick to whatever language they are comfortable with. For instance I like both js and python but if there was an option, I would stick to python for all my needs.
Sure, so it'd be hard to come up with a bytecode standard that worked really well for anything you might want to run on it - but you could probably come up with a reasonable bytecode / VM that would work reasonably well for most things.
For example, imagine that there was no threat of being sued by Oracle, etc. You could just use the JVM - that already has lots of things that compile to it which work reasonably well. I'm not arguing that we should use the JVM, but only that something like the JVM seems to work reasonably well.
Anyway - having a way that's OK or reasonable to run (say) Ruby in the browser is a lot better than the current situation where there is no such way (without using proprietary stuff).
It's probably less to do with a bytecode 'standard' vs. a set of standard libraries for doing things outside of the browser (file manipulations, etc). Then browsers could just not support those libraries, but JavaScript VM developers could. In this way, it would be possible to not have 'the future of JavaScript' tied down to a specific bytecode implementation. Then people could pick and choose the VM that they want to run their JS in based on what sort of optimizations they needed.
One of the largest requirements here would probably be a method of linking against/using C libraries, and also a standard for 'import/#include/etc' statements.
Maybe I'm being naive here though. Feel free to correct me.
I'm not talking about pulling C libraries into the browser. I'm talking about expanding the use of the JavaScript language outside of the browser (beyond even something like Node.js).
Unless I'm completely misunderstanding NativeClient, that's not what it's about.
Right, it's same reason we still have ARM despite the success of x86. Or why some hardware vendors bundle FPGAs rather than rely on GPUs or custom ASICs. The instruction set is absolutely vital for a given domain's performance.
One way to avoid the problem is to make the bytecodes low level enough. You don't add bytecodes for method lookup or unification. You add bytecodes on top of which these can be implemented efficiently.
The downside is that if the bytecodes are low level the chances of different languages interoperating easily are small (e.g. take x86 assembly: Python doesn't automatically interoperate with Ruby, but take MSIL, now they do more easily).
My admittedly biased view: I spent two years of my life trying to make the JVM communicate gracefully with Javascript - there were plenty of us at Netscape who thought that bytecode was a better foundation for mobile code. But Sun made it very difficult, building their complete bloated software stack from scratch. They didn't want Java to cooperate with anything else, let alone make it embeddable into another piece of software. They wrote their string handling code in an interpreted language rather than taint themselves with C! As far as I can tell, Sun viewed Netscape - Java's only significant customer at the time - as a mere vector for their Windows replacement fantasies. Anybody who actually tried to use Java would just have to suffer.
Meanwhile Brendan was doing the work of ten engineers and three customer support people, and paying attention to things that mattered to web authors, like mixing JS code into HTML, instant loading, integration with the rest of the browser, and working with other browser vendors to make JS an open standard.
So now JS is the x86 assembler of the web - not as pretty as it might be, but it gets the job done (GWT is the most hilarious case in point). It would be a classic case of worse is better except that Java only looked better from the bottom up. Meanwhile JS turned out to be pretty awesome. Good luck trying to displace it.
SWF was the other interesting bytecode contender, but I don't know much about the history there. Microsoft's x86 virtualization tech was also pretty cool but they couldn't make it stick alone.
x86 is a perfect comparative example. An architecture that is a patch on a patch on a patch (add several more layers here until you're tired) going back to the 8086 a kajillion years ago (a processor which was less sophisticated and powerful than an arduino). Intel tried to kill the architecture (replacing it with IA64) but AMD patched it yet again and the result was successful.
Nobody sane would design an architecture like x86 (or event x86-64) from the ground up today. Yet here we are.
I would just like to point out that IA64 wasn't really a significant step up. VLIW is a good idea for DSPs and GPUs and whatnot, but for the kind of dynamic, branchy code that we all know and love, IA-64 was quite probably the only somewhat modern CPU arch that was actually worse than x86.
I'm not so sure about that. Modern IA64 compilers and systems are actually pretty damned decent. Though the fact that they are still only comparable to a monstrous, teetering pile of hacks and kludges (x86) is not much of a recommendation.
Turing complete language ≈ Turing complete language. Just make something to compile your language of choice into JavaScript, or make a new one (CoffeeScript). Bytecode is just a little more dense, little faster executing, far harder to investigate language than JavaScript, which the browser can compile for added speed anyway.
I prefer my language-of-the-web to be readable, thanks. Then I can find out wtf it's doing. And where you really need speed, native client is just about your only option.
The difference is that Javascript is about the worst possible IL for a compiler to compile down to. Yeah you can recover some of the speed by spending man years on making Javascript execute somewhat fast.
As a bytecode you wouldn't use a bytecode that is targeted at handling Javascript constructs. You'd use a bytecode that, for example, supports machine integers (unlike Javascript). Something that allows you to allocate objects, rather than string indexed hash tables. Something that allows you to allocate an array of floats, rather than an array of boxed floats.
With enough effort you can put lipstick on a pig. But really, an IL that doesn't support integers? An IL where a member field access may involve looking up a string in a hash table instead of being a single machine instruction? Propose that as an IL to a compiler guy and they'll laugh at you. Javascript is an historical accident. Let's look at some alternatives if history was different:
- Scheme: this would have been a much better choice, because it supports integers and has a compact object representation instead of representing everything with hash tables. Bad point: all objects are boxed (like in JS). Good point: tail calls are handy when compiling all kinds of control structures.
- Java: As an IL to compile down to this would have been better than Javascript. Good points on top of Scheme: unboxed primitive types, static typing means fewer runtime checks. Bad point: lack of tail calls.
- ML: excellent choice: has both unboxed primitives and tail calls.
I challenge you to find a language that would have been a worse choice than Javascript. It may turn out that by working really hard on smart runtimes and compilers you can get OKish performance out of Javascript in some cases, but that doesn't mean that it's a good choice for an intermediate language. A ML compiler that just translates to LLVM IL as simply as possible without doing any optimizations itself will easily beat the highly optimized Javascript engines we have today.
Just to note though: Javascript in fact doesn't require representing objects as hash tables. For example, the V8 javascript engine represents objects as instances of classes, more or less exactly how C++ would do it (the build the class definitions automatically in the background.)
You are right on integers though, and other types of memory block in general, they are trickier to fix in a javascript engine. Surely the best IL is LLVM IL though, since it was designed so everything can compile down to it.
> Just to note though: Javascript in fact doesn't require representing objects as hash tables. For example, the V8 javascript engine represents objects as instances of classes, more or less exactly how C++ would do it (the build the class definitions automatically in the background.)
Yep, sophisticated implementations try to infer the patterns in which objects occur and then dynamically generate classes for them. However this is not without problems, because if you assign to a new slot then you need to change the object's class, it needs to gracefully degrade to when the objects are used as hash tables, etc.
LLVM IL would be excellent for speed. The problem is that it's not memory safe (i.e. it allows reading and writing to arbitrary memory locations). NativeClient at Google is trying to solve that. http://code.google.com/p/nativeclient/
"But really, an IL that doesn't support integers? An IL where a member field access may involve looking up a string in a hash table instead of being a single machine instruction? Propose that as an IL to a compiler guy and they'll laugh at you."
Yup, JS arrays look crazy when you are used to C and pointers (I was shocked when I first found out).
That's why WebGL folks had to come up with typed arrays:
"- Scheme: this would have been a much better choice, because it supports integers and has a compact object representation instead of representing everything with hash tables. Bad point: all objects are boxed (like in JS). Good point: tail calls are handy when compiling all kinds of control structures."
The boxing part is untrue. Even on 32-bit systems (that have word-aligned addressing, which all current 32-bit systems do), you can get 6 arbitrary type tags by using up the three least significant bits of a pointer (6 and not 8 because you use up two of the possible tag values for even/odd immediate integers (fixnums)).
Tail calls are overrated, maybe you're thinking about continuations? Those would have been truly terrible - you get all the downsides of concurrency with no benefits. One of the reasons arbitrary crap JS code runs as well as it does is because there's no possibility of concurrency. The browser event-loop model is a good one.
"- Java: As an IL to compile down to this would have been better than Javascript. Good points on top of Scheme: unboxed primitive types, static typing means fewer runtime checks."
Don't forget: getting sued by Oracle.
Java blows in every way compared to JavaScript. If Netscape had put Java instead of JS into Navigator the web would be a very different place today, because no one would have used it [Java].
"ML: excellent choice: has both unboxed primitives and tail calls."
You're right. Scheme implementations usually have unboxed fixnums and other types. However Scheme systems do not usually support user defined unboxed objects, like C#, C and ML do.
> Tail calls are overrated, maybe you're thinking about continuations? Those would have been truly terrible
I agree, I did mean tail calls. Tail calls may not be so good for programming with, but they are good for compiling to, because you can easily encode your control structures like while, for, until, state machines and more exotic forms of looping to them. You can also compile (parts of) code in continuation passing style without blowing the stack, which is hard to do efficiently if you don't have tail calls.
The reverse is not true, you cannot easily express tail calls on top of the usual control structures like while. This is why Scala and Clojure still don't have support for tail calls.
> Java blows in every way compared to JavaScript. If Netscape had put Java instead of JS into Navigator the web would be a very different place today, because no one would have used it [Java].
I'm not arguing that Java is better than Javascript for programming in. It's just a better target for compilers.
> In 1995? Give me a break, or a time machine.
Do you have any arguments to back up why ML would not be a better IL than JS?
It seems to me that you are arguing that these languages are not better than JS for programming in, but that is not what I claimed.
"You're right. Scheme implementations usually have unboxed fixnums and other types. However Scheme systems do not usually support user defined unboxed objects, like C#, C and ML do."
That's because Scheme has to support type safety for arbitrary code loaded at run-time. Exactly like JavaScript has to do when the web browser GETs a script. Exactly what any of the other proposed ILs would have to do as well.
"I agree, I did mean tail calls. Tail calls may not be so good for programming with, but they are good for compiling to, because you can easily encode your control structures like while, for, until, state machines and more exotic forms of looping to them. You can also compile (parts of) code in continuation passing style without blowing the stack, which is hard to do efficiently if you don't have tail calls.
The reverse is not true, you cannot easily express tail calls on top of the usual control structures like while. This is why Scala and Clojure still don't have support for tail calls."
A better argument can be made for gotos than tail calls in this case. JavaScript already has gotos in the form of labels and break.
"I'm not arguing that Java is better than Javascript for programming in. It's just a better target for compilers."
JavaScript provides run-time types and extensive reflection and introspection facilities. This makes it relatively straightforward to write something like Parenscript. OTOH trying to compile even a static subset of Common Lisp to C or Java is pretty complicated.
"Do you have any arguments to back up why ML would not be a better IL than JS?"
Your original statement was: "I challenge you to find a language that would have been a worse choice than Javascript." Given the state of ML implementations in 1995, I think it would have been a worse choice.
"It seems to me that you are arguing that these languages are not better than JS for programming in, but that is not what I claimed."
And you seriously think people would have first run out and built compilers from other languages to the "hypothetical better than JavaScript" language rather than use the latter directly? Because that's not what actually happened.
x86 was a historical accident, and yet it has become the most popular architecture for personal computers and servers in history.
There are plenty of examples of languages that are worse than Javascript for the web. Java being a perfect example. Java and Javascript co-existed on the web for quite some time (in theory they still do). Java is based on a "proper" IL. And yet here we are, talking about some hypothetical new IL to replace Javascript, despite the fact that there is no guarantee it would actually be better, as the historical example of Java has shown us.
You need to get more than just the VM "right", there's a billion other factors, Java screwed up, Javascript hit the mark (despite its many other flaws). Perhaps LLVM is the future of web applications, it's too soon to tell. What I do know is that Javascript may not be perfect but it's still fundamentally good, and powerful enough (largely through closures and prototypes) to allow for robust workarounds to its flaws (jQuery, coffeescript, etc.)
Worse is Better doesn't quite fit here. WiB is about getting a "basically right" prototype out, and then improving it with real world feedback, rather than waiting forever to have a perfect 1.0.
With Javascript, unfortunately, it didn't work out so well - a bunch of early bugs (which could have been quickly fixed) were preserved when Microsoft made a bug-compatible clone for IE, and then became part of the ECMAScript standard in 1996. This derailed efforts to improve the language - there is a lot of work on sophisticated implementations now, but the spec itself is buggy.
People invoke "Worse is Better" as retroactive justification for all kinds of engineering blunders, but it's a bit more limited in scope. Here you go: http://www.jwz.org/doc/worse-is-better.html
Actually, I think you're right - Javascript was frozen very early and Unix had a while to develop, but the same forces (competing commercial implementations) locked in minor errors in Unix as well.
I'd say Unix got more of its warts fixed than JS did, but that's only a difference of degree.
Mainly: The whole "Worse is Better" thing works better when you don't get stuck maintaining reverse compatibility with your early releases.
I'd say Unix and C have more warts, and more serious warts, than Javascript does. But we have come to accept those warts in Unix and C as "the way things are" whereas we generally perceive javascripts warts as such. Arguably the decision to use null-terminated strings in C in order to save one or two bytes of memory per string is an error of much greater than Y2K proportions that we are still paying for.
Except that VBScript is implemented as a DOM scripting language in IE!
I've had several mid-end managed switches where the web configuration UI used it and was unusable outside of IE with no sign as to why unless you read the source.
As a bytecode you wouldn't use a bytecode that is targeted at handling Javascript constructs.
So now you're asking the browsers to support two different VMs; one for JS and another one. Standardizing bytecode for JS (perhaps based on the existing Nitro or JagerMonkey bytecode) would be much less work.
The browser provides very high level constructs (e.g., the DOM), handling numbers quickly is uninteresting and would be a poor direction for optimization. While I think an appropriate VM could be something interesting, I doubt an appropriate VM in this case would look like the JVM.
One difference is in compiler optimizations. For a given language, compilers can often make inferences based on the code and the semantics of the language, allowing for very targeted optimizations. If you go from LanguageFoo -> Javascript -> bytecode, then you may be limiting the range of optimizations possible (because Javascript may not be able to infer the same things about the code, be it because of program structure, language semantics, etc.)
I don't think you understand the proposal: JavaScript would still be the language of the web. The things you don't see would just be handled differently.
In fact, it would be easier to to "find out wtf it's doing" because something that is now ad-hoc - what to do with the JS given to a browser - would be standardized. Keep in mind that the browsers already do something to your JS, either compile it to byte-code for the browsers own VM, or compile it directly to machine code. This proposal would standardize what the intermediate representation would be.
No, the idea is that you could compile whatever language you like into byte code that would then run in the browser.
Javascript might still be the main language on the web (and would probably continue to be handled in the same way it is now) but you could also use other languages.
For reasons that silentbicycle presents elsewhere in this thread, that goal is quite difficult. The benefits I mentioned above would be true even if no other languages compiled down to the standardized byte-code - we'd have the above benefits first.
Well, you would be developing with plain javascript or symbols just like when writing native code. And besides, minified javascript might be more readable than bytecode but just barely.
The difference is that Javascript is a crappy language and I resent having to write in it. In addition, it lacks types and is therefore slower and does not execute the same way on all platforms.
Then why don't you treat your JavaScript as object code, and compile a language you like better to it?
It does not lack types (it just lacks static type checking); it is not significantly slower (JavaScript JIT is very advanced, and you're not compute-bound in JavaScript). And "it does not execute the same way on all platforms" is true for pretty much all languages on all platforms, so I don't know what you're getting at.
If you'd like a better language I can suggest CoffeeScript and ghcjs.
JavaScript lacks 64-bit integers and binary data, just for starters. You can emulate these things (GWT does) but not efficiently.
Byte code isn't necessarily the solution though. It might be a good idea to add these types directly to JavaScript, so they're usable both directly and as an object language.
JavaScript, although dynamic and loosely typed, does have types: string, number, object (including array, function and date), null and undefined are all JavaScript types.
IMO, the issue isn't if you like Javascript or not. It is is Javascript a reasonable language for compilers to target.
I think the core proposal is that Javascript stays the defacto language of the web, but you build a standardized IL that all browsers target. Of course Javascript would be one of the core languages that would guide the design of the IL, because you don't want this new IL and Javascript to incompatible.
The nice thing though is after you do that, other people can write new languages that target that IL, and they're not second-class citizens. It allows there to be real language innovation on the IL, just as there is language innovation on x86.
The problem today is that JS is not a great language to target, and it relegates anything that targets it to second-class status (why are you using XYZ, why not just use Javascript?).
I think its a great idea. IMO, this would be a far more important development than HTML5 for the life of the web.
I find Javascript depressing. If it had a few more years to develop, it could have been a really great language. I understand why it wound up that way, but, sigh.
If you want to see a much better language with the same general design, look at Lua. It's made for scripting C programs rather than web pages, and it has had over a decade longer to mature.
Lua has been able to make major, reverses-compatibility breaking changes to improve its design in ways Javascript hasn't. Where Javascript has "The Good Parts" and incrementally improving implementations, Lua has been able to fix things and evolve.
Lua (without JIT) is also one of the fastest non-JIT, non-native compiled languages there is. LuaJIT is also one of the faster JIT languages. Now, I'm not sure how it compares to the popular JS JIT engines, but from what I've read, its very hard to beat LuaJIT for performance.
I've talked and written recently about why "just pick[ing] up Lua or Python" was not an option, but there are other strong reasons that was not in the cards.
Think about what mid-1990s Lua and Python were like, how much they needed to change in incompatible ways. The web browsers would never have tolerated that -- you'd get a fly-in-amber-from-1995-or-1996 version of Python or Lua, forced into a standards body such as Ecma (then ECMA), and then evolved slowly _a la_ ECMA-262 editions 2, 3, and (much later, after the big ES4 fight) edition 5.
Interoperation is hard, extant C Python and Lua runtimes were OS-dependent and as full of security holes, if not more full, than JS in early browsers, and yet these languages and others such as Perl were also destined to evolve rapidly in their virtuously-cycling open source communities, including server-side Linux and the BSDs (also games, especially in Lua's case -- Python too, but note the forking if not fly-in-amber effects: Stackless Python in Eve Online, e.g.).
JS, in contrast, after stagnation, has emerged with new, often rapidly interoperating, de-facto (getters, setters, array extras, JSON) and now de-jure (ES5) standards, the latter a detailed spec that far surpasses C and Scheme, say, in level of detail (for interop -- C and Scheme favor optimizing compiler writers and underspecify on purpose, e.g. order of evaluation).
The other languages you cite have been defined normatively over most or all of their evolving lives entirely by what their C implementations do. Code as spec, single source implementations do not cut it on the web, what with multiple competing open- and closed-source browsers.
You have to compare Lua at 1995 (version 2) to JavaScript at 1995, as its development, at least inside the browser, would be arrested the same way JavaScript's was. Lua 2 was certainly had a much better implementation than the first JavaScript, but I would not call it a better language (didn't have JavaScript's annoying quirks, but also didn't have closures, for example). 2000's Lua (5.0 and 5.1) is quite different from its earlier incarnations.
I'm mostly thinking about how many of the problems highlighted in "Javscript: The Good Parts" have been fixed in Lua, while Javascript can't be fixed. Not a matter of design and taste, but outright bugs.
That's what I try to do, which turns it into a crippled Ruby. And there's no getting around it silently doing the wrong thing if I ever forget a "var" or "===".
For what it's worth, there are tools that can help you avoid the bad parts. JSLint won't let you forget a "var" or "===" ... and if you use CoffeeScript, it's not possible to forget "var" or "===", because there aren't any.
I'm not saying that there aren't better programming languages than Javascript. I'm just saying that there is a subset of Javascript which is really expressive and elegant.
That elegant, expressive subset is basically Lua. It's been able to jettison most cruft over the years.
I'm not a web developer, but every time I read/use Javascript, it feels like a broken fork of my favorite language. Javascript could have been that good, too. I like where Eich was going with it, but the browser wars etc. meant that shipping an early version made the most business sense, and design errors (which would have shaken out) got frozen in the spec.
It's not the case that the Web dooms us forever to use JS as it was in 1995.
That is simply false on a number of JS-specific points, but more generally: the Web's deployed-browsers-with-enough-market-share-to-matter intersection semantics moves over the years. It does not stay still or grow only compatibly. Bad old forms (plugins, especially, but also things like spacer GIFs used in pre-modern table layouts, not to mention old JS versions) die off.
So, cheer up! JS can't be Lua, but it doesn't need to be. Its job is to be the best it can be according to its species and genus, its body plan. Which may be related to Lua's, but which was not and will never be the same as Lua's, because JS and Lua live in quite different ecosystems.
A concrete example: Lua has coroutines now, but JS is unlikely to get them in the same way, interoperably. Some VMs would have a hard time implementing, especially where the continuation spans a native method activation.
This is a case where Lua's single-source implementation shines, but that's just not going to happen on the web, in variously-implemented, open- and closed-source (the latter including clean-room, for fear of patents) browsers. So, we're aiming for Pythonic generators at most.
If we go further than generators, I'll be surprised. Pleased too, don't get me wrong. However I doubt it will happen, because some on the committee do not want deeper continuations than generators, since greater than one frame of depth breaks local security-hat-wearing reasoning about what is invariant in JS's run-to-completion, apparently single-threaded execution model.
Not the same on all platforms is a wholly legitimate complaint. It's frickin' annoying that browsers don't all implement the same language.
But bytecode won't solve that. IE will still refuse to implement subset-X, WebKit will still prefix their CSS with -webkit, etc. Until everything settles down and decides on one implementation, in which case I'd still prefer to be able to read what the web is sending to me.
Google is already heading there with native client. They're changing the approach to serving up LLVM bytecode on the server, which is translated to x86 or ARM by the browser prior to execution. For future apps that require performance, it should work well and with Google's weight behind the tech I think it'll be widely adopted.
There's already a version of python that runs in native client, and pretty much anything could conceivably be ported.
I really liked this comment (by khoth): "And once we have standardised bytecode, the next logical step would presumably be to improve performance by creating CPUs that can execute it directly. In 20 years we'll all be back where we started."
Talk to Cliff Click (or sift through his blog http://www.azulsystems.com/blogs/cliff) and he will quickly disabuse you of the notion that having bytecode instructions in a processor is a good idea. For optimal performance you want a generic RISC-type processor with good hardware performance for critical things like read-barriers that the bytecode might require; you really don't want to design a processor that directly executes Java bytecode.
There's a whole hell of a lot of optimization that gets done in the JIT layer besides just translating bytecode to machine code, and you want all that stuff to get done in software rather than trying to build it into your processor. Once you've done all that stuff, emitting actual assembly code isn't really the hard part, so you might as well just design a processor that you can make fast, give it a simple general-purpose instruction set, target that instruction set in your JIT, and then add a few special goodies as you need them for things that are really, really hard to do fast without specialized hardware support.
There have been several JVM (without the "V") created[1]. None of them really caught on other than the ARM Jazelle[2], which isn't really a JM, it is more of a JVM accelerator: it has support for direct execution of many of the JVM opcodes.
After investing lots of time and money into creating a JM, the companies were chagrined to find a general purpose processor with a good JVM (especially with JIT) could run circles around a direct-execution processor.
And that makes an excellent case for why JavaScript should simply be the bytecode of the future. x86 is gaudy and unsuited for many tasks--but it's where most of the speed innovation happens, thus it's the best platform for compiling other languages too.
I'll propose an alternative, fix JavaScript by adding APIs like ByteArrays and shorts and a proper int to the language. Over time, JS could become an excellent IL. We can standardize on intermediate bytecode, but like all things in web adoption, it will probably be the path of least resistance that works. (Who would ever give HTML graphic, multimedia, and threading abilities?)
They've been slowly withering away for the past couple of years (at least judging by the number of cars in their parking lot - I work in the building they are in). They just announced a cloud-based product based on their systems, so maybe there's still some life there.
In that sense, Intel is just making CPUs on top of x86 bytecode that runs natively. I wonder if a little extra firmware can eliminate the need for an OS altogether.
Am I missing something, or did we all have a browser bytecode standard that everyone hopped on board with, and it turned out nobody really liked feeding browsers compiled programs after all?
It was never integrated into the DOM, though, was it? I am only aware of it being used in a separate, sandboxed area on the page, unable to interact with anything else except in a very limited way.
You could integrate Java applets into the DOM. Now you have two problems. (Keep in mind that back when this was popular IE 4 was busy kicking the pants off of Netscape Navigator. Fancy programming against IE 4?)
From the first hit: "Java applets may need to perform Java to JavaScript communication to access the Document Object Model (DOM)..."
So it seems that it is hardly an alternative.
> ...or is every link on the first SERP wrong in some way?
What I don't expect to find quickly is exactly how long this has been possible in which browsers since when. Was it possible back in the days of IE4 and Netscape 4 or 4.5 or whatever it was?
The performance characteristics of the JVM didn't match the web IMHO. Javascript wasn't fast in a raw sense, but it didn't trade startup time for running time, didn't trade memory for performance, etc. The basis of Javascript is one that made sense for the web, for pages, for transient code. Since then they've done stuff to improve it, to make those issues less of a tradeoff, but when they started and had to make those tradeoffs Javascript made them correctly. The JVM didn't make them correctly for a web page. But I'm not sure the JVM was every really meant for that purpose.
In general though I think this demonstrates the environment and language mismatch problem. Retrofitting existing languages onto the web won't be very satisfying. There might be room for new languages (CoffeeScript is cool, for instance), and bytecodes could make that process more elegant, but ultimately those languages need to be different takes on Javascript or else it will feel ugly and awkward.
It was the core, defining feature of the most used language in the business world... and everyone with any sense is trying to forget it every happened.
Java Applets never were a first class citizens in the browser. Same as Flash.
I think it's a really good idea to have some LLVM-type of bytecode standard.
I'm on Ecma TC39 and in touch with JS hackers working for all the major browser vendors. FWIW, as far as I can foresee there will be no standard JS bytecode. Commenters here touch on good reasons:
* divergent, evolving VMs can't agree (not all NIH, see third bullet);
* IPR concerns;
* lowering from source over-constrains future implementation, optimization, and language evolution paths;
* view-source still matters (less, but still).
A binary encoding of JS ASTs, maybe (arithmetic coders can achieve good compression; see Ben Livshits' JSZap from MSR, and earlier work by Michael Franz and Christian Stork at UCI). But this too is just a gleam in my eye, not on TC39's radar.
Meanwhile, we are making JS a lot better as a target language, with things like ES5 strict mode, the ES-Harmony module system and lexical scope all the way up (no global object in scope; built on ES5 strict), and WebGL typed arrays (shipping in Firefox 4).
We have a Mozilla hacker, Alon Zakai, building an LLVM-based C++-to-JS compiler. Others are doing such things (along with good old SNES emulators and the like).
So being a good mid-to-high-level, memory-safe compiler target language is on TC39's radar. Not to the exclusion of other JS desiderata, and never in a way that compromises mass-market usability or buy-by-the-yard rapid-protoyping "scriptability". But one among several goals.
In my experience NaCl is not a viable platform at this time.
My friend and I spent a couple weeks just trying to get the developer tools to work. We tried on Ubuntu, OS X, Windows, and Arch Linux. We weren't able to get started on a single platform. In the process we came across bugs that were several months old that haven't been fixed, even on their supported platforms.
Another thing to worry about is whether NaCl will take off. We can't know how committed Google is and how many users will install the plugin. It was supposed to be enabled by default in Chrome 6 (I believe) but it still isn't enabled in 7.
NaCl is still promising (especially Portable Native Client) but Google has to get it working first.
I think the change to PNaCl (LLVM) is largely responsible for this - after that decision was made and without much in the way of existing apps, there really isn't much incentive for them to invest more effort in the x86 specific version of NaCl. When PNaCl is ready and bundled with Chrome and Android, it's only going to take one killer app to really establish it.
I was at David Sehr's presentation at the LLVM conference. There are something like a dozen Googlers working on getting PNaCL working. They are able to get pretty close to the performance of native code, which is pretty enticing. There's still plenty of work left to do though.
One of the problems that needs to be fixed is that LLVM bitcode is kind of bloated, something like 6x as compared with native code. Another issue is that bitcode is not platform independent. Things like pointer sizes and exception handling are dependent on the target you are compiling to. Thus, they have to define a bunch of specifications for PNaCL programs so that they'll run everywhere.
It will definitely be interesting to see if anyone uses it. Like you said, one killer app will do the trick. I'd like to see Google Earth reimplemented for PNaCL, or any Google project for that matter. That would give more confidence that this is a project that is going to be taken seriously within Google.
LLVM is really how NaCl should be done anyway. The idea of compiling your Web app for a physical hardware architecture is completely backwards, particularly now that there are two families in wide use among consumers.
There was a session about it, this time in the Google Developer Day at Munich. Interesting stuff. Right now they're stressing on C/C++, but probably C# is up next.
Discussions of technical feasibility aside, getting a sufficiently large installed base to make this interesting is (as usual) the "fun" part; your plan for getting to critical mass against an available and "good enough" solution.
As for previous bytecode approaches to consider, there are the Lisp Machines, or the Burroughs B5000 Algol box, or the EFI byte code (EBC) interpreter and the EBC I/O drivers, or UCSD pCode, or all the gonzo things you could do with the VAX User Writable Control Store, the JITs underneath Java and Lua and other languages, or...
And HTTP is printable, which means you're working within character-encoding constraints or with escapements.
There are the not-inconsequential security requirements.
And then there's the question of how a provider might make money with this, if you're not undertaking this effort for free.
It's an interesting theory, but ultimately irrelevant. The thing is that there are SO MANY people working right now to make JavaScript incredibly fast. It's not hard to imagine a future where JavaScript is the fastest reasonable way to write software -- simply because it's the language that has the most R&D going for it.
Did I say future? Oh, hello NodeJS.
I don't envision that we'll be writing JavaScript itself forever -- but rather a super-syntax on top of it that compiles down to JS. CoffeeScript is the first generation of this kind of programming language.
I do, however, believe that for the foreseeable future, JavaScript will become the lingua franca of day-to-day programming.
I'd take it even further and implement the various web standards on this VM. That way when HTML6 or CSS4 comes out you don't need to wait 4 years for everyone to upgrade their browser, you just download it automatically the first time it's used.
This is sort of taking Cappuccino/Objective-J's principle of "shipping the runtime" to the extreme.
It would improve script parsing speed but perhaps hinder the evolution of internals of the interpreters. Of course engines could always retranslate this standard bytecode in another internal representation and jits.
I might be missing something here, but this implicates a few things:
You won't be writing code in the browser anymore. Otherwise you'd have fragmentation where the user doesn't have the proper interpreter installed, and a never ending stream of "language downloads" which might be cool to devs but horribly impractical for end users.
Because we're not writing in the browser anymore, we'd be back to offline compilation and static code generation. I suppose it'd be possible to JIT the bytecode (assuming that current Javascript optimizers work on the IR; I don't know enough to qualify this any further). From my casual browsing of LtU, it seems that tracing JITs in particular may offer optimizations dificult to achieve with static bytecode compilation, particularly in identifying hot spots. But the more I think of it, that's a non-issue because you're still compiling to bytecode and JITing from there.
This feels very Java-ish to me. If you really want to script the browser in another language, compiling to Javascript (a'la GWT) would essentially do the same thing.
> You won't be writing code in the browser anymore
You could be. Impement a URL like http://mydomain/scripts/MyScript.bytecode. This is a page which compiles, say MyScript.rb into bytecode and delivers it back to the client. You get a new copy by saving a file on the server and hitting refresh, and there's only one interpreter to download.
That just moves the bytecode compilation to the server -- so you're back to my original point of generating the code statically. In fact it'd probably be worse, since there is no reason to compile MyScript.rb to bytecode more than once. It's like generating a dynamic page with Rails when all you really need is static, cacheable HTML.
I'd be really interested in seeing this develop. Define a bytecode interpreter (JSVM?) and push bytecode into it. Performance may be god-awful, but you'd be free to use whatever language you fancy.
Why, after decades of JIT bytecode VMs, do we still have this misconception? Many of the widespread JVM implementations are faster than V8. LuaJIT and C# Mono are also VM implementations that JIT bytecode and are faster than V8! Is this a troll?
As people have commented, I'm suggesting writing a javascript function which takes pre-compiled bytecode; like so;
function interpret(bytecode) {
// stack machine implementation goes here
}
And called like so;
var bytecode = load("http://my.domain.com/myscript.bytecode");
interpret(bytecode);
I'm not suggesting that bytecode is inherently slow -- just that I could take a great stab at writing a slow bytecode interpreter in JavaScript. ;)
So under this scheme, you could do a server-side compilation of any language -- let's imagine Pascal as an example -- and deliver it back to the client as bytecode. Now you've broken the browser dependence on Javascript. At the cost of a VM written in JavaScript.
I'm suggesting writing a javascript function which takes pre-compiled bytecode
Sorry, I misunderstood.
Now you've broken the browser dependence on Javascript. At the cost of a VM written in JavaScript.
I keep looking at these two sentences again. On one hand I know what you mean. On the other hand, this doesn't make any sense because it contradicts itself. (Which is why I misunderstood the original comment, I think.)
The post you replied to seemed to be suggesting a VM written in Javascript. I don't think it's particularly controversial to speculate that the performance of that would be pretty awful :)
Maybe browser makers could expose a few lower level constructs in JS to make efforts like CoffeeScript, Objective-J and GWT easier?
That could be a short term solution. Long term we have to hope that LLVM and NaCl pick up steam.
But many languages for the same tasks will create fragmentation. Right now there is a big pool of developers proficient in JS. They even use the same libraries! The libraries are where a lot of productivity gains come from. The web became an "easy" platform to develop for. With fragmentation that could be lost.
CoffeeScript is interoperable with JS, easy to pick up because it's mainly a cleanup, but Objective-J and GWT are too far out. GWT needs a reworked JQuery, called GQuery.
So there are these economic considerations that are probably more important than language preference.
Not sure what you mean by Objective-J being too far out. Objective-J is much closer and "interoperable" with JS than CoffeeScript because it is a strict superset of JavaScript (in other words, all JavaScript is also Objective-J). Practically what this means is that any existing JS works alongside Objective-J with no changes whatsoever, the syntaxes do not "collide". The difference between ObjJ and JS are comparable to ECMAScript 2 and 5 (additions of new features and keywords)
Not to be combative, but I think what he means is this: Although it's perfectly easy to use JavaScript from Objective-J, the same is not true in reverse. ObjJ generates code that looks like this:
... where objj_msgSend is how the Objective-J "interpreter" does it's magic. If I wanted to use a library that was originally written in Objective-J, from JavaScript, I'd have a hell of a hard time calling it correctly, unless the library author took special care to make it JavaScript-compatible in the first place. The syntaxes may not collide, but the semantics certainly don't match up.
On the other hand, you'd never know that you were calling a CoffeeScript library from JS, unless you inspected the source, and vice-versa. In that sense, they're interoperable.
No not combative at all, what you say makes perfect sense. I suppose the confusion arose because he bundled objj with gwt, saying you'd need a totally different jquery, which is certainly not the case with objj.
In fact, why not take it a step further and make a stand-alone client that only executes the byte code? It would be different from Java for two reasons:
(a) The bytecode format could be much simpler if it weren't tied to a particular language (no type system, no objects, maybe not even a garbage collector). If the bytecode was similar to a real CPU architecture, it would be possible to target it from LLVM.
(b) There would be no humongous standard library to install, because it could just be downloaded on demand. With almost all of the library living on the server side, application authors could avoid a lot of the usual compatibility nightmares with different client implementations each one with its own bugs and workarounds.
Just keep on saying it! I've also been coming to the same conclusion and saying it, though just like you ranting about other stuff brought negative reactions:
I remembered that post when I saw this headline. I think from a philosophical viewpoint it's a good idea - though I've no idea what the implementation difficulties might be (could google have created V8 if we had a bytecode system?). But from a practical stance, what's the benefit? What would it achieve?
>But from a practical stance, what's the benefit? What would it achieve?
The reasons I outlined in my post were things like having a single form validation codebase, deployed both server side and client side. More ambitiously you could have the same codebase used for the server-side online portions of something like gmail and also retarget that to use for offline gmail. Basically the big divide between your server-side and client-side codebases would start to disappear.
Seems like Google's native client system is a great step in this direction. Soon enough it may be possible to target say Lua to LLVM bytecode and deploy that to a browser in a bundle of javascript. If the browser supports NaCi you execute the LLVM bytecode with the built-in VM, if not you have a fallback bytecode interpreter in Javascript. That way you get full coverage of browsers with only degradation of performance when NaCi isn't available. I'm not sure however how fast you could make the Javascript fallback. If it is very slow it may not be workable.
imho an intermediate bytecode makes no sense, as silentbicycle explained.
3 alternatives:
1) make every browser vendor implement multiple runtimes
2) x-to-js translators
3) interpreters in implemented in javascript
ad 1) multiple runtimes: not a good idea, because of multiple reasons.
first, versioning hell. javascript is ~15 years old, and browser vendors are still not able to provide 100% compatibility. you'd have versioning hell, only worse. second, it would hurt javascript performance, because browser vendors would have to split ressources. third: which languages should be supported? you just couldn't please everyone, so there would be ongoing "why is language x supported but not y?" problems.
ad 2) translators: are in use now. see coffeescript, gwt and ghcjs, ...
pros:
- almost native javascript speed
- already possible
cons:
- small translation overhead (depending on if it's JITted or precompiled)
- not everything is possible. if the javascript runtime doesn't support tail call optimization, the translation won't have it either. certain magic just doesn't translate.
ad 3) interpreters implemented in js
afaik there are some, e.g. js brainfuck interpreters.
pros:
- the sky's the limit
- already possible
cons:
- slow (adds an emulation layer).
that said, i'm against additional browser-provided runtimes except NACL. you could use javascript for your day to day work and NACL for your special needs. i'm not sure if NACL would be a complete stand-in for JS thought - would it be possible to communicate with the DOM (or at least JS)? if yes, there you have it - problem solved.
JS is just not a good language as a compiler target. If you hade an Bytcode with all the right low level types and features it would be easier to optimise for the VM implementers (mozilla, google ...), you could modify JS without changing the VMs, languages would not be blocked by the speed of JS, faster languages would be possible. It would be easier to make languages that are very diffrent from JS work in the browser.
Your solution work but a haskell interpreter in js will never have good performence but if you could make a GHC backend that comipiles to "web bytecode".
I think a standardized bytecode for Javascript could make sense, but designing a VM for JS and then trying to port (say) Ruby or ML to it later would be awkward, and I don't think most web devs are aware of that.
Compiling to Javascript is a practical choice now. I'm not a fan of the language, but that avoids a lot of its pain points.
I also wonder why these discussions seem to assume we're going to stay with the web stack as-is forever, but that's neither here nor there.
Also you can run VBScript. I have seen code that avoid javascript confirm() and tries to check first if it can use VBScripts msgbox function, just so it can provide 'yes'/'no' buttons. Aka:
function confirmVB (text)
confirmVB = msgbox ( Text , VBYesNo )
end function
No, it wouldn't help at all: let me point to the overwhelming success of standards (XML, HTML, Javascript, CSS, Postscript) that are character-based.
Think of Javascript as a (human-readable) virtual machine layer itself, below which the implementation is free to do as it pleases as long as it meets the JS standard semantics.
I took a whack at this 9 years ago. My code's at http://wry.me/~darius/software/idel and there were other, probably worthier, attempts around the same time and before, like Michael Franz's work with Oberon. Basically: make a low-level VM more or less like LLVM without the machine-dependent semantics and with a compact, easy-to-verify wire format. Apparently PNaCl is working on fixing those infelicities now, or at least the machine dependence.
(I had some fun but decided the obstacles to addoption were too great and we'd end up with x86 in the browser someday. So the NaCl announcement years later amused and gratified me.)
Having a standard bytecode would make it much harder to steal code which is much needed as web and mobile applications shift towards using lots of JavaScript.
Currently it's way too easy to steal everything since the whole source is exposed. You can obfuscate JavaScript and CSS, but the main semantics will still be there and someone that's interested will still steal your code (this has happend twice to us, even if all of our JS is obfuscated using Google Closure...) The same thing could happen with bytecode, but their takeout would not be maintainable. Obfuscated JavaScript is still maintainable, since the structure and semantics are largely there.
I'd counter that that ability to easily copy what the browser sees and dissect it is what leads to rapid growth and innovation on the web front.... obfuscating everything will act counter to that, and leave the web not nearly as nice a place as we'd like it to be.
The point isn't that you can't decompile bytecode-languages, but that the decompiled bytecode is a lot harder to maintain than obfuscated JavaScript...
With obfuscated JavaScript you get all the sematics and for the most time all the global functions and data structures names since they aren't obfuscated away. This is a lot more structure and information than you get in your average bytecode code (especially since bytecode tends to be much more low level than something like obfuscated JavaScript).
Have you ever looked at e.g. .NET assemblies in Reflector? It will decompile them into C# code that usually isn't even all that strange or hard to follow. There's plenty of structure and information there.
and Javascript. It is very very fast with Javascript.
You can do what is discussed in this thread today by detecting Silverlight in the client and if it exists use it instead of Javascript source. DOM access, manipulation and everything else is just so much faster.
The Silverlight SDK is free, you can get it on Linux and OS X (Moonlight) and there are a lot of Silverlight VM's out there
I'd also be interested in knowing. AFAIK, things like coffeescript compile to readable, plain JS. Can a relatively efficient system be made which interprets something more 'bytecodey' than readable JS;
This is what I've been thinking about too. If javascript like what you wrote above were much faster than nicely structured, readable javascript, then perhaps the idea of "Javascript is the bytecode" may not be such a bad one.
<br />
I'm just not sure that it would really provide the performance improvement we would hope for.
It would almost certainly be a good deal slower. Or at least, it would be if I wrote it.
I know cappucino (http://cappuccino.org/) works with a JavaScript language interepreter, so there are mature products out there heading along these lines.
It'll probably be slower than smartly written, natively compiled code, but not necessarily by enough to matter. It really depends on what you're trying to do, so the best way to know is to try it and measure.
Compiling to a human-readable format can be pretty straightforward. You can skip some steps entirely (e.g. register selection), and modern tools make it easy to ignore others (lexing & parsing) for quite a while. Don't think about it as "compiling", if you think that's a scary hard thing. Think of it as "reading in a couple kinds of simple structures and converting them to text". Like a templating engine.
Figure out what kind of operations you want to support, how to map them into small chunks of Javascript, and go for it. You can pass hand-written JSON, XML, sexps, dicts, etc. to the compiler in a REPL and worry about parsing and other stuff later.
Why use a virtual machine when you have a real machine? It takes some cleverness to do safely, but the AI lab are a lot clever, and they've found a way:
"Vx32 is a user-mode library that can be linked into arbitrary applications that wish to create secure, isolated execution environments in which to run untrusted extensions or plug-ins implemented as native x86 code."
JavaScript is so moldable, you could easily make a compiler that translates to JavaScript. You could even write a language and translate it in the browser, see CoffeeScript.
(I didn’t downvote you but) you answered the wrong parsing of OP’s original question: It wasn’t “why can’t we use JS elsewhere?” but rather “Why can’t we use other languages in the browser,” a semi-tired plea from those who haven’t learned to love JavaScript and/or don’t realize that due to the countless number of browsers out there, JavaScript is going to be the only option for DOM scripting for years, at minimum.
I know web programmers generally aren't big on assembly or writing virtual machines, but an instruction set (group of bytecodes) design and implementation predisposes a processor/VM to certain operations. A VM for an OO language is going to have bytecodes (and other infrastructure) for doing fast method lookup, because it will really hurt performance otherwise. A functional or logic language VM will probably have tail-call optimization, perhaps opcodes specific to pattern matching / unification, and a different style of garbage collector.
Compare the JVM to the Lua or Erlang virtual machines; think about the issues people run into when trying to port languages that aren't like Java to the JVM. Unless people are very deliberate about making a really general instruction set, a "bytecode standard" informed by Javascript could be similarly awkward for languages that aren't just incremental improvements on Javascript. Besides, you can't optimize for everything.
There are a LOT of details I'm glossing over (e.g. sandboxing/security concerns, RISC vs. CISC, the DOM), but I've been meaning to point this out since I read someone saying, "Why do people keep writing more VMs? Why don't we just use the JVM for everything and move on?" It's not that easy.