
Towards a JavaScript Binary AST - Yoric
https://yoric.github.io/post/binary-ast-newsletter-1/
======
nfriedly
To clarify how this is not related to WebAssembly, this is for _code written
in JavaScript_ , while WASM is for code written in other languages.

It's a fairly simple optimization - it's still JavaScript, just compressed and
somewhat pre-parsed.

WASM doesn't currently have built-in garbage collection, so to use it to
compress/speed up/whatever JavaScript, you would have to compile an entire
JavaScript Virtual Machine into WASM, which is almost certainly going to be
slower than just running regular JavaScript in the browser's built-in JS
engine.

(This is true for the time being, anyway. WASM should eventually support GC at
which point it _might_ make sense to compile JS to WASM _in some cases._ )

~~~
pier25
But couldn't other languages also compile to this binary AST too?

~~~
Akkuma
Sure, why not? The difference is that WASM is aiming for near native
performance of code, while the binary JS would still be limited to JS
performance.

~~~
pier25
Yes, but it would have access to DOM, GC, etc.

~~~
Akkuma
But that's what languages that compile to JavaScript support already today.
The binary JS shouldn't prevent this behaviour if it is just a binary
representation of a text version.

------
cabaalis
So, compiled Javascript then? "We meet again, at last. The circle is now
complete."

The more I see interpreted languages being compiled for speed purposes, and
compiled languages being interpreted for ease-of-use purposes, desktop
applications becoming subscription web applications (remember mainframe
programs? ), and then web applications becoming desktop applications
(electron) the more I realize that computing is closer to clothing fads than
anything else. Can't wait to pickup some bellbottoms at my local target.

~~~
Yoric
Well, it's more "compressed JavaScript" than "compiled JavaScript".

~~~
MadcapJake
How can you say it's "compressed" over "compiled" when you are actually
parsing it into an AST and then (iiuc) converting that to binary? That's
exactly what compilers do. You are in fact going to a new source format
(whatever syntax/semantics your binary AST is encoded with) so you really are
compiling.

To be fair, these two concepts are similar and I may be totally
misunderstanding what this project is about. In the spirit of fairness, let me
test my understanding. You are saying wasm bytecode is one step too early and
a true "machine code" format would be better able to improve performance
(especially startup time). I'm not following wasm development, but from
comments here I am gathering that wasm is too level and you want something
that works on V8. Is that what this project is about?

On a side note, it's truly a testament to human nature that the minute we get
close to standardizing on something (wasm), someone's gotta step up with
another approach.

~~~
Yoric
> How can you say it's "compressed" over "compiled" when you are actually
> parsing it into an AST and then (iiuc) converting that to binary? That's
> exactly what compilers do. You are in fact going to a new source format
> (whatever syntax/semantics your binary AST is encoded with) so you really
> are compiling.

I am not sure but there may be a misunderstanding on the word "binary". While
the word "binary" is often used to mean "native", this is not the case here.
Here, "binary" simply means "not text", just as for instance images or zipped
files are binary.

A compiler typically goes from a high-level language to a lower-level
language, losing data. I prefer calling this a compression mechanism, insofar
as you can decompress without loss (well, minus layout and possibly comments).
Think of it as the PNG of JS: yes, you need to read the source code/image
before you can compress it, but the output is still the same source
code/image, just in a different format.

> You are saying wasm bytecode is one step too early and a true "machine code"
> format would be better able to improve performance (especially startup
> time). I'm not following wasm development, but from comments here I am
> gathering that wasm is too level and you want something that works on V8. Is
> that what this project is about?

No native code involved in this proposal. Wasm is about native code. JS BinAST
is about compressing your everyday JS code. As someone pointed out in a
comment, this could happen transparently, as a module of your HTTP server.

> On a side note, it's truly a testament to human nature that the minute we
> get close to standardizing on something (wasm), someone's gotta step up with
> another approach.

Well, we're trying to solve a different problem :)

~~~
MadcapJake
> Here, "binary" simply means "not text", just as for instance images or
> zipped files are binary.

If it's not text, then what is it? I'm not sure "not text" is a good
definition of the word "binary".

> A compiler typically goes from a high-level language to a lower-level
> language, losing data.

I don't agree, I don't think there is any loss in data, the compiled-to
representation should cover everything you wanted to do (I suppose not
counting tree-shaking or comment removal).

> I prefer calling this a compression mechanism, insofar as you can decompress
> without loss (well, minus layout and possibly comments).

Ahh, so you mean without losing the original textual representation of the
source file.

> Wasm is about native code

Here you are making claims about their project that are just not the whole
picture. Here's the one-line vision from their homepage[1]:

> WebAssembly or wasm is a new portable, size- and load-time-efficient format
> suitable for compilation to the web.

With that description in mind, how do you see BinAST as different?

> Well, we're trying to solve a different problem :)

I think you might be misunderstanding what wasm is intended for. Here's a
blurb from the wasm docs that is pertinent:

> A JavaScript API is provided which allows JavaScript to compile WebAssembly
> modules, perform limited reflection on compiled modules, store and retrieve
> compiled modules from offline storage, instantiate compiled modules with
> JavaScript imports, call the exported functions of instantiated modules,
> alias the exported memory of instantiated modules, etc.

The main difference I can gather is that you are intending BinAST to allow
better reflection on compiled modules than wasm intends to support.

Here's another excerpt from their docs (and others have mentioned this
elsewhere):

> Once GC is supported, WebAssembly code would be able to reference and access
> JavaScript, DOM, and general WebIDL-defined objects.

[1]: [http://webassembly.org/](http://webassembly.org/)

[META: Wow, I thought downvotes were for negative or offtopic comments]

~~~
MadcapJake
Ok I am understanding the distinction now.

I ran a google search for "js to wasm" and found a ticket on the webassembly
github that explained it all:
[https://github.com/WebAssembly/design/issues/219](https://github.com/WebAssembly/design/issues/219)

~~~
Yoric
Thanks for the link, it will certainly prove useful in the future.

------
apaprocki
From an alternate "not the web" viewpoint, I am interested in this because we
have a desktop application that bootstraps a lot of JS for each view inside
the application. There is a non-insignificant chunk of this time spent in
parsing and the existing methods that engines expose (V8 in this case) for
snapshotting / caching are not ideal. Given the initial reported gains, this
could significantly ratchet down the parsing portion of perceived load time
and provide a nice boost for such desktop apps. When presented at TC39, many
wanted to see a bit more robust / scientific benchmarks to show that the gains
were really there.

~~~
kannanvijayan
Yoric is working on a "proper" implementation and I'll be assisting in the
design work, and if needs be some implementation. I think the next milestone
here is a working demo in Firefox with Facebook page load.

At a personal level, I feel pretty confident that BinaryJS will help your
case. The theory behind the gains was pretty solid before we designed the
prototype. The prototype, for me, basically proved the theory.

My personal hope is that by the time we're done squeezing all we can out of
this - which includes zero-costing lazy function parsing, and on-line bytecode
generation, we can cut the "prepare for execution" time by 80%.

------
le-mark
Here's some perspective for where this project is coming from:

> So, a joint team from Mozilla and Facebook decided to get started working on
> a novel mechanism that we believe can dramatically improve the speed at
> which an application can start executing its JavaScript: the Binary AST.

I really like the organization of the present article, the author really
answered all the questions I had, in an orderly manner. I'll use this format
as a template for my own writing. Thanks!

Personally, I don't see the appeal for such a thing, and seems unlikely all
browsers would implement it. It will be interesting to see how it works out.

~~~
lj3
All they have to do is convince Google. Chrome's the king of the castle right
now.

~~~
markdog12
Native Client

~~~
lj3
Firefox had a much higher market share than Chrome back then. Context matters.

~~~
markdog12
Mozilla and others had pretty strong technical objections to Native Client.
Context matters (sometimes :))

~~~
lj3
Oh, I think I misunderstood you. You mean the AST is going to go the way of
Native Client because Mozilla doesn't have the muscle it used to? Or do you
think Google's going to sandbag it as revenge for Native Client? :)

~~~
markdog12
Sorry, to clarify:

> All they have to do is convince Google. Chrome's the king of the castle
> right now.

When I replied "Native Client", I was giving an example where Chrome came out
with something the other browser vendors rejected on technical grounds.

I love this development (Binary AST) and hope it's adopted.

------
mannschott
This is reminiscent of the technique used by some versions of ETH Oberon to
generate native code on module loading from a compressed encoding of the parse
tree. Michael Franz described the technique as "Semantic-Dictionary Encoding":

«SDE is a dense representation. It encodes syntactically correct source
program by a succession of indices into a semantic dictionary, which in turn
contains the information necessary for generating native code. The dictionary
itself is not part of the SDE representation, but is constructed dynamically
during the translation of a source program to SDE form, and reconstructed
before (or during) the decoding process. This method bears some resemblance to
commonly used data compression schemes.»

See also "Code-Generation On-the-Fly: A Key to Portable Software"
[https://pdfs.semanticscholar.org/6acf/85e7e8eab7c9089ca1ff24...](https://pdfs.semanticscholar.org/6acf/85e7e8eab7c9089ca1ff24531c341168f93c.pdf)

This same technique also was used by JUICE, a short-lived browser plugin for
running software written in Oberon in a browser. It was presented as an
alternative to Java byte code that was both more compact and easier to
generate reasonable native code for.

[https://github.com/Spirit-of-
Oberon/Juice/blob/master/JUICE....](https://github.com/Spirit-of-
Oberon/Juice/blob/master/JUICE.txt)

I seem to recall that the particular implementation was quite tied to the
intermediate representation of the OP2 family of Oberon compilers making
backward compatibility in the face of changes to the compiler challenging and
I recall a conversation with someone hacking on Oberon that indicated that
he'd chosen to address (trans)portable code by the simple expedient of just
compressing the source and shipping that across the wire as the Oberon
compiler was very fast even when just compiling from source.

I'm guessing the hard parts are: (0) Support in enough browsers to make it
worth using this format. (1) Coming up with a binary format that's actually
significantly faster to parse than plain text. (SDE managed this.) (2)
Designing the format to not be brittle in the face of change.

~~~
Yoric
Very interesting, thanks.

Indeed, one of the challenges was designing a format that will nicely support
changes to the language. I believe that we have mostly succeeded there, and
I'll blog about it once I find a little time. Now, of course, the next
challenge is making sure that the file is still small enough even though we
are using this future-proof format. We haven't measured this yet, but I expect
that this will need additional work.

------
onion2k
This is a really interesting project from a browser technology point of view.
It makes me wonder how much code you'd need to be deploying to for this to be
useful in a production environment. Admittedly I don't make particularly big
applications but I've yet to see parsing the JS code as a problem, even when
there's 20MB of libraries included.

~~~
Yoric
We're working towards experimenting this. For the moment, benchmarks suggest
that this will be noticeable after 1 or 2Mb of code.

We'll only know for sure once we can start testing in real conditions, though.

~~~
Yoric
I should have mentioned "1 or 2Mb of code on an average-ish computer".

------
nine_k
This is what BASIC interpreters on 8-bit systems did from the very beginning.
Some BASIC interpreters did not even allow you to type the keywords. Storing a
trivially serialized binary form of the source code is a painfully obvious way
to reduce RAM usage and improve execution speed. You can also trivially
produce the human-readable source back.

It's of course not compilation (though parsing is the first thing a compiler
would do, too). It's not generation of machine code, or VM bytecode. it's mere
compression.

This is great news because you got to see the source if you want, likely
nicely formatted. You can also get rid of the minifiers, and thus likely see
reasonable variable names in the debugger.

~~~
thanatropism
Ha, great memories of the ZX Spectrum programming-keyword keyboard!

------
ryanong
This is some amazing progress, but reading this and hearing how difficult
JavaScript is as a language to design around makes me wonder how many hours
have we spent optimizing a language designed in 2 weeks and living with those
consequences. I wish we could version our JavaScript within a tag somehow so
we could slowly deprecate code. I guess that would mean though browsers would
have to support two languages that would suck..... this really is
unfortunately the path of least resistance.

(I understand I could use elm, cjs, emscriptem or any other transpirer but I
was thinking of ours spent around improving the js vm.

~~~
exikyut
> _I wish we could version our JavaScript within a tag somehow_

Behold one of the implementational details that emerged out of this language
that was indeed "designed in 2 weeks":

    
    
      <script type="text/javascript1.0">...</script>
    
      <script type="text/javascript1.1">...</script>
    
      <script type="text/javascript1.2">...</script>
    
      <script type="text/javascript1.3">...</script>
    
      <script type="text/javascript1.4">...</script>
    
      <script type="text/javascript1.5">...</script>
    

[https://tools.ietf.org/html/rfc4329](https://tools.ietf.org/html/rfc4329):

    
    
      3.  Deployed Scripting Media Types and Compatibility
    
        Various unregistered media types have been used in an ad-hoc fashion
        to label and exchange programs written in ECMAScript and JavaScript.
        These include:
    
          +-----------------------------------------------------+
          | text/javascript          | text/ecmascript          |
          | text/javascript1.0       | text/javascript1.1       |
          | text/javascript1.2       | text/javascript1.3       |
          | text/javascript1.4       | text/javascript1.5       |
          | text/jscript             | text/livescript          |
          | text/x-javascript        | text/x-ecmascript        |
          | application/x-javascript | application/x-ecmascript |
          | application/javascript   | application/ecmascript   |
          +-----------------------------------------------------+
    

[https://www.quirksmode.org/js/intro.html](https://www.quirksmode.org/js/intro.html)
(from circa 2007, when it was trendy to NOT date webpages _shakes fist_ ):

> _JavaScript versions_

> _There have been several formal versions of JavaScript._

> _1.0: Netscape 2_

> _1.1: Netscape 3 and Explorer 3 (the latter has bad JavaScript support,
> regardless of its version)_

> _1.2: Early Version 4 browsers_

> _1.3: Later Version 4 browsers and Version 5 browsers_

> _1.4: Not used in browsers, only on Netscape servers_

> _1.5: Current version._

> _2.0: Currently under development by Brendan Eich and others._

The link above points to a bit of interesting general discussion about
versioning, and isn't as dense as the RFC.

~~~
bryanlarsen
Another mechanism created to do something similar to versioning was strict
mode.

~~~
AgentME
And along these lines: javascript modules automatically turn strict mode on,
so most people will just be on the "new" (strict) version of javascript once
modules are popular.

------
iainmerrick
This article says "Wouldn’t it be nice if we could _just_ make the parser
faster? Unfortunately, while JS parsers have improved considerably, we are
long past the point of diminishing returns."

I'm gobsmacked that parsing is such a major part of the JS startup time,
compared to compiling and optimizing the code. Parsing isn't slow! Or at least
it shouldn't be. How many MBs of Javascript is Facebook shipping?

Does anyone have a link to some measurements? Time spent parsing versus
compilation?

~~~
sltkr
I think the main issue with parsing is that you probably need to parse all
JavaScript before you can start executing any of it. That might lead to a high
delay before you can start running scripts.

Compiling and optimizing code can be slow, too, but JIT compilers don't
optimize all code that's on a page. At least at first, the code gets
interpreted, and only hot code paths are JIT compiled, probably in a
background threads. That means that compiling/optimizing doesn't really add to
the page load latency.

But I agree with you that this is a strange suggestion. If parsing is so slow,
maybe browsers should be caching the parsed representation of javascript
sources to speed up page loading, or even better: the bytecode/JIT-generated
code.

~~~
hashseed
Chrome already caches compile result for previously visited pages to bypass
the initial parsing/compiling.

~~~
Yoric
So does Firefox, btw.

This is addressed here: [https://yoric.github.io/post/binary-ast-
newsletter-1/#improv...](https://yoric.github.io/post/binary-ast-
newsletter-1/#improving-caching)

------
nikita2206
In this thread: people not understanding the difference between byte code
(representing code in the form of instructions) and AST.

------
vvanders
Lua has something very similar(bytecode vs AST) via luac for a long while now.
We've used to to speed up parse times in the past and it helps a ton in that
area.

~~~
spc476
There's a problem---you can break out of Lua with malformed bytecode [1] and
the Lua team don't want to spend the time trying to validate Lua byte code
[2]. That's why the latest Lua version has an option to ignore precompiled Lua
scripts.

And sadly, I can see the same happening here.

[1] [https://www.corsix.org/content/malicious-luajit-
bytecode](https://www.corsix.org/content/malicious-luajit-bytecode)

[2] The overhead will eat any performance gained by using precompiled Lua code
in the first place.

~~~
vvanders
Sure, if you're loading scripts from an untrusted source then don't use
bytecode. They're pretty clear about that in the docs. However probably about
90% of Lua's use cases are embedded and so in that case it works just fine.

------
s3th
i'm very skeptical about the benefits of a binary JavaScript AST. The claim is
that a binary AST would save on JS parsing costs. however, JS parse time is
not just tokenization. For many large apps, the bottleneck in parsing is
instead in actually validating that the JS code is well-formed and does not
contain early errors. The binary AST format proposes to skip this step [0]
which is equivalent to wrapping function bodies with eval… This would be a
major semantic change to the language that should be decoupled from anything
related to a binary format. So IMO proposal conflates tokenization with
changing early error semantics. I’m skeptical the former has any benefits and
the later should be considered on its own terms.

Also, there’s immense value in text formats over binary formats in general,
especially for open, extendable web standards. Text formats are more easily
extendable as the language evolves because they typically have some amount of
redundancy built in. The W3C outlines the value here
([https://www.w3.org/People/Bos/DesignGuide/implementability.h...](https://www.w3.org/People/Bos/DesignGuide/implementability.html)).
JS text format in general also means engines/interpreters/browsers are simpler
to implement and therefore that JS code has better longevity.

Finally, although WebAssembly is a different beast and a different language,
it provides an escape hatch for large apps (e.g. Facebook) to go to extreme
lengths in the name of speed. We don’t need complicate JavaScript with such a
powerful mechanism already tuned to perfectly complement it.

[0]: [https://github.com/syg/ecmascript-binary-ast/#-2-early-
error...](https://github.com/syg/ecmascript-binary-ast/#-2-early-error-
semantics)

~~~
Yoric
Early benchmarks seem to support the claim that we can save a lot on JS
parsing costs.

We are currently working on a more advanced prototype on which we will be able
to accurately measure the performance impact, so we should have more hard data
soon.

~~~
iainmerrick
It seems like one big benefit of the binary format will be the ability to skip
sections until they're needed, so the compilation can be done lazily.

But isn't it possible to get most of that benefit from the text format
already? Is it really very expensive to scan through 10-20MB of text looking
for block delimiters? You have to check for string escapes and the like, but
it still doesn't seem very complicated.

~~~
comex
Well, for one thing, a binary format’s inherent “obfuscatedness” actually
works in its favor here. If Binary AST is adopted, I’d expect that in
practice, essentially all files in that format will be generated by a tool
specifically designed to work with Binary AST, that will never output an
invalid file unless there’s a bug in the tool. From there, the file may still
be vulnerable to random corruption at various points in the transit process,
but a simple checksum in the header should catch almost all corruption. Thus,
most developers should never have to worry about encountering lazy errors.

By contrast, JS source files are frequently manipulated by hand, or with
generic text processing tools that don’t understand JS syntax. In most
respects, the ability to do that is a _benefit_ of text formats - but it means
that syntax errors can show up in browsers in practice, so the
unpredictability and mysteriousness of lazy errors might be a bigger issue.

I suppose there could just be a little declaration at the beginning of the
source file that means “I was made by a compiler/minifier, I promise I don’t
have any syntax errors”…

In any case, parsing binary will still be faster, even if you add laziness to
text parsing.

~~~
iainmerrick
_a simple checksum in the header should catch almost all corruption_

For JavaScript, you have to assume the script may be malicious, so it always
has to be fully checked anyway.

It's true that the binary format could be more compact and a bit faster to
parse. I just feel that the size difference isn't going to be that big of a
deal after gzipping, and the parse time shouldn't be such a big deal.
(Although JS engine creators say parse time is a problem, so it must be harder
than I realise!)

~~~
comex
> For JavaScript, you have to assume the script may be malicious, so it always
> has to be fully checked anyway.

The point I was trying to make isn't that a binary format wouldn't have to be
validated, but that the unpredictability of lazy validation wouldn't harm
developer UX. It's not a problem if malicious people get bad UX :)

Anyway, I think you're underestimating the complexity of identifying block
delimiters while tolerating comments, string literals, regex literals, etc.
I'm not sure it's all that much easier than doing a full parse, especially
given the need to differentiate between regex literals and division...

~~~
iainmerrick
I was figuring you could just parse string escapes and match brackets to
identify all the block scopes very cheaply.

Regex literals seem like the main tricky bit. You're right, you definitely
need a real expression parser to distinguish between "a / b" and "/regex/".
That still doesn't seem very expensive though (as long as you're not actually
building an AST structure, just scanning through the tokens).

Automatic semicolon insertion also looks fiddly, but I don't think it affects
bracket nesting at all (unlike regexes where you could have an orphaned
bracket inside the string).

Overall, digging into this, it definitely strikes me that JS's syntax is just
as awkward and fiddly as its semantics. Not really surprising I guess!

------
d--b
I am puzzled by how an binary AST makes the code significantly smaller than a
minified+gziped version.

A JavaScript expression such as:

var mystuff = blah + 45

Gets minified as var a=b+45

And then what is costly in there is the "var " and character overhead which
you'd hope would be much reduced by compression.

The AST would replace the keywords by binary tokens, but then would still
contain function names and so on.

I mean I appreciate the effort that shipping an AST will cut an awful lot of
parsing, but I don't understand why it would make such a difference in size.

Can someone comment?

~~~
bryanlarsen
According to a comment by Yoric, they're seeing a 5% improvement in size over
minified+gzip.

[https://news.ycombinator.com/item?id=15046750](https://news.ycombinator.com/item?id=15046750)

~~~
infogulch
I would be more excited by the possible reduction in parse time since the
grammar should be less ambiguous.

~~~
Yoric
Indeed, this is the main objective. Reduction of file size is also an
objective, albeit with a lower priority.

------
kyle-rb
The linked article somehow avoids ever stating the meaning of the acronym, and
I had to Google it myself, so I imagine some other people might not know: AST
stands for "abstract syntax tree".

[https://en.wikipedia.org/wiki/Abstract_syntax_tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree)

~~~
Yoric
Good point, I'll add this to the article.

------
mnarayan01
For those curious about how this would deal with Function.prototype.toSource,
via [https://github.com/syg/ecmascript-binary-
ast#functionprototy...](https://github.com/syg/ecmascript-binary-
ast#functionprototypetostring):

> This method would return something like "[sourceless code]".

~~~
_wmd
It seems this would break a number of libraries (can't name one, but certain
I've seen it), although per the article it would be a simple affair for the
browser to reify the encoded AST into a compatible representation, especially
if comments are also eventually preserved (per article its on the roadmap)

~~~
Yoric
It would be interesting to find any library that actually depends on that.
There used to be some, but I haven't seen any in years.

If you _do_ find one, please feel free to add a comment on the blog or one of
the trackers :)

~~~
mnarayan01
I remember reading about some framework which was doing something nutty like
embedding actual data in the _comments_ on a function, and then parsing out
those comments at run-time. For what it's worth, I believe it was on HN and in
relation to the caveats for older V8 optimization w.r.t. function length
([https://top.fse.guru/nodejs-a-quick-optimization-
advice-7353...](https://top.fse.guru/nodejs-a-quick-optimization-
advice-7353b820c92e)), but it was years ago so, as you say, hopefully they
moved to something less insane in the intervening time.

Aside since you're here. The "Would it be possible to write a tool to convert
serialized AST back to JS?" portion of the FAQ
([https://github.com/syg/ecmascript-binary-
ast#faq](https://github.com/syg/ecmascript-binary-ast#faq)) says that it would
be possible to generate source which would be "semantically equivalent" \--
you might want to call out the Function.prototype.toString exception
explicitly there, though admittedly that level of pedantry might be more
obscuring than enlightening.

------
svat
However this technology pans out, thank for a really well-written post. It is
a model of clarity.

(And yet many people seem to have misunderstood: perhaps an example or a
caricature of the binary representation might have helped make it concrete,
though then there is the danger that people will start commenting about the
quality of the example.)

------
Existenceblinks
These are random thought I just wrote on twitter in the morning(UTC+7):

"I kinda think that there were no front-end languages actually. It's kinda all
about web platform & browsers can't do things out of the box."

"Graphic interface shouldn't execute program on its own rather than rendering
string on _platform_ which won't bother more."

"This is partly why people delegate js rendering to server. At the end of the
day all script should be just WebAssembly bytecodes sent down."

"Browser should act as physical rendering object like pure monitor screen.
User shouldn't have to inspect photon or write photon generators."

"SPA or PWA is just that about network request reduction, and how much string
wanted to send at a time & today http/2 can help that a lot."

"Project like Drab
[https://github.com/grych/drab](https://github.com/grych/drab) 's been doing
quite well to move computation back to "server" (opposite to self-service
client)"

"WebAssembly compromise (to complement js) to implement the platform. JS api
and WebAssembly should be merged or united."

"VirtualDom as if it is a right way should be built-in just like DOM get
constructed from html _string_ from server. All JS works must die."

"That's how WebComponent went almost a half way of fulfilling web platform. It
is unfortunate js has gone far, tools are actively building on"

"I'd end this now before some thought of individualism-ruining-the-platform
take over. That's not gonna be something i'd like to write (now)"

\-----

Not a complete version though. Kind of general speaking but I've been thinking
in detail a bit. Then hours later I found this thread.

------
TazeTSchnitzel
It's really exciting that this would mean smaller files that parse faster, but
also _more readable_!

------
c-smile
To be honest I (as an author of the Sciter [1]) do not expect too much gain
from that.

Sciter contains source code to bytecode compiler. Those bytecodes can be
stored to files and loaded bypassing compilation phase. There is not too much
gain as JS alike grammar is pretty simple.

In principle original ECMA-262 grammar was so simple that you can parse it
without need of AST - direct parser with one symbol lookahead that produces
bytecodes is quite adequate.

JavaScript use cases require fast compilation anyway. As for source files as
for eval() and alike cases like onclick="..." in markup.

[1] [https://sciter.com](https://sciter.com) And JS parsers used to be damn
fast indeed, until introduction of arrow functions. Their syntax is what
requires AST.

~~~
timdorr
How do you do one-symbol-lookahead with things like function hoisting?

[https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Refe...](https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Statements/function#Function_declaration_hoisting)

~~~
c-smile
Not clear how hoisting is related to parsing really ...

Compiler builds table of variables (registers map) of the function to generate
proper bytecodes to access registers. At this point hoisting has some effect.
But it has nothing with parse/AST phase.

------
iamleppert
I'd like to see some real-world performance numbers when compared with gzip.
The article is a little overzealous in its claims that simply don't add up.

My suspicion is it's going to be marginal and not worth the added complexity
for what essential is a compression technique.

This project is a prime example of incorrect optimization. Developers should
be focused on loading the correct amount of JavaScript that's needed by their
application, not on trying to optimize their fat JavaScript bundles. It's so
lazy engineering.

~~~
Yoric
I'm guilty here: I described this format as "compression technique" because
early feedback indicated that many people assumed this was a new bytecode.
However, the main objective is indeed to speed up parsing. Compressing the
file is a secondary goal.

> My suspicion is it's going to be marginal and not worth the added complexity
> for what essential is a compression technique.

In terms of raw file size and according to early benchmarks (which may, of
course, be proved wrong as we progress), Binary AST + gzip affords us a
compression that is a little bit better than minification + gzip. By
opposition to minification, Binary AST does not obfuscate the code.

The real gain is in terms of parsing speed, in which we get considerable
speedups. I do not want to advertise detailed numbers yet because people might
believe them, and we are so early in the development process that they are
bound to change dozens of time.

> This project is a prime example of incorrect optimization. Developers should
> be focused on loading the correct amount of JavaScript that's needed by
> their application, not on trying to optimize their fat JavaScript bundles.
> It's so lazy engineering.

Well, you are comparing optimizing the language vs. optimizing the code
written in that language. These two approaches are and always will be
complementary.

------
mnemotronic
Yea! A whole new attack surface. A hacked AST file could cause memory
corruption and other faults in the browser-side binary expander.

~~~
Yoric
Sure, a new file format always introduces a new risk. This has never prevented
browsers from adding support for new image formats or compression schemes,
though.

------
kevinb7
Does anyone know the actual spec for this binary AST can be found? In
particular I'm curious about the format of each node type.

~~~
Yoric
The specifications are not nearly stable enough to be publicized yet. In
particular, there are several possibilities for file layout, compression, etc.
and we have barely started toying with some of them, so any kind of spec we
publish at this stage would be deprecated within a few weeks.

If you wish to follow the development of the reference implementation, you can
find it here: [https://github.com/Yoric/binjs-
ref/](https://github.com/Yoric/binjs-ref/) . It's very early and the format
will change often as we

1/ fix bugs;

2/ remove debugging info;

3/ start working seriously on compression;

4/ optimize.

------
z3t4
I wish for something like evalUrl() to run code that has already been parsed
"in the background" so a module loader can be implemented in userland. It
would be great if scripts that are prefetched or http2 pushed could be parsed
in parallel and not have to be reparsed when running eval.

------
malts
Yoric - the Binary AST size comparisons in the blog - was the original
javascript already minified?

~~~
Yoric
On our sample, x0.95 for minified code (so that's a 5% improvement) and ~x0.3
for non-minified code.

As usual, read these numbers with a pinch of salt, they are bound to change
many times before we are done.

~~~
malts
Does BinJS perform AST transforms like `!0` to `true` which would be shorter
in binary AST encoding? Or does it faithfully model the original code?

~~~
Yoric
It faithfully models the original code. The idea is that if you want to
transform `!0` to `true`, or if you want to obfuscate, etc. you can always
plug an additional tool.

~~~
malts
The 5% gains over minified code are impressive given that BinJS performs no
AST manipulation.

The BinJS prototype will be written in pure javascript I assume to speed its
adoption?

~~~
Yoric
For the moment, it's JS + Rust, because Rust is better than JS when you keep
refactoring Big Hairy Data Structures. However, once the data structures
stabilize we're planning to shift it entirely to JS.

------
bigato
Trying to catch up with webassembly, huh?

------
limeblack
Could the AST be made an extension of the language similar to how it works in
Mathematica?

~~~
Yoric
What do you mean? How does it work in Mathematica?

~~~
chr1
Mathematica has several representations for the code the main one is the
inputForm, the thing that users type in, and there is a fullForm which is the
full ast, and there is a way to manipulate it to create new functions on the
fly (or really any other objects, like graphic or sound, since everything in
Mathematica is represented in a uniform way)

For instance if in input form you have {1 + 1, 2 *3 }, in fullForm this
becomes List[Plus[1,1], Times[2, 3]], which is the readable version of lisp
and internally represented the same way [List [Plus 1 1] [Times 2 3]].

The fullForm is accessible from the language, and can be manipulated using
standard language feature, kind of like eval but better.

It would be really cool to have something like this in javascript, but
unfortunately looks like javascript tools tend to create nonuniform ast, that
is hard to traverse and manipulate

~~~
Yoric
For the moment, we're not trying to make the AST visible to the language
although this would definitely be interesting.

The "recommended" manner to manipulate the AST these days is to use e.g.
Babel/Babylon or Esprima. I realize that it's not as integrated as what you
have in mind, of course, but who knows, maybe a further proposal one of these
days?

------
tolmasky
One of my main concerns with this proposal, is the increasing complexity of
what was once a very accessible web platform. You have this ever increasing
tooling knowledge you need to develop, and with something like this it would
certainly increase as "fast JS" would require you to know what a compiler is.
Sure, a good counterpoint is that it _may_ be incremental knowledge you can
pick up, but I still think a no-work make everything faster solution would be
better.

I believe there exists such a no-work alternative to the first-run problem,
which I attempted to explain on Twitter, but its not really the greatest
platform to do so, so I'll attempt to do so again here. Basically, given a
script tag:

    
    
        <script src = "abc.com/script.js" integrity="sha256-123"></script>
    

A browser, such as Chrome, would kick off _two_ requests, one to
abc.com/script.js, and another to cdn.chrome.com/sha256-123/abc.com/script.js.
The second request is for a pre-compiled and cached version of the script (the
binary ast). If it doesn't exist yet, the cdn itself will download it, compile
it, and cache it. For everyone except the first person to ever load this
script, the second request returns before the time it takes for the first to
finish + parse. Basically, the FIRST person to ever see this script online,
takes the hit for everyone, since it alerts the "compile server" of its
existence, afterwards its cached forever and fast for every other visitor on
the web (that uses chrome). (I have later expanded on this to have interesting
security additions as well -- there's a way this can be done such that the
browser does the first compile and saves an encrypted version on the chrome
cdn, such that google never sees the initial script and only people with
access to the initial script can decrypt it). To clarify, this solution
addresses the exact same concerns as the binary AST issue. The pros to this
approach in my opinion are:

1\. No extra work on the side of the developer. All the benefits described in
the above article are just free without any new tooling.

2\. It might actually be FASTER than the above example, since cdn.chrome.com
may be way faster than wherever the user is hosting their binary AST.

3\. The cdn can initially use the same sort of binary AST as the "compile
result", but this gives the browser flexibility to do a full compile to JIT
code instead, allowing different browsers to test different levels of compiles
to cache globally.

4\. This would be an excellent way to generate lots of data before deciding to
create another public facing technology people have to learn - real world
results have proven to be hard to predict in JS performance.

5\. Much less complex to do things like dynamically assembling scripts (like
for dynamic loading of SPA pages) - since the user doesn't also have to put a
binary ast compiler in their pipeline: you get binary-ification for free.

The main con is that it makes browser development even harder to break into,
since if this is done right it would be a large competitive advantage and
requires a browser vendor to now host a cdn essentially. I don't think this is
that big a deal given how hard it already is to get a new browser out there,
and the advantages from getting browsers to compete on compile targets makes
up for it in my opinion.

~~~
tomdale
I don't think the binary AST proposal changes the accessibility status quo. In
my mind, the best analogy is to gzip, Brotli, etc.

If you had to have a complicated toolchain to produce gzipped output to get
the performance boost, that would create a performance gap between beginners
and more experienced developers.

But today, almost every CDN worth its salt will automatically gzip your
content because it's a stateless, static transformation that can be done on-
demand and is easily cached. I don't see how going from JavaScript -> binary
AST is any different.

~~~
tolmasky
I actually think gzip serves as a good example of this issue: this comment
alone is daunting to a beginner programmer and it really shouldn't. This
chrome/cdn thing could ALSO be auto-gzipping for you so that a beginner
throwing files on a random server wouldn't need to know whether it supports
gzip or not. I think we really take for granted the _amount_ of stuff
completely unrelated to programming we've now had to learn. If our goal is to
make the web fast by default, I think we should aim for solutions that work by
default.

It's definitely the case that once a technology (such as gzip) gets popular
enough it can get to "by default"-feeling status: express can auto-gzip, you
can imagine express auto-binary-ast-ing. It's slightly more complicated
because you still need to rely on convention of where the binary-ast lives if
you want to get around the dual script tag issue for older browsers that don't
support binary ast yet (or I suppose have a header that specifies it support
binary ast results for js files?). Similarly, at some point CDN's may also do
this for you, but this assumes you know what a CDN is and can afford one. The
goal I'm after is it would be nice to have improvements that work by default
on day 1, not after they've disseminated enough. Additionally, I think its
really dangerous to create performance-targeted standards this high in the
stack (gzip pretty much makes everything faster, binary ast one kind of file,
and introduces a "third" script target of the browser). The chrome/cdn
solution means that firefox/cdn might try caching at a different level of
compilation, meaning we get actual real world comparisons for a year before
settling on a standard (if necessary at all).

Edit: another thing to take into account, is that it now becomes very
difficult to add new syntax features to JavaScript, if its no longer just the
browser that needs to support it, but also the version of the Binary AST
compiler than your CDN is using.

~~~
tomdale
The process of getting content on to the web has historically been pretty
daunting, and is IMO much easier now than the bad old days when a .com domain
cost $99/year and hosting files involved figuring out how to use an FTP
client.

In comparison, services like Now from Zeit, Netlify, Surge, heck, even RunKit,
make this stuff so much easier in comparison now. As long as the performance
optimizations are something that can happen automatically with tools like
these, and are reasonable to use yourself even if you want to configure your
own server, I think that's a net win.

I do agree with you though that we ought to fight tooth and nail to keep the
web as approachable a platform for new developers as it was when we were new
to it.

On balance, I'm more comfortable with services abstracting this stuff, since
new developers are likely to use those services anyway. That's particularly
true if the alternative is giving Google even more centralized power, and
worse, access to more information that proxying all of those AST files would
allow them to snoop on.

------
agumonkey
hehe, reminds me of emacs byte-compilation..

------
Laaas
Why does this guy use bits instead of bytes everywhere?

------
jlebrech
with an AST you can visualise code in ways other than text, and also reformat
code like in go-fmt.

~~~
Yoric
Indeed. That's not one of the main goals of the project, but I hope that
standardizing the AST will help a lot with JS tooling, including code
visualisation.

~~~
jlebrech
I want to see a code editor that represent common programming language
concepts such as classes, functions (OO) in a graphical way and displays the
rest in text.

~~~
robertkrahn01
[http://imgur.com/a/c7cNx](http://imgur.com/a/c7cNx)

~~~
jlebrech
bit old school, but in that direction. yes

------
megamindbrian
Can you work on webpack instead?

~~~
Yoric
I personally hope that webpack can eventually support this format. But we're
not there yet.

------
FrancoisBosun
I feel like this may become some kind of reimplementation of Java's byte code.
We already have a "write once, run anywhere" system. Good luck!

~~~
Yoric
Not really. The reimplementation of something like Java's bytecode is Wasm.
This one is more of a compression format than a bytecode.

