
Proposal for a Binary Encoding of the JavaScript AST - bobajeff
https://github.com/syg/ecmascript-binary-ast
======
fake-name
Jesus christ. The solution to "our webpage has 7.1 MegaBYTES of javascript"
isn't "lets use a more compact representation of the code", it's STOP FUCKING
ADDING MORE JS SHIT.

\------

It's basically:

Users: "My house is on fire!"

These devs: "Ok, we're here to help, we cleaned your windows. You can see the
fire from outside _so_ much better now!"

\------

Maybe they should solve the problem of their website being a bloated
javascript blob-monster, rather then just trying to fit it into a smaller box.

~~~
rtpg
I feel like this is an unfair criticism unless you know what the page is
doing.

People are building entire vector editing programs on the web. How can you be
fine with Photoshop being several hundred megs but not with this person having
less than 10 megs of code to achieve something similar?

The web isn't just Hacker News and CNN. People write real programs on it.

Obviously shipping less code is Good(TM). But isn't an across the board
improvement also good?

~~~
chroem-
>People are building entire vector editing programs on the web.

This is the exception rather than the rule. There is no reason why it should
take more than ten seconds to load a static news article with text.

~~~
spoiler
In which you—and the original starter of this comment thread—completely missed
the point. The linked page clearly uses complex applications as examples.
Whatever anyone's opinion of them are, they _are_ quite complex.

This proposal is aimed at web _applications_ , not a simple website with
image/text content, and minimal javascript.

~~~
mirko22
Yet Google Sheets and Gmail are applications, Facebook and LinkedIn are
websites which might as well be fully static and are twice the size so his
point is kinda correct.

Mind you, Gmail also has features like Chat as do Facebook and LinkedIn.

~~~
djeric3
Facebook.com does a lot more than show a newsfeed, profiles, and
notifications.

    
    
      - It contains a fully-fledged messenger application that supports payments, videoconferencing, sharing rich media etc
      - The newsfeed supports interactive 360-degree video, a live video player, mini-games in the newsfeed, and lots of other rich/interactive media
      - It's a gaming platform for 3rd party games
      - It's a discussion forum, groups management system, event planning UI
      - Photo sharing and editing platform, as well as live video streaming tool
      - A platform for businesses to have an online presence (Pages)
      - A peer-to-peer online marketplace (called "Marketplace")
    

And a dozen other things I can't think of right now. You might say "but I
don't use all of those things". That's another tricky part, the fact that
every user has a different "configuration" and different types of content in
their newsfeed at any given moment, requiring them to be served different sets
of JavaScript code.

~~~
codedokode
Most of these features are unnecessary until the user tries to play a video, a
game, opens a dialog etc.

~~~
djeric3
Right, and that's what happens today, the JS for the secondary functionality
is loaded on demand.

Here's what I have in my FB homepage during a random load:

    
    
      - Search bar for searching people/groups/posts/pages
      - News ticker
      - Friend requests, Notifications
      - Sidebar ads
      - A rich text editor for sharing my status
      - A newsfeed story with a special "memory - 3years ago" feature
      - Comments & commenting UI under newsfeed stories
      - Suggestions for "People You May know"
      - A video auto-playing a clip from a friend, with capability to auto-select between tracks of different video quality based on bandwidth (including bandwidth estimator code)
      - And probably a dozen different A/B experiments that I'm a part of
    

It takes a lot of code just to render all these UI elements. If I interact
with any of them, additional code is loaded (you can see this in the Network
monitor).

This homescreen UI is as rich as any desktop application and requires no less
code to render. The problem being addressed in this proposal is that a native
version of this app would start a lot faster than the web version. And that's
because a browser will parse all the code files loaded on startup in an app
(inefficiently, by necessity), but a native app will only read the code for
the code paths that are actually executed.

Basically, O(code executed) is a lot better than O(all files containing code
that is executed). And this proposal features a more parser-friendly encoding
and change to the parsing semantics.

------
codedokode
I don't like the part that explains the motivation for implementing binary
encoding:

> A brief survey from July 2017 of the uncompressed JS payload sizes for some
> popular web applications on desktop and mobile:

> Facebook 7.1 MB

I think Facebook should just remove the code they don't need rather than
invent new compression methods. There is no way a social network with a chat
needs so much JS code.

~~~
13of40
This is kind of a noob question, but do you have any idea if Javascript
actually gets fully parsed on the client side every time it's used, or if the
AST gets cached? (Let's assume the "client side" is a newish version of
Chrome.)

~~~
tachyonbeam
Chrome now caches compiled code: [https://v8project.blogspot.ca/2015/07/code-
caching.html](https://v8project.blogspot.ca/2015/07/code-caching.html)

In addition to this, browsers do lazy parsing. They do not fully parse the
bodies of functions until the functions are executed.

~~~
Yoric
Note that "lazy parsing" means "do most of the parsing, just don't store the
result". That's a consequence of the specifications of JavaScript that require
all syntax errors to be thrown as early as possible.

This is something that would change with the Binary AST, hence allowing much
lazier parsing.

------
dgreensp
I love the idea of binary ASTs as the things browsers compile instead of
source, but I don't think making browsers even more complex is worth it. These
new binary files would also be yet another type of asset to serve alongside
your .js files, with implications for debugging and source maps. If the reward
is bloated SPAs load 10% faster, as long as they ship both JS and binary JS,
forever, I would wonder if browsers should parse the binary format or just
keep optimizing their text parsers more.

~~~
paulddraper
This seems pretty easy though.

When you parse, you parse to the same JS AST.

When you show it in the browser, you show the text form of the JS AST. Source
maps like normal.

It's a reasonably lightweight proposal, though IDK if it will be compelling
enough.

------
gcoda
It might be a good idea. But scripts inside HTML not compiled because every
user should have a right to see what gets executed of his device. Or this is
just my fantasy and not design choice by early days engineers? I feel really
conflicted about wasm and this.

Maybe now we can just put index.exe and forget about openness of web already

~~~
geofft
I don't know of anyone who actually audits the JS they run. I know people
(like Richard Stallman) who don't run JS at all, and people (like my roommate)
who use NoScript with exceptions for websites they trust—which is a trust by
website, not by code; they'll trust any JS that origin produces in the future.
I suppose there are people running GNU LibreJS and running any JS that claims
to be freely licensed, but again without looking at the code to see if it's
heavily obfuscated. And then there's the rest of us, the vast majority, who
just run all the JS we se.

If we already had people who audit JS before running it, preserving
auditability would be worth doing, but keeping the ability to do something
that (I think) literally nobody has done in over 20 years of JavaScript isn't
worth trading off actual technical improvements for.

Fortunately we have something very different and (IMO) more practical, that
makes this very different from an "index.exe" model: we have isolation between
websites enforced at the browser level. aim.exe can read my saved quicken.exe
documents. hangouts.google.com cannot interact with my bankofamerica.com
account; it can't even know that I have one.

I think I have a right to see source to all the code that runs on my device
_with full privilege_. I also think I have a right to limit the powers of code
to which I don't have source, including both limiting what sorts of things it
has access to and how much CPU, memory, etc. it's allowed to consume. I don't
really know that insisting on the right to see source to code that runs
confined is actually going to measurably improve my computing freedom.

~~~
kalleboo
> aim.exe can read my saved quicken.exe documents. hangouts.google.com cannot
> interact with my bankofamerica.com account; it can't even know that I have
> one.

Although these days we have sandboxed native software as well. iOS, Android,
Mac App Store and I believe Windows has something like this now too?

------
jacobush
Thoughts. This looks to make Javascript even more "lisp-ish" than it already
was, by simplifying syntax. Albeit a binary lisp.

Second thought - regular zip compression already minimises Javascript, doesn't
it?

Which leads to my third thought - isn't the most important problem they are
trying to solve, the slow parsing speed of Javascript text? They solve that by
introducing unambiguous operators and other things. This could be done without
resorting to binary encodings. They could introduce a plain text encoding,
where (for instance) only UTF8 is allowed, with unambiguous operators.

THEN they could put a binary encoding on top of that, if necessary. Or just
zip it.

Instead they jumped straight at the binary representation. I'm not saying it's
wrong to do so. But maybe unnecessary?

Please let me know if my 5 minute understanding of what they are doing is
wrong...

EDIT: spelling.

~~~
Yoric
Slow parsing speed is due to several things:

\- encoding issues;

\- ambiguous stuff (for instance, `/` can be the start of a comment, the start
of a regexp or a division, `()` or `{}` can have several meanings, etc.);

\- odd behavior (variables can be used before they are declared, double-
declaring variables is sometimes valid but not always, etc.);

\- plenty of information that is necessary for execution is only available
after you need it (sometimes much later, e.g. determining free variables,
determining whether `eval` can affect local variables, etc).

Also, this syntax makes it impossible to perform concurrent parsing (at least
not without extremely costly non-concurrent pre-parsing) and makes it
basically impossible to perform lazy parsing (at least not without this same
extremely costly pre-parsing).

If I understand correctly, your suggestion would have been to:

1\. provide a "better" developer-facing text syntax;

2\. if necessary, move to binary parsing.

Your suggestion makes sense but I believe that it underestimates seriously the
ability to:

a. come up with an alternative, easier to parse, text syntax;

b. make sure that your text syntax has the same semantics as the original
syntax;

c. get _everybody_ to agree on that text syntax;

d. maintain this text syntax through successive versions of JavaScript,
without losing its desirable properties;

e. have every browser maintain two text parsers.

Also, note that b. most likely means standardizing upon an AST, which is
roughly half of the difficulty of the current proposal.

Finally, unless I am missing something, your suggestion does not improve
concurrent or lazy parsing. One of the core benchmarks behind the current
proposal is that lazy parsing can be made extremely faster than non-lazy
parsing, as soon as you the expensive pre-parsing phase has been abolished.

Does this answer your questions?

------
throwaway2016a
I think the web could actually benefit from this and most developers these
days are used to sending their client code through a compiler already so it
would make a minimal difference in tooling.

Personally, though, webpack is pretty amazing. When I first heard of it I very
much disliked it but it combined transpiling, with minification, with code
optimization like tree shaking (getting rid of code from the compiled result
that is unused).

------
z3t4
You don't have to parse every single module at page load. You can defer, or
only load code the first time it's used. This 10-15% of time saved is only on
first load. If you have a web "app" it's not like the user will reload it over
and over again like the old school server rendered apps, which still works
fine btw, they are even faster now with faster computers and better network
latency. So if this is not the problem of classic web apps, nor the problem of
modern web apps, who's problem will it solve !?

Make cache-friendly modules, only abstract code that can be reused, keep the
rest of the code in-line.

~~~
Yoric
I wrote
[https://news.ycombinator.com/item?id=14908903](https://news.ycombinator.com/item?id=14908903)
in answer to a different question. Does it answer your points, by any chance?

~~~
z3t4
JavaScript is a fun language because it _does not require a build step_.
Developing JavaScript is _easy_ and you want to make it harder just to shave
off a few milliseconds !?

Show me some stats and graphs on how long it actually takes to parse JS to
AST.

I care a lot for performance, if you want to make me happy, make text
rendering in the canvas faster. And if you could also make eval faster I would
be very happy.

~~~
Yoric
Most professional development I have seen in the past few years already uses a
build step, either to pack dependencies, to run linting or to polyfill
language features, so this Binary AST will not really change that. Indeed, I
hope that Babel, Webpack & co will eventually ship with an option `--binary-
ast` and that the only change for developers will be to add this option to
their toolchain.

However, if you wish to write JavaScript without build step, you will of
course be free to continue doing this. I don't think that anybody has ever
thought of making the Binary AST compulsory. It's just a build target that you
can use for decreasing file size and parse time.

For stats, see the proposal. We'll add more stats once the advanced prototype
is complete.

> I care a lot for performance, if you want to make me happy, make text
> rendering in the canvas faster. And if you could also make eval faster I
> would be very happy.

I'm sure both are in progress, but entirely orthogonal to this proposal.

------
twii
> For parsing desktop facebook.com, 10-15% of client-side CPU time is spent
> parsing JavaScript. The prototype we implemented reduced time to build the
> AST by 70-90%.

Whatever the win from their prototype, it should never be a reason to change
core Javascript like .toString(). A facebook representative in this little
proposal club doesn't feel good. My first thought was: please optimise your
own codebase and don't break other people's code for your own profit.

~~~
Yoric
Out of curiosity: do you actually use `Function.prototype.toString()`? I
haven't seen any use of this method in years. Plus the few uses that I have
seen were libraries that attempted to rewrite other libraries and broke at
pretty much each update of their dependencies.

There may be legitimate uses of this method in production code, but I can't
think of any from the top of my mind. If you can think of one, don't hesitate
to file it as an issue in the linked tracker.

~~~
amptorn
`Function.prototype.toString()` is the basis for an alternate, ES5-era
multiline string syntax:

    
    
        var str = (function(){/*
            STRING
            GOES
            HERE
        */}).toString().slice(14, -3);

~~~
TheAceOfHearts
Prior to ES2015, the spec's definition of Function.prototype.toString() was
pretty vague. The behavior was implementation-dependent, although I don't know
the differences per browsers, since I've never used this feature seriously.
Here's the text from ES5.1 [0]:

> An implementation-dependent representation of the function is returned. This
> representation has the syntax of a FunctionDeclaration. Note in particular
> that the use and placement of white space, line terminators, and semicolons
> within the representation String is implementation-dependent.

[0] [http://www.ecma-
international.org/ecma-262/5.1/#sec-15.3.4.2](http://www.ecma-
international.org/ecma-262/5.1/#sec-15.3.4.2)

------
codedokode
This suggestion also requires a JS engine to have an AST compatible with the
one that is described. So the browser developers will have less freedom in a
choice of an internal AST representation.

~~~
Yoric
For what it's worth, I'm currently implementing the Binary AST parser (using
the current candidate AST) in SpiderMonkey (which uses its own AST, of course)
and that doesn't cause any issue so far.

The AST used in the file is a convenient abstraction but doesn't need to match
the AST used in-memory.

------
snarfy
It seems like this is a very small gain. After the initial fetch everything
should be cached locally. If it's not you did it wrong.

Am I missing something? How is this better than normal web caching?

~~~
Yoric
Yes and no.

You are right, after the initial fetch, everything should be cached locally.
However, caching traditionally does not affect parsing speed. While size gains
are a nice side-benefit of this proposal, the main benefit is parsing speed.

Now, both Chromium and Firefox have introduced smarter caching that caches
post-parsing data for websites that you have already visit (or visit often, I
don't remember the heuristics). This is very useful for all the sites that you
visit regularly and that do not update their JS code between your visits.
However, for all the sites that do update their JS code between your visits,
including Facebook (which updates several times per day), Google Docs, etc.
and every random website that you read because of a link on Hacker News but
will never revisit again, there is a potentially pretty large size benefit.

Does this answer your question?

------
Animats
It could be called "Java".

------
z3t4
Send your JavaScript as an image, then unpack it using WebGL.

------
dmitriid
Isn't "binary AST" called "WebAssembly"?

Oh. Right. I forgot. "Web" assembly was specifically designed to exclude
Javascript.

So yes, go ahead. Create another incompatible binary format

~~~
dmitriid
People are downvoting my comment for some reason.

The whole premise of "binary javascript AST" is "compile JavaScript into
binary representation because bundle sizes are just too big and takes too much
time to parse".

Isn't WebAssembly designed to kinda solve that and many other problems
presented in the Readme among other things? Instead of parsing JS (or whatever
language) you just download binary (which has been compiled, optimised etc.).

So, instead of designing _web_ assembly to support the most important language
on the web, the committee (as always, there's an effing committee to committee
the hell out of things) designs webassembly with just one single language in
mind: C++. Because reasons.

Hence, this ridiculous proposal: oh, let's create a binary representation of
Javascript AST because reasons. Because "oh, we can't compile to webassembly
because webassembly hasn't even been designed to support the main language on
the web and many other major languages".

Instead, we're just going for the lame excuse of "no vendor would agree to
ship bytecode." (why would they agree to ship binary ASTs?) and "there is no
guarantee that it would be any faster to transmit/parse/start" (where's the
guarantee with binary AST?).

~~~
Yoric
Well, if you look at the design goals of WebAssembly [1], you'll see very
different high-level goals. Sure, there is some intersection, but not nearly
as much as you seem to believe.

Now, there are very good reasons to not ship bytecode (e.g. because once you
standardize the bytecode on which your VM runs, you're dead in the water and
your VM stops evolving). The binary AST was carefully designed to avoid this
issue.

Indeed, as written in the proposal, there is no guarantee that sending
bytecode would be any faster to transmit/parse/start. Moreover, experiments
suggest that it wouldn't. On the other hand, if you have read the proposal,
you have seen that experiments with binary AST suggest that it is faster to
transmit/parse/start.

As for committees, well, that's how the web works. It is certainly not
perfect, but it beats single-vendor-decides-everything.

[1]
[https://github.com/WebAssembly/design/blob/master/HighLevelG...](https://github.com/WebAssembly/design/blob/master/HighLevelGoals.md)

~~~
dmitriid
> Well, if you look at the design goals of WebAssembly [1], you'll see very
> different high-level goals. Sure, there is some intersection, but not nearly
> as much as you seem to believe.

Yes. Because it's not tied into a single programming language

> Now, there are very good reasons to not ship bytecode (e.g. because once you
> standardize the bytecode on which your VM runs, you're dead in the water and
> your VM stops evolving). The binary AST was carefully designed to avoid this
> issue.

Oh. Right. Because binary AST is not something that will be standardized on,
and once you're set on a particular version of the AST, you're not dead in the
water?

Oh. Wait. Let me quote the proposal:

\--- quote ---

The grammar provides the set of all possible node kinds and their ordered
properties, which we expect to be monotonically growing. However, not all
vendors will support all nodes at any given time.

... parsing a correctly formatted file fails if a parser encounters a feature
that the engine does not implement

\--- end quote ---

Same problems

> Indeed, as written in the proposal, there is no guarantee that sending
> bytecode would be any faster to transmit/parse/start. Moreover, experiments
> suggest that it wouldn't.

Let me quote the WebAssemply READMEs

\--- quote ---

Wasm bytecode is designed to be encoded in a size- and load-time-efficient
binary format. WebAssembly aims to execute at native speed by taking advantage
of common hardware capabilities available on a wide range of platforms.

The kind of binary format being considered for WebAssembly can be natively
decoded much faster than JavaScript can be parsed (experiments show more than
20× faster). On mobile, large compiled codes can easily take 20–40 seconds
just to parse, so native decoding (especially when combined with other
techniques like streaming for better-than-gzip compression) is critical to
providing a good cold-load user experience.

\--- end quote ---

~~~
Yoric
Mmmhhh... I believe that I am starting to understand your point. If I
understand correctly, you regret that WebAssembly does not offer a set of
opcodes that would let it trivially (or at least simply) compile JavaScript,
right?

If so, let's reset the discussion, because we were actually talking past each
other.

Yes, such opcodes might essentially solve the issues that the binary AST is
attempting to solve, albeit probably at the cost of losing the source code of
JavaScript.

I have not been part of the debates on Wasm, but I believe that I can
extrapolate some of the reasons why this wasn't done:

1\. Coming up with a decent target for statically compiled languages whose
compilation is basically well-known is much easier than coming up (and
maintaining) a decent target for a dynamic compiled language whose
specifications are amended roughly once per year.

2\. Each browser vendor has its own JS-specific bytecode format. Transpiling a
neutral bytecode to a vendor-specific bytecode isn't particularly easy.

3\. If a vendor decides to expose their own bytecode format and let third-
parties ship to this format, they suddenly get a low-level compatibility
burden that will be extremely damaging to their own work on the JS VM.

4\. If the JS opcodes are high-level, suddenly, Ecma needs to maintain two
different specifications for the semantics of the same language:
specifications by interpretation and specifications by compilation. On the
other hand, if the JS opcodes are low-level, there is a pretty large burden
for developers on maintaining a JS-to-bytecode compiler and making sure that
it has the exact same semantics as JavaScript. In the latter case, there is
also a pretty good chance (and some early experiments) that the compiled files
will be much larger and much slower to parse.

In comparison with these issues, coming up with a simple Binary AST format is
rather simple, hence has much higher chances of achieving success and
consensus. Additionally, we have encouraging numbers that indicate that Binary
AST can speed up things a lot. Finally, the Binary AST has several interesting
side-benefits, including the fact that it maintains source code readability.

~~~
dmitriid
> Mmmhhh... I believe that I am starting to understand your point. If I
> understand correctly, you regret that WebAssembly does not offer a set of
> opcodes that would let it trivially (or at least simply) compile JavaScript,
> right?

Yes. Because SURPRISE it's called _web_ assembly

> Each browser vendor has its own JS-specific bytecode format. Transpiling a
> neutral bytecode to a vendor-specific bytecode isn't particularly easy.

Newsflash: webassembly is:

1\. based on asm. _js_

2\. supported (eventually) by all browser vendors

3\. is specifically targeting the _web_

4\. is already shown to be a significant improvement (see, for example,
[https://blog.figma.com/webassembly-cut-figmas-load-time-
by-3...](https://blog.figma.com/webassembly-cut-figmas-load-time-
by-3x-76f3f2395164))

> If a vendor decides to expose their own bytecode format

Why would the want to expose their own bytecode format? The entire point of
webassembly was to be a crossplatform format

> If the JS opcodes are high-level, suddenly, Ecma needs to maintain two
> different specifications for the semantics of the same language:
> specifications by interpretation and specifications by compilation.

Why? No one maintains two different specifications for C++ just because it
compiles to JavaScript (via EcmaScripten) and to WebAssembly.

Meanwhile, with binary AST, Ecma will indeed need to maintain two different
specifications

> In comparison with these issues, coming up with a simple Binary AST format
> is rather simple, hence has much higher chances of achieving success and
> consensus.

Issues that you pulled out of nowhere, frankly

> Additionally, we have encouraging numbers that indicate that Binary AST can
> speed up things a lot.

We have encouraging numbers that WebAssembly speeds up things a lot. Yet, you
insist "oh, we don't know, it might, or it might not".

> Finally, the Binary AST has several interesting side-benefits, including the
> fact that it maintains source code readability.

Quoting WebAssembly FAQs:

\--- quote ---

WebAssembly is designed to be pretty-printed in a textual format for
debugging, testing, experimenting, optimizing, learning, teaching, and writing
programs by hand. The textual format will be used when viewing the source of
wasm modules on the web.

\--- end quote ---

I strongly suggest you

1\. read up on WebAssembly

2\. Possibly apply efforts to bring JavaScript to WebAssembly

~~~
Yoric
I'm afraid that we are on vastly different wavelengths. If you feel that you
can improve WebAssembly, then by all means, please do so.

I am convinced that I can do good by making Binary AST a reality, so I will
focus my energy on that.

Thanks for the conversation.

~~~
dmitriid
So, basically, all your "arguments" for binary AST resolve to:

\- binary AST will not have versioning problems unlike WASM (this is false)

\- binary AST is better than ASM, because as soon as you standardize on
bytecode, "your VM will not evolve" (this is false)

\- binary AST can be used to view/debug source code, while WASM can't (this is
false)

\- binary AST is faster to load and parse than WASM (neither false nor true,
because there are no comparisons or experiments)

\- supporting WASm means supporting two versions of ECMA standard unlike
binary AST (which is false)

and so on and so on

However, Javascript is already a dumpster fire, so this effort will neither
improve nor worsen the situation

