
Faster JavaScript parsing - mbrubeck
http://blog.mozilla.com/nnethercote/2011/07/01/faster-javascript-parsing/
======
cgs1019
Reminds me of something I once read that said "You can't make code run faster,
you can only make it do fewer things"

~~~
bsiemon
[http://lists.freebsd.org/pipermail/freebsd-
current/2010-Augu...](http://lists.freebsd.org/pipermail/freebsd-
current/2010-August/019310.html)

------
daeken
You know that you've done a good job of optimizing your stack when parsing JS
is a bottleneck. Kudos on getting to this point and pushing past it.

~~~
praeclarum
In the article he says you will see about a tenth of a second improvement on a
1 MB codebase. That's fantastic, but I wouldn't call it a bottleneck.

~~~
kevingadd
If each page you load takes 0.3 seconds and your workload involves loading a
hundred pages, that's a tangible improvement.

------
justincormack
Why do they parse 16 bit UCS2? Although Javascript strings are internally
UCS2, the files as transferred on the web are all UTF8 which should be easier
to parse fast, and involve less conversion, unless there is something really
broken in the spec.

~~~
bzbarsky
> the files as transferred on the web are all UTF8

Except they're not. There's a ton of ISO-8859-1 out there, as well as Big5,
Shift_JIS, and so forth.

So Gecko canonicalizes all input to a single encoding; that happens to be
UTF-16 for various historical reasons.

Note, by the way, that since some non-ASCII characters are syntax-special in
JavaScript parsing it in UTF-8 may not in fact be simpler to do fast, because
locating particular non-ASCII chars in a UTF8 stream is a bit of a PITA.

~~~
justincormack
Has anyone got any more up to date figures than these
<http://www.w3.org/QA/2008/05/utf8-web-growth.html> from 1998, when UTF-8 +
ASCII was already 50% of the web? I think there is a lot less ISO 8859-1 than
there was back then.

I also think that for Javascript files, almost all will be ASCII, with some
UTF-8, and little else. JSON is of course only allowed to be Unicode, and I
suspect that almost all of it is UTF8, although that will not generally hit
the Javascript parser I presume (although maybe Gecko also normalises it?).

Locating non ASCII chars obviously is an issue, but no worse than the line
feed issue mentioned in the article. Doubling the size of the input is
generally a big performance hit that will be much more significant.

~~~
bzbarsky
> Doubling the size of the input is generally a big > performance hit that
> will be much more significant.

Apart from cache locality issues, it's not, really. And for a linear scan like
this prefetching does OK.

And yes, Gecko normalizes JSON input to the JS engine.

I don't have numbers, sorry. But yes, UTF-8 would be the other thing to try
normalizing to; it has its own benefits and drawbacks when the system is
considered as a whole.

------
illumen
Interesting to see that the parsing does not work on a stream anymore, but on
a whole 'file'.

This gives up the real world optimisation of starting work on a partially
downloaded file on slow network connections.

Even fast internet connections take a while to download 1MB of javascript. Not
to mention the majority of internet users on slower connections.

I think this work could be improved to go back to the old behaviour.

~~~
spjwebster
According to the stats gathered by HTTP Archive, the average response size
(i.e. after compression) of a JavaScript file is 15kB:
<http://httparchive.org/interesting.php#responsesizes>

This will make JavaScript parsing faster for the vast majority of sites, and
for the majority of users.

If you have a single JavaScript file that is 1MB _after_ gzipping, you really
should think about modularising your code and lazy-loading the bits that
aren't immediately required. See <http://ajaxpatterns.org/On-
Demand_Javascript>

~~~
nonane
Also, don't forget that in most cases the JS code is probably loaded from the
browser's cache anyway.

