Hacker News new | past | comments | ask | show | jobs | submit login
Faster JavaScript parsing (blog.mozilla.com)
117 points by mbrubeck on July 1, 2011 | hide | past | web | favorite | 16 comments

Reminds me of something I once read that said "You can't make code run faster, you can only make it do fewer things"

You know that you've done a good job of optimizing your stack when parsing JS is a bottleneck. Kudos on getting to this point and pushing past it.

Parsing JS becomes a bottleneck when the web page throws a ton of scripts at you.

Last I profiled, something like 30% of the pageload time for gmail in Firefox was parsing the JS.

In the article he says you will see about a tenth of a second improvement on a 1 MB codebase. That's fantastic, but I wouldn't call it a bottleneck.

If each page you load takes 0.3 seconds and your workload involves loading a hundred pages, that's a tangible improvement.

Why do they parse 16 bit UCS2? Although Javascript strings are internally UCS2, the files as transferred on the web are all UTF8 which should be easier to parse fast, and involve less conversion, unless there is something really broken in the spec.

> the files as transferred on the web are all UTF8

Except they're not. There's a ton of ISO-8859-1 out there, as well as Big5, Shift_JIS, and so forth.

So Gecko canonicalizes all input to a single encoding; that happens to be UTF-16 for various historical reasons.

Note, by the way, that since some non-ASCII characters are syntax-special in JavaScript parsing it in UTF-8 may not in fact be simpler to do fast, because locating particular non-ASCII chars in a UTF8 stream is a bit of a PITA.

Has anyone got any more up to date figures than these http://www.w3.org/QA/2008/05/utf8-web-growth.html from 1998, when UTF-8 + ASCII was already 50% of the web? I think there is a lot less ISO 8859-1 than there was back then.

I also think that for Javascript files, almost all will be ASCII, with some UTF-8, and little else. JSON is of course only allowed to be Unicode, and I suspect that almost all of it is UTF8, although that will not generally hit the Javascript parser I presume (although maybe Gecko also normalises it?).

Locating non ASCII chars obviously is an issue, but no worse than the line feed issue mentioned in the article. Doubling the size of the input is generally a big performance hit that will be much more significant.

> Doubling the size of the input is generally a big > performance hit that will be much more significant.

Apart from cache locality issues, it's not, really. And for a linear scan like this prefetching does OK.

And yes, Gecko normalizes JSON input to the JS engine.

I don't have numbers, sorry. But yes, UTF-8 would be the other thing to try normalizing to; it has its own benefits and drawbacks when the system is considered as a whole.

Also, does it mean serving UCS2 encoded JavaScript files to client presenting a Mozilla user agent would speed things up?

"Maybe". You still have to go through and convert lone surrogates to the replacement char.

Interesting to see that the parsing does not work on a stream anymore, but on a whole 'file'.

This gives up the real world optimisation of starting work on a partially downloaded file on slow network connections.

Even fast internet connections take a while to download 1MB of javascript. Not to mention the majority of internet users on slower connections.

I think this work could be improved to go back to the old behaviour.

According to the stats gathered by HTTP Archive, the average response size (i.e. after compression) of a JavaScript file is 15kB: http://httparchive.org/interesting.php#responsesizes

This will make JavaScript parsing faster for the vast majority of sites, and for the majority of users.

If you have a single JavaScript file that is 1MB after gzipping, you really should think about modularising your code and lazy-loading the bits that aren't immediately required. See http://ajaxpatterns.org/On-Demand_Javascript

Also, don't forget that in most cases the JS code is probably loaded from the browser's cache anyway.

That optimization never existed for <script> loads in Gecko in the first place.

And it turns out that the optimization is overrated, especially for the common case of the 1MB file being in your cache anyway.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact