Last I profiled, something like 30% of the pageload time for gmail in Firefox was parsing the JS.
Except they're not. There's a ton of ISO-8859-1 out there, as well as Big5, Shift_JIS, and so forth.
So Gecko canonicalizes all input to a single encoding; that happens to be UTF-16 for various historical reasons.
Locating non ASCII chars obviously is an issue, but no worse than the line feed issue mentioned in the article. Doubling the size of the input is generally a big performance hit that will be much more significant.
Apart from cache locality issues, it's not, really. And for a linear scan like this prefetching does OK.
And yes, Gecko normalizes JSON input to the JS engine.
I don't have numbers, sorry. But yes, UTF-8 would be the other thing to try normalizing to; it has its own benefits and drawbacks when the system is considered as a whole.
This gives up the real world optimisation of starting work on a partially downloaded file on slow network connections.
I think this work could be improved to go back to the old behaviour.
And it turns out that the optimization is overrated, especially for the common case of the 1MB file being in your cache anyway.