
Off-main-thread HTML parsing in Servo - robin_reala
https://blog.servo.org/2017/08/24/gsoc-parsing/
======
vmarsy
I really liked the blog post and was interested to see the end results but...

> I hope to publish another blog post describing [speculative parsing]
> thoroughly, along with details on the performance improvements this feature
> would bring.

Isn't there any preliminary results on the perf improvements this would bring?

I understand that the % of time when the speculative parsing succeeds (instead
of having to roll back to a sequential parsing) can't be know easily, since
it'll be based on the testing of a large number of real-world web pages,

but I'd love to see just 2 simple examples:

* One full of `document.write`, so we can have an idea of the time lost in case of failed speculation

* One 'embarrassingly parallel' dom tree where the speculation should pay off the most.

That would give us a good idea of worst case / best case results.

~~~
kevingadd
With things like this sometimes they do the work without preliminary perf
numbers because other browsers have already done it successfully. IIRC, at
least one production browser already does off-main-thread parsing.

~~~
Touche
Do you know which one? Is it speculative parsing like this one or does it stop
at scripts?

~~~
bla2
Pretty sure all WebKit-based browsers (including chrome) do this. I'd guess
they block if needed, but I'm not sure.

~~~
esprehn
fwiw WebKit never turned on the threaded html parser (and deleted it after the
fork [1]) and Blink removed it a bit ago too [2]. In Chrome we measured it
across a number of sites and found that the time saved from background
tokenization wasn't benefiting real world content enough to justify the cost
and that in some situations it actually made things worse.

That's not to say Servo shouldn't try it of course! Part of having a healthy
multi-browser ecosystem is each browser trying lots of ideas and coming up
with new solutions to problems that were encountered in other implementations.

[1]
[http://trac.webkit.org/changeset/162260/webkit](http://trac.webkit.org/changeset/162260/webkit)
(the number cited here is a joke :P)

[2]
[https://groups.google.com/a/chromium.org/forum/#!topic/loadi...](https://groups.google.com/a/chromium.org/forum/#!topic/loading-
dev/_cXkjQ0-L_E) (see also the design doc linked from the email)

~~~
krzat
I guess one of goals of Rust is to make these sort of optimisations viable
(compiler provides safety nets). Will be nice to see what they can do.

~~~
hsivonen
Gecko has had off-the-main thread HTML parsing since Firefox 4. Docs:
[https://developer.mozilla.org/en-
US/docs/Mozilla/Gecko/HTML_...](https://developer.mozilla.org/en-
US/docs/Mozilla/Gecko/HTML_parser_threading)

WebKit/Blink put fewer parts of the HTML parser off the main thread, so
negative results should not be taken to apply to more comprehensive
approaches.

------
DiabloD3
So, anyone try Firefox Nightly lately, with Stylo finally landed in there? The
browser actually feels fast now, I think I've finally made my decision to dump
Chrome.

~~~
Garbage
I have been using Stylo for some time now. I have faced exactly zero problems.
I think it's rock solid. I don't think it's the default CSS style system
though. You can enable it using layout.css.servo.enabled to true in
about:config

~~~
KitDuncan
You can already enable this flag and Firefox stable, but I don't think it
takes effect yet.

~~~
NiLSPACE
It only works in Firefox Nightly. The flag exists in Beta and Stable, but
since Stylo isn't compiled with these versions the flag doesn't actually do
anything.

~~~
infogulch
Are you sure? I thought I saw a FF dev mention right here that the flag is
available in stable just a couple days ago.

~~~
DiabloD3
From what I can tell, Beta 56 has it (but you have to enable it), Nightly 57
has it
([https://bugzilla.mozilla.org/show_bug.cgi?id=1330412](https://bugzilla.mozilla.org/show_bug.cgi?id=1330412)
is tracking when it'll be enabled by default), and Stable 55 does not have it
at all (although the pref might be there).

------
lewisl9029
Last time I checked, the Servo team also had plans to _execute JS off the main
thread_ and introduce async versions of DOM APIs to enable better parallelism:
[https://news.ycombinator.com/item?id=9011215](https://news.ycombinator.com/item?id=9011215)

Servo is by far the technology I'm most excited about as a frontend developer.
It will be a complete game changer for web performance once all the pieces are
in place.

~~~
pcwalton
There isn't really a main thread in Servo, because WebRender, layout, and
JS/DOM per origin all live in separate threads.

------
euyyn
I wish he went into more detail explaining his solution to the synchronous
queries, as the linked gist implies much more knowledge of Servo's internals
than I have.

------
andy_ppp
I wish browser vendors would make stand at this point and say in 1 year all
major browser engines will stop supporting document.write - it's horrible for
developers to use and no-one I know advocates it - but I'm fairly sure ad
networks still use it for no reason whatsoever.

The implementation of browsers is already difficult enough! It's great to see
the lengths Servo is going to to be compatible, but on this one I say warn
developers and deprecate this feature that no-one in 2017 should ever use.

~~~
jgraham
Unfortunately the reality is that a double digit percentage of all sites rely
on document.write [1]. Whilst it's true that a lot of usage is advertising,
it's also used for functionality in sites like Google Docs. Perhaps with a lot
of work you could get the usage down to single-digit percentage figures, but
disabling it entirely is going to break an awful lot of sites that are
unmaintained. For something like NPAPI, where there is a real ongoing cost of
support, breaking old content — with a multi year timeline for migration —
might be worthwhile, but I think it's hard to make the case that
document.write is in the same category. Yes, it's an old and terrible API that
causes layering violations, makes writing scripted parsers much harder, and
ideally authors would avoid it. But making it go away would be a huge project
with relatively low payoff compared to other possible uses of the same effort.

[1] [https://discuss.httparchive.org/t/how-and-where-is-
document-...](https://discuss.httparchive.org/t/how-and-where-is-document-
write-used-on-the-web/1006)

~~~
acdha
In case anyone else was curious about precisely what those numbers measured, I
just confirmed that Lighthouse only reports the no-document-write violation
when the page actually calls it so those numbers aren't high due to the sites
which still have fallbacks left from the paleolithic-era Web (I'd seen that
with Adobe Analytics/Omniture where the tracking code injection had a fallback
path for Netscape 2/IE3 until very recently).

------
Touche
Is it possible to use Servo yet? Last time I tried to use it, it was unusably
slow on all websites I tested with. It was always weird to read about how fast
it was when my (albeit limited) experience was the opposite. Is browser.html
still the preferred way to test?

~~~
dbaupp
As I understand it, a lot of pieces critical for real world use are really
just minimum viable implementations to allow Servo to work end-to-end and feed
data into the interesting parts. In particular, things like the network stack
don't benefit nearly as much from aggressive parallelism and so little effort
has been focused on them, not even bringing them up to level with the
techniques other browsers use.

~~~
Touche
Thanks, I can understand how that might be the case (that many parts are MVIs,
as you said). My issues were never with things like networking, it was more
that scrolling was really bad, and stuff rendered slowly.

~~~
pcwalton
WebRender has matured a lot lately; it's even in Firefox nightlies behind an
about:config preference. That said, there are still rough edges. If you see
rendering slowness, feel free to file issues on
[http://github.com/servo/webrender](http://github.com/servo/webrender).

There are no silver bullets for performance, only lots of lead bullets. I'm
confident that parallelism via GPU and multicore is a large architectural win.
But lots of making the Web fast involves a ton of little special case
optimizations that just need to be implemented. That's how Stylo went:
promising benchmarks and a mediocre user experience at the research stage
turned into great benchmarks and a great user experience with production
engineering work.

