
Inside a fast CSS engine - rbanffy
https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo/
======
crescentfresh
I always wonder, who puts together nifty little blog posts on this kind of
thing complete with graphics just for the article? By that I mean, literally
what title do they have?

Myself and my colleagues would/could write up a technical breakdown of
something neat or innovative we might have done to solve some problem at work,
but we sure as shit can't make cool little graphics interspersed between
opportune paragraphs, nor could we figure out how to make the thing
entertaining to read.

Is this kind of thing done in coordination with like a PR/graphics department?

~~~
linclark
No, I created Code Cartoons in my spare time. After I worked at Mozilla for a
while, I pitched the idea of me making Code Cartoons explaining the things we
were developing in Emerging Technologies. My (current) boss was super into the
idea.

I talked more about this on a recently recorded Hanselminutes podcast. It
should come out soon.

~~~
Kagerjay
your cartoon explanations are wonderful :)

By the way, what program are you using to make your art?

~~~
linclark
Thank you :) I use Photoshop on a Wacom Cintiq.

------
fpgaminer
Isn't it just crazy that we're gonna get all this cool tech in a browser that
is completely free and open source?

And along the way, Mozilla created what is perhaps the most disruptive
programming language of the past decade. For free. And open source.

It's really hard to appreciate the gravity of this.

------
robin_reala
I turned this on a couple of weeks ago on Nightly and have noticed precisely
zero problems, and a really nice little speedup on CSS-heavy sites. Really
good to see large chunks of parallelised Rust code start making their way over
from Servo to Firefox.

~~~
hamandcheese
Are there any plans for Servo to be a "real" browser, or will it always be
more of a R&D playground for Firefox?

~~~
Manishearth
It's a possible eventual goal but we don't have concrete plans or a timeline.

Stylo and the like let us get improvements out to users well before we need to
deal with that.

~~~
Brakenshire
Also, all the discussion so far seems to focus on desktop firefox, are these
improvements coming to firefox for android, or is that a longer term project,
perhaps when servo is ready?

~~~
Manishearth
We're hoping to have stylo in 57, and I would expect it to be on android in
the next release (58), but no promises.

Basically we intend for it to work eventually on android, but it didn't get
prioritized.

~~~
enz
I can turn on `layout.css.servo.enabled`, on Firefox 55 Android (and Desktop).
What does it mean? If I understand correctly, Firefox will have this property
turned on by default in 57/58 but it's possible to have it early?

~~~
Manishearth
No, it won't do anything on 55. Or on 56/57 Android.

Stylo can be disabled at build time, and this was the case for 55 and is still
the case for android. We don't remove the associated prefs in that case
(about:config doesn't have UX because it's not exactly user facing).

You can go to about:support to see if stylo is actually enabled. There's a
"stylo" column. It's not there in 55, but no 55 release has stylo enabled at
build time; so if you don't have that column at all, stylo isn't there.

------
sanxiyn
You may want to actually read this code. You can start by searching
"LayoutStyleRecalc" at
[https://github.com/servo/servo/blob/master/components/layout...](https://github.com/servo/servo/blob/master/components/layout_thread/lib.rs).
Following is verbatim copy.

    
    
      // Perform CSS selector matching and flow construction.
      if traversal_driver.is_parallel() {
          let pool = self.parallel_traversal.as_ref().unwrap();
          // Parallel mode
          parallel::traverse_dom::<ServoLayoutElement, RecalcStyleAndConstructFlows>(
              &traversal, element, token, pool);
      } else {
          // Sequential mode
          sequential::traverse_dom::<ServoLayoutElement, RecalcStyleAndConstructFlows>(
              &traversal, element, token);
      }

~~~
taeric
What am I aiming for on this read? I'm assuming the hard parts are swallowed
elsewhere, which is good. I'm a little surprised there are two methods for the
traversal, depending on the parallel versus sequential. (I'd expect that could
have been switched based on the "pool" parameter passed to a single method.
No, I don't actually care.)

~~~
sanxiyn
What do you consider hard parts? The actual magic is that there is no hard
parts.

~~~
taeric
I haven't read it. That snippet just seemed odd to me.

The hard parts, in this case, would be in Rust, most likely. I'm assuming that
flipping CSS to parallel is not a trivial process. I'm further assuming that
there is still some sort of locking mechanism so that you don't fire off two
parallel traversals at the same time. Or allow updates during a scan. (Or
abort running scans if an update happens?)

~~~
sanxiyn
Ah yes, parallelism (locking etc) part is entirely handled in generic Rust
library (Rayon in this case), and not specific to Stylo at all. Here is where
Stylo meets Rayon,
[https://github.com/servo/servo/blob/master/components/style/...](https://github.com/servo/servo/blob/master/components/style/parallel.rs)

    
    
      //! Implements parallel traversal over the DOM tree.
      //!
      //! This traversal is based on Rayon, and therefore its safety is largely
      //! verified by the type system.
    
      /// A parallel top-down DOM traversal.
      ///
      /// This algorithm traverses the DOM in a breadth-first, top-down manner. The
      /// goals are:
      /// * Never process a child before its parent (since child style depends on
      ///   parent style). If this were to happen, the styling algorithm would panic.
      /// * Prioritize discovering nodes as quickly as possible to maximize
      ///   opportunities for parallelism.  But this needs to be weighed against
      ///   styling cousins on a single thread to improve sharing.
      /// * Style all the children of a given node (i.e. all sibling nodes) on
      ///   a single thread (with an upper bound to handle nodes with an
      ///   abnormally large number of children). This is important because we use
      ///   a thread-local cache to share styles between siblings.

~~~
peoplewindow
It looks like Rayon is the same thing as Java's ForkJoinPool and parallel
streams but with the neat trick (really, Rust's neat trick) that it can
statically check that the code is safe to parallelise through the borrow
checker.

~~~
sanxiyn
Yes, that's a good summary.

------
lucideer
It's great to see any company going into detail about their technical
implementation, so I'm extremely hesitant to be critical, but I'm really
curious who the target audience for this one particular article is.

It's a very very odd mix of language that sounds like it's directed at a very
young child and standard technical speak. Not the usual for the Hacks blog.

Not to fault the article too much, but I just found the tone a bit confusing.
Even veering towards condescension in some parts, though I'm certain that's
entirely accidental and wasn't the author's intent at all.

~~~
ChrisSD
Presumably it's trying to bring technical details to the widest possible
audience. So some details may fly over the heads of some folk, whereas more
domain knowledgeable people might think some parts sound condescending. It's a
hard square to circle.

I think Chrome tried something similar when they were introducing the chrome
browser, although maybe with a different balance.

~~~
Stratoscope
You're probably thinking of this Chrome comic:

[https://www.google.com/googlebooks/chrome/](https://www.google.com/googlebooks/chrome/)

~~~
wlesieutre
When you said "Chrome comic" I assumed it was this one
[http://i.imgur.com/bhfYx6R.jpg](http://i.imgur.com/bhfYx6R.jpg)

------
pducks32
This is really great of Mozilla. I’m really excited to see such a large rust
project used on such a scale; after that I think there will be few doubts it’s
a really really impressive language. Also the fact that Mozilla knew this and
decided to take such a bold step as rewrite their engine is super cool. I’ve
done rewrites and they never go well so hats off to them.

------
aembleton
The writeup is inspiring. I found it very clear and yet reasonably in depth.
It helps me to understand how much work modern browsers are doing.

Also, excellent use of Rust.

~~~
moosingin3space
> excellent use of Rust

I should hope so! It was one of the major motivators behind Mozilla's
stewardship of Rust.

------
jancsika
> 4\. Paint the different boxes.

Is this really what happens under the hood?

1\. If I overlap 52 html <div>s like a deck of cards, does the browser really
paint all 52 div rectangles before compositing them?

2\. If I overlap 52 <g>s like a deck of cards, does the browser really paint
all 52 <g>s before compositing them?

3\. In Qt, if I overlap 52 QML rectangles like a deck of cards, does the
renderer only paint the parts of the rectangles that will be visible in the
viewport? I was under the impression that this was the case, but I may be
misunderstanding how the Qt QML scenegraph (or whatever it is called) works in
practice.

edit: typo

~~~
pcwalton
> 1\. If I overlap 52 html <div>s like a deck of cards, does the browser
> really paint all 52 div rectangles before compositing them?

Browsers don't typically do very good occlusion culling in general.

WebRender aims to change that, by using the hardware Z-buffer. :)

~~~
vvanders
You may want to be careful with that approach.

Not all GPUs have the Z-Buffer fillrate to make that approach viable(esp on
mobile). I've seen more than a few cases where it was actually faster to turn
off Z-Buffering and do the overdraw. More than a few architectures share
Z-Buffer bandwidth with other pipelines.

On tiled architectures it'll also increase your tile count which can impact
your per-drawcall overhead.

For low-tri things like UI you're better off doing the rect-culling yourself
in software(ideally on another thread) and only falling back to the Z-Buffer
if you have actual 3D transforms that need per-pixel culling.

~~~
pcwalton
Not in our experience. WebRender 1 used to do the rectangle culling on CPU and
it ended up being way slower than using the Z buffer on every architecture we
tried, including mobile. (Overdraw was even worse.) There are a surprisingly
large number of vertices on most pages due to glyphs and CSS borders. Note
also that rounded rectangles are extremely common on the Web and clipping
those in software is a big pain.

Generally, we are so CPU bound that moving _anything_ to the GPU is a win. We
had to fight tooth and nail to make WebRender even 50% GPU bound...

~~~
vvanders
Fair enough, my data was from about 4 years ago so it may be out of date.
There's some embedded GPUs that have some pretty 'interesting' architectures.

I would argue though if overdraw vs z-buffer hurt your performance then you
are more than 50% GPU bound :).

------
tannhaeuser
Congrats! Beyond the CSS engine itself, I also very much appreciate inside
development stories like these. I'd also like to read a meta-story about the
development efforts in terms of time spent, prior knowledge required etc., and
CSS spec feedback, with a reflection on the complexity of implementing CSS
from scratch.

------
t20n
Haven't even read it, just looked at the drawings and now i know how a browser
parses css.

------
altotrees
Really late to this discussion but wow. Having worked as a web developer/tech
writer and editor, this writeup pushed all my buttons. High-level concepts
broken down in an exciting way. Nothing turns me off quite as much as clicking
on a tech blog post and suddenly feeling like I am reading a whitepaper or
portion of someone's Ph.D. dissertation. This is the kind of post I love —
stripping things down to the nuts and bolts but keeping me engaged in a way
that gives me that excited feeling in the pit of my stomach like I am watching
something important take shape.

This is a fantastic technology and I feel like Servo has a pretty amazing
future ahead of it. Exciting stuff.

------
om2
I wish this post included some benchmarks or measurement.

~~~
pcwalton
Just asked Emilio on IRC for some quick numbers.

emilio: pcwalton: wrt gecko we have stuff like
[https://bugzilla.mozilla.org/show_bug.cgi?id=1342220#c25](https://bugzilla.mozilla.org/show_bug.cgi?id=1342220#c25)
and similar

emilio: pcwalton: there's also the tp6 numbers, though those also measure CSS
parsing and other stuff that isn't the style engine per se

pcwalton: emilio: our tp6 numbers are improved over Gecko at this point, yes?
:)

emilio: pcwalton: amazon by a huge amount, facebook not yet I believe, but
patches are on the queue that should make it turn around :)

emilio: pcwalton: happy to talk with mjs about impl details too, if he wants.
I know a bit of WK stuff too :)

~~~
bholley
This is an easy benchmark for rejecting a lot of selectors:

[http://bholley.net/testcases/style-perf-tests/perf-
reftest/b...](http://bholley.net/testcases/style-perf-tests/perf-
reftest/bloom-basic.html)

Firefox with STYLO_THREADS=1 gets about 160ms on my machine, which is
basically parity with safari and chrome. With the parallelism, Firefox gets
40ms. :-)

You can also simulate sequential mode in recent nightlies by ctrl-clicking and
loading the tab in the background (we disable parallelism for background
loads).

~~~
om2
Thanks, that is the kind of thing I was looking for (though a bit of a narrow
test). Cool to hear that this gets a solid speedup from parallelism.

------
ndh2
Very nice writeup! One thing I found strange is that multi-threading is ELI5,
but the reader is expected to know what DOM means.

------
c-smile
Parallel processing demonstrates benefits only if you have physical cores to
run code on them. If just one core is available for the app then parallel
processing is a loss due to thread preemption overload.

Is there any real life examples of achieved speedup?

~~~
taeric
I have similar concerns. I'm all for using my machine to its fullest, but in
large, applications like web browsing should be an additional thing I am doing
on my computer, not something that thinks it can take the full computer.

Though, I have to admit I am also a little torn on this. Yes, browsing is
typically done "during compile" or some other task. However, I have also begun
doing most of that work remotely so that I can save battery on my laptop. To
that end, it is now less of a concern on preserving cores for my tasks that
actually need it.

~~~
matthewmacleod
If that’s what you want to do, the you should ‘nice’ your web browser (or use
whatever priority mechanism your environment had available). Artificially
limiting performance is absolutely the wrong approach!

~~~
taeric
It isn't artificially limiting. If my browser didn't need all of my cores to
perform, we wouldn't be having this discussion. :)

And I seriously question whether my browser needs this to perform well. I am
not completely closed to the idea, but I am highly skeptical.

~~~
matthewmacleod
The fact that it's been done is some evidence that it's required. If it
weren't, it's hard to imagine why limited resources would be wasted on it!

Almost every modern device running a browser has multiple cores, and that
trend is almost certainly going to increase – so it definitely feels like
allowing a core part of the web platform to expand across cores will be
beneficial.

~~~
taeric
I challenge that. That it was done is as much evidence that it was doable as
that it was required.

It is also a bloat race. Web pages are getting increasingly complicated. With
very little benefit to end users. I'd wager a growing number of the cycles and
network requests are going to tracking, nowdays. Not to mention the ui
language doing more and more contortions to give us a page that could be much
more succinctly described.

None of this is to say I want it stopped. I just have a feeling of concern
that this is leading to an ever increasing march to faster and faster machines
to do basic work.

------
kristofferR
It's such a shame Firefox (including the nightlies) kills my Mac (making most
other applications hang/break), since the new versions are otherwise way
better than Chrome.

Does anyone know what it is about Firefox that makes the rest of my system
unable to spawn new processes?

~~~
robin_reala
That’s weird. If you’re not attached to your profile you could try resetting
it (go to about:support) and see if that fixes it. Otherwise I’d file a bug.

~~~
cpeterso
"Refreshing" your Firefox profile from about:support is not as scary as it
sounds. :) It creates a clean profile and imports your old profile's
bookmarks, passwords, and browsing history. You just lose any settings tweaks
and old add-ons you had. Your old profile directory is also backed up in case
something goes wrong during the new profile import.

