Hacker News new | comments | ask | show | jobs | submit login
Inside a fast CSS engine (hacks.mozilla.org)
660 points by rbanffy on Aug 22, 2017 | hide | past | web | favorite | 141 comments

I always wonder, who puts together nifty little blog posts on this kind of thing complete with graphics just for the article? By that I mean, literally what title do they have?

Myself and my colleagues would/could write up a technical breakdown of something neat or innovative we might have done to solve some problem at work, but we sure as shit can't make cool little graphics interspersed between opportune paragraphs, nor could we figure out how to make the thing entertaining to read.

Is this kind of thing done in coordination with like a PR/graphics department?

No, I created Code Cartoons in my spare time. After I worked at Mozilla for a while, I pitched the idea of me making Code Cartoons explaining the things we were developing in Emerging Technologies. My (current) boss was super into the idea.

I talked more about this on a recently recorded Hanselminutes podcast. It should come out soon.

I loved your crash course in memory management. It had a real Randall Munroe vibe to it. Please keep up the good work.

That's a great idea, blog posts are usually too dry and we try to spice them up with photos and related media, but they're usually only tangentially related. Having the ability to draw something specific to the post really makes it more enjoyable to read.

your cartoon explanations are wonderful :)

By the way, what program are you using to make your art?

Thank you :) I use Photoshop on a Wacom Cintiq.


Did you also do the React Fiber cartoons?

Yes, that was me :) I've created a lot of cartoons around things in the React ecosystem. I've also done cartoons on WebAssembly and SharedArrayBuffer/Atomics.

Cool. I had the feeling they looked similar, but I didn't think someone who does this whole low-level stuff AND works for Mozilla would also like React, which got much hate from non-FB employees lately :D

The one on WebAssembly was also very informative, thanks!

I knew it!

This is kind of inspiring, I tend to doodle bits down on paper, and putting diagrams like this into posts seems like a great idea.

Which tools do you use to make the drawings? Just a tablet?

What font did you use for this? To me it looked like Comic Sans, but I guess the reaction would be different if it was the case...

I created my own font based on my handwriting. It's called codecartoons. I also have a more Marker Felt-ish one called Clarker Felt.

That's honestly one of those big differences between Mozilla and the tons of other idealistic free/open source projects out there. They care about presentation. They care about clarity. They care about layout. They care about style.

IMO understanding how to appeal to non-tech-y users (and that tech-y users don't always want to sift through 5 pages of links to SDK-tarballs when trying to download a software) was how they managed to compete with IE/Microsoft back in the day. They're also one of the few non-profit organizations releasing software that understands user interfaces.

All that stuff going on with FF57 and whatnot must be tough for them but overall, I believe they're a huge success story and more open projects should look up to them for inspiration. If there was an operating system (sorry, Firefox OS didn't count) or office suite developed with a similar attitude, it could really make an impact.

Just compare the download page of Libre Office (https://www.libreoffice.org/download/download/) with that of Firefox. I'd link the latter, but it basically just starts to download the file you actually want. It's the little things...

> Just compare the download page of Libre Office (https://www.libreoffice.org/download/download/) with that of Firefox.

Without the budget of Mozilla, there is no time to put enough thought and effort on presentation. Our volunteers are our money. I wear like five different hats in LibreOffice and do not get paid for any of them. If I focus on web layouts for some weeks, I watch in horror as our unconfirmed bug stats start growing out of control.

The number of people actively thinking about LibreOffice web presentation and layout is zero. If someone wants to start doing it for no compensation whatsoever, feel free to ping me.

As a fellow open source developer I completely understand you. Thank you for your work on LibreOffice.

Having said that, the grandparent reflects an undeniable truth of our time. User expectations are high, users are "spoiled": they expect it all and they expect it for free. Not commenting on whether that is a good or bad thing, just pointing out that it is true.

Yes, psychologically it is healthier to draw motivation from the fun of working with other contributors. I also like delegating stuff to others and that is a really good habit for avoiding burnout.

An astute observation -- applies to more than just open source. Doesn't it seems to sum up the western society in general these days, or am I becoming a grumpy old man?

The sense of automatic entitlement, with no expectation of reciprocity or putting in the hard work (in open source or otherwise), appears pervasive.

Thank you for all your work on Libre Office.

I believe she just has a talent for it, she's an engineer as far as I know and has other related cartoon descriptions: http://lin-clark.com/

Her talk on react fiber this year is quite good as well: https://www.youtube.com/watch?v=ZCuYPiUIONs

> Lin is an engineer on the Mozilla Developer Relations team. She tinkers with JavaScript, WebAssembly, Rust, and Servo, and also draws code cartoons.

Staff blogger perhaps?

I've been hired before by companies to take something their engineering team has done and turn it into a blogpost. I can't do graphics, but I can write a mean blogpost about pretty much anything you can think of.

Some posts/articles that came out of such initiatives have gone on to get published in dead-tree magazines or stick around the frontpage of HN for a few hours.

Broadly speaking, if you were doing this as a job job, I think it would fall under "developer evangelist". Your job is to make the company look good to potential hires and API consumers. Do whatever it takes. That involves going to conferences, writing blogposts, drawing graphics, and using freelancer resources to help you out in areas where you can't go. Usually evangelists can do most of the job on their own.

In my experience, it's happenstance that varies on an author-by-author basis. I've worked in places where one author would put up a mostly text article which would be immediately followed by something incredibly creative like this from another colleague.

It's a bit of a pity really, as it would be nice to take some of the insight of the not-quite-so-graphically-gifted engineers and have them collaborate with colleagues to present it in a great accessible format, but I've not seen this happen in practice.

Technical writers often do this. At an agency I worked at my co-worker worked that job + being a freelance technical writer and she used to make graphics all the time for documentation and articles.

If you read this article with a HN mobile app, try again with a real web browser. The cartoons are awesome but are not displayed in the app.

Marketing depth has a budget usually and can hire external graphic talent as needed. Not saying that this was the case eh, speaking generally.

Drawing is learned like any other skill. Just practice it for a while and you will become better at it

YES, this! Who on earth does that?

Isn't it just crazy that we're gonna get all this cool tech in a browser that is completely free and open source?

And along the way, Mozilla created what is perhaps the most disruptive programming language of the past decade. For free. And open source.

It's really hard to appreciate the gravity of this.

I turned this on a couple of weeks ago on Nightly and have noticed precisely zero problems, and a really nice little speedup on CSS-heavy sites. Really good to see large chunks of parallelised Rust code start making their way over from Servo to Firefox.

Are there any plans for Servo to be a "real" browser, or will it always be more of a R&D playground for Firefox?

I think of Servo as a non-production race car, taking risks and advancing the needle of technology to new unexplored areas. When it pays off, the results are drawn back into the production model.

Servo is still considered a research project, but one of the benefits of sharing WebRender, Stylo, etc between Servo and Firefox is that it is really helping to productionize those systems. That then will make it easier to turn it into a real thing.

Servo should be referred to as a prototype or experimental rendering engine. If you come across anything still calling it a research project, please let me know! I should have purged them all a year ago, but may have missed some.

It's a possible eventual goal but we don't have concrete plans or a timeline.

Stylo and the like let us get improvements out to users well before we need to deal with that.

I have to say, that seems like exactly the right approach and it's seriously impressive to see what you're doing.

I remember the original Mozilla, how it started out as a total rewrite of Navigator. It took years to reach feature parity and Navigator lost all of its market share in that time.

Being able to do refactorings like this, and introduce a whole new language at the same time, is seriously impressive engineering. Bravo.

Also, all the discussion so far seems to focus on desktop firefox, are these improvements coming to firefox for android, or is that a longer term project, perhaps when servo is ready?

We're hoping to have stylo in 57, and I would expect it to be on android in the next release (58), but no promises.

Basically we intend for it to work eventually on android, but it didn't get prioritized.

I can turn on `layout.css.servo.enabled`, on Firefox 55 Android (and Desktop). What does it mean? If I understand correctly, Firefox will have this property turned on by default in 57/58 but it's possible to have it early?

No, it won't do anything on 55. Or on 56/57 Android.

Stylo can be disabled at build time, and this was the case for 55 and is still the case for android. We don't remove the associated prefs in that case (about:config doesn't have UX because it's not exactly user facing).

You can go to about:support to see if stylo is actually enabled. There's a "stylo" column. It's not there in 55, but no 55 release has stylo enabled at build time; so if you don't have that column at all, stylo isn't there.

Stylo isn’t available on Firefox for Android yet but you can track this bug; they’d like to enable it “sooner rather than later” https://bugzilla.mozilla.org/show_bug.cgi?id=1366049

There's more to putting it in the product than just the code, but the style system in Servo does already work on Android. Once 57 ships, I suspect there will be some resources to start doing the remaining product integration work for Firefox on Android.

Ideally it would be both; at a minimum, it should be a full browser engine, suitable for embedding (like WebKit, and unlike Gecko).

I second this for diversity and safety in rendering engines if nothing else. Plus, kiosk-style systems might find good use out of it if it was lean since they can match how they develop their web apps to what Servo supports. Finally, if small and using portable layer on bottom, it could be mixed with self-healing systems like QNX or Minix 3.

Apart from Kiosk-style systems I think it might also work very well in an Electron-like setting. Supposedly, supporting only new development in the standards (i.e. without having workarounds for every quirk browsers have gathered over the years) is relatively easy, so if you could simply limit yourself to modern techniques to build Electron apps that perform better, that would be great.

You don't even need the whole web platform to build these sort of UIs -- in theory, you could use the WebRender engine to render CSS-box-like UIs. You could avoid some of the weaknesses of the web platform like the memory-heavy DOM. I think this would be an excellent target for a React Native-like framework for desktop, as it would support the full generality of CSS with excellent performance.

Qt has been doing something this for a long while: https://doc.qt.io/qt-5/stylesheet-syntax.html

QML is more comparable to what the grandparent means. Stylesheet support still works on the fairly rigid structure of nested widgets.

Absolutely agreed; I'd love to see Servo used as a browser engine for apps. And if you fit within what Servo supports today, you can already do this. I've seen people demonstrate Android applications (written in Rust) that embed Servo.

> I've seen people demonstrate Android applications (written in Rust) that embed Servo.

Do you have a link? I went looking for information about embedding Servo and found nothing.

https://blog.mozvr.com/webvr-daydream-support-lands-in-servo... for one, though that doesn't do a very good job of using Servo as a separate module.

I can't seem to find the original example I'm thinking of that showed how to build an Android application (using cargo-apk) that embedded Servo.

Ideally neither.

The article says We’re swapping in parts from our experimental browser Servo but servo.org says Servo is a modern high-performance browser engine. That's quite a difference.

Would be quite happy with servo being an engine and firefox being the browser (love firefox btw), just like how webkit/blink are the engines powering chrome/safari/opera. with bits and pieces of servo landing into firefox (just like how the CSS engine is landing in now).

I could be wrong though.

You may want to actually read this code. You can start by searching "LayoutStyleRecalc" at https://github.com/servo/servo/blob/master/components/layout.... Following is verbatim copy.

  // Perform CSS selector matching and flow construction.
  if traversal_driver.is_parallel() {
      let pool = self.parallel_traversal.as_ref().unwrap();
      // Parallel mode
      parallel::traverse_dom::<ServoLayoutElement, RecalcStyleAndConstructFlows>(
          &traversal, element, token, pool);
  } else {
      // Sequential mode
      sequential::traverse_dom::<ServoLayoutElement, RecalcStyleAndConstructFlows>(
          &traversal, element, token);

Hm, you may be lost if you don't have Rust IDE handy to jump around. Web cross reference for Rust code navigation is in development, but until that is ready, the next step is https://github.com/servo/servo/blob/master/components/layout...

parallel and sequential traverse_dom is generic traversal code which calls DomTraversal::process_preorder etc to do actual work. Here is lightly edited code:

  impl<'a, E> DomTraversal<E> for RecalcStyleAndConstructFlows<'a>
      where E: TElement,
            E::ConcreteNode: LayoutNode,
            E::FontMetricsProvider: Send,
      fn process_preorder<F>(&self, traversal_data: &PerLevelTraversalData,
                             context: &mut StyleContext<E>, node: E::ConcreteNode,
                             note_child: F)
          where F: FnMut(E::ConcreteNode)
          if !node.is_text_node() {
              let el = node.as_element().unwrap();
              let mut data = el.mutate_data().unwrap();
              recalc_style_at(self, traversal_data, context, el, &mut data, note_child);

Note that everything in components/layout is servo-specific and not used by Stylo / Firefox. The code in components/style is shared, and the code that hooks it up to Firefox is in ports/geckolib/glue.rs (specifically Servo_TraverseSubtree).

What am I aiming for on this read? I'm assuming the hard parts are swallowed elsewhere, which is good. I'm a little surprised there are two methods for the traversal, depending on the parallel versus sequential. (I'd expect that could have been switched based on the "pool" parameter passed to a single method. No, I don't actually care.)

What do you consider hard parts? The actual magic is that there is no hard parts.

I haven't read it. That snippet just seemed odd to me.

The hard parts, in this case, would be in Rust, most likely. I'm assuming that flipping CSS to parallel is not a trivial process. I'm further assuming that there is still some sort of locking mechanism so that you don't fire off two parallel traversals at the same time. Or allow updates during a scan. (Or abort running scans if an update happens?)

Ah yes, parallelism (locking etc) part is entirely handled in generic Rust library (Rayon in this case), and not specific to Stylo at all. Here is where Stylo meets Rayon, https://github.com/servo/servo/blob/master/components/style/...

  //! Implements parallel traversal over the DOM tree.
  //! This traversal is based on Rayon, and therefore its safety is largely
  //! verified by the type system.

  /// A parallel top-down DOM traversal.
  /// This algorithm traverses the DOM in a breadth-first, top-down manner. The
  /// goals are:
  /// * Never process a child before its parent (since child style depends on
  ///   parent style). If this were to happen, the styling algorithm would panic.
  /// * Prioritize discovering nodes as quickly as possible to maximize
  ///   opportunities for parallelism.  But this needs to be weighed against
  ///   styling cousins on a single thread to improve sharing.
  /// * Style all the children of a given node (i.e. all sibling nodes) on
  ///   a single thread (with an upper bound to handle nodes with an
  ///   abnormally large number of children). This is important because we use
  ///   a thread-local cache to share styles between siblings.

It looks like Rayon is the same thing as Java's ForkJoinPool and parallel streams but with the neat trick (really, Rust's neat trick) that it can statically check that the code is safe to parallelise through the borrow checker.

Yes, that's a good summary.

I'm curious how well this plays with some of the fancier selectors. Adjacent sibling, in particular.

I'm also curious if this pretty much prohibits ever having a parent selector. I guess you could do a first pass through the styles to remove any DAG nature from them?

DOM element has parent and sibling pointer, so there is nothing particularly difficult about adjacent sibling, later sibling, or even parent selector as far as selector matching goes. Avoiding restyling becomes more tricky, but nothing serious. "Parent before child" is for CSS inheritance, not for selector matching.

For not-very-difficult details, read https://github.com/servo/servo/blob/master/components/select... searching for NotMatchedAndRestartFromClosestLaterSibling and HAS_SLOW_SELECTOR_LATER_SIBLINGS.

As soon as there are parent selectors, the pointers don't help. Specifically, if I restyle a parent, all children of parent need to be adjusted, right?

Siblings... I'm assuming there is nothing hard there. Positions, maybe?

I still don't see what problem parent selectors cause. Restyling parent happens anyway without parent selectors (say, by JavaScript), so it is handled. In https://github.com/servo/servo/blob/master/components/style/..., MustCascadeChildren is returned when you... must.

I was basically restating the restriction for the parallel scan. It goes from parent down so that during the scan you don't do something that causes the scan to have to restart.

I was also shooting from the hip for things I'm interested in, here. I'm not sure it matters. But, if on scanning, you pick relevant rules to apply based on the path to a node, then I could see some trickiness on having to consider both the top down and the bottom up paths to a node.

Siblings.., I don't think has this problem. Not sure why I thought it might.

And javascript is ultimately a different topic. So I wasn't worried about that here.

Coming back to this before I finally sleep.

I missed your point about selector matching and style application. I think that is ultimately where my hip shot missed on this. (I have never had to reserve the right to be wrong. Just to assume it. :)

I still feel there is some danger there, but I think that is just clinging to an initial shot. I am curious why rust's help was needed to get this, now. A basic thread pool seems like it would have been somewhat easy to wire up in any language.

Yeah, the key reason Rust is necessary here is that there is an insane amount of complexity in the heart of modern web engines. Injecting concurrency into the intersection of DOM and Layout is only realistic with some kind of static guarantees against data races, and Rust is the only tool I'm aware of to do that.

One neat thing is that we use rust-bindgen to walk C++ data structures, which extends Rust's concurrency guarantees into C++. We also have an FFI layer for invoking C++ code, and we have some careful static analysis of that callgraph from the entry points to be sure we're not mutating anything.

Right, you don't need Rust to do this. In fact, Qualcomm did this with C++ and TBB in 2013, see http://dl.acm.org/citation.cfm?id=2442543. Rust does give you peace of mind that parallel styling won't be endless source of bugs, or worse, vulnerabilities.

It's great to see any company going into detail about their technical implementation, so I'm extremely hesitant to be critical, but I'm really curious who the target audience for this one particular article is.

It's a very very odd mix of language that sounds like it's directed at a very young child and standard technical speak. Not the usual for the Hacks blog.

Not to fault the article too much, but I just found the tone a bit confusing. Even veering towards condescension in some parts, though I'm certain that's entirely accidental and wasn't the author's intent at all.

Huh, I had the opposite reaction. I thought that it was a very nice, clear article, which is phrased to be approachable by as many people as possible with some amount of technical background.

That technical background may not be as programmers. It may be power users, or web developers, or the like.

And I thought the analogies and illustrations helped out with giving some context that people who aren't systems developers might be missing.

I found the clear explanations quite nice; it doesn't assume deep knowledge of CSS or Rust or the history of the various projects involved. The Stylo project has been extremely visible, being part of the latest Firefox Nightly, so it makes sense to explain it for a very broad audience.

And personally, I loved the illustrations, as well.

Lin Clark's "Code Cartoons" series is intended to take a potentially complex programming topic, and break it down into easy-to-understand terms, using a combination of simpler language and the cartoons. She has previously written Code Cartoons entries on topics such as Flux, Redux, hot module reloading/time travel debugging, and the "React Fiber" rewrite of React. She's also written several entries specifically for the Mozilla Hacks blog on topics such as WebAssembly.

Presumably it's trying to bring technical details to the widest possible audience. So some details may fly over the heads of some folk, whereas more domain knowledgeable people might think some parts sound condescending. It's a hard square to circle.

I think Chrome tried something similar when they were introducing the chrome browser, although maybe with a different balance.

You're probably thinking of this Chrome comic:


When you said "Chrome comic" I assumed it was this one http://i.imgur.com/bhfYx6R.jpg

Open source depends on people wanting to work on it. Firefox is a very intimidating beast under the hood. Making it interesting is a vital important goal.

I get that, and that's exactly what the Hacks blog is for (it has always done this masterfully). The recent posts on Hacks and Mozilla Tech on Medium have done this really well I think.

I don't think the tone here necessarily does anything for generating interest though. This is highly subjective, but at least for me it made for a jarring read and I lost interest pretty quickly. As I said above, I respect the intent behind the article, but I just found it very odd.

Different strokes I guess. There's plenty of documentation out there on CSS engines that you'd find readable while others lose interest pretty quickly. The diversity in approaches ensures that the pool drawn into Firefox development is diverse.

There were only two small parts that rubbed me this way, and they may just be related to my own knowledge/interests/biases:

1. The brief diversion about registers vs RAM and the brain analogy seemed a bit silly. I feel like anyone who would be interested in this article would find it completely unnecessary. It was just a tiny paragraph and one doodle though so it didn't really detract from the article.

2. Going the other way, Lin could have spent another few sentences talking about the compositing step and how GPUs enter into the picture (also why some CSS properties are "compositor-only"). I know compositing is tangential to the core of the article, but there were other brief tangents like the Rust stuff so a similar blurb about compositing wouldn't have been out of place (plus I find it super interesting).

Those are my only two nitpicks and the rest of the article did an excellent job threading the needle between informativeness and approachability without being too fluffy or verbose (e.g. Lin rightly didn't waste words explaining what the DOM is).

This is really great of Mozilla. I’m really excited to see such a large rust project used on such a scale; after that I think there will be few doubts it’s a really really impressive language. Also the fact that Mozilla knew this and decided to take such a bold step as rewrite their engine is super cool. I’ve done rewrites and they never go well so hats off to them.

The writeup is inspiring. I found it very clear and yet reasonably in depth. It helps me to understand how much work modern browsers are doing.

Also, excellent use of Rust.

> excellent use of Rust

I should hope so! It was one of the major motivators behind Mozilla's stewardship of Rust.

> 4. Paint the different boxes.

Is this really what happens under the hood?

1. If I overlap 52 html <div>s like a deck of cards, does the browser really paint all 52 div rectangles before compositing them?

2. If I overlap 52 <g>s like a deck of cards, does the browser really paint all 52 <g>s before compositing them?

3. In Qt, if I overlap 52 QML rectangles like a deck of cards, does the renderer only paint the parts of the rectangles that will be visible in the viewport? I was under the impression that this was the case, but I may be misunderstanding how the Qt QML scenegraph (or whatever it is called) works in practice.

edit: typo

> 1. If I overlap 52 html <div>s like a deck of cards, does the browser really paint all 52 div rectangles before compositing them?

Browsers don't typically do very good occlusion culling in general.

WebRender aims to change that, by using the hardware Z-buffer. :)

You may want to be careful with that approach.

Not all GPUs have the Z-Buffer fillrate to make that approach viable(esp on mobile). I've seen more than a few cases where it was actually faster to turn off Z-Buffering and do the overdraw. More than a few architectures share Z-Buffer bandwidth with other pipelines.

On tiled architectures it'll also increase your tile count which can impact your per-drawcall overhead.

For low-tri things like UI you're better off doing the rect-culling yourself in software(ideally on another thread) and only falling back to the Z-Buffer if you have actual 3D transforms that need per-pixel culling.

Not in our experience. WebRender 1 used to do the rectangle culling on CPU and it ended up being way slower than using the Z buffer on every architecture we tried, including mobile. (Overdraw was even worse.) There are a surprisingly large number of vertices on most pages due to glyphs and CSS borders. Note also that rounded rectangles are extremely common on the Web and clipping those in software is a big pain.

Generally, we are so CPU bound that moving anything to the GPU is a win. We had to fight tooth and nail to make WebRender even 50% GPU bound...

Fair enough, my data was from about 4 years ago so it may be out of date. There's some embedded GPUs that have some pretty 'interesting' architectures.

I would argue though if overdraw vs z-buffer hurt your performance then you are more than 50% GPU bound :).

> Not all GPUs have the Z-Buffer fillrate to make that approach viable(esp on mobile). I've seen more than a few cases where it was actually faster to turn off Z-Buffering and do the overdraw. More than a few architectures share Z-Buffer bandwidth with other pipelines.

Uh, mobile GPUs tend to be way, way ahead of desktop GPUs in Z-Buffer bandwidth (especially in relative terms)

and you can't increase your tile count; there's a fixed number of tiles in the frame buffer (unless you're referring to not drawing some tiles at all - in which case set your scissor rect appropriately. oh, and if you can find out the GPU's tile size round your rectangle up to cover whole tiles)

Not quite, tile based GPUs tend to be better on Z-Buffer bandwidth but not all mobile GPUs are true tile-based GPUs.

The number of tiles can definitely change, it's something Qualcomm calls out directly in one of their talks[1]. As your tile count increases so does your setup cost for your drawcalls.

[1] https://youtu.be/SeySx0TkluE?t=41

> 3. In Qt, if I overlap 52 QML rectangles like a deck of cards, does the renderer only paint the parts of the rectangles that will be visible in the viewport? I was under the impression that this was the case, but I may be misunderstanding how the Qt QML scenegraph (or whatever it is called) works in practice.

The rendering part of the question has been answered in another comment, but I would like to point out a couple things.

QML elements are `QObject`s with all the overhead[1] that comes with them. They are created even if they are not visible unless you have some `Loader` or C++ magic to prevent it. A QtWidget `QStyledItemDelegate` only had a single instance for all views/elements that used it. It could scale to millions of "cards" (assuming a QAbstractListModel was used) without any overhead[2].

So even if Qt manage to batch the draws and avoid painting everything, there is still a massive overhead in having such deck. I would suggest using a Loader and keeping only `n` cards loaded. If you want to minimize the performance impact.

[1] Object tree management, signal and slots, `n` connections for each QML expressions (recursively down to each tree leafs), memory impact, slower GC, etc.

[2] Assuming the model had the lazyloading functions implemented.

> 3. In Qt, if I overlap 52 QML rectangles like a deck of cards, does the renderer only paint the parts of the rectangles that will be visible in the viewport? I was under the impression that this was the case, but I may be misunderstanding how the Qt QML scenegraph (or whatever it is called) works in practice.

Nope, QML draws everything back-to-front with overdraw. From what I can tell, there is an ability to batch things into standard forward-renderer opaque/alpha passes, but that requires setting a flag QSGRenderNode::DepthAwareRendering which no builtin nodes set, from what I can tell, so the whole thing is skipped.

The core renderer code is here: http://code.qt.io/cgit/qt/qtdeclarative.git/tree/src/quick/s...

Search for m_useDepthBuffer and DepthAwareRendering and follow the trail -- the only use of the flag is from an example about raw OpenGL integration.

In a web browser, typically the layout engine generates a "display list" which is a list of drawing commands that includes all of the rectangles, borders, text, etc. It hands the display list to a graphics engine to be rasterized. The graphics code may optimize the display list before rasterizing it, doing things like removing items that are completely occluded.

>. If I overlap 52 html <div>s like a deck of cards, does the browser really paint all 52 div rectangles before compositing them?

Well, if they have alpha, it should.

There is probably some trickery going on to optimize e.g. completely hidden elements. But when you compare the two options of "loop through the rectangles from below and for each pixel in the rectangle set its color" and "for each pixel, determine the rectangle it is in and set its color accordingly" the amount of work is not much different. You could speed up the second strategy for nested rectangles by using a smart search structure, but this will most likely be trumped by specialized rendering hardware that can paint lots of rectangles very fast but isn't very good at branching logic.

AFAIK this is a common problem in rendering, be it CSS or games: how do you not render something that's not currently visible on screen. There are all sorts of tricks to calculate the "render only this" set as quickly as possible.

Congrats! Beyond the CSS engine itself, I also very much appreciate inside development stories like these. I'd also like to read a meta-story about the development efforts in terms of time spent, prior knowledge required etc., and CSS spec feedback, with a reflection on the complexity of implementing CSS from scratch.

Haven't even read it, just looked at the drawings and now i know how a browser parses css.

Really late to this discussion but wow. Having worked as a web developer/tech writer and editor, this writeup pushed all my buttons. High-level concepts broken down in an exciting way. Nothing turns me off quite as much as clicking on a tech blog post and suddenly feeling like I am reading a whitepaper or portion of someone's Ph.D. dissertation. This is the kind of post I love — stripping things down to the nuts and bolts but keeping me engaged in a way that gives me that excited feeling in the pit of my stomach like I am watching something important take shape.

This is a fantastic technology and I feel like Servo has a pretty amazing future ahead of it. Exciting stuff.

I wish this post included some benchmarks or measurement.

Just asked Emilio on IRC for some quick numbers.

emilio: pcwalton: wrt gecko we have stuff like https://bugzilla.mozilla.org/show_bug.cgi?id=1342220#c25 and similar

emilio: pcwalton: there's also the tp6 numbers, though those also measure CSS parsing and other stuff that isn't the style engine per se

pcwalton: emilio: our tp6 numbers are improved over Gecko at this point, yes? :)

emilio: pcwalton: amazon by a huge amount, facebook not yet I believe, but patches are on the queue that should make it turn around :)

emilio: pcwalton: happy to talk with mjs about impl details too, if he wants. I know a bit of WK stuff too :)

This is an easy benchmark for rejecting a lot of selectors:


Firefox with STYLO_THREADS=1 gets about 160ms on my machine, which is basically parity with safari and chrome. With the parallelism, Firefox gets 40ms. :-)

You can also simulate sequential mode in recent nightlies by ctrl-clicking and loading the tab in the background (we disable parallelism for background loads).

Thanks, that is the kind of thing I was looking for (though a bit of a narrow test). Cool to hear that this gets a solid speedup from parallelism.

(for more context, Facebook has a very small style recalc time, so we're bottlenecked right now on the time it takes for us to parse CSS, and build the data structures for selector-matching and invalidation, and the patches on the queue are https://github.com/servo/servo/pull/18191)

Thanks for the pointer! Sadly, I couldn't find any concrete numbers or tests to run at that link.

I'm mildly interested in impl details. But I'm more interested in what it speeds up and how it was measured (ideally in a form where I could try my own old-to-new and cross-browser comparisons, but just numbers would be interesting too).

The STR for that bug are in comment 3, but I can try to get something better when I'm back home (on my phone now).

We get significant speed ups from parallelism and such on big doms over all during page load, but we also get speedups from dynamic change handling implementing smarter invalidation than Gecko, which basically restyles the whole subtree / every sibling if it finds a relevant combinator affected by a change.

WebKit does much better than Gecko for class and other attribute changes, at least for descendants, where you go down with the relevant selectors. Stylo's system is relatively similar, but doing selector matching ltr, and handling the same way state and id changes.

You can see components/style/invalidation/element in the servo repo for the relevant code in that regard.

I suspect you will hear a lot of numbers around the time of the 57 launch, and until then people will keep quiet as they try to make the numbers that get revealed as good as possible.

For old numbers, you can get some in some of my old presentations at conferences. I think the LCA 2015 one[1] has the most details in that regard.

[1] https://www.youtube.com/watch?v=7q9vIMXSTzc

Very nice writeup! One thing I found strange is that multi-threading is ELI5, but the reader is expected to know what DOM means.

Parallel processing demonstrates benefits only if you have physical cores to run code on them. If just one core is available for the app then parallel processing is a loss due to thread preemption overload.

Is there any real life examples of achieved speedup?

> Is there any real life examples of achieved speedup?

Yes, most pages have significant speedups during the initial restyle especially. Wikipedia pages tend to get 3x or so improved style recalc time on typical systems with Intel quad core CPUs, for example.

Of course, styling is only one part of the whole, so your overall speedups are limited by the rest of the rendering pipeline. That's Amdahl's Law for you. But that's no reason to not parallelize at all; it just means that we have lots more work to do once this is done :)

> Wikipedia pages tend to get 3x or so improved style recalc time on typical systems with Intel quad core CPUs, for example.

Do you have absolute numbers? If it's 100 vs. 300 ms, that'd be huge. If it's 1 vs. 3 ms, I don't really care.

On my Haswell MBP, Stylo can take first pageload styles on Wikipedia pages from 150-200 ms down to 50 ms or so, last I checked.

It's even better now - we get down to about 14ms with various recent improvements in style sharing.

Okay, that's seriously awesome. Nice job!

First, Stylo has sequential mode. Choice of sequential and parallel is entirely modular. It should be easy to use sequential when you are on single core.

Stylo actually achieves nearly linear speedup if you just measure styling. It's really impressive.

Servo has their argument from power saving as well; do the work across many cores and go back to cpu power saving modes faster.

The sequential traversal code is separate and does not have parallel overhead. On a single core machine you will get the sequential traversal.

Note that Stylo is still faster than the old Gecko system even in sequential mode.

I have similar concerns. I'm all for using my machine to its fullest, but in large, applications like web browsing should be an additional thing I am doing on my computer, not something that thinks it can take the full computer.

Though, I have to admit I am also a little torn on this. Yes, browsing is typically done "during compile" or some other task. However, I have also begun doing most of that work remotely so that I can save battery on my laptop. To that end, it is now less of a concern on preserving cores for my tasks that actually need it.

This problem is the responsibility of the OS scheduler. One of the most important goals of any scheduler should be to give foreground UI processes that spend a lot of time sleeping in event loops priority. So if your OS scheduler is properly designed (as the ones in all major OS's are) a threaded browser shouldn't interfere with the system's responsiveness.

If that’s what you want to do, the you should ‘nice’ your web browser (or use whatever priority mechanism your environment had available). Artificially limiting performance is absolutely the wrong approach!

It isn't artificially limiting. If my browser didn't need all of my cores to perform, we wouldn't be having this discussion. :)

And I seriously question whether my browser needs this to perform well. I am not completely closed to the idea, but I am highly skeptical.

The fact that it's been done is some evidence that it's required. If it weren't, it's hard to imagine why limited resources would be wasted on it!

Almost every modern device running a browser has multiple cores, and that trend is almost certainly going to increase – so it definitely feels like allowing a core part of the web platform to expand across cores will be beneficial.

I challenge that. That it was done is as much evidence that it was doable as that it was required.

It is also a bloat race. Web pages are getting increasingly complicated. With very little benefit to end users. I'd wager a growing number of the cycles and network requests are going to tracking, nowdays. Not to mention the ui language doing more and more contortions to give us a page that could be much more succinctly described.

None of this is to say I want it stopped. I just have a feeling of concern that this is leading to an ever increasing march to faster and faster machines to do basic work.

Do you have a similar concern for all multithreaded programs?

In general, yes. That my computer can do many things is something I take advantage of as a user. The programs should use them to their advantage, but by and large, most programs do not need all of the processing capabilities of my computer, so I expect they should play well together. (Indeed, it takes effort to get the GPU of my computer to help out with anything.)

This is the job of your operating system's scheduler - to divide the limited resource of your computer's CPU time among different competing tasks.

In the modern era, a program cannot take more than its fair share of CPU time - otherwise, a runaway program could easily render your computer nearly unusable. (Linux, macOS and Windows all use preemptive multitasking.)

The way to tell your operating system what you desire prioritized is, on *nix systems, 'nice'.

I'm well aware of that. I also know that, in general, having to schedule things slows them down. If everything I'm running is trying to schedule something on my entire machine, it is giving my OS more work. Which will, by necessity, be harder to schedule and slow things down.

I'm not necessarily against all of this, but I'm also not eagerly embracing more crap to slow down my machine for no apparent reason.

You are assuming that your OS scheduler is nearly at capacity, it isn't. And moreover your CPU is will spend most of it's life idle.

I could almost guarantee you this is not the bottleneck of any modern setup.

I can guarantee you that I have jumped over to my browser when doing a compile that is definitely limited by my machine. I'm ok doing this knowing that I will only take up so much work on my machine. I do not go off and kick off another giant compile at the same time.

Now, if the browser becomes more and more consuming, hitting my browser could get closer and closer to kicking off another giant compile.

Keep in mind that by parallelizing work the browser can finish the work it's doing faster, which means your CPU can spend more time idle, which is better for power consumption.

That is a big "ostensibly" there. Hard data showing this would be somewhat nice.

And, as others, you are also assuming I was not pegging out my machine doing something by choice.

Metajack mention power usage with parallelization briefly here[0], but doesn't provide the data.

[0] https://youtu.be/7q9vIMXSTzc?t=35m (2015)

To be clear, the reasoning is sound on this argument. I'm skeptical due to it never having delivered data, though. :(

I want it to be true. I expect that someone should be able to show this with data. I've never seen it done, though.

You can collect much of this data for yourself on a GNU Linux system by using the cgroups feature of the Linux kernel which is more powerful than nice https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.t... .

Using the various CPU* options, you can turn on CPUAccounting, pin a given process and its children/threads to a range of CPUs, place CPUQuotas and so forth. There's a lot of power and granularity there.

I know, anecdotally, that devops/sysadmin folk use this to also audit and test energy consumption of processes over time. (Certain popular PID 1 programs have a run tool that allows you to easily, dynamically change and audit process resource usage.)

My typical use case, for instance, is auditing and managing various Emacs' processes lifetimes while running potentially racy elisp code.

I know the data can be gathered. Could go even more direct and measure power usage of the computer before and after the upgrade.

It would be nice if everyone pushing some of these would collect some data for their claims, though. Especially if any of them have better setups (read: more than the single machine I have).

It's such a shame Firefox (including the nightlies) kills my Mac (making most other applications hang/break), since the new versions are otherwise way better than Chrome.

Does anyone know what it is about Firefox that makes the rest of my system unable to spawn new processes?

That’s weird. If you’re not attached to your profile you could try resetting it (go to about:support) and see if that fixes it. Otherwise I’d file a bug.

"Refreshing" your Firefox profile from about:support is not as scary as it sounds. :) It creates a clean profile and imports your old profile's bookmarks, passwords, and browsing history. You just lose any settings tweaks and old add-ons you had. Your old profile directory is also backed up in case something goes wrong during the new profile import.

Seconded, I'm regularly opening FDE these days and never noticed anything odd on my mac.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact