> 4. Paint the different boxes. Is this really what happens under the hood? 1. I...

pcwalton · on Aug 22, 2017

> 1. If I overlap 52 html <div>s like a deck of cards, does the browser really paint all 52 div rectangles before compositing them?

Browsers don't typically do very good occlusion culling in general.

WebRender aims to change that, by using the hardware Z-buffer. :)

vvanders · on Aug 22, 2017

You may want to be careful with that approach.

Not all GPUs have the Z-Buffer fillrate to make that approach viable(esp on mobile). I've seen more than a few cases where it was actually faster to turn off Z-Buffering and do the overdraw. More than a few architectures share Z-Buffer bandwidth with other pipelines.

On tiled architectures it'll also increase your tile count which can impact your per-drawcall overhead.

For low-tri things like UI you're better off doing the rect-culling yourself in software(ideally on another thread) and only falling back to the Z-Buffer if you have actual 3D transforms that need per-pixel culling.

pcwalton · on Aug 22, 2017

Not in our experience. WebRender 1 used to do the rectangle culling on CPU and it ended up being way slower than using the Z buffer on every architecture we tried, including mobile. (Overdraw was even worse.) There are a surprisingly large number of vertices on most pages due to glyphs and CSS borders. Note also that rounded rectangles are extremely common on the Web and clipping those in software is a big pain.

Generally, we are so CPU bound that moving anything to the GPU is a win. We had to fight tooth and nail to make WebRender even 50% GPU bound...

vvanders · on Aug 22, 2017

Fair enough, my data was from about 4 years ago so it may be out of date. There's some embedded GPUs that have some pretty 'interesting' architectures.

I would argue though if overdraw vs z-buffer hurt your performance then you are more than 50% GPU bound :).

oshepherd · on Aug 22, 2017

> Not all GPUs have the Z-Buffer fillrate to make that approach viable(esp on mobile). I've seen more than a few cases where it was actually faster to turn off Z-Buffering and do the overdraw. More than a few architectures share Z-Buffer bandwidth with other pipelines.

Uh, mobile GPUs tend to be way, way ahead of desktop GPUs in Z-Buffer bandwidth (especially in relative terms)

and you can't increase your tile count; there's a fixed number of tiles in the frame buffer (unless you're referring to not drawing some tiles at all - in which case set your scissor rect appropriately. oh, and if you can find out the GPU's tile size round your rectangle up to cover whole tiles)

vvanders · on Aug 22, 2017

Not quite, tile based GPUs tend to be better on Z-Buffer bandwidth but not all mobile GPUs are true tile-based GPUs.

The number of tiles can definitely change, it's something Qualcomm calls out directly in one of their talks[1]. As your tile count increases so does your setup cost for your drawcalls.

[1] https://youtu.be/SeySx0TkluE?t=41

Elv13 · on Aug 22, 2017

> 3. In Qt, if I overlap 52 QML rectangles like a deck of cards, does the renderer only paint the parts of the rectangles that will be visible in the viewport? I was under the impression that this was the case, but I may be misunderstanding how the Qt QML scenegraph (or whatever it is called) works in practice.

The rendering part of the question has been answered in another comment, but I would like to point out a couple things.

QML elements are `QObject`s with all the overhead[1] that comes with them. They are created even if they are not visible unless you have some `Loader` or C++ magic to prevent it. A QtWidget `QStyledItemDelegate` only had a single instance for all views/elements that used it. It could scale to millions of "cards" (assuming a QAbstractListModel was used) without any overhead[2].

So even if Qt manage to batch the draws and avoid painting everything, there is still a massive overhead in having such deck. I would suggest using a Loader and keeping only `n` cards loaded. If you want to minimize the performance impact.

[1] Object tree management, signal and slots, `n` connections for each QML expressions (recursively down to each tree leafs), memory impact, slower GC, etc.

[2] Assuming the model had the lazyloading functions implemented.

Jasper_ · on Aug 22, 2017

> 3. In Qt, if I overlap 52 QML rectangles like a deck of cards, does the renderer only paint the parts of the rectangles that will be visible in the viewport? I was under the impression that this was the case, but I may be misunderstanding how the Qt QML scenegraph (or whatever it is called) works in practice.

Nope, QML draws everything back-to-front with overdraw. From what I can tell, there is an ability to batch things into standard forward-renderer opaque/alpha passes, but that requires setting a flag QSGRenderNode::DepthAwareRendering which no builtin nodes set, from what I can tell, so the whole thing is skipped.

The core renderer code is here: http://code.qt.io/cgit/qt/qtdeclarative.git/tree/src/quick/s...

Search for m_useDepthBuffer and DepthAwareRendering and follow the trail -- the only use of the flag is from an example about raw OpenGL integration.

mbrubeck · on Aug 22, 2017

In a web browser, typically the layout engine generates a "display list" which is a list of drawing commands that includes all of the rectangles, borders, text, etc. It hands the display list to a graphics engine to be rasterized. The graphics code may optimize the display list before rasterizing it, doing things like removing items that are completely occluded.

coldtea · on Aug 22, 2017

>. If I overlap 52 html <div>s like a deck of cards, does the browser really paint all 52 div rectangles before compositing them?

Well, if they have alpha, it should.

yorwba · on Aug 22, 2017

There is probably some trickery going on to optimize e.g. completely hidden elements. But when you compare the two options of "loop through the rectangles from below and for each pixel in the rectangle set its color" and "for each pixel, determine the rectangle it is in and set its color accordingly" the amount of work is not much different. You could speed up the second strategy for nested rectangles by using a smart search structure, but this will most likely be trumped by specialized rendering hardware that can paint lots of rectangles very fast but isn't very good at branching logic.

dmitriid · on Aug 22, 2017

AFAIK this is a common problem in rendering, be it CSS or games: how do you not render something that's not currently visible on screen. There are all sorts of tricks to calculate the "render only this" set as quickly as possible.