"Platform text rendering (CoreText, DirectWrite) not performant enough" That abo...

raphlinus · on Jan 30, 2018

I'll have to try this out. My experiments indicated that DirectWrite could not keep up with drawing on a 4k monitor at 60 Hz, though was ok at a smaller window. I think it might depend a lot on driver too. I'll see if I can instrument the xi-win prototype to give performance numbers. I do note that your lines aren't very wide, but still, in my tests I wasn't seeing anything like 500fps. DirectWrite does at least seem to use the GPU, while Core Text appears to rely entirely on software rendering.

Skia is definitely capable of good performance, as it resolves down to OpenGL draw calls, pretty much the same as Alacritty, WebRender, and now xi-mac. One thing though is that it doesn't do fully gamma-corrected alpha compositing, so it's not anywhere near pixel-accurate to CoreText rendering.

Doing proper measurement is not easy, but seems worth doing.

c-smile · on Jan 30, 2018

Let me know if you need more tests around this.

If to consider more complex DOM cases then you can try https://notes.sciter.com/ application. Or to run it from SDK directly: https://github.com/c-smile/sciter-sdk/blob/master/bin/32/not...

Notes window layout resembles IDE layout pretty close. And Notes works on Window, Mac and Linux so you can compare different native text rendering implementations (I mean without conventional browsers overhead).

jwilm · on Jan 30, 2018

> Skia is definitely capable of good performance, as it resolves down to OpenGL draw calls, pretty much the same as Alacritty, WebRender, and now xi-mac.

This claim is a bit surprising to me. I was under the impression Skia is an immediate mode renderer which ends up issuing a lot GL calls that could be avoided with a retained mode renderer.

kllrnohj · on Jan 31, 2018

An immediate-style API does not mean the work is performed immediately. Skia defers and reorders internally to batch commands so minimal GL state changes are required.

That said a "lot of GL calls" for a 2D UI is actually a trivially insignificant number of GL calls to the actual GPU/driver for most cases. That's basically never the bottleneck unless you've done something insanely wrong.

IshKebab · on Jan 31, 2018

I wouldn't be so sure. A single draw call is surprisingly slow. If you drew each glyph with one draw call that could be hundreds which will definitely cause slowness.

kllrnohj · on Jan 31, 2018

"hundreds" is actually what I meant by insignificant to a modern driver.

For example: https://images.anandtech.com/graphs/graph11223/86100.png

Granted that's a 1060 but since we're looking at driver CPU overhead that shouldn't matter much. So 2.3 million draw calls per second in DX11 single threaded.

It's not until you start getting into the 10k+ draw calls a frame that you are putting your 60fps at risk.

It's often worth the work to avoid this anyway, after all faster is better if you're an engine/renderer, but it takes a lot for it to be an actual _problem_

IshKebab · on Feb 3, 2018

Yeah, so 2 million, cut that down by 10 for integrated graphics. Then you need 60 fps, that brings it down to 3000. If you're just doing empty draw calls and nothing else. Throw in WebGL and hundreds is really significant.

vardump · on Jan 30, 2018

> On window caption you see real FPS that is around 500 frames per second for the whole screen for the sample.

Sure, 500 fps, but that's not the important part. Latency would be. So at 500 fps with how much output latency to the display?

c-smile · on Jan 30, 2018

For that particular text editor latency of char typed to appear on screen will be 16ms (normal 60 FPS refresh rate).

Editor keeps each line in separate <text> DOM elements (like <p> but no margins and only text inside).

So we just need to relayout one particular line in order to show typed character.

vardump · on Jan 30, 2018

> For that particular text editor latency of char typed to appear on screen will be 16ms (normal 60 FPS refresh rate).

That's very impressive, if so. On Windows 7 with DWM (GPU display compositor) switched off?

How did you validate and measure latency?

Someone correct me if I'm wrong, but I'm under impression Windows 10 DWM adds additional latency making 16 ms latency unachievable.

c-smile · on Jan 30, 2018

Not sure I understand your concerns. If you have DirectX there then you will have the same GPU rendering.

If that's about CPU rasterizers then Direct2D/WARP and Skia rasterizers are pretty good.

Problem is that if you have two monitors of the same size but one of "standard" 96ppi and another is, say, Retina grade (300+ ppi) then GPU rendering is the only reasonable option. Number of pixels to rasterize is 9 times more in Retina case. We do not have CPU performance increased 9 times...

kllrnohj · on Jan 31, 2018

It would be dependent on how deep the pipeline is. In the fairly common case of 1 to 2 app threads (ui + rendering) + GPU work you're looking at a pipeline depth of 3, so if it's doing 500fps that must mean no stage of the pipeline is taking longer than 2ms. With 3 pipeline stages that puts your worst-case latency at 6ms.

z3t4 · on Jan 30, 2018

platform rendering is very hard to beat. i tried to make my own bitmap text rendering but the native is much faster. and you wont notice any difference between 1ms and 10ms due to monitor refresh rate and human perception. text editors like notepad++ and sublime is already at ~5ms input latency. i think text rendering is already a solved problem. and not the bottleneck in for example browser based text editors.

jamesrom · on Jan 30, 2018

This is awesome. Thank you for sharing.