Hacker News new | past | comments | ask | show | jobs | submit login

"Platform text rendering (CoreText, DirectWrite) not performant enough"

That above needs some reliable proof to be honest.

I am testing this claim with Sciter (https://sciter.com)... On Windows Sciter uses Direct2D/DirectWrite (with an option to render on DirectX device directly) and/or Skia/OpenGL. On Mac it uses Skia/OpenGL with an option to use CoreGraphics. On Linux Cairo or Skia/OpenGL.

Here is full screen text rendering on DirectX surface using Direct2D/DirectWrite primitives. Display is 192 ppi - 3840x2160:

https://sciter.com/temp/plain-text-syntax.png

On window caption you see real FPS that is around 500 frames per second for the whole screen for the sample. CPU consumption is 6% as it constantly does rendering in a "game event loop" style. In reality (render on demand as in sciter.exe) CPU consumption is 3% on kinetic scroll (120 FPS).

As you see platform text rendering is quite capable of drawing texts on modern hardware.

As of problems of text layouts ... Not that CPU and memory consuming too to be honest.

Sciter uses custom ::mark(name) pseudo-elements that allow to style arbitrary text runs/ranges without the need of creating artificial DOM elements (like <span> for each "string" in MS Visual Code), see: https://sciter.com/tokenizer-mark-syntax-colorizer/

To try this by yourself: get https://github.com/c-smile/sciter-sdk/blob/master/bin/32/sci... (sciter on DirectX) and sciter.dll nearby. Download https://github.com/c-smile/sciter-sdk/blob/master/samples/%2... and open it with the sciter-dx.exe.




I'll have to try this out. My experiments indicated that DirectWrite could not keep up with drawing on a 4k monitor at 60 Hz, though was ok at a smaller window. I think it might depend a lot on driver too. I'll see if I can instrument the xi-win prototype to give performance numbers. I do note that your lines aren't very wide, but still, in my tests I wasn't seeing anything like 500fps. DirectWrite does at least seem to use the GPU, while Core Text appears to rely entirely on software rendering.

Skia is definitely capable of good performance, as it resolves down to OpenGL draw calls, pretty much the same as Alacritty, WebRender, and now xi-mac. One thing though is that it doesn't do fully gamma-corrected alpha compositing, so it's not anywhere near pixel-accurate to CoreText rendering.

Doing proper measurement is not easy, but seems worth doing.


Let me know if you need more tests around this.

If to consider more complex DOM cases then you can try https://notes.sciter.com/ application. Or to run it from SDK directly: https://github.com/c-smile/sciter-sdk/blob/master/bin/32/not...

Notes window layout resembles IDE layout pretty close. And Notes works on Window, Mac and Linux so you can compare different native text rendering implementations (I mean without conventional browsers overhead).


> Skia is definitely capable of good performance, as it resolves down to OpenGL draw calls, pretty much the same as Alacritty, WebRender, and now xi-mac.

This claim is a bit surprising to me. I was under the impression Skia is an immediate mode renderer which ends up issuing a lot GL calls that could be avoided with a retained mode renderer.


An immediate-style API does not mean the work is performed immediately. Skia defers and reorders internally to batch commands so minimal GL state changes are required.

That said a "lot of GL calls" for a 2D UI is actually a trivially insignificant number of GL calls to the actual GPU/driver for most cases. That's basically never the bottleneck unless you've done something insanely wrong.


I wouldn't be so sure. A single draw call is surprisingly slow. If you drew each glyph with one draw call that could be hundreds which will definitely cause slowness.


"hundreds" is actually what I meant by insignificant to a modern driver.

For example: https://images.anandtech.com/graphs/graph11223/86100.png

Granted that's a 1060 but since we're looking at driver CPU overhead that shouldn't matter much. So 2.3 million draw calls per second in DX11 single threaded.

It's not until you start getting into the 10k+ draw calls a frame that you are putting your 60fps at risk.

It's often worth the work to avoid this anyway, after all faster is better if you're an engine/renderer, but it takes a lot for it to be an actual _problem_


Yeah, so 2 million, cut that down by 10 for integrated graphics. Then you need 60 fps, that brings it down to 3000. If you're just doing empty draw calls and nothing else. Throw in WebGL and hundreds is really significant.


> On window caption you see real FPS that is around 500 frames per second for the whole screen for the sample.

Sure, 500 fps, but that's not the important part. Latency would be. So at 500 fps with how much output latency to the display?


For that particular text editor latency of char typed to appear on screen will be 16ms (normal 60 FPS refresh rate).

Editor keeps each line in separate <text> DOM elements (like <p> but no margins and only text inside).

So we just need to relayout one particular line in order to show typed character.


> For that particular text editor latency of char typed to appear on screen will be 16ms (normal 60 FPS refresh rate).

That's very impressive, if so. On Windows 7 with DWM (GPU display compositor) switched off?

How did you validate and measure latency?

Someone correct me if I'm wrong, but I'm under impression Windows 10 DWM adds additional latency making 16 ms latency unachievable.


Not sure I understand your concerns. If you have DirectX there then you will have the same GPU rendering.

If that's about CPU rasterizers then Direct2D/WARP and Skia rasterizers are pretty good.

Problem is that if you have two monitors of the same size but one of "standard" 96ppi and another is, say, Retina grade (300+ ppi) then GPU rendering is the only reasonable option. Number of pixels to rasterize is 9 times more in Retina case. We do not have CPU performance increased 9 times...


It would be dependent on how deep the pipeline is. In the fairly common case of 1 to 2 app threads (ui + rendering) + GPU work you're looking at a pipeline depth of 3, so if it's doing 500fps that must mean no stage of the pipeline is taking longer than 2ms. With 3 pipeline stages that puts your worst-case latency at 6ms.


platform rendering is very hard to beat. i tried to make my own bitmap text rendering but the native is much faster. and you wont notice any difference between 1ms and 10ms due to monitor refresh rate and human perception. text editors like notepad++ and sublime is already at ~5ms input latency. i think text rendering is already a solved problem. and not the bottleneck in for example browser based text editors.


This is awesome. Thank you for sharing.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: