Very happy to see this become public, and it looks very impressive. I'm blushing a bit. Patrick and I indeed had very stimulating conversations, but all the hard work figuring out how to map rendering efficiently to a GPU is Patrick's.
This is an exciting demonstration, but I have one worry that's unaddressed: I'm assuming all the other nitpicky details of hinted font rendering etc is handled correctly by all the renderers compared? I mean, they all provide pixel-identical output?
Otherwise the point is mostly useless. A faster rendering implementation is almost useless if the output doesn't look as nice. I wouldn't care if the text I'm reading over 5 minutes takes 200ms vs 800ms to render.
Pathfinder's rendering is best-in-class, equaling or exceeding the quality of the other renderers. FreeType, stb_truetype, font-rs, and Pathfinder all use exact trapezoidal area calculations for antialiasing. Most GPU algorithms use lower-quality multisample antialiasing, but matching the system renderer in quality is one of the key goals of Pathfinder.
Regarding hinting, hinting is mostly a transformation applied to the vectors before rasterization, so it's (again, mostly) independent of the actual vector rasterizer. Note that many systems do no hinting at all. To make the benchmarks fair, hinting was disabled for the libraries that support it.
Having done some work that involved comparing a number of text rendering libraries in the past, I'm aware of the minute differences in rendering and hinting (even when the output should be the same). Could you briefly mention how hinting here would compare to ClearType?
I've been planning lately on writing my own little 2d rasterizer for fun. While I really like the trapezoidal signed-area method, when reading STB's write-up I noticed that it can overestimate coverage when a path self-intersects or when subpaths intersect. Does Pathfinder solve this?
Between that and clipping paths, I've been thinking of sticking with multisampling, as nice as the exactness of the trapezoidal algorithm can be.
Pathfinder doesn't do anything special in that case; it has the same limitations as stb_truetype. There's probably a way to fix these issues if they come up in practice.
I would recommend doing the trapezoidal area approach in spite of these limitations. Having 256 shades of gray instead of 8 or so makes a big difference for all paths, not just pathological ones. It often ends up faster than MSAA, too.
I think one can mix hinters and renderers from different libraries. If it's all about fast vector rendering I don't see why they couldn't just use freetype's hinter (maybe they do).
You can. For my programming font [0], I provide versions of it that bake in the output from Freetype's hint interpreter applied at different point sizes and display DPIs. This way platforms such as macOS that don't always apply hinting can show it as though they did.
> I mean, they all provide pixel-identical output?
I wonder if this is even possible once you move it to the GPU. Reading about some of the fingerprinting methods in browsers lately, many rely on different GPU's doing different things with the same input.
- Poorly defined APIs/types that allow vendors to diverge.
- Legit driver bugs.
- Vendors preferring optimizations to following the spec(including skipping memzero requirements).
- Hardware that doesn't behave how it should(fp32 falling back to fp16 silently for instance).
Overall you don't really win any points in the market for correctness, instead performance and efficiency are highly valued.
It depends on what kind(s) of development you are doing. I've become more comfortable on mac and linux than on windows... but even in windows now, I'm more inclined to be in a bash prompt (via git, and conemu) than I am a console prompt.
Node is my current dev environment of choice, but I'm mostly doing service level things or web-based applications. So it fits well for that use case. Of course it also shapes my choices if I did do desktop stuff as I'd more likely reach for ReactNative or Electron. Even if I'm very interesting in getting my feet wet with go and rust.
Eh, I can't blame them terribly for introducing it. It was pre-Vulkan so they didn't have any open options when it came to a DX12/Vulkan-level graphics API.
There's a Vulkan-on-Metal implementation[1], and I'd have absolutely no complaints about the platform if there was simply an open source equivalent (well, maybe geometry shaders).
Given Vulkan is in some ways lower-level than Metal, AIUI, surely there's a significant cost that would make targetting Metal directly a better solution for things performance-critical?
Most of the critical low level of components of Vulkan are there: multiple command queues with manual synchronization, resource heaps (although they are optional for Metal), mixed compute/render support. Mostly Metal simplifies some of the distinctions Vulkan drivers may dump on the user: for example, Metal has only one command queue type while a Vulkan driver may force the user to manage several.
The biggest hurdle I can think of is geometry/tesselation shaders. Newer versions of Metal support tesselation shaders, but they work rather differently than Vulkan's (they're based on a compute shader, instead of the Hull/Domain split other APIs use). Geometry shaders aren't supported at all.
Oh, yes, lets do it!
Having worked in OpenGL-DirectX for decades we suffered immensely from "standard by committee" and lack of competition in OpenGL. When I was young I was naive and thought things would improve, but it was so painful.
Most people doing graphics have to implement an abstraction layer anyway for handling DirectX, vulkan or old OpenGL.
Once you have it, it is not so much work adding one additional back end.
I now believe competition is great and one of the things that made Europe for example to prosper while China stagnated for 5 centuries.
If you say for example: "We only support text shaders because it is simple and works well enough" and someone else uses and API or DOM and it goes 200 times faster, I want the option to let the first guys down and pick the 200X improvement version, not having to stand the original "because it is the standard and we don't care".
> Once you have it, it is not so much work adding one additional back end.
Yes, it is. I would have been able to develop Pathfinder a lot faster if I didn't have to write and debug the "compute-shader" abstraction.
> If you say for example: "We only support text shaders because it is simple and works well enough" and someone else uses and API or DOM and it goes 200 times faster, I want the option to let the first guys down and pick the 200X improvement version, not having to stand the original "because it is the standard and we don't care".
Vulkan exposes more of the hardware than Metal does. Having separate tessellation evaluation/tessellation control shaders is important to the way that Pathfinder works.
It's one thing to have a vendor-specific API that's better than the standard. It's quite another to have a vendor-specific API that's worse...
Android with Vulkan support means Android 7, currently available on 1.2% of worldwide devices, hardly a market worth spending resources on. Also it is an optional API, Android 7 compliant devices aren't required to actually provide it, apps are supposed to check it.
Windows support is done by GPU vendors, not Microsoft and is only supported on Win32, not UWP applications.
Sony doesn't plan to support Vulkan, PS* APIs are much better.
Nintendo did introduce support for the Switch, but they are so confident on it, that they also have NVN, which offers much better control over the hardware.
"NVIDIA additionally created new gaming APIs to fully harness this performance. The newest API, NVN, was built specifically to bring lightweight, fast gaming to the masses."
I couldn't tell you how much impact this actually has, but Vulkan/D3D12/Metal have to be conservative with pipeline state changes, and have to manage the layout of various hardware data structures for you.
They've gotten partway there with things like D3D12 descriptor heaps, but hardware-specific APIs just don't have those limitations.
An afterthought is pretty strong word. It has direct equivalents to most of Vulkan's concepts (resource heaps, command buffers and queues, etc.). It's definitely miles ahead of OpenGL on any platform. What Vulkan features do you think can't be added into Metal without effectively trashing the whole thing?
No, you're correct, afterthought is not the word I want to use. Apples actions certainly show forethought and planning. I meant instead that the graphics support overall has felt like an afterthought: radar bugs open without meaningful response for months, outdated graphics hardware and drivers, no option for choice between brands when that means you can't run CUDA on "pro" computers, no support for vulkan (though I think there's an official mapping Vulkan into Metal that hasn't materialized yet)... the list gets tiring to read after 15 years of waiting for apple to actively support their software/hardware combos to their equivalent capacities on windows, let alone supporting recent hardware or novel apis.
However, I can't help but seeing it modeled on the Direct X vendor lock-in model that starved Apple of games (and users because of those games). While it may not be a sign of change in Apple, it is a sign to me I shouldn't depend on them to provide products that meet my needs as a cross-platform developer. All those years I invested in a neglected OpenGL I should have devoted instead to directx; perhaps I would be less bitter now.
For what it's worth, during the development of Pathfinder, I have spent more time working around OpenGL driver bugs on Apple systems (at least 4 or 5) than any other system (zero).
I don't own a Mac, so I haven't actually tried Metal.
But from the documentation, it looks like quite a nice and simple API that also exposes the modern capabilities of shaders. It would probably make a great base for a new webgpu API.
This existence of Metal isn't really an issue. The real issue is that apple has silently shifted to a mode of only supporting Metal, not implementing vulkan and neglecting opengl.
This looks impressive, but does anybody know why "exact coverage" is apparently considered the gold standard for rendering vector graphics? Mathematically, computing pixel coverage corresponds to sampling a box filtered version of a characteristic function.
In practice I would expect, say, a gaussian filter to be both easier to approximate and less prone to aliasing artifacts. Apparently that expectation is completely wrong though, since nobody seems to implement it that way! What's so special about vector graphics that makes the box filter behave well?
It would be interesting to connect Pathfinder to Alacritty. They accelerate two different things: Pathfinder accelerates the glyph rasterization (converting the vector outlines to bitmaps), while Alacritty accelerates the glyph compositing (blitting the bitmaps onto the screen).
Would it change much? I imagine Alacritty doesn't rasterize the same glyphs over and over if they repeat and keeps some kind of Atlas, so using Pathfinder would just give a boost when creating that Atlas, which in the grand scheme of thing wouldn't change much?
> Instead, it is expected that users of Alacritty make use of a terminal multiplexer such as tmux.
Alacritty does one thing well. If you're using a terminal multiplexer or tiling window manager anyway, you're probably not using tabs; instead hoping there's a way to hide the 'tab bar' if only one is open. If you're using tmux, it has scrollback.
I've been using Alacritty without tmux or any other way of achieving scrollback, and I like it. The few times I've thought 'damn, I need to scroll back to that thing' it's because I should have done it differently anyway.
It doesn't appear to generate text properly on my Mid 2014 Macbook, however. I ended up with what appeared to be a red channel from random GPU memory when I pressed the screenshot key.
Thanks for the report! Could you file a GitHub issue mentioning your graphics hardware (which you can get from System Profiler)? This is the kind of GPU driver issue I was hoping to be able to get wide coverage for. :)
Didn't work first with my Zenbook and Arch Linux, when using the Intel i915 GPU, but just trying again with primusrun using the Nvidia 620M just worked.
The Intel just doesn't support modern enough OpenGL:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CompileFailed("Tessellation control shader", "0:11(10): error: GLSL 4.10 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.40, 1.50, 3.30, 1.00 ES, and 3.00 ES\n\u{0}")', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:837
Very nice work and awesome to see so many interesting Rust projects popping out every week.
> Didn't work first with my Zenbook and Arch Linux, when using the Intel i915 GPU, but just trying again with primusrun using the Nvidia 620M just worked.
Cool! Great to see successful Linux compatibility :)
Looks like I'll have to wait for the new Mesa to land -- no support for GL > 3.3 on my Intel cards. When I get it, I'll let you know how it works on Intel.
I started a project last night and ran into dependency hell because my distro's compiler was on version 1.14 instead of 1.15, so it's not always that easy.
I did. It's ok for now because the ecosystem is still maturing, but a system language will need to learn to live with the system, not be in it's own isolated world like ruby/python/Java.
Actually, your distribution needs to learn to live with the world rather than isolating itself to a stagnant point in history. If it's going to ship Rust in the repository, it needs to ensure that it always ships the latest version. If it doesn't plan on doing that, it should ship rustup instead. Otherwise, there's very little point in shipping Rust as you'll limit yourself to only being able to compile older versions of software. Ain't nobody got time to manage multiple versions of the same project for the sake of system X and Y having different versions of the Rust compiler.
The Rust ecosystem is able to make leaps and bounds on a regular basis precisely because we aren't limiting ourselves to ancient versions of the Rust compiler because the system we use isn't technologically savvy enough to keep up with the times. Rust basically follows the Internet age of development, whereas point release distributions are stuck in a metaphorical floppy disk era (and often times still use mailing lists).
Rustup is a major tool for development because it allows us to keep our toolchains updated. It's used for obtaining nightly compilers, stable compilers, official documentation, rust source code for racer autocompletion, and various different types of targets, such as MUSL vs Glibc on Linux, or installing the Windows GNU toolchain on Linux for cross-compiling.
And there's literally zero reason to not follow the latest Rust compiler. Rust follows the semantic versioning rules, which dictates that all of the 1.x.x release are backwards compatible with 1.0.0. Upgrading the compiler will bring no breaking changes, but it will bring improved performance and features.
The blog post mentions integrating with WebRender as an alternative rasterizer on capable systems. Could performance of the GPU-based rasterizer ever get to the point that WebRender's glyph cache is no longer needed?
After glancing quickly at the code, it looks like the lorem ipsum example renders to a texture atlas. Is that part of Pathfinder or just part of that example? I'm trying to understand whether managing the altas would be up to the application or Pathfinder.
> Could performance of the GPU-based rasterizer ever get to the point that WebRender's glyph cache is no longer needed?
I would like to try eliminating the frame-to-frame glyph cache. Doing so would reduce load on the texture atlas allocator, which can get slow as it's approximating an NP-complete problem. For me, Pathfinder can rerasterize the entire ASCII character set in 1.5ms or so (depending on the font size), which easily fits under the frame budget.
> After glancing quickly at the code, it looks like the lorem ipsum example renders to a texture atlas. Is that part of Pathfinder or just part of that example?
Pathfinder's API is based around the concept of an atlas in order to improve batching. Especially at small sizes it's a lot more efficient to render multiple glyphs all in one go without issuing separate draw calls for each one. There's nothing preventing you from making a separate "atlas" for each glyph if you want, though you'll pay some performance cost for this.
> How does/will Pathfinder support ligatures?
Ligatures are just glyphs like any other. If you want to use ligatures, you can run a full-featured OpenType shaper, like HarfBuzz or Core Text, on your text before sending the resulting glyphs to Pathfinder to be rendered.
> Pathfinder's API is based around the concept of an atlas in order to improve batching.
And the result of a raster job is then coordinates in the Atlas?
> Especially at small sizes it's a lot more efficient to render multiple glyphs all in one go without issuing separate draw calls for each one.
Makes sense
> There's nothing preventing you from making a separate "atlas" for each glyph if you want, though you'll pay some performance cost for this.
It's not exactly an atlas then is it :P. Sorry if I wasn't clear; I was trying to understand whether the library or the application is managing the Atlas. Sounds like the library.
> And the result of a raster job is then coordinates in the Atlas?
Yes.
> Sorry if I wasn't clear; I was trying to understand whether the library or the application is managing the Atlas. Sounds like the library.
The library manages the atlas, because it uses a particular packing algorithm that maximizes the performance of the accumulation step (by increasing parallelism) when rasterizing many glyphs at once.
Please forgive the stupid question, since I don't know low level coding too well but: what are some uses of this? It says it's a "Rust library for OpenType font rendering" - so since it's a library that means other rust code can use it for their purposes. So might this someday find its way into Servo? Is it possible for other languages to take advantage of this, too?
Impressive work. One of my (far too many) ongoing projects, is also a GPU based glyph renderer. However I'm using a different method, which relies on some preprocessing/conversion of the glyph data. Described in as few words as possible my method could be coined as "trivariate polynomial distance fields". The renderer is mostly done, but there's still a lot of work needed in the glyph preprocessor to be robust and universally usable.
Months ago I posted a few screenshots on Twitter (https://twitter.com/datenwolf/status/714934185564225536), and the comment by Michael IV is spot on. The renderer has no problem with sharp corners, but so far the glyph preprocessor still struggles with it and I have to manually adjust the emitted output to get nice results.
I'd be curious to see how this compares to CoreText on Mac and DirectWrite on Windows. Both are highly tuned for their respective platform so I'd see them as baseline.
I haven't been able to find documentation from Microsoft as to which algorithm DirectWrite uses, but if it's the same as the algorithm Direct2D uses for general path rendering it has an expensive CPU-side tessellation step first and so has the typical drawbacks of long setup times. I wouldn't be surprised if DirectWrite does the actual path rendering on the CPU, like Skia does in typical configurations.
Core Text (really, Core Graphics) renders paths on CPU. I benchmarked it and it generally performed a bit worse than stb_truetype.
I cover this at the end of the article. Hinting is generally just a transformation applied to the glyph outlines before rasterization; it doesn't affect the vector graphics renderer itself. Adding hinting would not cause problems for the algorithm. (I personally consider hinting obsolete, but I would rather not argue about it.)
I'm not sure what you mean by "subpixel rendering". If you mean subpixel positioning, that is correctly handled, though not fully exposed in the API yet. If you mean subpixel AA, that is straightforward to add, and I expect it to improve performance relative to the CPU rasterizers by effectively tripling the glyph area.
Watch out because some CJK fonts from Microsoft require hinting to render correctly. They are made up of composite glyphs where each sub-glyph corresponds to a brush stroke and hinting is used to scale and position them.
> (I personally consider hinting obsolete, but I would rather not argue about it.)
That depends heavily on your preferences in font rendering, and in particular whether you prefer Apple-style "accurate shapes with no respect for pixel alignment" or Windows-style "monitors have pixels, better to ignore the designer and snap to pixels than to be blurry" rendering.
It also depends on the fonts you use; some fonts don't need hinting to look good, while others do.
If you do add subpixel AA, please make it optional. Some people, like me, can directly see the colour artifacts from it, even on hi-DPI displays. And others have multiple displays with differing orientation or pixel layout.
Hinting is very much obsolete on modern systems. I haven't used hinting with my text in many, many years, and yet fonts are visually superior to a typical Windows PC with hinting enabled.
Wonder how it compares to using Antigrain Geometry. I once had to implement text rendering in a game engine a bunch of years back and I used FreeType to load font glyphs and then fed the geometry data into Antigrain and had it rasterize (FreeType rasterization is meh while AGG is heavenly). Even on very old iPhone hardware I was able to render in the main update loop and not encounter any frame rate hiccups.
Unfortunately AGG 2.5 is now GPL so if you need to stick it into anything closed source you are stuck using 2.4's modified BSD.
That said I think having a CPU tessellation path is going to be critical if you want to see wide adoption. Platforms like Android and the like don't always have geo shaders which is why you see FreeType so widely used.
You need not only geometry/tess shaders but also compute shader and signed framebuffers. In theory it would be possible to work around the lack of compute shader (making the minimum requirement GLES 3.0), but it would involve multiple passes and I'm not optimistic about the performance in that case.
Would the fallback be better placed into WebRender or Pathfinder? It would be nice to be able to know that you only need to import one library for font rasterization, regardless of the hardware you're running on.
I did some work a while back that involved using genetic algorithms to solve a problem pertaining to PC errors. Long story short, the bottleneck ended up being the generation of an error screen (which is then compared to a pre-existing one in the cost algorithm). We were first using GDI (on Windows), but then we switched to DirectWrite, but couldn't get it to be fast enough to make the algorithm feasible. This definitely piqued my interest!