
Show HN: 2D Graphics on Modern GPU - raphlinus
https://raphlinus.github.io/rust/graphics/gpu/2019/05/08/modern-2d.html
======
Jasper_
Impressive work, Raph! As I'm sure everyone knows by now, the PostScript
imaging model is _hard_ to make work on the GPU, and its traditional geometry
pipeline isn't suited well for curves or analytic coverage-based AA.

Since I come from the games space, a lot of my thoughts have been about baking
as much data as possible: if we had infinite build time, or artist tooling
akin to Maya, how would that help us develop better UIs? If we leave
PostScript behind and think about new graphics models, what can we do? So a
lot of my own research has been studying really old graphics history, since
there's always fun ways of looking at the problem before we settled on one
solution. I'll have to collect some of my research and demos and publish it
sometime soon.

Although, one thing I've noticed during my time in games is that frame time is
the standard. If we speed up our rasterizer, we'll just add more junk to the
scene until it hits our 60fps frame target. More blurs! More effects! :)

~~~
pcwalton
> its traditional geometry pipeline isn't suited well for curves or analytic
> coverage-based AA

I actually don't agree with this: Pathfinder shows that analytic AA works
quite well with the GPU rasterizer. You simply use a floating-point render
target with additive blending to sum signed areas. In fact, as long as you
convert to tiles first, which both piet-metal and Pathfinder do, vector
rendering is actually a task very well suited for the GPU.

The hard part is that filling a path is a fundamentally sequential operation,
since whether a pixel is on or off depends on every other path that intersects
the scanline that pixel is on. This means that you either need an expensive
sequential pass somewhere, or you lose work efficiency. Both piet-metal and
Pathfinder try to strike a balance by doing some parts sequentially and
sacrifice some work efficiency, using tiling to keep the efficiency loss
bounded. This approach turns out to work well.

~~~
Jasper_
> The hard part is that filling a path is a fundamentally sequential
> operation, since whether a pixel is on or off depends on every other path
> that intersects the scanline that pixel is on.

I'm not quite sure I get this. Are you simply talking about overdraw
optimizations and the requirement for in-order blending here, or something
else like even-odd fill rules?

Obviously, the hardware itself has a massive serial component in the form of
the ROP which will make sure blend draws retire in-order, and pixel-shader
interlock gives you a bit more granular control over the scheduling without
relying on the fixed-function ROP unit.

~~~
pcwalton
I'm talking about the fill rule (whether even-odd or winding). Given a path
outline made of moveto/lineto/etc. commands, you can't tell just by looking
locally at a pixel whether it should be filled or not without looking at every
path segment that intersects the scanline it's on.

~~~
fluffything
I have no idea what about the topic being discussed here, but this sounds like
you basically have to do a:

    
    
        for each path outline in all path outlines:
          for each pixel in all pixels:
            check whether pixel is inside/outside the path outline
    

?

That looks like a massively parallel problem to me: you can parallelize both
for loops (all paths in parallel, all pixels in parallel) and then do a
reduction for each pixel, which can be a parallel reduction.

Some acceleration data-structures for these kind of problems can also be
constructed in parallel and in the GPU (e.g. bounded volume hierarchies), and
some methods for inside/outside checking might be more amenable for
parallelization than even/odd or winding number (e.g. level-set / signed-
distance fields).

~~~
pcwalton
> That looks like a massively parallel problem to me: you can parallelize both
> for loops (all paths in parallel, all pixels in parallel) and then do a
> reduction for each pixel, which can be a parallel reduction.

The reduction step is still less work-efficient than the sequential algorithm
(O(n log n) work vs. O(n)), even if it can be faster due to the increased
parallelism.

But yeah, if you go down this road you will eventually end up with a delta
coverage algorithm. You've basically described what I'd like to do in
Pathfinder for an optional compute-based tiling mode. (Because I have to work
on GL3, though, I can't depend on compute shader, so right now I do this work
in parallel on CPU.)

> Some acceleration data-structures for these kind of problems can also be
> constructed in parallel and in the GPU (e.g. bounded volume hierarchies)

Yes, that's the tiling step that both piet-metal and Pathfinder do.

~~~
fluffything
I think it is quite funny that 2D vector graphics, and 3D e.g. computational
fluid dynamic simulations, have to, pretty much, solve the same problem.

We use in 3D STL geometrys, NURBS/T-splines from CAD, and signed-distance
fields, often all of them simulatenously in the same simulation, and for a big
3D volume (with 10^9-10^10 "cells"/"3d-pixels") we have to figure out whether
these are inside or outside. Our 3D domain is adaptive and dynamic to track
the movement of bodies and features of the flow, so we have to update it on
every iteration, and all of this has to happen in distributed memory on
100.000-1.000.000 cores, without blocking.

There is a lot of research about, e.g., how to update signed-distance fields
quickly, in parallel, and distributed memory, when they slightly move or
deform, as well as how to use signed distance fields to represent sharp
corners or how to extract the input geometry "as accurately as possible" from
a signed distance field, as well as how big the maximum error is, etc. The
Journal of Computational Physics, Computational Methods in Applied Mechanics,
and SIAM journals, are often full with this type of research.

For computer graphics, errors are typically ok. But for engineering
applications, the difference between a "sharp" edge and a smoothed one can be
a completely different flow field, which result in completely different
physical phenomena (e.g. turbulent vs laminar flow), and completely different
loads on a structure.

------
dahart
> Well known to game developers, what is efficient on GPU is a structure-of-
> arrays approach. In particular, the tiling phase spends a lot of time
> looking at bounding boxes, to decide what belongs in each tile. [...] Now we
> get to the heart of the algorithm: going through an array of bounding boxes,
> looking for those that intersect a subset of tiles.

Raph, you're describing ray tracing! I haven't thought a lot about this, but
maybe the ray tracing hardware trend really could very much be utilized for 2D
UI. With ray tracing support now announced all over the place, are you already
thinking along those lines?

The main difference between rasterizing and ray tracing is at it's most basic
SoA vs AoS, i.e., whether the outer loop is over triangles or over pixels. And
the pixel operation is going through an array of bounding boxes to see what
overlaps.

The somewhat amazing thing about the ray tracing hardware is you get in effect
your entire search query through all the bounding boxes in a single
"instruction". It's such a departure from all the predictable bite-sized
hardware operations we're used to. It reminds me of stories from college of
the DEC VAX and it's single instruction memcpy & strcmp.

> While I mostly focused on parallel read access, I’m also intrigued by the
> possibility of generating the scene graph in parallel, which obviously means
> doing allocations in a multithread-friendly way.

FWIW, this is definitely doable and already happening on the 3d GPU ray
tracing side of the world...! Parallel scene graph construction, parallel BVH
builds, etc.

------
kllrnohj
> and many people believe that damage regions are obsolete

Old-style damage regions are obsolete. But new-style ones leveraging buffer
age & swap-with-damage are not.

Vulkan changes this, but these older EGL extensions are good introductions to
the concepts:
[https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_...](https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_buffer_age.txt)
[https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_...](https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_partial_update.txt)
and
[https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_...](https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_swap_buffers_with_damage.txt)

> Flutter is a good modern approach to this, and its “layers” are one of the
> keys to its performance.

Flutter actually does it badly. Layers are terrible for performance because
they are a source of jitter (and memory usage), and jitter often results in
jank. They are necessary in some cases, like for correct alpha blending, but
in general should be _avoided_ if your goal is to be smooth & fast. You want
to optimize for your slowest frames, not your fastest ones.

Flutter's excessive layering can also rapidly be a net-loss in performance due
to the overhead of render target switching, increase in memory bandwidth
required, and fewer opportunities for overdraw avoidance & disabled blending.

~~~
Jasper_
In Vulkan, buffer age is implicit with the FIFO swap-chain model. Agreed that
there's currently no SwapWithDamage equivalent to tell the compositor about
the damaged region in Vulkan, but if it's wanted, it shouldn't be too
difficult of an extension to draft or support. I'm open to throwing it in the
queue if wanted. What swapchain platforms were you thinking of it for?

~~~
raphlinus
If you're talking about improving support for incremental present in Vulkan
implementations, let's definitely connect. Certainly I can see how to do it
well on Windows/DirectX, using IDXGISwapChain1::Present1. On macOS I think
it's impossible (though Chrome sometimes fakes it by pasting update layers on
top of the base window in the compositor).

~~~
kllrnohj
For incremental buffer updates you just know the content of the new VkImage
you're going to draw into - it's the same thing it was the last time you
present. So you keep a ringbuffer of dirt rects for the last N frames, and
then when you acquire the next frame you just intersect the last N frames of
dirty and that's the area of the buffer to update.

This is "standard" vulkan, there's no platform aspect to this. It has very
widespread support in EGL/GL as well with either egl_ext_buffer_age extension
or, more commonly, the EGL_KHR_partial_update extension.

If you then want to limit the area of the screen that's re-composited THAT'S
when swapchain extensions or platform support enters the picture. That would
be eglSwapBuffersWithDamage or VkPresentRegionKHR. But this optimization has
very little performance impact, and typically minimal battery life improvement
as well. Worth doing, but the partial re-render of the buffer is far more
significant.

> though Chrome sometimes fakes it by pasting update layers on top of the base
> window in the compositor

For a long time that was also just how chrome handled display list updates.
They just had a stack of them, and re-rendering a part of the screen just
created an entirely new displaylist and plopped it on top:
[https://www.chromium.org/developers/design-documents/impl-
si...](https://www.chromium.org/developers/design-documents/impl-side-
painting) (see "PicturePile")

I wouldn't necessarily follow Chrome as an example of what to do - there's a
lot of legacy in Chrome's rendering stack. Also web pages are _huge_ , so they
do things like put rendering commands in an rtree because quick rejects happen
100x more often than actual draws.

------
Waterluvian
Something I've been really surprised and disappointed by is the lack of 2d
vector graphics support in modern gaming engines.

I'm an amateur enthusiast but I spent hours and hours unsuccessfully trying to
find any decent engine where I could make games using svg or other vector
formats.

~~~
DonHopkins
I know what you mean! Unity3D has no way to simply draw a circle or pie chart
into a texture.

It would be great to be able to use the canvas api (and any JavaScript library
like d3 that uses canvas) to dynamically draw Unity3D textures.

To address that problem (and others), I'm been developing a Unity3D extension
called UnityJS, which integrates a web browser and JavaScript with Unity3D, so
you can use the canvas api to draw images for use as 2D user interface
overlays and 3D mesh textures in Unity. And of course the other (main) point
of UnityJS is to use JavaScript for scripting and debugging Unity apps, and
integrate Unity apps with off-the-shelf JavaScript web libraries (d3,
socket.io, whatever).

[https://github.com/SimHacker/UnityJS/blob/master/doc/Anatomy...](https://github.com/SimHacker/UnityJS/blob/master/doc/Anatomy.txt)

Drawing on canvases actually works pretty well in the WebGL build, because the
JavaScript runs in the same browser tab and address space that's running the
Unity app, so you can efficiently blit the canvas's pixels right into Unity's
texture memory.

(Although I don't believe there's a way to share the canvas texture directly
through GPU memory, or even directly share Unity's ArrayBuffer memory, but
it's fast enough for interactive user interface stuff, and millions of times
better than the "portable" technique of writing out a data uri with a base 64
encoded compressed png file, which is the "standard" way of getting pixels out
of the web browser! Web tech sure sucks!)

[https://github.com/SimHacker/UnityJS/blob/master/Libraries/U...](https://github.com/SimHacker/UnityJS/blob/master/Libraries/UnityJS/StreamingAssets/UnityJS/bridge.jss#L640)

~~~
bufferoverflow
Actually a circle is very easy to draw using a shader:

(4th answer down) [https://stackoverflow.com/questions/13708395/how-can-i-
draw-...](https://stackoverflow.com/questions/13708395/how-can-i-draw-a-
circle-in-unity3d)

A pie chart would be harder.

~~~
DonHopkins
But drawing that circle (or a pie chart) into a texture (so you don't have to
draw each triangle every frame) is a whole can of worms.

It might be fine in some cases on a big honking workstation, but reducing the
number of draws and triangles becomes quite important when you're doing mobile
user interfaces or VR.

Also, it would take a hell of a lot of programming to duplicate the canvas api
with LineRenderer and dynamic meshes.

And then you've rolled yourself a custom C# drawing API, so there aren't any
full featured well supported off-the-shelf libraries like d3 or chartjs to
drive it.

One of the nice things about the way UnityJS integrates JavaScript with a
standard web browser in Unity apps is that you can use any of the enormous
ecosystem of existing off-the-shelf JavaScript libraries without modification
(both visual canvas drawing, and JSON based data wrangling libraries and APIs
for services).

It's all about leveraging existing JavaScript libraries and web technologies,
instead of reinventing half-assed Unity C# versions.

Given any JSON based web service or SDK you might want to talk to (google
sheets or slack for example), the chances of finding a working well supported
JavaScript library that talks to it are a lot higher than finding an
equivalent C# library that runs in Unity. (Socket.io is a good example: there
exist some for Unity, but they all have problems and limitations, and none of
them compare in quality and support to the standard JavaScript socket.io
library.)

~~~
pcwalton
You've described exactly why I want to integrate Pathfinder into Unity [1]. :)
Pathfinder offers a subset of HTML canvas that I would like to expand over the
next few weeks and months into the full API. (Ideally it will actually _be_
the HTML canvas implementation for Firefox someday.)

There's also the benefit that Pathfinder can transform the canvas in 3D while
keeping the content in vector form (i.e. no quality loss), which is important
for VR.

[1]:
[https://github.com/pcwalton/pathfinder/issues/147](https://github.com/pcwalton/pathfinder/issues/147)

------
rock_artist
> it seems quite viable to implement 2D rendering directly on GPU, with very
> promising quality and performance.

I guess all agree that GPU rendering is the future. Yet, I'm still in favor
for libraries that abstracts the actual renderer so it could fallback if
needed. Especially when future and past conflicts.

I come from audio development where people use plug-ins (dynamic libs) that
loads on a host that provides native OS window you compose. Cross-platform is
pretty vital so there are frameworks to target macOS, Windows, iOS (and
sometimes Android & Linux).

Here is a real life scenario that many audio developers are currently at:

\- you had a product with UI that needed to show 'real-time' visualization of
signal - Analyzer (let's say FFT).

\- eventually you went the OpenGL way (because it was the right way in
2010...) you've written shaders.

\- it worked great on macOS and ok on most Windows machines (some old machines
had limited or broken OpenGL within their drivers).

\- Apple deprecates OpenGL in favor of Metal...

So now what was "modern" 9 years ago requires compete re-write. if there was
an abstraction layer it might be minor changes and switching to new Metal
renderer instead of the OpenGL one.

In comparison, The native/cpu C++ code for the same product proven to be much
more future proof and only required simple maintenance.

It is indeed important work. but I guess targeting Vulkan is better especially
when there are libs like MoltenVK.

I guess the question is how much portability is important factor vs optimal
performance.

~~~
zackmorris
I've been through the same iterations you talked about, writing game graphics
in MacOS Classic QuickDraw, then OS X Cocoa, then OpenGL, then ES2, now
Unity...

IMHO this is all going the wrong direction. I'm strongly against SIMD as a
pattern because it makes things hard to generalize. The fact that we don't
even know the architectures of the video cards we use every day is a huge red
flag. I mean, that's the point we're trying to get to with abstractions, but
SIMD exacerbated the pain by being so opinionated.

The reason all of this is going in such strange (informal) directions is that
video cards are proprietary and the money is in photorealistic rasterization.
It's working along one branch of the tree of possible architectures. Then CUDA
and TensorFlow etc are attempts to overlay other narrow generalizations over
that and IMHO aren't nearly as good of approaches as MATLAB/Octave/Erlang/Go,
etc.

I've said this many times, but I'd prefer to have an array of general purpose
CPUs, at least 256 and their number should roughly double in a predictable way
every year or two. Then we could have abstractions over that that handle data
locality or cache coherency (if needed... it most likely isn't). A chip like
that would be trivial to use for ray tracing, for example.

We're going to continue seeing this uninspired churn burning out developers
until there is a viable high-core CPU to compete with GPUs.

------
pavlov
What an interesting rabbit hole. Thanks for sharing your notes too!

Re: > _" As a digression, I find it amusing that the word for packing a data
structure into a byte buffer is “serialization” even when it’s designed to be
accessed in parallel. Maybe we should come up with a better term, as
“parallel-friendly serialization” is an oxymoron."_

NeXTStep/Cocoa calls this simply "encoding" which always made sense to me.

------
DonHopkins
>Other than that, the serialization format is not that exotic, broadly similar
to FlatBuffers or Cap’n Proto. As a digression, I find it amusing that the
word for packing a data structure into a byte buffer is “serialization” even
when it’s designed to be accessed in parallel. Maybe we should come up with a
better term, as “parallel-friendly serialization” is an oxymoron.

Great observation! I get the feeling there must be a great punny term for the
enigmatic oxymoron "parallel-friendly serialization", but I can't quite put my
finger on it.

(Reminds me of "de-optimizer" => "pessimizer", or "spectral frequency analysis
and filtering" => "cepstral quefrency alanysis and liftering".)

[https://en.wikipedia.org/wiki/Cepstrum](https://en.wikipedia.org/wiki/Cepstrum)

~~~
strmpnk
Marhsalling has a precedent in some languages. It makes a lot of sense.
"serde[s]" always reminds me of sequencing of transmission more than a format
concern of what goes where in a buffer.

------
amelius
> I’m particularly interested in the rendering quality.

Shouldn't the rendering be done using oversampling? E.g. instead of 100x100,
you render on 400x400, then scale back down to 100x100 to get the best result.

One of the problems is that anti-aliasing creates pixels which are half-on,
e.g. at the edge of a line or polygon. Now if multiple edges meet near a
certain pixel, then rendering the polygon(s) will touch a "half-on" pixel
several times, and as a result it may receive the wrong opacity.

~~~
pcwalton
As far as I know, you basically get to choose two of antialiasing quality,
correctness, and performance. The viable antialiasing options for vectors are:

1\. Supersampling/multisampling. This is what you suggested. This provides
correctness with coincident edges, but 256xAA (what analytic AA gives you) is
far too slow and memory intensive to be practical if you supersample. In
general practical implementations of MSAA/SSAA are limited to 16xAA or so,
which is a noticeable drop in quality and in fact is still pretty slow
relative to a high-quality implementation of analytic AA.

2\. Analytic antialiasing. This has problems with coincident edges, but in
most cases the effects are minimal. The performance and quality are excellent
if implemented properly.

I generally think that analytic AA is the right tradeoff for most
applications. Subtle rendering problems on coincident edges are usually a
small price to pay for the excellent results, and designers can work around
those issues when they come up. A 100% theoretically correct rasterizer can't
exist anyway, because of floating point/fixed point precision issues among
other reasons.

~~~
amelius
Interesting, this is new for me. Is it true that for analytic AA you'd have to
consider the rendered objects all at once, instead of just rendering one after
the other?

~~~
pcwalton
Usually with analytic AA you just use Porter-Duff blending, because rendering
all the objects simultaneously doesn't buy you much except for performance
(which is why piet-metal does it that way). For supersampling AA you can
render all the objects at once, like Flash did, and this _does_ improve
rendering quality around coincident edges.

Oddly (in my opinion), some rasterizers such as Skia's use supersampling for
individual paths and Porter-Duff blending when overlaying different paths on
top of one another. This means that you don't have seams between subpaths of
the _same_ path, but _different_ paths can have seams on coincident edges. I
don't like this tradeoff personally, because it has a lot of the downsides of
analytic AA without the benefit of antialiasing quality, but for better or
worse this approach is widely used.

~~~
amelius
Ok. I personally think a 100% correct rendering would have great benefit. The
coincident edge problem happens a lot, especially when the scene is computer-
generated. Also when overlaying the same object a number of times exactly, the
edges lose their AA property. (E.g. in Inkscape, draw a circle, copy it, then
paste it exactly on top using ctrl+shift+V and repeat 20 times, and you see
the edges becoming jagged).

------
tenaciousDaniel
As someone who knows literally nothing about the GPU, the notion that the GPU
might not be good for UI graphics seems...odd? Like, it can handle 3D graphics
right? Aren't those waaaay more computationally expensive than 2D graphics?

I know that I'm missing something important, but I don't know what it is.

~~~
raphlinus
It's complicated. 2D graphics aren't inherently harder or require more
computation than 3D, but they require very different optimizations. I think it
boils down to the fact that 2D graphics is ultimately data structure heavy,
whereas 3D graphics can often be represented as a huge array of triangles. So
there's a lot of literature (some of which I linked, then those papers have a
deep literature review), but no one obviously best solution. One thing I think
I contributed is a relatively simple solution, especially for doing almost all
the work on GPU rather than relying on CPU for things.

~~~
incompatible
Is there any particular reason why recent / future GPUs would be worse at 2D
graphics than older versions? I.e., what needs to change, when current methods
for 2D graphics have worked for years?

Also not knowing much about the field, GPUs seem to have a lot of
unpleasantness in the form of incompatible and/or proprietary APIs, so
implementing anything that's supposed to be cross platform (also to old and
new GPUs, and future GPUs as they are released) may be hard work.

~~~
raphlinus
What makes you say that 2D graphics has worked well? Scrolling often janks, UI
designers constantly have to work around limitations in the imaging model and
performance quirks (for a long time, clipping to a rounded rect would kill
performance), and there were lots of shortcuts with things like gamma for
blending. I'm hoping to build something way better, where the UI _always_
updates smoothly, and the UI designer doesn't have to be conscious of the
limitations in the renderer.

~~~
tenaciousDaniel
> I'm hoping to build something way better

Are you talking about a specific project you're working on? I'm currently
building a UI tool for designers, and I'm extremely interested to hear about
any work in that area.

~~~
raphlinus
Yes. I'll have more to say soon, but I'm basically building what I think is
next-gen UI infrastructure in Rust. That's a project with perhaps overly
ambitious scope, so I'm _starting_ with one specific app, namely a font
editor. That will be a great showcase for beautifully antialiased vector paths
and smooth-as-silk scrolling and interaction, I think.

------
adamnemecek
I've been also working on a 2D UI on the GPU. It's quite amazing actually,
compared with the code running on the CPU, the code is very terse. The setup
can be somewhat verbose, but the shaders themselves are short. Like a 20 line
shader can do miracles.

~~~
dman
Any github links that are freely available?

------
dragontamer
> It’s often said that GPU is bad at data structures, but I’d turn that
> around. Most, but not all, data structures are bad at GPU. An extreme
> example is a linked list, which is still considered reasonable on CPU, and
> is the backbone of many popular data structures. Not only does it force
> sequential access, but it also doesn’t hide the latency of global memory
> access, which can be as high as 1029 cycles on a modern GPU such as Volta.

You're almost correct here, from my understanding. But... I still feel like
its important to note that Pointer-jumping exists, and linked-list like
structures (linked lists, trees, and graphs) can be traversed in parallel in
many cases.

[https://en.wikipedia.org/wiki/Pointer_jumping](https://en.wikipedia.org/wiki/Pointer_jumping)

For pointer-jumping in GPUs / SIMD systems, I know that "Data Parallel
Algorithms" discusses the technique as applied to a "reduce" operation aka a
"scan":
[https://dl.acm.org/citation.cfm?id=7903](https://dl.acm.org/citation.cfm?id=7903)

The requirement (not fully discussed in the 'Data Parallel Algorithms'
article) is that you still need an array of nodes to be listed off by the SIMD
system. But this array of nodes does NOT need to be sorted.

As such, you can accomplish a "GPU Reduce" over a linked list in O(lg(n))
steps (assuming an infinite-core machine). This is way slower than the
O(lg(n)) array traversal, but you're at least within the same asymptotic
class.

------
aasasd
> _Performant UI must use GPU effectively, and it’s increasingly common to
> write UI directly in terms of GPU rendering, without a 2D graphics API as in
> the intermediate layer._

Y tho? I'm just a passerby in this topic, but I see this sentiment now and
again these days, and I don't understand what happened. I don't remember
hearing complaints of DirectDraw being too slow, for example.

~~~
Jasper_
DirectDraw wasn't a 2D graphics package; you probably mean GDI/GDI+. These
libraries were used at at much smaller resolutions (1024x768, but fullscreen
was pushing it, your window was smaller). GDI could roughly be used at 60Hz
but it didn't support antialiasing or transparency, you needed GDI+ for that.
And GDI+ was much slower, even at the smaller resolutions, so you tended to
draw into a big bitmap ahead of time and scroll that. Animations in GDI+ were
unheard of.

~~~
roel_v
I'm not sure what you mean by '2D graphics package', so it may be a semantics
thing here, but DirectDraw was a 2D graphics API on top of DirectX. I only
used it a bit and didn't find that for my uses, the added complexity was worth
it over just using GDI, but DirectDraw did do GPU accelerated 2D rendering.
How fast it was in light of today's requirements and hardware, I don't know.

~~~
Jasper_
Are you confusing it with Direct2D? DirectDraw was an old API from the
D3D5-ish era and basically allocated a front buffer you could draw to with
GDI. It was not GPU accelerated.

~~~
roel_v
Well you got me wondering now on what I worked with, but
[https://en.wikipedia.org/wiki/DirectDraw](https://en.wikipedia.org/wiki/DirectDraw)
seems to say DirectDraw was hardware-accelerated too. I guess the point is
that there are and have been for a long time GPU accelerated 2D API's in
DirectX land.

------
iamleppert
Very interesting work. I’m wondering if you’ve ever worked with vector tiles?
Mostly it’s used for drawing large numbers of vectors on maps, but has a
similar architecture and approach for 2D drawing on the GPU via an efficient
command-based language based on the extent of the tile.

------
vbuwivbiu
I don't get it. 2D is a subset of 3D which the GPU already does. Why the need
for tiles and whatnot why not render 2D text and other bezier curves as 3D
curves with a z coordinate of 0 projected in orthographic perspective ?

~~~
pavlov
The short answer is that 3D is all about triangles, and that's not a good fit
for the PostScript-based 2D rasterization model used by most APIs and tools
(SVG, Illustrator, Apple CoreGraphics, Sketch...)

See the answers to this question in the same thread:

[https://news.ycombinator.com/item?id=19864066](https://news.ycombinator.com/item?id=19864066)

------
Retr0spectrum
Is there a demo/source anywhere? I admit I only skimmed the article, for now.

~~~
raphlinus
Yes, [https://github.com/linebender/piet-
metal](https://github.com/linebender/piet-metal) , it's mac-only (for now).

------
malms
"graphics"

and then no video :(

~~~
raphlinus
I know. I was eager to get this out. I'll be working on more visuals for a
Libre Graphics Meeting presentation, where I'll include some of this in the
Rust 2D graphics talk.

