This is quite a bit slimmer than using one of the WebGPU libraries like wgpu-rs or Dawn, because sokol-gfx doesn't need an integrated shader cross-compiler (instead translation from a common shader source to the backend-specific shader formats happens offline).
Eventually I'd also like to support the Android, iOS and WASM backends from Zig (currently this only works from C/C++, for instance here are the samples compiled to WASM: https://floooh.github.io/sokol-html5/)
And finally, I'd also like to become completely independent from system SDKs for easier cross-compilation, but with a different approach: by automatically extracting and directly including only the required declarations right into the source files instead of bundling them into separate git repositories (similar to how GL loaders for Windows work).
I love this codebase. I’ve been working with it on the side for a few weeks: it’s extremely well documented, supports Metal, has reasonable abstractions that don’t try to be the One True Way, and the Zig port looks really clean.
very cool work :) If you like, add a screenshot or two and I'll include this in the next zigmonthly.org article
you might also be interested to know about stub dynamic libraries, e.g. `.tbd` files on MacOS. That's what is in the Mach SDK repos primarily[0], though I'm sure you could slim them down some more with a concrete list of symbols you depend on.
author here, I do think that's a reasonable choice of tech in general, but I won't be using most of those for Mach for various reasons:
* glm -> zig has native vector types with SIMD support and I think in the long-term these could be much nicer.
* bgfx is truly awesome, but I'm keen to is WebGPU for the reasons outlined in the post.
* imgui - I do think it's the best thing out there currently, but I have something up my sleeve for a much better UI library built around vector graphics and have done a lot of research in this area.
re: zig's SIMD vectors - I would caution you to do some proof-of-concept with some perf testing because as I understand it, these are more well suited for doing bulk operations on a large array of things, rather than for modeling linear algebra.
For example, if you had 100 PointA objects and 100 PointB objects and you wanted to compute the midpoint of all of them in an output array, you would use SIMD vectors to represent, e.g. 10 X values at a time, and 10 Y values at a time, and then do the math on that, rather than to represent each Point as a separate vector.
But that's just something I heard, of course real world experience would be interesting to hear about.
Pathfinder uses horizontal SIMD everywhere (e.g. vec4 stores x/y/z/w in a single SSE register). I measured about a 10% improvement across the board when enabling SSE+SSE2+SSSE3+SSE4.2+AVX vs. using scalar instructions only. The primary wins were from operations like de Casteljau curve subdivision which are very efficient when expressed in vector operations.
So, at least in this particular application, horizontal SIMD didn't make an enormous difference, but a 10% boost is nice.
Interesting to hear, thanks for the info! I imagine Zig's SIMD is still very immature, I haven't tinkered with it at all yet either - so what I said above is pure speculation about the future.
I clicked hoping for pictures of vegetables chopped in silly ways, was slightly disappointed tbh, but then stayed for the rest of the excellent presentation. 10/10
I generally agree that vertical SIMD is more efficient than horizontal SIMD when both are options, but the dot product is kind of an unfair example: that's literally the worst case for horizontal SIMD. A fairer example would be something like 4x4 matrix times vector transformation, where horizontal SIMD is pretty efficient with AVX2 broadcast instructions.
Sure, but often do you do 4x4 matrix times vector? In most cases I imagine, you're going to be using a 3x4 matrix times a vec3 with implicit w=1, or a 3x3 matrix times a vec3 with implicit w=0, making the wins a lot less clear. If you want to implement 3x3 * vec3 with SIMD, then go for it, but I don't quite think that should influence your core data structure design.
Well, I generally think that CPU SIMD is best used for little one-off things that you don't want to bother uploading to the GPU or other accelerators, not large-scale batch data processing. Like, if your game logic is running on CPU, then you're probably running the logic for one entity at a time anyway, in which case you probably just don't have enough batch data to make good use of vertical SIMD. So horizontal SIMD is useful there.
This is why Pathfinder uses horizontal SIMD, incidentally: most of the heavy number crunching is in super branchy code (e.g. clipping, or recursive de Casteljau subdivision) where there just isn't enough batch data available to make good use of vertical SIMD. Compute shader is going to be faster for any heavy-duty batch processing that is non-branchy. The game code I've written is similar.
I guess I can imagine things like physics simulations where you could have a lot of batch data processing on CPU, but even then, running that on GPU is attractive…
Does SSE not have a one-cycle dot product between two vector registers? I'd be surprised if it didn't.... Though not sure if zig compiles to that assembly via llvm
... no. And your mental model of what operations cost is a little out of whack.
On the newest Intel cpus, a vector double precision multiply is 4 cycles. A horizontal add is 6 cycles. There is a single instruction dot product, which is 9 cycles.
Something like a dot product for fp can only ever be 1 cycle in a cpu that has a very low clockspeed. (Or is a barrel processor or something.) There is simply too much to do.
Sorry I should have been precise and said dispatch able one per clock cycle, not latency one clock cycle. Amortized over pipelined operations with no dependencies, the net cost of such an operation is one clock cycle.
Anyways I checked it, it's DPPS, and it is a 25 clock cycle latency operation, with a throughput that yields an amortized 6 clock cycle cost.
Pretty sure they degrade to scalar; I accidentally compiled 8x64-bit to arm and it worked just fine (maybe not as fast), even though arm only supports 2x64-bit SIMD.
It'll be a bit before I can publish the paper, libraries, tooling etc. around this - it's been maybe 8 years in the making but slow - the general idea is:
1. GPUs are actually _ridiculously and stupidly_ excellent at natively rendering Bézier curves if you "limit" yourself to quadratic curves.
2. Most of the reasons designers / end users hate working with Bézier curves is because of the absolutely terrible control point manipulation that cubic curves require.
3. GPUs cannot natively render cubic curves (look at any of the papers on this and you'll find they do CPU-side translation of cubic -> quadratic before sending to the GPU in the best case) which harms perf substantially and makes animation of curves expensive. Plus handling the intersection rules is incredibly complex.
...so maybe using the 'mathematically pure' cubic curves is not actually so great after all. :) if only we had tools and formats (not SVG) that worked on isolated quadratic curves (without care for intersection) - then GPU rendering implementations wouldn't need to be absolutely horridly complex (nvidia et. al), or terrible approximations (valve SDF), we could literally send a vector graphic model directly to the GPU and have it render (and animate!) at incredible speeds, with easier to understand tools.
> 3. GPUs cannot natively render cubic curves (look at any of the papers on this and you'll find they do CPU-side translation of cubic -> quadratic before sending to the GPU in the best case) which harms perf substantially and makes animation of curves expensive. Plus handling the intersection rules is incredibly complex.
Loop-Blinn natively renders cubic curves in the same way as quadratic curves (implicitization), with some preprocessing to classify cubic curves based on number of inflection points (but this is not the same as converting to quadratic). But I don't think the complexity is really worth it when you can just approximate cubic Béziers with quadratic Béziers.
> we could literally send a vector graphic model directly to the GPU and have it render (and animate!) at incredible speeds, with easier to understand tools.
You can still submit high-level display lists to the GPU with an all-quadratic pipeline. Just add a compute shader step that runs before your pipeline that converts your high-level vector display lists to whatever primitives are easiest for you to render.
You may want to look at Raph Levien's piet-gpu, which implements a lot of this idea. Google's Spinel is also a good resource, though the code is impenetrable.
My understand is that Loop-Blinn performs constrained Delaunay triangulation prior to uploading an actual display list to the GPU, in effect turning its cubic curves into quadratic ones, from the paper:
> The overlap removal and triangulation is a one-time preprocess that takes place on the CPU
All I'm suggesting is merely to ditch that process, ditch the idea that you can have intersecting lines, and instead design tools around convex and concave isolated triangles[0] (quadratic curves) so you end up composing shapes like the ones from the Loop-Blinn paper[1] directly, with your tools aiding you in composing them rather than more complex shapes (cubic curves, intersecting segments, etc.) being in the actual model one needs to render.
I don't think any of this is truly novel from a research POV or anything, it's just a different approach that I think can in practice lead to far better real-world application. It's considering the whole picture from designer -> application instead of the more limited (and much harder) "start with SVGs, how do we render them on GPUs?" question.
The constrained Delaunay triangulation in Loop-Blinn is for tessellating the object so that every fragment belongs to at most one curve (i.e. removing self intersections). It doesn't approximate cubic Béziers with quadratic ones. See section 4 and especially section 4.5 in the paper.
It's been a year since I looked at the Loop-Blinn paper, I had forgotten they did this. That's not a good excuse, I should've shut up or spent the time to actually look it up before commenting :)
Regardless of what they do there, hopefully my general idea comes across clearly from my comment above: create tools that allow users to directly compose/animate convex/concave quadratic curves (which render trivially on a GPU as triangles with a fragment shader), and leave dealing with overlaps to the user + tools.
I wouldn't strictly say that. Even with quadratic curves, you still have a serial process of parsing a shape to determine interior vs. exterior. This is true of any non-convex polygon, not just those with curves. You have to solve that sort somehow, whether it be through linear search (e.g. Slug), through geometry simplification (e.g. Loop-Blinn, triangulation), through signed area tricks (e.g. red book, style stencil-and-cover), or through brute force (e.g. SDFs).
Eliminating cubics eliminates some trickier problems, but I don't think it fully solves the issues.
Yep, you probably read my comment before I edited and clarified, but you also need to eliminate overlapping geometry.
Basically just build your graphics around non-overlapping convex and concave quadratic curves represented as triangles. The trick is in building tooling that makes that a pleasant experience (and do any required reductions of more complex shapes -> isolated quadratic curves at tool time, not at render time.)
Existing approaches try to fit what we have (SVGs, complex polygons) onto what GPUs can do. This approach tries to take what GPUs _can do_ and make that competitive with what we have (SVGs, etc.)
In working with existing graphics tools to make drawings, something I've come to realize is that overlap is crucial to the workflow: it's easier to draw the line you want by overshooting and erasing than it is to get it precisely drafted. Likewise it's much easier to paint within a stencil mask than it is to stay within the lines freehand.
To me, that suggests the main distinction between source and output formats comes from how/whether they go about decimating the "erasure" data, since the final result produces much more complex non-overlapped paths. The attractiveness of the Bezier usually isn't in the initial definition of a shape - it just happens to be great at reproduction of multiple curves cut and spliced together. So it has the feel of a premature optimization that probably should be reconsidered in making more ergonomic tooling.
What do you think of NURBS? eg http://verbnurbs.com/. It’s intended for 3D instead of 2D, but still kind of relevant
How do you plan on doing text rendering? I feel like that’s often a challenge with cross-platform UI frameworks. Electron/Chromium does this well. Often QT apps don’t handle pixel ratio correctly, or the text looks non-native.
There are applications in CAD where I imagine working with degree 2 surfaces/curves only is going to be very tricky.
What about differentials? How do you get a change of normal, how do you get a curvature tensor? Without a continuous 2nd derivative?
Edit: In e.g. product design like e.g. cars, people want curvature continuity, i.e. C2 at minimum for their surfaces, because on machined surfaces, you see any curvature discontinuities as edges in reflected light... that's why degree 3 is kind of an industry standard.
There's a lot of talk on what platforms it supports (or plans to) and how it intends to achieve that. That's nice but what will it actually provide?
Does it implement an interface like Raylib? Is it a scene graph? How about an entity component system? Will game logic and graphical representation be tightly bound or kept separate?
None of these things are necessarily good, bad, required or forbidden, but they're fairly important to know before you dive in.
The project is only 4 months old :) I will get to those bits and explain my vision there once the groundwork is done.
In general, I have a preference of providing (1) solid groundwork, (2) libraries, and (3) extensive tooling. We're only at phase 1/3 :)
I don't want to _force_ one way of interacting with this, either. You're free to take the work I've done on making GLFW easy to use and build on top of that (several people are), same with the upcoming WebGPU work.
But yes, there will eventually be a graphical editor, component system, and "obvious" ways to structure your code if that's not something you want to choose for yourself.
Is that confirmed? I cant search anything official and Windows 11 was suppose to be the moment they could do DX13 ( DX12 Ultimate was a naming madness ). But they didn't.
https://github.com/yuki-koyama/bigger