Clip control on the Apple GPU

bob1029 · on Aug 22, 2022

Clip space is the bane of my existence. I've been building a software rasterizer from scratch and implementing vertex/triangle clipping has turned into one of the hardest aspects. It took me about 50 hours of reading various references before I learned you cannot get away with doing this in screen space or any time after perspective divide.

It still staggers me that there is not 1 coherent reference for how to do all of this. Virtually every reference about clipping winds up with something like "and then the GPU waves its magic wand and everything is properly clipped & interpolated :D". Every paper I read has some "your real answer is in another paper" meme going on. I've got printouts of Blinn & Newell, Sutherland & Hodgman, et. al. littered all over my house right now. About 4 decades worth of materials.

Anyone who works on the internals of OGL or the GPU stack itself has the utmost respect from me. I cannot imagine working in that space full-time. About 3 hours of this per weekend is about all my brain can handle.

joakleaf · on Aug 22, 2022

Not sure if you got through clipping, but it was one of those things I had to go through first at some point in the mid 90s myself. I feel your pain, but after having implemented it about 5-10 times in various situations, variants and languages, I can promise it gets a lot easier.

In my experience it is most elegant to clip against the 6 planes of the view-frustrum in succession (one plane at a time). Preferably clipping against the near-plane first, as that reduces the set of triangles the most for subsequent clips.

Your triangles can turn into convex polygons after a clip. So it is convenient to start with a generic convex polygon vs. plane-clipping algorithm; The thing to be careful about here is that points can (and will) lie on the plane.

Use the plane equation (f(x,y,z)=ax+by+cz+d) to determine if a point is on one side, on the plane, or the other side.

It is convenient to use a "mask" to designate the side a point v=(x,y,z) is on. So: 1 := inside_plane (f(x,y,z)>eps) 2 := outside_plane (f(x,y,z)<-eps) 3 := on_plane (f(x,y,z)>=eps && f(x,y,z)<=eps) Let m(v_i) be the mask of v_i.

When you go through each edge of the convex polygon (v_i->v_{i+1}), you can check if you should clip the edge using the mask. I.e.:

if (m(v_i)&m(v_{i+1})==0 the points are on opposite side => clip [determine intersection point].

Since you are just clipping to the frustrum, just return a list of the points that are inside or on the plane (i.e. m(v_i)&1==1) and the added intersection points.

There are lots of potential for optimization, of course, but I wouldn't worry about that. There are lots of other places to optimize a software rasterizer with more potential, in my experience.

Sharlin · on Aug 22, 2022

I also implemented clipping in my software rasterizer a while ago and can definitely sympathize! (Although I've written several simple scanline rasterizers in my life, this was the first time I actually bothered to implement proper clipping. I actually reinvented Sutherland–Hodgman from scratch which was pretty fun.) The problematic part is actually only the near plane due to how projective geometry works. At z=0 there's a discontinuity in real coordinates after z division, which means there can be no edges that cross from negative to positive z. Z division turns such an edge [a0, a1] into an "inverse" edge (-∞, a0'] ∪ [a1', ∞) which naturally makes rendering a bit tricky. In projective/homogenous coordinates, however, it is fine, because the space "wraps around" from positive to negative infinity. All the other planes you can clip against in screen space / NDC space if you wish, but I'm not sure there are good reasons to split the job like that.

nauful · on Aug 22, 2022

You have to clip against planes in 4D space (xyzw) before perspective divide (xyz /= w), not 3D (xyz).

This simplified sample shows Sutherland-Hodgman with 4D clipping: https://web.archive.org/web/20040713023730/http://wwwx.cs.un... The main difference is the intersect method finds the intersection of a 4D line segment against a 4D plane.

Jasper_ · on Aug 22, 2022

Vertex/triangle clipping is quite rare, and mostly used for clipping against the near plane (hopefully rare in practice). Most other implementations use a guard band as a fast path (aka doing it in screen space) -- real clipping is only used where your guard band doesn't cover you, precision issues mostly.

I'm not sure what issues you're hitting, but I've never found clipping to be that challenging or difficult. Also, clip control and clip space aren't really specifically about clipping -- clip space is just the output space of your vertex shader, and the standard "clip control" extension just controls whether the near plane is at 0 or -1. And 0 is the correct option.

Sharlin · on Aug 22, 2022

Guard band clipping is only really applicable to "edge function" type rasterizers. For the classic scanline-based algorithm, sure, you can easily clip to the right and bottom edges of the viewport while rasterizing, but the top and left edges are trickier. Clipping in clip space, before rasterization, is more straightforward, given that you have to frustum cull primitives anyway.

bob1029 · on Aug 22, 2022

> clipping against the near plane (hopefully rare in practice)

I am not sure I understand why this would be rare. If I am intending to construct a rasterizer for a first-person shooter, clipping is essentially mandatory for all but the most trivial of camera arrangements.

Jasper_ · on Aug 22, 2022

Yes, of course, I was definitely imagining you were struggling to get simpler scenes to work. But also, proportionally few of your triangles in any given scene should be near-plane clipped. It's OK to have a slow path for it, and then speed it up later. I've never felt the math for the adjusted barycentrics is too hard, but it can take a bit to wrap your head around. Good luck :)

Sharlin · on Aug 22, 2022

You and the GP have different rasterization algorithms in mind I think. The GP, I presume, is talking about a classic scanline-based rasterizer rather than an edge function "am I inside or not" type rasterizer that GPUs use.

bpye · on Aug 22, 2022

Clipping or culling? I expect it’s mostly the latter unless your camera ends up intersecting the geometry.

bob1029 · on Aug 22, 2022

Both. You almost always need both.

Clipping deals with geometry that is partially inside the camera. Culling (either for backfaces or entire instances) is a preliminary performance optimization that can be performed in a variety of ways.

sagacity · on Aug 22, 2022

As an example, if you're writing a shooter then the floor might be a large square that will almost definitely be intersecting the near plane. You absolutely need clipping here.

bob1029 · on Aug 22, 2022

This is precisely the first place I realized I needed proper clipping. Wasted many hours trying to hack my way out of doing it the right way.

bpye · on Aug 22, 2022

Even with a guard band don’t you need to at least test the polygons for Z clipping prior to the perspective divide?

Clipping in X and Y is simpler at least, and again the guard band hopefully mostly covers you.

bob1029 · on Aug 22, 2022

> Even with a guard band don’t you need to at least test the polygons for Z clipping prior to the perspective divide?

Yes. Guard band is an optimization that reduces the amount of potential clipping required. You still need to be able to clip for fundamental correctness.

If you totally reject a vertex for a triangle without determining precisely where it intersects the desired planes, you are effectively rejecting the entire triangle and creating yucky visual artifacts.

fabiensanglard · on Aug 22, 2022

I wrote about this a few years ago (https://fabiensanglard.net/polygon_codec/index.php).

It was a pain to lean indeed and the best resources were quite old:

- "CLIPPING USING HOMOGENEOUS COORDINATES" by James F. Blinn and Martin E. Newell

- A Trip Down the Graphics Pipeline by Jim Blinn (yes the same Blinn that co-authored the paper above).

samstave · on Aug 22, 2022

Be the documentation you want to see in the world.

samstave · on Aug 23, 2022

But thats the thing.... there is no documentation..

hashishen · on Aug 22, 2022

"RTFM" - The manual

samstave · on Aug 22, 2022

"Look up error code on stack exchange to find the error in question seeking a solution, but its you from 5 years ago."

sph · on Aug 22, 2022

"What was I working on? What did I see?!"

https://xkcd.com/979/

andrewmcwatters · on Aug 22, 2022

OpenGL on macOS is so frustrating, that I and many other developers have basically abandoned it, and not in favor of using Metal--the easier alternative is to just no longer support Macs.

Yes, OpenGL on macOS is now implemented over Metal, but unfortunately a side effect of this is that implementation-level details that were critical to debugging and profiling OpenGL just no longer exist for tools to work with. Anything is possible? Maybe? I'm sure Apple Graphics engineers could make old tooling work with the new abstraction layer, but it's not happening.

Tooling investment is all on Metal now. But so much existing NON-LEGACY software relied on OpenGL.

So what do you do? You debug and perf test on Windows and Linux and hope that fixing issues there addresses concerns on macOS, and hopefully your problems aren't platform-specific.

This is how some graphics engineers, including myself, continue to ship for macOS while never touching it.

Edit: Also, Vulkan is a waste of time for anyone who isn't a large studio. No one wants to write this stuff. The most common argument is "You only write it once." No, you don't.

You have to support this stuff. If it were that easy, bgfx would have been written in a month and it would have been considered "done" afterwards.

gernb · on Aug 23, 2022

> This is how some graphics engineers, including myself, continue to ship ...

The snarky response would be "you're not a graphics engineer if you're still using OpenGL" (please laugh at the bad joke)

But seriously, knowing how GPUs actually work, OpenGL is pretty horrible. Full OpenGL has 20 years of cruft hacked and glommed on to it. As someone who's job is to write OpenGL emulation it really needs to die and people need to stop using it.

If you don't care but want cross platform OpenGL ES then use ANGLE, or if you want something more modern that ditches most of that 20years of cruft look into one of the native WebGPU implementations (dawn, wgpu, ...)

kvark · on Aug 23, 2022

There is WebGPU (for example provided by wgpu/wgpu-native, or dawn). Easier to write, portable, and debuggable as Metal on macOS.

andrekandre · on Aug 23, 2022

  > Also, Vulkan is a waste of time for anyone who isn't a large studio. No one wants to write this stuff. The most common argument is "You only write it once." No, you don't.

this is really intersting. its not my area of much knowledge but i'm really curious why that is? what are the issues exactly?

andrewmcwatters · on Aug 23, 2022

What others have said here is accurate. Vulkan is excessively low-level for most.

OpenGL is not API-machine precise, but widely this isn’t relevant for people who are primarily concerned with shipping products.

APIs like Vulkan, DirectX 12, and WebGPU need to exist, but there is too much fragmentation in the graphics API space now and because of this API decision making that has taken place, the situation will not be smoothed over for probably a decade.

Many people hoped for an OpenGL successor that provided API-machine precision without explicit verbosity that you could simply opt-into if you needed control. Instead what we ended up with were required definitions of API mechanics.

These aren’t obvious to implement. It also debatably moves engineering efforts to the wrong people. There are graphics programmers who should not be involved in adapter-device type programming, who instead should be focusing simply on graphics shader work and the like.

Vulkan is a textbook case of throwing the baby out with the bathwater in pursuit of a more precise API. It throws out development practicality by allowing vendor engineers to wipe their hands of things that some of us want them to take care of.

Imagine if every time you wanted to write software you first needed to probe for what CPU cores existed, and also define how to allocate memory, and also how your program should run.

This is needless. It would also create a scenario where people would say similar things like, “Well sure it’s verbose, but of course you would only write this once.”

And everyone, individually, would do this, because everyone, individually would be forced to, and they’d all have to maintain it.

And along the way, most would get it wrong. Or implement only a subset of desired behavior.

andrekandre · on Aug 24, 2022

thanks for the reply!

so if i understand correctly, it would have been better to have an api that had more of an onramp with increasing complexity when you needed it, but instead all we got was fragmented low-level apis?

andrewmcwatters · on Aug 26, 2022

boondaburrah · on Aug 23, 2022

for me BGFX has become the API since I'm basically tired of wrassling how fragmented graphics programming seems to be. I just wish the shader language wasn't semi-custom GLSL-with-C-macros so I could use existing tooling for previewing/editing what I'm working on.

skocznymroczny · on Aug 22, 2022

I don't know why there's so much love for OpenGL in the communities still. Maybe it's the "open" part in the name, which was always confusing people, thinking it's an open source standard or something like that.

The API is very antiquated, doesn't match modern GPU architectures at all and requires many workarounds in the driver to get the expected functionality, often coming at a performance cost.

Vulkan is nice, but it goes into the other extreme. It's very low level and designed for advanced users. Even getting anything on the screen in Vulkan is intimidating because you have to write everything from scratch. To go beyond hello world, you even have to write your own memory allocator (or use an existing opensource one) because you can only do a limited amount of memory allocations and you're expected to allocate a huge block of memory and suballocate it as needed by your application.

In comparison, DX12 is a bit easier to grasp. It has some nice abstractions such as commited resources, which take some of the pain away.

Personally I like Metal as an API. It is lower level than OpenGL, getting rid of most nasty OpenGL things (state machine, lack of pipeline state objects), yet it is very approachable and easy to transition to from DX11/OpenGL. I was happy when I saw WebGPU was based on Metal at first. WebGPU is my go-to 3D API at the moment, especially with projects like wgpu-native which make it usable on native platforms too (don't let the Web in WebGPU confuse you).

mort96 · on Aug 22, 2022

You acknowledge that Vulkan is too low level for people who aren't investing billions into an AAA graphics engine. And you surely know that OpenGL and Vulkan are the only two cross-platform graphics APIs. Are you sure you can't infer why people like OpenGL from those two points? Especially in Linux-heavy communities where DX and Metal aren't even options?

I assure you, none of the "love" for OpenGL comes from the elegance of its design.

bitwize · on Aug 22, 2022

There should be more effort to support Direct3D under Linux. We have Wine and DXVK, but it should be easier to integrate the D3D support into Linux applications.

charcircuit · on Aug 23, 2022

There is DXVK native which doesn't require wine.

https://github.com/Joshua-Ashton/dxvk-native

mappu · on Aug 23, 2022

As well as Wine and DXVK there's also native Gallium Nine, which is `libd3dadapter9-mesa` in Debian.

Teknoman117 · on Aug 22, 2022

> Vulkan is nice, but it goes into the other extreme. It's very low level and designed for advanced users. Even getting anything on the screen in Vulkan is intimidating because you have to write everything from scratch.

I honestly believe that this is the major reason. Developing a hobby project with OpenGL is little more than using SDL or GLFW to get a window with a GLContext and then you can just start calling commands. Vulkan is much more complicated and unless you're really pushing performance limits, you're not getting much of a benefit for the extra headache.

gary_0 · on Aug 22, 2022

OpenGL is what you use if you just want to render some triangles on the GPU with a minimum of hassle on the most platforms (which is quite a few if you include GLES, WebGL, and ANGLE). Most people aren't writing graphics engines for AAA games so OpenGL is all they need.

astrange · on Aug 23, 2022

It's a difficult API to do that with; it's as imperative and state based as you could possibly be and the error handling isn't exactly good. Surely there's easier cross platform libraries for it now.

Teknoman117 · on Aug 23, 2022

I think that still falls under more than "putting a few triangles on the screen".

If you're doing anything advanced with OpenGL, yeah, debugging it is a pain and the statefulness isn't fun. But if all you're doing is trying to put some textured quads (two triangles) on the screen and you're not wrangling multiple framebuffers, there's not much to go wrong.

gjsman-1000 · on Aug 22, 2022

Optimistic that OpenGL 2.1 will be available by the end of the year on Asahi - well that is news. It's only 2.1, but that's enough (as stated) for a web browser, desktop acceleration, and old games.

Also RIP all the countless pessimistic "engineers" here and elsewhere saying we'd be waiting for years more for any graphics acceleration.

Edit: It is true though that AAA Gaming will wait: "Please temper your expectations: even with hardware documentation, an optimized Vulkan driver stack (with enough features to layer OpenGL 4.6 with Zink) requires over many years of full time work. At least for now, nobody is working on this driver full time. Reverse-engineering slows the process considerably. We won’t be playing AAA games any time soon."

Still, even if that be the case, accelerated desktop is an accelerated desktop, much sooner than many expected.

sudosysgen · on Aug 22, 2022

OpenGL 2.1 is not enough for browsers - Firefox requires OpenGL 3.2 for limited hardware acceleration support, and there will be limitations on Chrome as well. Desktop acceleration is a similar story, KWin will have a degraded experience for OpenGL less than 3. There is also the possibility of breakage due to missing extensions.

Most old games won't work because most are going to require OpenGL>=3. For D3D games from Windows, translation efforts have long been focused on Vulkan, and even then, you are very likely to require extensions which aren't present or further versions of OpenGL.

So that's to say that, it's not about AAA games, it's about Firefox, (full) KDE, and Minecraft. Which is going to take a few years, probably. There is no RIP, it's going about as well as those people were predicting - I'm expecting it's not going to be able to have full browser acceleration and play Minecraft until 2 more years at the very least.

smoldesu · on Aug 22, 2022

It's pretty insane that OpenGL 2.1 is even functional on a GPU this strange, but remember; this is still an unfinished, hacky implementation (the author's own concession). Plus, you're going to be stuck on x11 until any serious GPU drivers get written, which in many people's opinion is just as bad as no hardware acceleration at all. No MacOS-like trackpad gestures either, you'll be waiting for Wayland support to get that too. It'll definitely be a boon for web browsing though, so I won't deny that. What I'm really curious about is older WINE titles with Box86, if you could get DOS titles like Diablo 2 running smoothly, it could probably replace my Switch as a portable emulation machine...

gjsman-1000 · on Aug 22, 2022

> pretty insane that OpenGL 2.1 is even functional on a GPU this strange,

Well... you were one of the most vocal critics saying it wouldn't happen anytime soon.

> unfinished, hacky implementation (the author's own concession)

Still more stable than Intel's official Arc drivers, so who defines "hacky"? ;)

> Plus, you're going to be stuck on x11 until any serious GPU drivers get written

Only because it is running on macOS, which supports X11 but not Wayland. On Linux, Wayland or X11 will both work, no problem.

> No MacOS-like trackpad gestures either, you'll be waiting for Wayland support to get that too

Again, Wayland will work on Day 1, it's just a limitation of running the driver on macOS until the kernel support is ready. When it is on Linux, Wayland will be a full-go.

smoldesu · on Aug 22, 2022

> Well... you were one of the most vocal critics saying it wouldn't happen anytime soon.

Yep. Been beating that drum since 2020, looks history proved me right on this one.

> Still more stable than Intel's official Arc drivers, so who defines "hacky"? ;)

Apparently not me, I had no idea that the M1 supported Vulkan and DirectX 12.

kirbyfan64sos · on Aug 22, 2022

I'm a bit confused as to why X11/wayland would be a huge issue here? The Mesa docs do say X11-only, but they're referring to running the driver on macOS (hence the XQuartz reference), where Wayland basically doesn't exist.

smoldesu · on Aug 22, 2022

Ah, looks like I definitely missed that.

In any case, I don't think Asahi/M1 has proper KWin or Mutter support yet. It's still going to take a while before you get a truly smooth desktop Linux experience on those devices, but some hardware acceleration is definitely better than none!

rowanG077 · on Aug 22, 2022

I mean the signs were clear for basically one and a half years now. It was never a question of if. But a question of when. There were just so many voices that didn't know what they were talking about. Comparing it to nouveau for example.

viraptor · on Aug 22, 2022

> if you could get DOS titles like Diablo 2

Did you mean some other title? Even diablo 1 was a Windows game.

Miraste · on Aug 22, 2022

Why would you need Wayland for trackpad gestures?

3836293648 · on Aug 22, 2022

You technicay don't, but the implementations on X are kinda terrible and can't do 1:1

pornel · on Aug 22, 2022

Apple could help by documenting this stuff. I remember the good old days when every Mac OS X came with an extra CD with Xcode, and Apple was regularly publishing Technical Notes detailing implementation details. Today the same level of detail is treated as top secret, and it seems that Apple doesn't want developers to even think beyond the surface of the tiny App Store sandbox.

dagmx · on Aug 22, 2022

Even back in the day, those technical notes would not cover private APIs like this, because they’re subject to change or are for internal use only.

These are the same in any closed source OS

alberth · on Aug 22, 2022

I really wish Apple would do another "Snow Leopard" - go an entire year WITHOUT any new features and just fix bugs and documentation.

This twitter thread is a perfect example of why it's needed

https://twitter.com/nikitonsky/status/1557357661171204098

madeofpalk · on Aug 22, 2022

Ahh yes, that "just fix bugs" release that would delete your main user account if you used a guest user https://www.engadget.com/2009-10-12-snow-leopard-guest-accou...

Besides, rebuilding/redesigning the settings screen is a perfect "snow leopard" thing. That's not an actual Feature.

The problem isn't doing features, the probably is doing a bad job.

Wowfunhappy · on Aug 23, 2022

We've had several.

• Mountain Lion / Mavericks

• El Capitan

• High Sierra

Yes, these releases had new features, but so did Snow Leopard. Behind the scenes there was grand central dispatch, and in front there was QuickTime X, App Exposé, and Exchange support.

I also happen to think Snow Leopard is overrated, its legacy largely due to the fact that Lion was really bad.

This isn't to say Apple shouldn't do another polish release. They're overdue for one.

DerekL · on Aug 22, 2022

Snow Leopard had few user-facing features, but it did have new APIs, such as Grand Central Dispatch and OpenCL, and also an optional 64-bit kernel.

https://en.wikipedia.org/wiki/Mac_OS_X_Snow_Leopard

sudosysgen · on Aug 22, 2022

OpenCL is not an OS level API. I guarantee you they were basically just redistributing Intel and NVidia implementations. GCD isn't OS level either, it's just a library, but it is at least a new API.

GeekyBear · on Aug 22, 2022

> I guarantee you they were basically just redistributing Intel and NVidia implementations.

OpenCL was created and open sourced by Apple.

>OpenCL was initially developed by Apple Inc., which holds trademark rights, and refined into an initial proposal in collaboration with technical teams at AMD, IBM, Qualcomm, Intel, and Nvidia.

OpenCL 1.0 released with Mac OS X Snow Leopard on August 28, 2009.

https://en.wikipedia.org/wiki/OpenCL#History

sudosysgen · on Aug 23, 2022

I think you might not understand what OpenCL is. It's not software, it's a spec. Apple created the original spec and probably an initial reference implementation. Then they brought it to Kronos Group where the spec was changed and made ready for production. Then the vendors made non-reference implementations, which is thr actual code that makes OpenCL more than a curiosity. It makes no sense to say that you open sourced OpenCL because OpenCL isn't software, and it was worked on by more than Apple before its release.

I'm talking about the GPU OpenCL implementation. Implementing OpenCL on the CPU is not really a big deal. I can guarantee you that these implementations were Intel and NVidia and were just redistributed.

czhiddy · on Aug 23, 2022

I mean, the other poster "guaranteed" it, so I don't know who to believe here.

sudosysgen · on Aug 23, 2022

You don't have to believe anyone. Try to run code on the GPU using OpenCL on Snow Leopard, you will be using NVidia or Intel implementations. I know as much because I've been there done that 13 years ago.

astrange · on Aug 23, 2022

libdispatch/GCD has kernel side integration in some parts for performance, though it can live without it.

argsnd · on Aug 22, 2022

I mean that thread is looking at pre-release software

ianlevesque · on Aug 22, 2022

There is software that approaches barely functional after dozens of rounds of QA testing, and then there is software that is implemented on a solid foundation with care and happens to have a few bugs. Unfortunately that many bugs in a beta implies the former. I think the thread comes from a disappointment that Apple is moving from the second category to the first.

buildbot · on Aug 22, 2022

But it is not even a "Consumer Beta" it is a developer beta- for catching bugs and allowing devs to create applications for new APIs while Apple polishes the build for release? Was snow leopard ever released as a dev beta even?

astrange · on Aug 23, 2022

There wasn't a customer beta program at the time IIRC. Either way, as a dev you'd still want to test on an OS after a year of work even if there weren't "new features".

Bug fixes and performance work on the OS are even more likely to break your app than feature work is! Bug fixes in the OS are pretty likely to cause new bugs too…

madeofpalk · on Aug 22, 2022

Apple Platform Security: May 2022. 242 pages.

https://help.apple.com/pdf/security/en_GB/apple-platform-sec...

hrydgard · on Aug 22, 2022

There is no good reason to flip the flag dynamically at runtime and apps just don't do that, so flushing the pipeline should be perfectly fine, even in an implementation of the clip control extension.

1letterunixname · on Aug 22, 2022

This brings back memories of a computer graphics course. It was (re)implementing part of the OpenGL 1.3 pipeline in C++98 as a 2D triangle-to-trapezoid engine using the painter's algorithm (and then W- or Z-buffering). Coincidentally, it happened to use GLUT+GLU and work on Windows 98, Solaris, and Linux... circa 2001.

https://www.opengl.org/resources/libraries/glut/glut_downloa...

rowanG077 · on Aug 22, 2022

Amazing that this works because of the herculean effort of just a handful of people.

naillo · on Aug 22, 2022

This person has an awesome set of blog posts. One of the few rss feeds I keep track of.

skrrtww · on Aug 22, 2022

Despite the progress here, for me it raises a question: Most of the old games she mentions are x86 32bit games. What's the story for how these programs are actually going to run in Asahi? Box86 [1] doesn't sound like it's projected to run on M1. Rosetta 2 on macOS allows 32-bit code to be run by a 64-bit process, which is the workaround CrossOver et. al. use (from what I understand), but that obviously won't be available?

[1] https://box86.org

mort96 · on Aug 22, 2022

Someone would need to make an x86 -> ARM recompiler like Rosetta 2. That's not an easy task, but also not the task she's tackling with the GPU driver.

It's not unprecedented in the open-source space though; the PCSX2 PlayStation 2 emulator for example contains a MIPS -> x86 recompiler, and the RPCS3 PlayStation 3 emulator contains a Cell -> x86 recompiler.

TazeTSchnitzel · on Aug 22, 2022

QEMU has a “user mode” feature where it can transparently emulate a Linux process and translates syscalls. You can probably run at least old 32-bit Linux games that way, assuming you have appropriate userland libraries available. Windows content might be trickier.

rowanG077 · on Aug 22, 2022

Rosetta 2 runs on Linux. There's also FEX.

skrrtww · on Aug 22, 2022

I guess that's true, I forgot about Apple making Rosetta 2 installable in Linux VMs.

Also though, since Rosetta 2 was released, it's had an incredibly slow implementation of x87 FPU operations, and anything that relies on x87 floating point math (including lots of games) is currently running about 100x slower than it ought to. Apple is aware of it but it's still not fixed in Ventura.

I hadn't heard of FEX before, looks interesting.

mort96 · on Aug 22, 2022

Huh, I thought everyone used SSE floats these days. I suppose there may be old games compiled with x87 floats, but I'd expect those to be made for CPUs so old that even slow x87 emulation wouldn't be a big issue.

What software do people have x87-related issues with?

skrrtww · on Aug 22, 2022

The software I personally have the most issues with is Star Wars Episode 1: Racer, a 3d title from 1999 that from what I understand uses x87 math extensively. In Parallels (i.e. no Rosetta) it runs at 120fps easily, while in CrossOver the frame rate barely ekes above 20. Old titles like Half-Life, all other Source games, Fallout 3, SWTOR etc. all run vastly worse than they should, and many cannot run at playable framerates through Rosetta. Honestly, the problem most likely extends to more of Rosetta's floating point math than just x87.

The author of REAPER has also written about it some: https://user.cockos.com/~deadbeef/index.php?article=842

There's been lots of discussion about the issue in the Codeweavers forums, and Codeweavers points the blame squarely at Apple, who have been, predictably, very quiet about it.

1ace · on Aug 23, 2022

FEX decided to not support page sizes >4k, which is fair (it's a lot of work), but it means that it won't be usable on Apple silicon, so that's not an option.

rowanG077 · on Aug 23, 2022

Afaik you can run kernels with 4k pagesize on apple Mx platform. It's just not optimal.

amluto · on Aug 22, 2022

Does it? Or does Rosetta 2 run on Mac OS with a Linux shim to ask the host to kindly Rosetta-ify a given binary?

wtallis · on Aug 23, 2022

Pretty much immediately after the first developer beta providing Rosetta for Linux shipped, people tried to get it to work on non-Apple hardware, and succeeded: https://news.ycombinator.com/item?id=31662978

There's no macOS host to interact with on AWS Graviton3 instances.

amluto · on Aug 23, 2022

Huh, I misunderstood the virtiofs mechanism.

58028641 · on Aug 22, 2022

Does Rosetta on Linux support 32 bit code? I believe FEX does.

saagarjha · on Aug 22, 2022

Rosetta supports emulating 32-bit code.

58028641 · on Aug 22, 2022

On Linux? I know it has been confirmed on macOS. I haven’t heard anyone say they ran 32 bit code on Linux.

saagarjha · on Aug 22, 2022

I should double-check but I seem to recall the functionality being present, although of course nobody uses it

moondev · on Aug 22, 2022

Can you run a PCIE enclosure over thunderbolt on asahi Linux yet? Could this enable GPUs that already work on aarch64 Linux?

hishnash · on Aug 22, 2022

I would assume all linux GPU drivers would need to be adapted at least a little to support the larger page size (most linux AARCH64 kernel level code is writing assuming 4kb pages).

kmeisthax · on Aug 22, 2022

Yes, but NOT for GPUs. Apple Silicon does not support non-Device mappings over Thunderbolt, so eGPUs will never work.

fbanon · on Aug 22, 2022

Couldn't you just pre-multiply the projection matrix to remap the Z range from [-1,1] to [0,1]?

Jasper_ · on Aug 22, 2022

This is effectively what the vertex shader modification would do -- the same trick that ANGLE does: gl_Position.z = (gl_Position.z + gl_Position.w) * 0.5;

This is the same as modifying a projection matrix -- you're doing the same post-multiply to the same column. But note that there's no guarantee there's ever a projection matrix. Clip space coordinates could be generated directly in the vertex shader.

NobodyNada · on Aug 22, 2022

What projection matrix?

Remember that this translation needs to happen at the graphics driver level. For fixed-function OpenGL where the application actually passes the graphics driver a projection matrix this would be doable. But if your application is using a version of OpenGL newer than 2004, the projection matrix is a part of your vertex shader. The graphics driver can't tell what part of your shader deals with projection, and definitely can't tell what uniforms it would need to tweak to modify the projection matrix -- many shaders might not even have a projection matrix.

fbanon · on Aug 22, 2022

I know. But the second sentence of the article starts with:

"Neverball uses legacy “fixed function” OpenGL."

But also you could simply remap the Z coordinate of gl_Position at the end of the vertex stage, do the clipping in [0,1] range, then map it back to [-1,1] for gl_FragCoord at the start of the fragment stage.

NobodyNada · on Aug 22, 2022

> "Neverball uses legacy “fixed function” OpenGL."

Sure, it'd work for Neverball, but the article is clear that they're looking for a general solution: something that'd work not just for Neverball, but for all OpenGL applications, and would ideally let them give applications control over the clip-control bit through OpenGL/Vulkan extensions.

> But also you could simply remap the Z coordinate of gl_Position at the end of the vertex stage, do the clipping in [0,1] range, then map it back to [-1,1] for gl_FragCoord at the start of the fragment stage.

Yes, that was the current state-of-the-art before this article was written:

> As Metal uses the 0/1 clip space, implementing OpenGL on Metal requires emulating the -1/1 clip space by inserting extra instructions into the vertex shader to transform the Z coordinate. Although this emulation adds overhead, it works for ANGLE’s open source implementation of OpenGL ES on Metal.

> Like ANGLE, Apple’s OpenGL driver internally translates to Metal. Because Metal uses the 0 to 1 clip space, it should require this emulation code. Curiously, when we disassemble shaders compiled with their OpenGL implementation, we don’t see any such emulation. That means Apple’s GPU must support -1/1 clip spaces in addition to Metal’s preferred 0/1. The problem is figuring out how to use this other clip space.

viktorcode · on Aug 22, 2022

Can someone explain to me why support OpenGL at all? Vulkan is easier to implement. Is there a need for OpenGL on Linux?

gjsman-1000 · on Aug 22, 2022

On a reverse-engineered GPU like this, because of Vulkan's low-level design, implementing (early) OpenGL might actually be significantly easier.

Also, Vulkan isn't popular with game developers because availability sucks. Vulkan doesn't run on macOS. Or iOS. Or 40% of Android phones. Or Xbox. Or PlayStation. Or Nintendo Switch[1].

Unless you are targeting Windows (which has DirectX and OpenGL already), or those 60% of Android phones only, or Linux, why would you use Vulkan? On Windows, DirectX is a generally-superior alternative, and you get Xbox support basically free, and if you also support an older DirectX, much broader PC compatibility. On Android, just use OpenGL, and don't worry about separate implementations for the bifurcated Vulkan/OpenGL support. On Linux, just use Proton with an older DirectX. Whiz bang, no need for Vulkan whatsoever. Yes, some systems might perform better if you had a Vulkan over OpenGL, but is the cost worth it when you don't need it?

[1] Technically, Vulkan does exist for Nintendo Switch, but it is so slow almost no production game uses it, and it is widely considered not an option. Nintendo Switch is slow enough without Vulkan making it slower. Much easier just to use the proprietary NVIDIA library.

dagmx · on Aug 22, 2022

Because Vulkan, despite the mystical reputation it has in gaming circles, actually has fairly low adoption vs OpenGL .

Very few applications in the grand scheme of things use Vulkan, and a minority of games do.

Therefore the ROI on supporting OpenGL is very high.

58028641 · on Aug 22, 2022

Doesn’t implementing Vulkan give you DirectX with DXVK and VKD3D and OpenGL with Zink for free?

Cu3PO42 · on Aug 22, 2022

Only if you support all of the necessary Vulkan features and extensions. The article states that getting to that point would be a multi-year full time effort, whereas "only" OpenGL seems to be within grasp for this year. And arguably having a lower OpenGL standard soon is better than OpenGL 4.6 in a few years.

erichocean · on Aug 22, 2022

Yes, with appropriate (and reasonably-available) Vulkan extensions.

Jasper_ · on Aug 22, 2022

Because of how mesa is structured. OpenGL is notoriously terrible to implement, so there's a whole framework called Gallium that does the hard work for you, and you slot yourself into that. Meanwhile, Vulkan is easier to implement from scratch, so there's a lot less infrastructure for it in mesa, and you have to implement more of the boring paperwork correctly.

It's an accident of history more than anything else. Once the reverse engineering is further along, I expect a Vulkan driver to be written for it, and the Gallium one to be phased out in favor of Zink.

phire · on Aug 22, 2022

Keep in mind that Mesa actually implements most of OpenGL for you. Its not like you are implementing a whole OpenGL driver from scratch, you are mostly implementing a hardware abstraction layer.

My understanding is that this hardware abstraction layer for mesa is way easier to implement than a full vulkan driver, especially since the earlier versions of OpenGL only require a small subset of the features that a vulkan driver requires.

bla3 · on Aug 22, 2022

> Here’s a little secret: there are two graphics APIs called “Metal”. There’s the Metal you know, a limited API that Apple documents for App Store developers, an API that lacks useful features supported by OpenGL and Vulkan.

> And there’s the Metal that Apple uses themselves, an internal API adding back features that Apple doesn’t want you using.

Apple does stuff like this so much and gets so little flak for it.

I use macOS since it seems like the least bad option of you want a Unix but also don't want to spend a lot of time on system management, but this is a real turn-off.

Jasper_ · on Aug 22, 2022

As a graphics engineer, good riddens to the old clip space, 0...1 really is the correct option. We also don't know what else "OpenGL mode" enables, and the details of what it does probably changes between GPU revisions -- the emulation stack probably has the details, and changes its own behavior of what's in hardware and what's emulated in the OpenGL stack depending on the GPU revision.

Also, to Alyssa, if she's reading this: you're just going to have to implement support shader variants. Build your infrastructure for supporting them now. It's going to be far more helpful than just for clip control.

But yes, the Vulkan extension was just poorly specified, allowing you to change clip spaces between draws in the same render pass is, again, ludicrous, and the extension should just be renamed VK_EXT_i_hate_tilers (like so many others of their kind). Every app is going to set it at app init and forget it; the implementation using the render pass bit and flushing on change will cover the 100% case, and won't be slow at all.

bpye · on Aug 22, 2022

> you're just going to have to implement support shader variants

I admittedly have zero experience with Mesa, but it seems like shader variants is something that should be common infrastructure? Though of course the reason that a variant is needed would be architecture specific.

garaetjjte · on Aug 22, 2022

>good riddens to the old clip space, 0...1 really is the correct option

More like 1...0, which nicely improves depth precision. Annoyingly due to symmetric -1...1 range reverse-Z cannot be used on OpenGL out of the box, but it can be fixed with ARB_clip_control. https://developer.nvidia.com/content/depth-precision-visuali...

Pulcinella · on Aug 22, 2022

Apple does stuff like this so much and gets so little flak for it.

It would be one thing if the private APIs were limited to system frameworks and features while Apple’s own apps weren’t allowed to use them, but they do. E.g. The Swift Playgrounds app for iPad is allowed to share and compile code, run separate processes, etc. which isn’t normally allowed in the AppStore. They also use blur and other graphical effects (outside of the background blur material and the SwiftUI blur modifier) that are unavailable outside of private APIs.

It stinks because of the perceived hypocrisy and the inability to compete on a level playing field or leave the AppStore (and I say this as someone who normally doesn’t mind the walled garden!)

adrian_b · on Aug 22, 2022

Unfortunately such a behavior is not at all new.

The best known example of these methods is how Microsoft has exploited the replacement of MS-DOS with Windows 3.0 and especially with Windows 95.

During the MS-DOS years, the only Microsoft software products that were successful were their software development tools, i.e. compilers and interpreters, and even those had strong competition, mainly from Borland. Those MS products addressed only a small market and they could not provide large revenues. The most successful software products for MS-DOS were from many other companies.

That changed abruptly with the transition to various Windows versions, when the Microsoft developers started to have a huge advantage over those from any other company, both by being able to use undocumented internal APIs provided by the MS operating systems and also by knowing in advance the future documented APIs, before they were revealed to competitors.

Thus in a few years MS Office has transitioned from an irrelevant product, much inferior to the competition, to the dominant suite of office programs, which has eliminated all competitors and which has become the main source of revenue for MS.

bri3d · on Aug 22, 2022

> Apple does stuff like this so much and gets so little flak for it.

Why should they get flak for having internal APIs? The fact that the internal API is a superset of the external API is smart engineering.

Think about it this way: Apple could just as well have made the "Metal that Apple uses themselves" some arcane "foocode" IR language or something, as I'm sure many shader compilers and OpenGL runtime implementations do, and nobody would be nearly as mad about it.

The fact that they use internal APIs for external apps in their weird iOS walled garden is obnoxious, but having private, undocumented APIs in a closed-source driver is not exactly an Apple anomaly.

LeifCarrotson · on Aug 22, 2022

> Why should they get flak for having internal APIs? The fact that the internal API is a superset of the external API is smart engineering.

It's not about having good segmentation of user-facing and kernel-side libraries, no one faults them for that.

It's about Apple building user-facing apps that use the whole API, and then demanding that other developers not use the features required to implement those apps because we're not trusted to maintain the look-and-feel, responsiveness, or battery life expectations of apps on the platform.

dcx · on Aug 22, 2022

But isn't it kind of fair to say that when you look at the case studies presented by (a) the Android app store in the past decade and (b) Windows malware in the decade before that, this trust has in fact not been earned?

I hate a walled garden as much as the next developer, and the median HN reader is probably more than trustworthy. But past performance does predict future performance.

chaxor · on Aug 22, 2022

This is why the Asahi Linux project is so exciting!! You get the great performance at low-power (M* ARM processors) while still getting the more performant and useful Linux experience.

I am really thankful to the Asahi Linux team, and specifically in this instance for the GPU, [Alyssa Rosenweig](https://github.com/alyssarosenzweig), [Asahi Lina](https://github.com/asahilina), and [Doug all Johnson](https://github.com/dougallj).

gjsman-1000 · on Aug 22, 2022

> Apple does stuff like this so much and gets so little flak for it.

To be fair, Windows has a ludicrous amount of undocumented APIs for internal affairs as well, and you can get deep into the weeds very quickly, just ask the WINE Developers who have to reverse-engineer the havoc. There is no OS without Private APIs, but Windows is arguably the worst with more Private or Undocumented APIs than Apple.

This actually bears parallels to Metal. Until DirectX 12, Windows had no official way to get low-level. Vulkan and OpenGL are only 3rd-party supported, not Microsoft-supported, Microsoft officially only supports DirectX. If you want Vulkan/OpenGL, that's on your GPU vendor. If you wanted low-level until 12, you may have found yourself pulling some undocumented shenanigans. Apple hasn't gotten to their DirectX 12 yet, but they'll get there eventually.

As for why they are Private, there could be many reasons, not least of which that (in this case) Apple has a very complicated Display Controller design and is frequently changing those internal methods, which would break compatibility if third-party applications used them. Just ask Asahi about how the DCP changed considerably from 11.x to 13.x.

chongli · on Aug 22, 2022

Apple has a very complicated Display Controller design

Can anyone in the know give more information here? Why would Apple want to do this? What could they be doing that's so complicated in the display controller?

gjsman-1000 · on Aug 22, 2022

https://twitter.com/marcan42/status/1549672494210113536

and

https://twitter.com/marcan42/status/1415360411260493826?lang...

and

https://twitter.com/marcan42/status/1526104383519350785

As to why? Well, if it ain't broke don't fix it from iPhone, but it is still a bit of a mystery.

In a nutshell from those threads:

1. Apple's DCP silicon layout is actually massive, explaining the 1 external display limit

2. Apple implements half the DCP firmware on the main CPU and the other half on the coprocessor with RPC calls, which is hilariously complicated.

3. Apple's DCP firmware is versioned, with a different version for every macOS release. This is also why Asahi Linux currently uses a "macOS 12.3" shim, so they can focus on the macOS 12.3 DCP firmware in the driver, which will probably not work with the macOS 12.4+ DCP firmware or the macOS 12.2- firmware.

I can totally see why Apple doesn't want people using their low-level Metal implementation that deals with the mess yet.

phire · on Aug 22, 2022

The complexity with the firmware split across the main CPU and a coprocessor seems to be a historical artefact.

Seems the DCP driver was originally all on the main CPU, and when apple got these cheap coprocessor cores, they took a lazy approach of just inserting a simple RPC layer in the middle. The complexity for Asahi comes from the fact that it's a c++ API that can change very dynamically from version to version.

And yes, these ARM coprocessor cores are cheap, apple have put at least 16 of them [1] on the M1, on top the 4 performance and 4 efficiency cores. They are an apple custom design that implement only the 64bit parts of the ARMv8 spec. I'm not entirely sure why the actual DCP is so big, but it's not because of the complex firmware. Potentially because the DCP includes enough dedicated RAM to store an entire framebuffer on-chip.

If so, they will be doing this because it allows for lower power consumption. The main DRAM could be put in a power-saving mode and kept there for seconds or even minutes at a time without having to wake it up multiple times per frame, even when just showing a static image.

[1] https://twitter.com/marcan42/status/1557242428876537856

throwaway08642 · on Aug 22, 2022

@marcan42 said that on the M1 MacBook Pro models, the DCP also implements hardware-level antialiasing for the notch and rounded display corners.

chongli · on Aug 22, 2022

Yeah it makes perfect sense that they don't want to expose any of that complexity to 3rd parties and risk constant breakage with new models. I'm just really curious about what sort of complex logic they have going on in that silicon.

fezfight · on Aug 22, 2022

If you buy a hackintosh, you have to sometimes mess around to get stuff to work. Same goes for Linux on random hardware. If you check first and buy a machine that supports the OS you're using, you don't have to do anything special. It'll work as you expect.

It's freeing not to be beholden to the likes of someone like Tim Cook who, it would seem, spends the majority of his waking hours figuring out how to hide anticonsumer decisions under rugs.