It's not super complex. I ended up just modifying the locations of the layer toggle keys. In the default Miryoku layout, in order to switch the keys to a different layer on the right hand you need to hold a button on the left hand. I found this to be annoying since some actions like entering and using a navigation layer can be done on 1 hand.
Do you have any good resources that go into detail on GPU ISAs or GPU architecture? There's certainly a lot available for CPUs, but the resources I’ve found for GPUs mostly focus on how they differ from CPUs and how their ISAs are tailored to the GPU's specific goals.
Edit: I should say that Apple also publishes decent stuff. See the link here and the stuff linked at the bottom of the page. But note that now you're in UMA/TBDR territory; discrete GPUs work considerably differently: https://developer.apple.com/videos/play/wwdc2020/10602/
I assume most people learn microarchitecture for performance reasons.
At which point, the question you are really asking is what aspects of assembly are important for performance.
Answer: there are multiple GPU Matrix Multiplication examples covering channels (especially channel conflicts), load/store alignment, memory movement and more. That should cover the issue I talked about earlier.
Optimization guides help. I know it's 10+ years old, but I think AMDs OpenCL optimization guides was easy to read and follow, and still modern enough to cover most of today's architectures.
Beyond that, you'll have to see conferences about DirectX12 new instructions (wave instructions, ballot/voting, etc. etc) and their performance implications.
It's a mixed bag, everyone knows one or two ways of optimization but learning all of them requires lots of study.
Branch Education apparently decapped and scanned a GA102 (Nvidia 30 series) for the following video: https://www.youtube.com/watch?v=h9Z4oGN89MU. The beginning is very basic, but the content ramps up quickly.
Just out of curiosity, have you checked out Spack, https://github.com/spack/spack, which has a lot of HPC users. Support for mixing and matching both system and from source dependencies has been extremely useful in my work.
Other areas of success have been just offloading the typing/prototyping. I know exactly how the code should look like so I rarely run into issues.
reply