More

PerryStyle · 2026-03-29T22:06:08 1774821968

Meson is indeed nice, but has very poor support for GPU compilation compared to CMake. I've had a lot of success adopting the practices described in this talk, https://www.youtube.com/watch?v=K5Kg8TOTKjU. I thought I knew a lot of CMake, but file sets definitely make things a lot simpler.

PerryStyle · 2026-03-15T22:23:29 1773613409

I work in HPC and I’ve found it very useful in creating various shell scripts. It really helps if you have linters such as shellcheck.

Other areas of success have been just offloading the typing/prototyping. I know exactly how the code should look like so I rarely run into issues.

PerryStyle · 2026-01-21T19:13:25 1769022805

I would love to do this in the future, but knowing me I’d get caught up making sure I’m benchmarking properly then actually writing code.

PerryStyle · 2025-12-29T20:43:29 1767041009

I’d definitely recommend Miryoku for those starting out. You’re then free to make any modifications to suit your preferences.

I ended up making the layer activations happen on the same hand to allow 1 handed use.

MorehouseJ09 · 2025-12-30T18:01:01 1767117661

Using this for my next build. Could you share more on how you did the activations for 1-handed used? That sounds quite interesting.

PerryStyle · 2026-01-01T00:05:04 1767225904

It's not super complex. I ended up just modifying the locations of the layer toggle keys. In the default Miryoku layout, in order to switch the keys to a different layer on the right hand you need to hold a button on the left hand. I found this to be annoying since some actions like entering and using a navigation layer can be done on 1 hand.

PerryStyle · 2025-08-24T19:33:03 1756063983

+1. Learned about this in DB research course during grad school. Feldera is really cool.

Also I love their website design.

lsuresh · 2025-08-24T21:58:25 1756072705

Thanks for the kind words (Feldera co-founder here). I'll pass it on to the design team. :)

PerryStyle · 2025-08-18T03:50:50 1755489050

There are some solutions that try to tackle this in HPC. For example https://github.com/LLNL/mpibind is deployed on El Capitan.

Would be interesting to see if something similar appears for cloud workloads.

PerryStyle · 2025-07-10T02:03:30 1752113010

Do you have any good resources that go into detail on GPU ISAs or GPU architecture? There's certainly a lot available for CPUs, but the resources I’ve found for GPUs mostly focus on how they differ from CPUs and how their ISAs are tailored to the GPU's specific goals.

grg0 · 2025-07-10T02:25:37 1752114337

Unfortunately this is a topic that isn't open enough, and architectures change rather quickly so you're always chasing the rabbit. That being said:

RDNA architecture (a few gens old) slides has some breadcrumbs: https://gpuopen.com/download/RDNA_Architecture_public.pdf

AMD also publishes its ISAs, but I don't think you'll be able to extract much from a reference-style document: https://gpuopen.com/amd-gpu-architecture-programming-documen...

Books on CUDA/HIP also go into some detail of the underlying architecture. Some slides from NV:

https://gfxcourses.stanford.edu/cs149/fall21content/media/gp...

Edit: I should say that Apple also publishes decent stuff. See the link here and the stuff linked at the bottom of the page. But note that now you're in UMA/TBDR territory; discrete GPUs work considerably differently: https://developer.apple.com/videos/play/wwdc2020/10602/

If anyone has more suggestions, please share.

dragontamer · 2025-07-10T14:19:08 1752157148

I assume most people learn microarchitecture for performance reasons.

At which point, the question you are really asking is what aspects of assembly are important for performance.

Answer: there are multiple GPU Matrix Multiplication examples covering channels (especially channel conflicts), load/store alignment, memory movement and more. That should cover the issue I talked about earlier.

Optimization guides help. I know it's 10+ years old, but I think AMDs OpenCL optimization guides was easy to read and follow, and still modern enough to cover most of today's architectures.

Beyond that, you'll have to see conferences about DirectX12 new instructions (wave instructions, ballot/voting, etc. etc) and their performance implications.

It's a mixed bag, everyone knows one or two ways of optimization but learning all of them requires lots of study.

xelxebar · 2025-07-10T05:35:35 1752125735

Branch Education apparently decapped and scanned a GA102 (Nvidia 30 series) for the following video: https://www.youtube.com/watch?v=h9Z4oGN89MU. The beginning is very basic, but the content ramps up quickly.

PerryStyle · on March 27, 2025

Wow this one of the most interesting things I’ve come across. Definitely could learn a lot by tinkering with this.

Thanks!

PerryStyle · on March 11, 2025

Would it be possible to leverage the python array api standard? Or is that more suited for just computations?

PerryStyle · on March 6, 2025

Zotero's PDF viewer also does this now. Being able to annotate PDFs and having a reference manager has been a life saver.