I totally understand why George is upset. Troubleshooting crashes within a vendor's closed source product is one of the most painful things I've had to do professionally. In my case, it took months of patience and collecting pcaps to finally get it resolved. George doesn't have months.
As crappy as the situation is, I must say it's been great to see George's passion to solve these types of technical puzzles reignited! It seems like he's tried his hardest to work with AMD in an nonabrasive manor but he has become very frustrated by the lack of progress.
AMD would be wise to just pair one of their best, whom has access to the source, with George for just a couple days.
[1]
Spoke with @LisaSu. She was very polite, but said nothing of substance and indirectly rejected my offer.
Here’s a firmware crash, run ./loop.sh to trigger. Tested on tinybox and 1x7900XTX machine, ROCm 6.0.2 and 6.0.3 preview. Must reboot to bring back.
[2]
No need to hire me, just open source the 7900XTX firmware+docs and remove the signature check. We'd treat it like bring up for our own chip, get builds in CI, HITL testing, fuzzing, etc...
Deadline, end of the week? Otherwise I'm not spending more time thinking about this.
I've been fascinated by his live streams where he has been digging into the bug himself: https://www.youtube.com/@geohotarchive/videos
As crappy as the situation is, I must say it's been great to see George's passion to solve these types of technical puzzles reignited! It seems like he's tried his hardest to work with AMD in an nonabrasive manor but he has become very frustrated by the lack of progress.
AMD would be wise to just pair one of their best, whom has access to the source, with George for just a couple days.