I've been on a similar quest for hard to reproduce, timing/hardware/... bugs, and if you're facing any kind of skepticism (your own or otherwise) it can be very comforting to have a 10x or even 100x no failure occurred confidence.
It's particularly comforting when the reason for the failure/fix/change in behavior isn't completely understood.
If the bug occurs reasonably often, say usually once every 10 minutes, you can model an exponential distribution of the intervals between the bug triggering and then use the distribution to "prove" the bug is fixed in cases where the root cause isn't clear: https://frdmtoplay.com/statistically-squashing-bugs/
It's particularly comforting when the reason for the failure/fix/change in behavior isn't completely understood.