It's a very interesting approach, however once you have the probability distribution for each pixel, independent random sampling produces a poor dither pattern compared to Floyd-Steinberg or other error diffusion approaches.
I think once you have the target distributions then maybe you can combine the sampling with some error diffusion approach. The idea is to make the sampling of neighboring pixels negatively correlated, so the colors average out at shorter length scale.
For a sledgehammer approach you can try to have a blur in your loss function and try to sample from the combined probability distribution of all the pixels (ie. sample whole images). It would probably make the calculation even more expensive or possibly even infeasible.
A differentiable error diffusion loss would dither the image with quantisation (like in the post), but then minimise the difference between the blurred dithered image and the blurred original, instead of the dithered image and the original. This would tend to distribute errors so that the average colour in an area is the same in the dithered image as the original, similar to Floyd-Steinberg.
Although I agree with the sentiment I am going to point out for fun that the algorithm does not have to assign probabilities less than 1. Technically a solution like Floyd steinberg produces is in the search space. You would just need the right objective to motivate it
I think once you have the target distributions then maybe you can combine the sampling with some error diffusion approach. The idea is to make the sampling of neighboring pixels negatively correlated, so the colors average out at shorter length scale.
For a sledgehammer approach you can try to have a blur in your loss function and try to sample from the combined probability distribution of all the pixels (ie. sample whole images). It would probably make the calculation even more expensive or possibly even infeasible.