All similarly sparse data samples would suffer from the bachnorm issue. I don’t remember if I tried a convnet with batchnorm on galaxy classification but I did try it on piano rolls - it was bad - precisely because of batchnorm, and had I first tried the same model on mnist I would have caught the issue much faster (I tested it on cifar).
I suspect a chess position evaluation would suffer from batchnorm just as much, if the intermediate feature maps remain sparse.
I suspect a chess position evaluation would suffer from batchnorm just as much, if the intermediate feature maps remain sparse.