1. Could it be that the (small fraction) of problems being solved correctly involve feature recognition and spatial position? Eg: all images have a shape in the top right corner -vs- bottom left corner
2. Using a model with 1050 features makes me uncomfortable, because of the ability to overfit, or guess solutions randomly. However, I'm not able to make a watertight argument and would appreciate an analysis along these lines. Numbers which such an analysis might involve: There are 10choose5=252 ways to partition 10 images into two sets of 5 (and 2 ways to assign the remaining two images). If one wants to improve the training by some form of augmentation or cross-validation, there are 36 ways to drop one image each from the left and right side, for each Bongard problem.
2. For testing, definitely should all add cross-testing on many partitions, and then report average or worst result. Still, for mt better than random results were good for a start.
I realize now that my comments might have sounded overly critical; didn’t mean to. The intention behind my probing was to understand precisely the regimes in which the algorithm works and fails :-) In particular, towards understanding how it might be improved and whether a NN approach is capable of solving Bongard problems.
2. You can't think of these features the same way you think of features in a linear model, or something like that. This is typical for a neural network.
It's not enough to know that the system correctly categorized the two test images. It could just be luck, since it's such a tiny input set.
One way to gain confidence that the correct rule was learned would be to use the correct answer to generate another dozen (or hundred) test images for both sides. Then the question would be whether all the test images are correctly categorized based on training over the initial dozen (or ten) images.
Does this suggestion sound reasonable to someone with ML experience?
Generative NN architectures could be used as well, like variational autoencoder (VAE), or generative adversarial networks (GAN). In this case, it would be “explanation by example”.
[...] NN would generate new examples of images from Bongard problems, and if this generated images capture the concepts behind classification rule, it might be enough to show understanding of the problem.