
Show HN: Solving visual analogy puzzles with Deep Learning - coolvision
https://k10v.github.io/2018/02/25/Solving-Bongard-problems-with-deep-learning/
======
ssivark
My first response on seeing these results:

1\. Could it be that the (small fraction) of problems being solved correctly
involve feature recognition and spatial position? Eg: all images have a shape
in the top right corner -vs- bottom left corner

2\. Using a model with 1050 features makes me uncomfortable, because of the
ability to overfit, or guess solutions randomly. However, I'm not able to make
a watertight argument and would appreciate an analysis along these lines.
Numbers which such an analysis might involve: There are 10choose5=252 ways to
partition 10 images into two sets of 5 (and 2 ways to assign the remaining two
images). If one wants to improve the training by some form of augmentation or
cross-validation, there are 36 ways to drop one image each from the left and
right side, for each Bongard problem.

~~~
coolvision
1\. Yes, you are right. Most of solved problems are quite simple. Still, have
to start somewhere? :)

2\. For testing, definitely should all add cross-testing on many partitions,
and then report average or worst result. Still, for mt better than random
results were good for a start.

~~~
ssivark
Oh, I found the experiment and blog post very interesting... Thanks for that!
Also, I’d never heard of Bongard problems, so I enjoyed reading about that
too.

I realize now that my comments might have sounded overly critical; didn’t mean
to. The intention behind my probing was to understand precisely the regimes in
which the algorithm works and fails :-) In particular, towards understanding
how it might be improved and whether a NN approach is capable of solving
Bongard problems.

------
akkartik
Really cool write-up, so I figure it's worth airing my ignorance.

It's not enough to know that the system correctly categorized the two test
images. It could just be luck, since it's such a tiny input set.

One way to gain confidence that the correct rule was learned would be to use
the correct answer to generate another dozen (or hundred) test images for both
sides. Then the question would be whether all the test images are correctly
categorized based on training over the initial dozen (or ten) images.

Does this suggestion sound reasonable to someone with ML experience?

~~~
coolvision
Yes, I'm thinking about exactly the same way to prove understanding of the
rule, it's in the end of the post:

>>> Generative NN architectures could be used as well, like variational
autoencoder (VAE), or generative adversarial networks (GAN). In this case, it
would be “explanation by example”. [...] NN would generate new examples of
images from Bongard problems, and if this generated images capture the
concepts behind classification rule, it might be enough to show understanding
of the problem.

