Hacker News new | comments | show | ask | jobs | submit login
Show HN: Solving visual analogy puzzles with Deep Learning (k10v.github.io)
60 points by coolvision 4 months ago | hide | past | web | favorite | 6 comments



My first response on seeing these results:

1. Could it be that the (small fraction) of problems being solved correctly involve feature recognition and spatial position? Eg: all images have a shape in the top right corner -vs- bottom left corner

2. Using a model with 1050 features makes me uncomfortable, because of the ability to overfit, or guess solutions randomly. However, I'm not able to make a watertight argument and would appreciate an analysis along these lines. Numbers which such an analysis might involve: There are 10choose5=252 ways to partition 10 images into two sets of 5 (and 2 ways to assign the remaining two images). If one wants to improve the training by some form of augmentation or cross-validation, there are 36 ways to drop one image each from the left and right side, for each Bongard problem.


1. Yes, you are right. Most of solved problems are quite simple. Still, have to start somewhere? :)

2. For testing, definitely should all add cross-testing on many partitions, and then report average or worst result. Still, for mt better than random results were good for a start.


Oh, I found the experiment and blog post very interesting... Thanks for that! Also, I’d never heard of Bongard problems, so I enjoyed reading about that too.

I realize now that my comments might have sounded overly critical; didn’t mean to. The intention behind my probing was to understand precisely the regimes in which the algorithm works and fails :-) In particular, towards understanding how it might be improved and whether a NN approach is capable of solving Bongard problems.


1. I think this is the idea behind convolutional neural networks. You learn individual features in the image in the convolutional layers, and you learn their arrangement in the subsequent layers. It turns out that networks don't actually do that AFAIK, but that's the idea as I've been taught it.

2. You can't think of these features the same way you think of features in a linear model, or something like that. This is typical for a neural network.


Really cool write-up, so I figure it's worth airing my ignorance.

It's not enough to know that the system correctly categorized the two test images. It could just be luck, since it's such a tiny input set.

One way to gain confidence that the correct rule was learned would be to use the correct answer to generate another dozen (or hundred) test images for both sides. Then the question would be whether all the test images are correctly categorized based on training over the initial dozen (or ten) images.

Does this suggestion sound reasonable to someone with ML experience?


Yes, I'm thinking about exactly the same way to prove understanding of the rule, it's in the end of the post:

>>> Generative NN architectures could be used as well, like variational autoencoder (VAE), or generative adversarial networks (GAN). In this case, it would be “explanation by example”. [...] NN would generate new examples of images from Bongard problems, and if this generated images capture the concepts behind classification rule, it might be enough to show understanding of the problem.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: