
Show HN: Transformation Invariant Reverse Image Search - pippy360
https://pippy360.github.io/transformationInvariantImageSearch/
======
pippy360
Hey guys, I'm the guy who made this. Let me know if anything isn't clear or if
you have any questions.

*I developed this after I saw how poor google (and any other reverse image search engine I tested, like bing/tineye) performed on rotated images, for example google doesn't match this image [0] to it's original [1] (google's neural network will find that it is a cat but won't match to the original). After playing around with trying to make my algorithm (uniform/nonuniform) scale and rotation invariant I found that I was able to make it fully 2D affine invariant (pretty much by luck)

I developed all this in my spare time actively for about about a year. I also
have a c++ implementation that let's you reproduce all this but it's just a
proof of concept and so it's quite slow. You can check that out on my github
here [2]. There are loads of ways this can be improved but I wanted to get the
idea out quickly.

[0]
[https://github.com/pippy360/transformationInvariantImageSear...](https://github.com/pippy360/transformationInvariantImageSearch/blob/master/fullEndToEndDemo/inputImages/cat2.png)

[1]
[https://github.com/pippy360/transformationInvariantImageSear...](https://github.com/pippy360/transformationInvariantImageSearch/blob/master/fullEndToEndDemo/inputImages/cat_original.png)

[2]
[https://github.com/pippy360/transformationInvariantImageSear...](https://github.com/pippy360/transformationInvariantImageSearch)

~~~
activatedgeek
Hey Tom,

Great work there!

One question - Why are we transforming the triangles to equilateral triangles?

Just throwing out a few ideas and wanted to discuss what potential flaws could
they have.

1\. For keypoint detection, what if you use ASIFT (Affine SIFT) which is
Affine transformation friendly. In that case, you'd probably save time doing
the rotations. Given the huge number of proposals we get, we might still need
to filter out some keypoint proposals from this may be using a metric like if
too many keypoints are within some 2D window, then choose the one which is
farthest from all edges of the 2D window (very rough, don't know if there are
other ways)

2\. With the final set of keypoints, I propose that let us do Delaunay
Triangulation in the hope of getting a collection of triangles which cover the
complete surface area of the image making it a spatially equidistant breakdown
of the image pixels.

3\. Hash those triangles (maybe? how?)

4\. Now given a query image, perform steps 1-3, and find the triangles which
match the triangles from a database image. If the fraction of matches are
above a given threshold, then this is a potential candidate for the search
result.

~~~
pippy360
>Why are we transforming the triangles to equilateral triangles?

This is so that the extracted image fragments/triangles are always very
similar and so have an extremely similar hash. If I didn't transform the
triangles then, for example, a fragment of a query image might be stretched
compared with the matching fragment in the database. Intuitively it makes
sense that it would be easier to match image fragments if they have the same
scale/aren't stretched versions of each other. The transformation to
equilateral triangles is what makes the algorithm 2D affine-invariant.

>what if you use ASIFT (Affine SIFT) which is Affine transformation friendly

I only discovered ASIFT while reading this thread, it looks very interesting
and useful but I'll need to look into it more. I did test out SIFT as a
keypoint finder and it performed really poorly for what I wanted to do but I
didn't do a lot of testing. My current method of finding keypoints is a load
of crap that I threw together for this demo, a better one would seriously
improve the accuracy and speed.

>like if too many keypoints are within some 2D window/I propose that let us do
Delaunay Triangulation

I looked into both Delaunay Triangulation and a 2D window as a way to decrease
the number of proposals/triangles generated, but had problems with both. I
will still try later to implement a 2D window. As for Delaunay Triangulation,
if you play around with it on paper you can see that it's not 2D affine
invariant, it might still work well for small transformations but there are
affine transformations where it fails. Still I think it would be worth looking
into more.

~~~
activatedgeek
Your observation regarding the triangulation makes sense. Might end up failing
under most cases.

------
ansgri
Good in its simplicity. Basically, they construct not-quite-but-local
hyperfeatures using edge triangles and then compute a descriptor (perceptual
hash) for all these hyperfeatures. In this way, you get many features on
different scales allowing for retrieving composed images, and, when comparing,
use the most informative fragment available.

The big problem is, a moderately complex image can yield n³ triangles, so you
have to limit your feature space in a way that would have consistent behavior
across transformed versions of the image.

~~~
pippy360
>The big problem is, a moderately complex image can yield n³ triangles

This is a big problem I have been facing. You will notice that all the images
I use in my c++ demos are small for this reason. I'm trying to solve this
scaling issue but I wasn't able to get a solution quickly so I decided to
release this and continue to work on it and hopefully release an improved
version at some point (or let other people work on it and hopefully find
solutions).

~~~
bobosha
Could you not use feature hashing to reduce to something manageable?

~~~
pippy360
possibly, I'm not sure what implementation you have in mind?

I find it's the combination of a poor keypoint finder, noise and needing to be
able to match partial images is the problem. If I could guarantee that 99% of
the keypoints in the query image would be in the matching image and that the
extracted hashes wouldn't be affected (hugely) by noise and that most of the
query image was present in the matching image then I could simply discard a
huge amount of the triangles and only query for a manageable number of them.
Obviously that's a lot to ask but the better I can make each part the more
options I will have for culling the hashes. For the moment I need to query as
many fragments as possible to have a good chance of finding the matching
image.

------
zfedoran
Awesome work! Thanks for sharing this!

Reminds me of nova.astrometry.net and PixInsight (and others), which are able
to determine which stars are in an image, regardless of transforms.

Relevant paper (outlines an approach using triangle space):
[https://hal.inria.fr/inria-00548539/document](https://hal.inria.fr/inria-00548539/document)

Relevant section: "Provided the reduced object lists, our next step is to
construct all possible triangles. From the list of n objects we can construct
n(n − 1)(n − 2)/6 triangles. Those will be represented in so called "triangle
space". We need to choose a triangle representation that will let us find
similar triangles, that is to find corresponding object triplets being
insensitive to translation, rotation, scaling and flipping. A triangle in the
triangle space could be represented as a two-dimensional point (x, y) where x
= a/b, y = b/c.

a, b and c are the lengths of triangle sides in decreasing order. Similar
triangles will be located close to each other in the triangle space."

------
onion2k
This is really impressive. What I particularly like is how the demo
demonstrates the usefulness of the algorithm - an article about it would be
interesting, but after I cropped, scaled, rotated and moved the dog's head,
and it was still found, it was just so _obviously_ useful. For example, if
someone did the same thing with a photo of a person's face you could find the
original image it came from.

------
jboggan
This is great, I was toying with this kind of image recognition a year or two
ago and I am impressed with this.

Have you considered doing other common transformations like various Instagram
filters and compression loss? If you had those abilities combined with the
current algorithm you could have the kernel of a photo copyright protection
system . . .

------
deevnull
That is really gorgeous, do you have anything quick written up about how your
hashes are stored? I spent far too much of a few months optimizing phash to
hell (I can do fuzzy lookups in a table of ~17 million images in constant
time)and I'd love to play with your math a little bit

------
tw1010
Really cool! I love how more and more mathematical ideas (like translation
invariance) are seeping into the programming community. I suspect that this
will have a bigger and longer lasting impact on software engineering as a
result of the current machine learning hype, than will any particular
technology that it may produce.

~~~
ansgri
These invariants are standard practice in computer vision. In fact, you won't
go anywhere without translation and brightness invariance, more often you need
also scale (uniform) and rotation invariance, and limited projective transform
tolerance.

There are various ways to deal with this in ML (not really an expert in ML),
but AFAIU in most cases you get candidate transforms via model-based methods,
and then make the decision using ML-based methods trained to allow for limited
transformations (convolution nets are good at that).

~~~
lscharen
I've been out of the Machine Vision space for a while, so my knowledge is
somewhat out of date.

What research is out there on general Affine invariant vision algorithms and
techniques?

For context, my practical experience ended in the mid 2000's when "jittering"
was a bit of thing along with some occasional closed form estimators based on
basic linearization approximations.

~~~
activatedgeek
ASIFT or Affine SIFT is one of the Affine invariant versions of SIFT. Take a
look here -
[http://www.cmap.polytechnique.fr/~yu/research/ASIFT/demo.htm...](http://www.cmap.polytechnique.fr/~yu/research/ASIFT/demo.html).

~~~
ansgri
It's actually perspective transform invariant, which is a more general class.
The name is misleading.

SIFT and its multiple descendants, yes, and also things like Random Ferns,
google "Boosted Random Ferns for Object Detection" (it's damn difficult to get
a clean link to pdf from google...)

------
osrec
Very impressive indeed. I was wondering, how did you determine your criteria
for matching (i.e. the hash distance at which two triangles are considered
different)? Also, could this be extended to colour transformations as well
(perhaps by using a wavelet transform to extract phase for comparison, rather
than amplitude)?

~~~
pippy360
>I was wondering, how did you determine your criteria for matching

Yeah, the query image is broken into a number of fragments and then the hash
of each fragment/triangle is checked against the hashes stored in the database
using a nearest neighbor search (well you find any neighbors within a certain
distance/threshold, I found a hamming distance of 8 to be a pretty good
threshold).

Currently only one fragment of the query image needs to match a database image
for the images to be considered a match. You could also put some threshold on
the number of fragments that need to match but this is only a proof of concept
I haven't had time to find the perfect combination yet.

>could this be extended to colour transformations

Yeah, actually after you have re-transformed the triangles to be equilateral
you can apply a lot of other techniques to future improve the search. You can
even use NON-affine transformation partial image matching on the triangles to
match fragments that only partially match fragments stored in the database.

------
rosenbus
When you say this is a reverse image search, does it search over multiple
images to find the one which matches the closest? In the demo it seems to only
search over a single image to find the cropped, stretched or rotated part of
the image. Do you also have examples of negative searches where it doesn't
find the image in question?

------
mendeza
This is awesome! Do you have any resources to dive into reverse image search?
I am not familiar with the standard methods and I am looking for more than the
pyimagesearch's tutorial of using histogram matching, I am interested in
learning what the standard is and any tutorials/implementations

------
rcarmo
This is pretty neat. I've been using pHash to find duplicate photos across my
personal library, but this seems significantly better. I'd like to wrap it as
a Python library - what are the bits that need improving?

~~~
pippy360
Unfortunately it needs a huge amount of work to be production ready. I'd love
to see it wrapped up in python though and I'll be continuing to work on it.
The main issue you will run into is that the number of triangles computed from
the keypoints is n^3 (where n is the number of keypoints), So currently it
will take far too long to match large images. I have a few ideas on how to
solve this but none of them have lead to quick and easy solutions...yet

I'm currently working on an improved method for getting 2D affine invariant
keypoints because the current one just barely works. After that I will be
working on handling the scaling issue.

------
Sean1708
I'm amazed at how simple this is! Making the triangles equilateral is one of
those things which seems so simple in hindsight but I doubt it was easy to
think of, were you inspired by anything in particular?

------
sova
Beautifully done! Affine transformations! Gotta love that Jacobian, baby.

