I eventually came up with a contrived set of heuristics to tackle this problem as you can see in the example below and managed to get more get accurate thresholding more than 90% of the times for pathological cases like these with the right set of weights. --- https://imgur.com/a/XMhdnjH
I would change there for something more standard.
...30 minutes later the horse is still running and I'm like 'wtf? what does a horse have to do with the DDG logo?' close tab. read comments...
It turns out the app doesn't handle svg (it is actually in the to do list) and returned a 500, but the failure was never presented to the user.
Massive kudos for delivering; I like it a lot :D
Please help yourself to my icon data - I spent a while collecting it and hope it can be useful to someone!
I would test on the MPEG-7 dataset to begin with and once the precision and recall values are good enough go ahead with testing on logos and icons. I must've manually tested the algorithm more than a 100,000 times probably because that was the only way to do with untagged datasets. Quite tedious indeed. This version gives out pretty decent results about 7-8 out of 10 times I'd say.
 - http://www.dabi.temple.edu/~shape/MPEG7/dataset.html
This is a great starting point in case you are interested in knowing more -- http://www.staff.science.uu.nl/~kreve101/asci/smi2001.pdf More recently, I've been exploring some ideas of Tversky.
The first one (the arrow into the door) seems to have worked well for the first three 'similar icons'.
The second one (remove user) didn't work at all. Maybe because it is circled.
In both cases, half the similar icons are 'download' icons, and I can kind of see why for the first case but not at all for the second case.
A closer inspection of the results actually shows some of the results aren't that bad a match. Results ordered 1, 4, 5, 7 and 7 in particular vaguely have the same outline as that of the query image. If I have to score this result, I wouldn't give it more than a 3 out of 10 for sure.
I've re-tried the "remove user" one but uploaded an SVG instead of a PNG (so technically the resolution is unlimited). Uploaded it both circled and not circled.
Here are the results: https://imgur.com/a/OT8Spjt
I actually uploaded SVGs so I think you might already (unintentionally) support SVGs?
I spent a lot of time working on a hack for 'normalizing' white on black and black on white backgrounds and also between choosing adaptive vs gaussian thresholds dynamically during preprocessing.
Here is an example with white on black and black on white variations of Nike logo that works as intended.
Those seem to be actual download buttons where you can download the found similar icon.
PS: You can explore company logos here http://compute.vision/brands/index.html . It's implemented using an older iteration of the algorithm and performance isn't that great compared to the one used with the icons database.
Speaking of Telugu, I recently got hold of a treasure trove(about 700GB) of scanned copies of Telugu magazines and newspapers some of them as old as 1880! Gonna upload them on archive.org very soon.