

How We Built a Cutting-Edge Color Search App - mattangriffel
http://bits.shutterstock.com/2014/01/30/how-we-built-a-cutting-edge-color-search-app/

======
joshdotsmith
This is really awesome. For anyone who wants a naive, poor man's
implementation of this, I have an unfinished Ruby gem that's a good starting
point for you:
[https://github.com/JoshSmith/kaleidoscope](https://github.com/JoshSmith/kaleidoscope)

Here's how I got it to work (cribbed from my own README):

TL;DR: I used k-means clustering to segment a database of images into color
bins for quick searching.

Using imagemagick, I ran histograms on images and converted their top n most
frequent colors into L _a_ b* color space for an approximate representation of
human vision.

Colors were then matched to a user-defined set of colors using Euclidean
distance, i.e. a "bin". I could choose any array of RGB values of arbitrary
length.

I then stored hexadecimal values of the image's original color and the matched
color, along with the frequency of that color within the image (for sorting
based on frequency) and the Euclidean distance (for sorting by tolerance).

Then finding images close to a certain color was as simple as
Photo.all.with_color('#993399') and order by frequency and Euclidean distance.
Here's a photo of the results: [https://github-
camo.global.ssl.fastly.net/89cc87ac84cd3a1d12...](https://github-
camo.global.ssl.fastly.net/89cc87ac84cd3a1d1223e8f9d560e65eb8447ef6/687474703a2f2f636c2e6c792f696d6167652f336e3243313631373069306b2f53637265656e25323053686f74253230323031332d30322d30352532306174253230362e35362e3434253230504d2e706e67)

I might spend some time reverse-engineering Shutterstock's implementation,
since it sounds way better than mine and clearly works at scale. But for my
purposes, my own implementation worked just fine.

If you want help implementing it, feel free to reach out to me!

~~~
kevinh
clbecker, your account has been shadowbanned so your replies won't show up on
any posts. This is very unfortunate because it seems like you're the origin of
this post. So, make a new account or something.

Everyone else: If you want to see his comments, turn on the showdead option in
your profile.

------
darsham
Reminds me of TinEye's multicolr demo [0] that searches through CC-licensed
images on Flickr. Their multiple color feature was really nice (however they
don't have shuttershock's keyword filtering.)

I wonder if anyone ever bought TinEye's color-search-engine-as-a-service [1].
The as-a-service model seems really awkward for something that requires so
much integration, and this new shuttershock feature (developed from the ground
up) seems to confirm this.

[0] [http://labs.tineye.com/multicolr](http://labs.tineye.com/multicolr)

[1]
[http://services.tineye.com/MulticolorEngine](http://services.tineye.com/MulticolorEngine)

~~~
strebler
I'm really interested in that question too, did anyone buy TinEye's color
search service? I hope so.

Personally, I think the TinEye color results are better than Shutterstock's
approach....although having meta-data alongside is definitely a must.

------
basseq
I think the next step for them is to cluster by similar images. A green hue,
for instance, shows a lot of similar-looking pictures of leaves—better to show
one picture and have a "show similar images" feature to dig into a finer level
of variance.

V. cool, though.

~~~
strebler
That'd be a good next post, color retrieval using solr.

------
frik
Airliners.net had something similar since 2005.

[http://www.airliners.net/similarity/](http://www.airliners.net/similarity/)

It was on Slashdot back then:
[http://tech.slashdot.org/story/05/05/04/2239224/searching-
by...](http://tech.slashdot.org/story/05/05/04/2239224/searching-by-image-
instead-of-keywords)

------
clbecker
Thanks to the Hacker News Gods, my account is no longer blocked on here. I'm
happy to answer any questions about the original blog post.

------
kfk
Is somebody thinking on applying this to a different stock photo model? I
mean, I have heard shutterstock takes a big cut and pays very little to
photographers. Nobody out there with these innovative ideas but with a
business plan more friendly for photographers?

------
carlob
The three sliders here correspond to averages of the three LCH channels. Has
anyone thought of looking into the second moments of those? As in variances
and covariances in order to get high contrast in luminance or high contrast in
hue…

------
frik
"the prototype had over 20 sliders to control all the visual attributes"

I cannot read the slider labels, the screenshot is very low-res :(

~~~
clbecker
Sorry about that! The labels on the actual prototype were a bit vague and
undescriptive anyway - basically most of the sliders represented various
statistics taken over the histograms for each dimension in the LCH colorspace
(mean, median, stddev, etc), and then there were a few magical sliders that,
to this day, I think only one engineer around here knew how they worked...

------
abvdasker
This is incredibly cool. Shutterstock has done a great job in creating a
beautiful product to address a fun problem. Kudos.

------
bpphillips
Neat! It'd be nice to see some more discernible screenshots of the prototypes
to compare with the final version.

------
iterable
this is dope. Shutterstock is bomb

