
How to implement search-by-color when all you have is a good coffee - helloiloveyou
https://www.mikealche.com/software-development/how-to-implement-search-by-color-when-all-you-have-is-a-good-coffee
======
bobbyi_settv
Using the distance of the colors in a Euclidian space makes sense, but RGB
isn't the best space to use.

You probably want something like CIELAB which is (quoting from Wikipedia)
"designed so that the same amount of numerical change... corresponds to
roughly the same amount of visually perceived change."

[https://en.m.wikipedia.org/wiki/CIELAB_color_space](https://en.m.wikipedia.org/wiki/CIELAB_color_space)

~~~
mkagenius
Interesting. Perhaps this information is not available with the images? or are
they.

The conversion between RGB and CIELAB is almost meaningless (according to
wikipedia) or perhaps there is a still a way to do that and helpful?

1\.
[https://en.wikipedia.org/wiki/Color_space#Absolute_color_spa...](https://en.wikipedia.org/wiki/Color_space#Absolute_color_space)
: However, in general, converting between two non-absolute color spaces (for
example, RGB to CMYK) or between absolute and non-absolute color spaces (for
example, RGB to L _a_ b*) is almost a meaningless concept.

~~~
tveita
Image formats like JPEG or PNG typically have a color profile embedded in the
metadata, or if they don't you can assume sRGB.

------
rozab
This seems like a really fun problem and I think you found the solution with
the best cost/effort ratio. The first thing that sprung to mind on seeing the
problem for me was to use the fact that products often come in several colour
variations but use very similar images for each. If you mask out the parts of
the image that are similar you can get the single 'actual' colour as it might
be labeled by a human. This would also allow you to search for neutral tones
like grey and black.

And seeing as you've gone to all the trouble of figuring out how to grab
dominant colours from images, why not allow users to upload an image of their
logo and grab the dominant colour from that? This could help avoid the problem
of having noticably different shades, without making the user do anything
technical like searching with an RGB value and a threshold.

------
julian37
You'll get better results using a perceptually uniform color space [1]

CIE76 is not ideal but would let you continue to use CUBE.

You could consider doing a first pass with CIE76 and then sort by something
more accurate in a second pass.

[1]
[https://en.wikipedia.org/wiki/Color_difference](https://en.wikipedia.org/wiki/Color_difference)

------
aasasd
> _I just took the euclidean distance between the given color and (0,255,0)_

If you used HSL, you possibly could match on the hue while being more lax on
the brightness and saturation—so a pastel-pink color could still match to a
red image. Not sure if the database can use different weights for positions in
a vector (though with pre-processing S and L could be collapsed to narrower
ranges, so they match more freely afterwards). Just need to make sure that
white and black aren't matched to random hue values.

Though, I guess if you use only the distance to fixed chosen high-saturation
values then it's about the same thing.

Except it seems that with just RGB distance, a semi-saturated cyan or yellow
color might be counted as closer to ‘pure green’ than dark-green. Something
like (0,255,180) vs (0,70,0).

Also:

> _values( '(${ color })'_

2020 and still no placeholders.

~~~
ajnin
I'm not sure HSL would be better in that case. For unsaturated colors (black
to grey to white) HSL has many points that have perceptually the same color
but different HSL values. If S is zero, then you have the same color for any
value of H so you might get a large distance for two points that are actually
white for example. At least when using Cartesian coordinates like he is here.

~~~
aasasd
Yes, as I noted:

> _Just need to make sure that white and black aren 't matched to random hue
> values._

Colors with too small saturation can just be excluded from the search, since
gray/white/black go with anything.

------
Angostura
I'm only marginally technical, but I would like to congratulate the author for
a great example of how to explain the issues, the sequence of problem solving
and the solutions so clearly and entertainly. Very good writing.

~~~
helloiloveyou
Thank you very much!! I rewrote the article a couple of times

------
kinkrtyavimoodh
This might be a dumb question, but what does coffee have to do with this task?

~~~
ctack
It's trying to say that you don't need a lot of time to do this.

I literally skimmed the article while drinking a coffee.

------
LoSboccacc
the palette approach is a little reductive. we've built a solution in this
space, a more robust approach would be to get the dominant colour, and from
that colour brightness and tone extract the accent colour and the contrast
colour by considering both distance from the dominant and extent on the image,
that solves the problem of having a weighted palette ignoring, say, a red
stitching because it's too small

~~~
ComodoHacker
>extract the accent colour and the contrast colour

That wouldn't work for products that have poorly chosen color (which aren't
rare) but have the right non-dominant color of customer's brand.

~~~
LoSboccacc
the opposite really, measuring both tone contrast and luminosity contrast from
the background average separately and extracting them is instead more robust
for goods that have patterns, i.e.
[https://www.seven.eu/it_it/seven/zaini.html](https://www.seven.eu/it_it/seven/zaini.html)

it also works both ways, so white elements having a color accent and a dark
contrast element still have them recognized for what they are

------
njudah
I wrote a tutorial a while back on how to do this in Postgres using the Google
Cloud Vision API in case its useful for anyone..

[https://medium.com/@adamgross_6978/getting-started-with-
json...](https://medium.com/@adamgross_6978/getting-started-with-json-and-
postgres-782cba93c706)

------
crawdog
I like the color pallet option you provided showing the primary colors. Adding
the colors as facet options would be interesting since you can align each of
the colors with the items, then maybe a numeric metric you can use to sort the
results by the percent coverage of that color.

Another item to consider is also taking into account how the user might search
- "red shoes", then classify the query as "color: red" & "shoes".

The next fun challenge is determining which image to display if you have a
variant product (color/size combination for clothing for example). Figuring
out which product image to show in the result set requires identifying the
primary color in the search.

------
jbmsf
Always happy to learn about more good ways to use PostgreSQL!

------
maurits
Mike Bostock (the d3 creator) has a nice little demo illustrating the use of
linear RGB to assign an average color to an image. [1]

[1] [https://observablehq.com/@mbostock/image-
average](https://observablehq.com/@mbostock/image-average)

------
thdrdt
Vibrant.js [1] also does this.

It is nice to look into the source code to see how it is done.

But in my experience it works around 80% of the time because there are so many
exceptions.

[1] [https://jariz.github.io/vibrant.js/](https://jariz.github.io/vibrant.js/)

------
snemvalts
There's a classic injection attack when interpolating in the SQL query

Also as others have mentioned RGB space and human perceptible color space is
different enough that distance in RGB doesn't equate to difference/similarity
between colors

~~~
helloiloveyou
No, it isn't. What is interpolated in the string comes from an object
containing predefined set of strings. I don't allow the user to type anything
they want. I didn't want to clutter the tutorial

------
tomcam
Impressive. I wasn't expecting that approach! The whole thing implemented as a
stored procedure in PosgresSQL a data type (cube) I didn't even know existed.
That was a wild ride for me.

------
kelvin0
Great article. However in my experience, using the Euclidean distance in RGB
space can bring about some really weird visual 'matches'. The best color space
I've found so far for this type of task is the L _A_ B color space.

And there are also other tricks of the trade. But for starters this simple
solution might be enough.

This solution also assumes the colors in the images are 'calibrated' and are
consistent. This is almost never the case, but may be overlooked in for the
given use case.

------
alphachloride
I don't believe coffee (good or bad) contributed to this article.

~~~
saagarjha
I wouldn’t be surprised if coffee _did_ contribute to this article.

------
yoava
Good article, and I like the implementation.

I am sure it has some drawbacks, like distance on the rgb space may not be the
best option, or that it does not ignore background color of it is not
transparent, still, I like the way of thinking

------
raldi
What does the title refer to?

------
lmilcin
I really like this title: "Can we implement something easier than a
Convolutional Neural Network?"

Nice to see people still remember problems can be solved with anything else
than, ehm, "AI".

------
ponker
I thought this was going to be about how coffee is a consistent color so you
can white balance your customers uploads by asking them to photograph coffee
in their ambient lighting.

~~~
helloiloveyou
OP here, this is amazing, i love it!

------
shalmanese
xkcd did a perceptual color survey a few years ago:
[https://blog.xkcd.com/2010/05/03/color-survey-
results/](https://blog.xkcd.com/2010/05/03/color-survey-results/)

It's population specific to the type of people who read xkcd but it's an
interesting starting point if you want higher resolution info on, eg: exactly
where people draw the line between "green" and "blue".

Raw data is here:
[http://xkcd.com/color/colorsurvey.tar.gz](http://xkcd.com/color/colorsurvey.tar.gz)

------
ComodoHacker
Nice write-up, thank you.

Nitpick. What's the point of writing a good guide with rich text and images,
but showing end result only in video? I have embedded YouTube videos disabled
by default, so it's a little embarrassing.

~~~
klodolph
Why is this _embarrassing?_ That is incomprehensible to me. You’ve disabled
embedded YouTube videos and should not be surprised when content is missing
from webpages. Video is the perfect choice for showing off the results, video
distribution is hard, and YouTube embedding makes this fairly easy.

