
Using imagemagick, awk and kmeans to find dominant colors in images - rubikscube
http://javier.io/blog/en/2015/09/30/using-imagemagick-and-kmeans-to-find-dominant-colors-in-images.html
======
neya
This is much easier using the LAB color space. I wrote an algorithm using the
LAB color space several years ago to achieve something similar:

[http://i.imgur.com/M6Oo6dp.jpg](http://i.imgur.com/M6Oo6dp.jpg)

Will open source it soon, perhaps.

------
bootload
There is an improvement that can be made here. There is no attempt to find the
brightest colour. If you look at the dark image, the brightest colours (red)
stand out over and above the dark colours. Using this knowledge is an
illustration technique. Notice how the red stands out on one image, a small
blue line on the other?

This particular algorythm and others that I've seen don't take this into
account. Depending on the image, you could also take into account the
lightness and darkness to find the dominant colour however small amounts in
the image.

~~~
minikomi
Hmm. I wonder if converting to HSV first would change the results.

~~~
jacobolus
HSV is a terrible color model for any human purpose. (To be honest, for any
purpose whatsoever, unless it’s required for backwards-compatibility with
legacy systems.) HSV, like the RGB model it is a trivial derivative of, has
dimensions which are not closely correlated with any color attributes relevant
to human perception.

Instead, use a model such as CIECAM02, IPT, CIELAB (from the 70s but not too
bad), or Munsell (basically a big lookup table, from experiments done in the
40s).

~~~
ThomPete
I am not sure what you are getting at here.

The models you propose are not what most if any designers use as the purpose
is not to manually select best color combinations (there are other ways to do
that) but to approach colors a little more structurally.

Whether you calibrate your perception to understand one model or the other
doesn't seem to be relevant.

I am glad to be taught something I didn't know, but for now it sounds more
like a theoretical claim than an actual useful one.

~~~
theoh
I think the parent was referring to the lack of properties like perceptual
uniformity in RGB or HSV colour spaces. Check out the "Advantages" section on
this Wikipedia page:
[https://en.m.wikipedia.org/wiki/Lab_color_space](https://en.m.wikipedia.org/wiki/Lab_color_space)

~~~
ThomPete
You but in this context it seems wrong, but maybe thats just me.

------
andrewgleave
I wrote a small Go tool[1] that extracts a color palette from an image using
either median or mean quantizer.

The quantizer code[2] is well documented and worth reading if you're looking
at doing this in Go.

[1] [https://github.com/andrewgleave/color-
extract](https://github.com/andrewgleave/color-extract) [2]
[https://github.com/soniakeys/quant](https://github.com/soniakeys/quant)

------
w00kie
I was expecting a step by step decomposition of the process. I'm a bit
disappointed.

------
dxbydt
To find k dominant colors using kmeans and then replace all the colors with
their closest dominant color, try this -
[http://github.com/krishnanraman/colorquantization](http://github.com/krishnanraman/colorquantization)

50 lines of Scala with lots of comments. I threw in some example images as
well.

------
brute
Does the while (1) {...} loop always terminate? I am not entirely sure about
it and would like to hear some opinions. What if the first guess is already
the best possible solution? Could there be any pictures that cause trouble?
I.e. ones that consist of 4 (equally spaced) colors only.

------
madsravn
I read the same article as the guy and was also discouraged by using PIL.
Python 2 vs Python 3 just makes life hard for people not used to Python. Maybe
someone should do something about that.

I just went ahead and implemented the code in C++ instead.

~~~
creshal
It's mostly a solved problem nowadays – Pillow exists as much better and
Python3-compatible PIL fork.

~~~
madsravn
I wans't talking about just PIL - I was talking about Python 2 vs Python 3 in
general.

EDIT: And I'm not talking about development only. I'm talking about download
small scripts and being able to run them seamlessly as well.

~~~
JupiterMoon
Just pick one and use it. If it is a long project pick python 3. If you
actually use python you'll barely notice the difference.

EDIT If the script uses a well formed #! this isn't too much of a problem
either.

~~~
creshal
> EDIT If the script uses a well formed #! this isn't too much of a problem
> either.

He does have a point, though: Dependency management in Python is a pain in the
ass. Virtualenvs aren't really a solution, distribution packages are usually
horribly outdated, …

------
jlhonora
Couldn't this be achieved with imagemagick's builtin blur + histogram?

~~~
e12e
Don't know if this is the best way to do it, but reading a bit[1] as well as
looking at comments above about color spaces, I came up with:

    
    
      #shell:
      for space in sRGB RGB HSV LAB
      do
        # I think it should be possible to do this without writing
        # tiff-images to disk in-between -- but having a look at the
        # resulting images next to the original is actually quite nice
        # gives some idea of the differences colorspace makes:
        convert akira_800x800.jpg -quantize $space +dither \
               -colors 4 akira_lab_$space.tiff
    
        echo "Histogram in $space colorspace:"
    
        convert akira_lab_$space.tiff -format %c histogram:info:-
    
        echo
      done
    

Output:

    
    
      Histogram in sRGB colorspace:
      123498: ( 5811, 4253, 6632) #16B3109D19E8 srgb(8.86702%,6.48966%,10.1198%)
      110248: (17780,16101,19392) #45743EE54BC0 srgb(27.1305%,24.5686%,29.5903%)
       47520: (34608,35255,34146) #873089B78562 srgb(52.8084%,53.7957%,52.1035%)
       47534: (47890,19471,10567) #BB124C0F2947 srgb(73.0755%,29.7108%,16.1242%)
    
      Histogram in RGB colorspace:
      182611: ( 45, 40, 51) #2D2833 srgb(45,40,51)
       76221: ( 93, 86,100) #5D5664 srgb(93,86,100)
       36583: (139,119,124) #8B777C srgb(139,119,124)
       33385: (218,125, 74) #DA7D4A srgb(218,125,74)
    
      Histogram in HSV colorspace:
       94957: ( 30, 13, 51) #1E0D33 srgb(30,13,51)
      128709: ( 56, 52, 80) #383450 srgb(56,52,80)
       37291: ( 60, 31, 10) #3C1F0A srgb(60,31,10)
       67843: ( 78,155, 87) #4E9B57 srgb(78,155,87)
    
      Histogram in LAB colorspace:
      133802: ( 42, 41, 60) #2A293C srgb(42,41,60)
       74332: ( 51, 72, 87) #334857 srgb(51,72,87)
       42436: ( 88, 33, 27) #58211B srgb(88,33,27)
       78230: (151,117, 89) #977559 srgb(151,117,89)
    

In fact, it became a bit of an obsession, and the hack evolved a bit:

[https://gist.github.com/e12e/7990e56f48ceff5506d7](https://gist.github.com/e12e/7990e56f48ceff5506d7)

I'm not sure if it actually does the same thing as the script in the post -
but here's a sample output:

[http://htmlpreview.github.io/?https://gist.githubusercontent...](http://htmlpreview.github.io/?https://gist.githubusercontent.com/e12e/7990e56f48ceff5506d7/raw/d5d65b3ea49d09a9234b94c2e43d6263b6524cbf/index.html)

Maybe someone with more knowledge of imagemagick can improve on the pipeline
etc.

[ed: Just noticed that the sRGB output for "RGB" values is different enough
that my script doesn't consider them to be RGB values (well, they're not) - so
the line for sRGB is blank in the html. Still think it's interesting to see
the difference between RGB/HSV/LAB.]

[ed2: Changing the number of colors to 3, to better compare with op, op's
algorithm clearly chooses different colors. Not sure which is "best", but just
FYI]

[1]
[http://www.imagemagick.org/Usage/quantize/#extract](http://www.imagemagick.org/Usage/quantize/#extract)

