Hacker News new | past | comments | ask | show | jobs | submit login
Using python and k-means to find dominant colors in images (charlesleifer.com)
189 points by coleifer on Oct 24, 2012 | hide | past | web | favorite | 31 comments

Awesome, I used this technique to find the dominant color in favicons so I could create a gradient in NewsBlur.

See the effect here: http://cl.ly/KElb

Here's the Python I use to do it: https://github.com/samuelclay/NewsBlur/blob/master/utils/Ima...

    from PIL import Image
    import scipy
    import scipy.cluster
    from pprint import pprint

    image = Image.open('logo.png')

    # Convert image into array of values for each point.
    ar = scipy.misc.fromimage(image)
    shape = ar.shape

    # Reshape array of values to merge color bands.
    if len(shape) > 2:
        ar = ar.reshape(scipy.product(shape[:2]), shape[2])

    # Get NUM_CLUSTERS worth of centroids.
    codes, _ = scipy.cluster.vq.kmeans(ar, NUM_CLUSTERS)

    # Pare centroids, removing blacks and whites and shades of really dark and really light.
    original_codes = codes
    for low, hi in [(60, 200), (35, 230), (10, 250)]:
        codes = scipy.array([code for code in codes 
                             if not ((code[0] < low and code[1] < low and code[2] < low) or
                                     (code[0] > hi and code[1] > hi and code[2] > hi))])
        if not len(codes): codes = original_codes
        else: break

    # Assign codes (vector quantization). Each vector is compared to the centroids
    # and assigned the nearest one.
    vecs, _ = scipy.cluster.vq.vq(ar, codes)

    # Count occurences of each clustered vector.
    counts, bins = scipy.histogram(vecs, len(codes))

    # Show colors for each code in its hex value.
    colors = [''.join(chr(c) for c in code).encode('hex') for code in codes]
    total = scipy.sum(counts)
    color_dist = dict(zip(colors, [count/float(total) for count in counts]))

    # Find the most frequent color, based on the counts.
    index_max = scipy.argmax(counts)
    peak = codes[index_max]
    color = ''.join(chr(c) for c in peak).encode('hex')

I like this solution much better, here is why:

k-means is not a mode seeking algorithm, I think. You are clustering your color space, but you're not even guaranteed to end up with colors that are very close to those in your image. With a high k you're getting actual colors in the image, but they're not really dominant anymore.

What about mean shift? It's based on one method of non-parametric density estimation. Another method is this: Some sort of histogramming, which is another method of non-parametric density estimation.

Also other colors spaces will pay off immensely.

I love subtle flairs like this. It's something that a lot of people would only notice if removed and I think that's an indicator of good design.

Just wanted to say thanks for mentioning my book! I'm so happy that people are still getting value from it.

>I'm so happy that people are still getting value from it.

I took a look at it a few weeks ago based on a recommendation here on HN and have been enjoying it ever since.

I'm in the process of trying to extend some very basic programming skills and refresh / extend equally basic math.

I started reading expecting something that would assume enough to fly right over in my head. Instead, I found a comfortable pace and very "practical" examples.

Every few pages is a delightful "Ohhhh...So that's how that works." demonstration of things that I've always wondered and theorized about, but never understood the mechanics of.

It's a great book, one that I expect will end up being pretty significant to me.

I had bought the book a few months ago for work in a big batch, thinking ahead to the time I would need it. That time came a day ago when I needed to do clustering of markers for displaying on a map. A stackoverflow post mentioned the book, so I pulled it out and was wowed! Great stuff so far.

I like that the code is in python, which I find easier to read than the java examples in Algorithms of the Intelligent Web. Being a perl guy (and a perl shop), python feels pretty similar and it's easy to port anything I need. If the rest of the book is as good as the clustering chapter, it's going to be a fun treat for my brain!

just wanted to say that your book is what got me interested in data science and python. I would be in a very different place today if it wasn't for your book! So, thank you!

/metoo! devoured it and learned a lot. I recently dumped a bunch of technical books, but that is one that I kept and plan on keeping for a long time. It's got a timeless nature to it as a "working-reference" compared to only theoretical descriptions or any one specific implementation in "real-world" code.

It's a fantastic book! I read it years ago (I think shortly after it came out), and I still browse it from time to time. I really learned a lot reading it.

I thought I'd share some good links that came up in the reddit discussion:

Color theme generator for Xresources and more: https://gist.github.com/3946121

Reference to similar features in scipy: http://docs.scipy.org/doc/scipy/reference/cluster.vq.html

Old-school computer graphics w/kmeans: http://pragprog.com/magazines/2011-12/revisiting-graphics-ha...

There was also some discussion on using different color spaces than RGB to get better results, lab*, HSL, YUV, etc. And recommending using something like Numpy to make this shit hum.

Nice. I'm not sure though that the geometric distance between two RGB vectors is always a good measure of their perceived similarity. Converting to a device independent space and using a different metric could improve the results. See the following links for standard metrics:



I'm guessing this is a solved-many-times problem :) Here is my own version of the same thing:

http://tylerneylon.com/a/imghist/ (and source) https://github.com/tylerneylon/imghist/blob/master/imghist.p...

That also includes color histograms.

Thoughts on actually using this:

* If a human is using the color output, it's fun to weigh the colors by cluster sizes. * To find human-perspective dominant colors, it helps to throw out very-light/dark pixels. * Dribbble does a very good job of this - look at any of their individual photo pages. * For production use, C would be much more appropriate at solving this problem (but python is more fun to use).

Another cool application of color analysis like this:


A lot depends on what you are trying to achieve. In my case, for example, I wanted to group images that may have perceptually similar colours either as a dominant component or as an accent (say, a red shirt and a white shirt with red stitching).

To expand on your points, based on my own experience:

* Rather than using the centroid to represent the cluster, pick a representative peak from the hue histogram. This tends to make things less muddy.

* Not only do I throw away extreme S/L values, I aggressively weigh everything by S * (0.5 - |0.5 - L|), so bright, saturated colours dominate. Black and white are usually very thin slivers, and I almost never have grey.

* As a last step, I convert to Lab space and merge any colours that are perceptually similar (dE < 5.0). This means that large gradients (like an unevenly lit surface) only occupy one sliver of the pie. Unfortunately, in my current implementation this conflicts with my first point above, so the colours I end up with aren't always in the original image -- something I need to fix.

Here's my result for one of your images: https://dl.dropbox.com/s/azl2bag84riugg8/imghist.png, notice how the orange has a disproportionally large weight because it is so saturated.

There are a few things you should take into account:

1. You determine the 'dominant' colours to be the centroids of your clusters. The centroid is the mean of the points within the cluster, this mean is not necessarily a colour that is in your image. If you, for example, take a picture divided into four different solid coloured squares, and use this to find the 3 dominant colours it will average 2 (or more) colours. (The same might happen for more complex images with a a lot of contrast).

2. When randomly initializing k-means there is a good chance you'll find one of the local optima, so running it more than once will return different colours. In general it is good practice to run it several times and choose the outcome with the lowest cost.

3. K-means can take a long time to converge; limit the amount of iterations it can do.

These things aside, very cool usage of k-means on image data!

IRT #1 good point. Doing a "quick" nearest neighbor to the centroids would help with that I'd bet.

Cool stuff... Andrew Ng's excellent Machine Learning course on Coursera also had a programming exercise that involved using k-means to reduce an image's palette (in Octave, though, rather than Python), so if you find this interesting you might consider signing up for his course the next time it comes around:


I did this exact technique as a hobby project.

Your next assignment: find similarly coloured photos using the right data structure. See my stack overflow ticket http://stackoverflow.com/questions/10555511/need-a-proper-da...

Google Chrome implements a similar thing for picking colors to go with the thumbnails of your most visited websites: http://www.quora.com/Google-Chrome/How-does-Chrome-pick-the-...

K-means clustering is super cool. Myself and a friend used it a few months ago to filter moving things out of webcam video in JavaScript. It takes samples of video and puts the color of each pixel across all the recent samples into clusters, and discards the smaller cluster (for each pixel) because it was probably some object (like a person) moving across the frame without sticking around.

It’s not super performant, but the end result is awesome — after a few seconds, you and up with a picture what your webcam sees minus the moving things. For the curious:


(I haven’t touched the code in a while, save for fixing a bug just now)

I made my own implementation of this a while ago as well. One thing to keep in mind though is that the human eye doesn't perceive all colors uniformly. This means you can't treat the R,G,B spectrum as a 3-dimensional space, as the distance between two points in this space doesn't take the characteristics of the human eye into account.

The solution to this is to convert your RGB colors to the CIELab color space (http://en.wikipedia.org/wiki/Lab_color_space) and use the Lab values for your 3-dimensional space.

Cool, I did something like this for an e-commerce store so that I could programmatically sort new products into color bins. Scipy handled the kmeans stuff for me though.

One problem I ran into but never solved was ignoring the background color. For example in the second picture it might be more interesting to bring out the oranges of the tail-lights and the light-blues of the street lights, rather than just the dark blues of the roads. You could ignore the largest cluster, but that's not always necessarily the background color. Have you thought about this at all?

I myself did not think of this, but a thoughtful commenter on my site suggested that the cluster furthest from the center might be a good indicator of the background color, and the cluster nearest the center the foreground.

I did this and have a set with either white or close to white backgrounds so I filtered them out. Depending on your images you could also sample some points near each corner to determine which cluster is likely background.

I think this would work well for a lot of things, but I forgot to mention one strange caveat - there were a lot of bracelet/necklace type items, where the important colors might actually be at the corners and not in the center.

Just a thought: If it's a photograph, you could use the areas of highest contrast to identify and outline the shape of the subject - assuming the photograph was focused. This then becomes the working area for your color selection.

Either that, or be sure that the background is consistent every time you take the picture for sorting into the color bins. It may be cheaper than trying to create a catch-all solution.

It's funny that he uses Akira as an example, a movie that famously went over budget because it needed a much larger colour pallet than usual (for the night scenes).

A fun project might be to use a dominant color-finding algorithm on each frame of a movie in order to create a color time series (then do it on a corpus of movies). Then you could attempt to create a "Shazam for movies" that could be used to identify movies just by pointing your iPhone at a wall (in a dark room). That would hopefully use the changing brightness of the movies to read a time series and allow you to match it up. Easier said than done of course :)

Anyone know of a similar style article going over determining an optimal k value? I remember reading some papers on this in my honors days but when I was doing something similar recently I just used a max distance to determine whether something should join a current cluster or form a new one.

There are density based clustering algorithms which I believe help with this.

Here's a recent blog post of a very elegant k-means implementation in cloud haskell. It's part of the effort to bring an erlang-like, CSP style, DSL-based message passing to haskell.


very nice code, thanks for sharing!

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact