There is many really well working algorithms in this field. A google search with the major research conferences (ICCV, ICIP, SIGGRAPH etc) will give you the latest and the greates of these algorithms. You'll also find good (image segmentation) work if you limit the search for csail.mit.edu.
If you want your uploader to help you, you can also go for one of the supervised image segmentation algorithms. Otherwise you'll need an unsupervised algorithm.
Given that the authors also want to detect what is in the image, this might be helpful for them:
This guy is also doing some great work in that field:
The really crazy thing is that they seem to be reinventing the wheel so that they can lean on the optimized code a graphics library provides. I think that's flawed thinking to begin with. In my experience with graphics programming, a reasonably optimized direct implementation tends to beat the hell out of a nine-step filter chain, no matter how good your graphics library. Even if they used the same algorithm, a direct implementation could condense steps 2, 3, and 4 into a single convolution [edit: whoops, no you cant; Sobel has a non-convolution step].
But more importantly, they've tied their hands by limiting themselves to a tiny set of operations. Combing the computer vision literature would have been a better use of time than trying to chain filters together.
In fact, some background subtraction algorithms effectively do as an intermediate step what Lyst wanted as an end result.
Actually I disagree with you, even though what you say is correct in principle. On any given day, the vast majority of my workload revolves around working with other people’s code. So I may spend 7 hours trying to get a new library to compile or figure out why I’m getting an exception or how to get out of dependency hell and only 1 hour getting “real work done”.
For me, it’s exhausting to find a new library or API that loosely does what I need, only to find that I have to install a new language, new framework, new compiler, or even new package manager to use it. Developers have a tendency to copy other developers (even when the “normal” way of doing things is not ideal), so many libraries have no binary that I can test, and in fact no example of usage or up to date documentation.
Then there are subtleties with new libraries such as speed or memory usage. So perhaps a library that does exactly what you want runs at 1 frame every N seconds while the highly optimized function in a mainstream graphics library runs at many hundreds or even thousands of frames per second by utilizing concurrency or the graphics card.
So in fact when it’s all said and done, I tend to think more in transformations. I ask myself if I can express a solution slightly differently if it allows me to use an existing tool, then encapsulate it in a black box that has the same inputs and outputs as my ideal solution. Then my frustration is that other developers don’t seem to think this way. “Make one tool that does one thing well” has become the mantra and driven us into this fragmented ecosystem.
This is just an aside, but: The only truly general purpose language that has a syntax that doesn’t make me want to club myself over the head is probably MATLAB, but unfortunately their licenses are too expensive for me. So I have high hopes for Octave, and after that, maybe NumPy. So maybe one point we could agree on is that we shouldn’t need a graphics “library” in the first place. If we had a good mainstream concurrent language, then many of these algorithms become one paragraph code snippets and run with speed comparable to C or Java.
Edit: I wanted to give a concrete example. Low level languages like C are overly verbose in their implementations, by 10 or 100 times usually, because they try to leverage the wrong metaphors. Notice how compactly concepts like image compression can be expressed with the right ones:
(Tip: if you have trouble compiling something you found on Github, contact the author and offer $200 to walk you through the installation. Might save a lot of frustration)
Also, OpenCV includes functions for image segmentation. If they couldn't use that, it would have been nice for the article to at least touch on why.
I think that's being a bit unkind, it's simply clear that the people are not experienced with computer vision and/or image processing in general. They approached the problem from their domain and found the solution that worked for them. If I had gotten to the point that I wanted to do a domain-specific image segmentation heuristic I would certainly build it up from a series of filters and image morphology steps. If the performance (both accuracy and speed) was satisfactory, I don't know why you would invest in optimization at that point. Also, from experience I know that implementing algorithms from papers is a slow and many times painful process as the original authors generally don't publish reference code, and if they do it's probably in Matlab. And if they don't have someone adept at image processing their potential for success would likely be low.
I agree that a quick "Here's what we tried that's readily available" would have been good for others to learn from, because for many the general solutions would be sufficient.
A good algorithm would surely solve part of the problem, but given the imagery non-uniform patterns a great deal of resources (engineering skills + computing power) would likely be required. Given Lyst's recent VC rounds maybe they can afford that, but I surely cannot.
So I went for the poor-man option: https://gist.github.com/mvsantos/5554663
(I'm definitely bookmarking the link you suggested and also the one suggested by sjtgraham.)
And since it's the magnitude of the gradient that Sobel finds, Sobel(Invert(img)) is mathematically equivalent to Sobel(img). The invert step is essentially an expensive noop.
You need a couple of extra markings, but that's life on images with background colors close to the foreground colors.
(I'm one of the creators of clippingmagic.com)
I think it is called GrabCut - http://research.microsoft.com/en-us/um/cambridge/projects/vi...
The disadvantage compared to the parent is that it's a semi-supervised algorithm: it requires a bounding box to be drawn around the desired image. If you had a bunch of very similar images you could pre-generate the Gaussian Mixture Model that GrabCut expects and turn it into a supervised learning algorithm, but then you'd lose the major innovation of GrabCut compared to the graph cut methods discovered before it: you would not be able to re-run with the newest "best-guess" and be guaranteed that the energy would monotonically decrease. It would also fail spectacularly if it encountered something it hadn't seen before.
Any recommendations other than keeping Photoshop CS5 around just to use KillWhite ? (Why Adobe axed Pixel Bender is still just beyond me.)
MathMap for gimp looks promising to create pixel-filters, yet I haven't tried it yet. Being able to to this on commandline would be nice, but isn't a requirement.
Expected behavior in HSL color space would be anything with the same Hue and Saturation value (with optional adjustable thresholds) of a selected color gets assigned an alpha value of 1 - lightness. Where lightness of 1 being white and alpha of 1 being opaque.
KillWhite seems to do exactly this:
I guess by now, it's easiest for me to re-implement this as a filter for HTML5 2DCanvas.
Here is the comparison between KillWhite's method and gimp's color to alpha:
Note that it adds transparency pretty much everywhere on the apple and alters it too much. (Reflection on apple is gone after blending with new background)
Killwhite is identical to Gimp's "Color to Alpha" if you use white as source color.
In their example, it leaves a grey color background with a little alpha. They started with a brighter blue, and was dimmed by the their resulting image.
I'm guessing that in your example, you used the background grey as the input color, so it took the background to clear, then you used a color sample of Killwhite's background to use as the background of yours.
...but I may be wrong.
edit: I am playing with it, and if you use white as the color, then put the result on the white background, it's identical to the original., if you do the same with the grey color, same results. So "Color to Alpha" is the same, just allows you to pick a color.
If this  is the actual code it looks like it's operating in RGB color space.
Which explains why it extracts the lowest value color component as alpha from every pixel and not just from those with low saturation.
alpha1 = (1 - a1) / (c1);
a are the channels of the current pixel (a1, a2, a3)
Later the highest alpha (lowest transparency) is kept for the output.
Which results in the effect that all highlights, regardless of their saturation, are made transparent.
Meanwhile the core function of KillWhite is executed in HSV color space:
Alpha = 1 - (Value - Saturation)
if there is need for further discussion I would suggest switching to stackexchange.
or open something new
Adding to these methods, a 'holes' fill algorithm can be very helpful. This would proably be used in place of your Alpha Mask.
The process can be run multiple times with different parameters for better results. For example, doing edge detect->threshold->edge detect (with different parameters) can help remove some of the smaller artifacts you see in the images.
If you're able to have some manual input, simply clicking the object of interest or doing a rough outline can be a great addition to an algorithm, especially in avoiding the problem seen in Global Threshold with Complex Background.
When you zoom into an image its clear that the edges of objects can affect pixel intensities a few pixels away (blur). So you have to reverse that which can be tricky. Image people like Matlab so thats where the premade solution will be found
First off, they don't mention searching for an off the shelf solution to the problem. If OpenCV's built-in GrabCut or Watershed filter wouldn't work for them, they should explain why.
Secondly, they don't examine the literature for existing approaches to the problem. Sometimes you won't find what you're looking for in those approaches, but in that case, the problem with existing approaches will inform how you decide to tackle the problem.
Finally, they solve the problem by building a filter chain, but they don't seem to actually understand what the filters are doing. They say that the Sobel operator "looks for light to dark transitions". This is completely incorrect. The Sobel operator does nothing more than estimate the magnitude of the gradient of the image. Furthermore, based on this misunderstanding, they invert the image before applying the Sobel filter - a step which does literally nothing.
> it normally has moved more background than product
Should probably be:
> it normally has removed more background than product
I think you'd have to apply a fair amount of pre-processing anyway before you passed it over to the ANN.
We open sourced a pretty cool standalone machine in Java that addresses those issues about a week ago. Looking for feedback...
There are a lot of innovations in image processing wrt neural nets specifically. The right neural network can learn everything from scene detection to simple object recognition.
I would highly reccommend taking a look at the neural nets course on coursera to understand some of the use cases.
When researching for this you didn't find any good premade solution or were they simply too highly priced?
We used GraphicsMagick and the pgmagick package to integrate into our code base and because GraphicsMagick is crazy fast.