Hacker News new | past | comments | ask | show | jobs | submit login
Image Processing 101 (recurse.com)
368 points by abecedarius on Mar 11, 2016 | hide | past | web | favorite | 41 comments

Slightly related, but I had a small epiphany when taking a class on DSP on Coursera. The kernel that is used to blur an image and that which is used to remove the treble/high frequencies from an audio sample are identical, except one is in 2 dimensions and the other is in 1. And this makes perfect sense! A low pass filter removes high frequencies, and sharp edges are high frequencies in the 2D plane.

TFA only mentions Gaussian blur, but a Gaussian blur is just a moving average, with "closer" pixels being valued higher, plus a smooth falloff. When you replace each value with an average of its neighborhood, you "soften" the transitions.

One of the remarkable things about the Gaussian function is that its Fourier transform is also a Gaussian. A Gaussian blur means to convolve a function with the Gaussian. Convolution in the frequency (Fourier) domain is multiplication. Since the Gaussian goes to zero rapidly as you move away from zero, high-frequency components get attenuated whereas low-frequency stuff are preserved.

A pretty interesting point made here last year is that picking x, y, z coordinates from three orthogonal gaussian distributions results in a spherically symmetrical distribution. https://news.ycombinator.com/item?id=9446126

Yes, this is the separability property of the Gaussian. It makes it easy to compute Gaussian blur as compared to, say, the circular pillbox filter (lens blur or bokeh simulation) because you can just convolve it different times separately in each dimension. So, blurring an n by n image with a Gaussian of size h would only take O(n^2 h) instead of O(n^2 h^2). When h is large compared to log n, you can use the property I mentioned in my previous comment and do the Fast Fourier transform on the columns and rows of the image to improve time complexity to O(n^2 log n). It doesn't have to be a symmetrical distribution either; but in the general case with a multivariate Gaussian with some d-dimensional covariance matrix S, you'd have to rotate the data to align with the eigenvectors of S, which is lossy.

The rectangular pillbox filter also has the separable property. So does the parallelogram pillbox filter (albeit also rotated), which, by extension, allows us to do blurring by a hexagon (sum of three parallelograms) which can be used to simulate cool-looking bokeh [1].

[1] McIntosh, L., Bernhard E. Riecke, and Steve DiPaola. "Efficiently Simulating the Bokeh of Polygonal Apertures in a Post‐Process Depth of Field Shader." Computer Graphics Forum. Vol. 31. No. 6. Blackwell Publishing Ltd, 2012. http://ivizlab.sfu.ca/media/DiPaolaMcIntoshRiecke2012.pdf

Gaussian blur is the same equation as of the normal distribution. so basically it's a bell curve in one or two dimensions.

it doesn't actually have to be 1 or 2, it can be generalized for n like many distributions. Depends on the context

When you start to look at things as waves a lot changes.

Could you expand on that?

May I ask which Coursera course that was?

i would presume https://www.coursera.org/course/dsp which was pretty good when I did it - https://www.coursera.org/course/images also covered some similar things

That looks like it, great, thanks.

The reason why I'm asking instead of searching on Coursera is that Coursera has become increasingly hard to search, at least in my view, so it's easier to ask directly.

I have a bunch of audio friends that love to do glitch art by processing images in audio processes.

Something like this: https://questionsomething.wordpress.com/2012/07/26/databendi...

Gaussian blur is a low-pass filter.

*is a kind of/is an example of

I know that this is an introduction, but I wish there was a warning about the use of improper color spaces for different tasks (like using sRGB or any nonlinear colorspace for downscaling images, blurring or Phong shading). A warning about the existence of different colorspaces and their different use cases would be enough in an introductionary write up. It's still an issue in most of today's software [1].

Like open this image in your browser or in your favorite image viewer and scale it down to 50%: http://www.4p8.com/eric.brasseur/gamma-1.0-or-2.2.png

[1] http://www.4p8.com/eric.brasseur/gamma.html

So I just spent half an hour googling for libraries that could help with this. Chroma.js seems to be a pretty nice option for dealing with this issue in a web context:


The person behind is has some nice blog posts on color generation too:


The only issue that I see is that the library is focused on translating data to colours. The problems with blurring and downscaling that you mentioned go the other way: the colours are the data.

So OK you can use cv2.COLOR_RGB2GRAY to get a grayscale image. But what does that teach you? In my image processing 101 course a couple of years ago we actually didn't use any libraries except for reading images and showing them (written by the teacher). Just Java. A picture would be a 2 dimensional array. The pixel is represented by an integer. So you just need a nested for loop and you can manipulate every pixel yourself, thus learning what really happens under the hood, how filters work.

i totally agree that the fun part lies in understanding how things work under the hood. i never understood how edge detection work until i worked on an implementation with JavaScript and canvas, and that was a fantastic learning experience. the motivation behind this article was to provide a general idea of what image processing could do, and hopefully encourage someone to delve deeper into the mechanics of image processing :)

A good follow on from this is the Learning OpenCV book (O'Reilly), written by a couple of the lead developers. It goes into detail on the mathematics, but it's not heavy or verbose at all. I found it far more useful than a lot of introductory image processing books simply for its theoretical content.

Don't forget scikit-image and scipy.ndimage too.

Strictly speaking image processing is image in -> image out. And image analysis is image in -> data out. The author gives the expression that everything is image processing. Not a big thing but it helps to know the difference if you want to take the correct course :)

The first paragraph in https://en.wikipedia.org/wiki/Image_processing

"In imaging science, image processing is processing of images using mathematical operations by using any form of signal processing for which the input is an image, a series of images, or a video, such as a photograph or video frame; the output of image processing may be either an image or a set of characteristics or parameters related to the image.[1]"

[1] Rafael C. Gonzalez; Richard E. Woods (2008). Digital Image Processing. Prentice Hall. pp. 1–3. ISBN 978-0-13-168728-8.

I was going to answer with the Gonzalez & Woods definition. Extremely clear book by the way.

Yes, I read this one too. Never applied the concepts in practice though.

Have you looked at http://aishack.in/ - it has a bunch of opencv tutorials and projects.

ahh this is great! opencv tutorials with detailed explanations on the algorithms are hard to come by. i've found tutorials really useful to learn some of the standard processes in image processing (i.e. grayscale and blurring to remove noise).

Instead of using built-in method calls that come with a library, why look into the algorithms used to generate the different transforms?

I once worked on a UI where the users wanted to capture a screenshot of the current page.

Because color toner is more expensive they also wanted the option to print grey scale. I'm pretty terrible at working in 2D space but a quick Google search let me know that the conversion to grey scale involved averaging the RGB values for each pixel.

Unfortunately, the coloring of the UI was darker more than light so the resulting greyscale image was still black toner intensive. So we provided an additional option to invert black and white.

To make it work a second transform was applied to each pixel that reversed the pixel value from upper to lower bound (or vice versa depending on how you look at it).

The result was an output that trended toward white instead of black. The output looked surprisingly good and saved on toner so the users could print many screen captures without worrying about wasting resources.

For the business, it resulted in a cost and resource savings. For users, picking the resulting output provided better results that were easier to understand. From a development perspective, the implementation wasn't difficult at all to add. So, win-win-win.

What surprised me was how easy these transforms were to apply. It's a bit CPU intensive on high resolution images but it's not terribly difficult to come up with good results.

It would be awesome to see some more examples of algorithms used for image processing. So much material covers generic algorithms and data structures that come with the typical CS degree.

It would be much more interesting to see algorithms that can be used in practice. For example, how to scale images, implement blur, color correction, calculate HSL, etc...

Libraries are great but these concepts are simple enough that they don't require 'high science'.

The article mentions a curiosity related to how edge detection works. I'd assume that you select a color and exclude anything that falls outside a pre-determined or calculated threshold. For instance, take a color and do frequency analysis of colors above-below that value by a certain amount. Make multiple passes testing upper and lower bounds.

A full color image @ 24 bit (8R 8G 8B) will take a max of 24 passes and will likely have logarithmic runtime cost if implemented using a divide-conquer algorithm.

Things like blur and lossy compression sound a hell of a lot more interesting because they have to factor in adjacency.

> I'd assume that you select a color and

This is not at all how edge detection works. See https://en.wikipedia.org/wiki/Sobel_operator for a key building block.


I wasn't aware of this approach. Looks like a reasonable single-pass solution.

I'm actually taking a image processing course right now, and at least one thing I didn't see in this article that I have found very useful is histogram equalization (OpenCV equalizeHist). It basically takes images with low contrast and increases the contrast. This is really useful for many applications but one I've actually been able to use is increasing the legibility of scanned pencil on paper images.

yes! i've found it really useful when dealing with images of items in different lighting conditions. i've struggled with understanding histogram equalization, would you happen to have good resources that explain how it is done?

Does this help any? Say we're looking at a grayscale image, where each pixel has a value between 0.0 and (just less than) 1.0. Sort all the pixel values, then scan them from lowest value to highest. The pixels with the lowest value in the image get reassigned the value 0; and in general, a pixel that's brighter than k of the other pixels, out of n total, gets changed to intensity k/n. This spreads the intensities as evenly as we can. (Which might not be so even, for example if all of the input pixels were the same intensity to start -- then this just changes them to 0!)

Maybe the image is, say, 8-bit grayscale, so the values can range from 0 to 255 instead -- then it'd be floor(k/n times 256).

This is usually described in terms of a cumulative description function; I tried to say the same with less jargon.

As a sort of abstract question, do readers here think of <Class X> 101 as meaning fundamentals of X, or basic techniques in X? Having taken image processing from both sides, I'd say that learning the principles was much more useful (and would have been more useful still if I'd had a proper background in linear algebra). This article is the equivalent of naming some tools and showing us where they fit.

I've taken a course in image processing, and I think this presents a fantastic high-level introduction for those that have not been presented with the material before. There's a lot that's missing, but that is to be expected.

I agree. I'm terrible at maths so struggled with aspects of Computer Vision in my degree, but I don't see how you could use OpenCV without understanding the principles to at least a basic degree. It seems like you're purposely creating a black box within which magic is happening. Which would be nice and fine and abstract in many circumstances, except OpenCV keeps you greatly abstracted from the concepts, and any non-trivial Vision application needs you to get close to the theoretical-metal anyway, in my experience.

If anyone's interested in the theory, I'd recommend Sonka

totally. while OpenCV is really useful and magical (i did learn a lot of theory from the OpenCV tutorials though), it didn't help me wrap my head around concepts, but implementing the algorithms (gaussian blur, edge detection, etc) without having fully grasped the math helped a whole lot with understanding how things work. i still can't explain the math behind edge detection, for example, but i can describe how it works. when i wrote this, i had in mind a person who would like to get a big picture idea of what image processing is, and will hopefully be inspired to learn what happens under the hood.

> I'd say that learning the principles was much more useful

Well, of course one should learn both :) An interesting question to ask is: principles first or applications and techniques first? Maybe it depends on the individual. For me, certainly it has to be the application. Principles without application would not keep me engaged long enough... Ideally, it should be something like 30% high level principles with applications, and then the rest of the principles and fundamentals later on, which really elucidate how and why the technique works.

That's my two cents anyway. :)

My CS lab specialized in image processing, can confirm this is indeed the 101.

-1 for explicitly disabling pinchzoom on mobile/tablet devices.

Try using your browser's reader mode. You should be able to pinch/zoom from there.

It's just a simple css value and not intentional, I think. I don't remember where I read this or what it was exactly, some other comment about another site.

The default is to allow zoom. Why add a value to remove end user functionality?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact