Hacker News new | past | comments | ask | show | jobs | submit login
Image Kernels explained visually (setosa.io)
363 points by phodo on Aug 8, 2016 | hide | past | web | favorite | 47 comments

This is super cool but it's unfortunate that it clamps negative result values to black.

It's probably worth mentioning that there are a lot of ways to implement convolution with a kernel, and the kernel can be of any size, not just 3×3. The explanation here shows how to implement the output-side algorithm nonrecursively; http://www.dspguide.com/ch6/3.htm gives this for the one-dimensional case. But you can implement it on the input side instead (iterating over the input samples instead of the output samples), there are kernels that have a much more efficient recursive implementation (including zero-phase kernels using time-reversal), you can implement very large kernels if you can afford to do the convolution in the frequency domain, and there's a whole class of kernels that have efficient sparse filter cascade representations, including large Gaussians.

(To say nothing of convolutions over other rings.)

> This is super cool but it's unfortunate that it clamps negative result values to black.

Curious: What would a negative pixel look like? What use would that have?

The standard thing to do in these cases is to make zero be a medium gray, so that you can see negative output as well as positive. If you use one of the edge-detection filters, you'll see what I mean.

In fluid dynamic sims for example. Water level can be positive or negative. Grids of data can represent more than colors in images.

Well yeah. That has use, but when one mentions claiming negative results to black, I assumed that was for pictures. Obviously negative values mean something in other places, but I assumed we were talking about pictures.

No experience with this stuff, but it feels like you could do an abs(kernel) to extract some meaning from the negative value.

Applying this to the "outline" kernel, it seems like white bordered by black would show up just as white as black bordered by white, with homogeneous kernels still showing up black.

(Can't comment on much else, but just wanted to say thanks for the link. I just got an SDR and the link you sent looks incredibly useful for learning. Thanks!)

Thanks for the DSP link. The book looks interesting.

How are the kernels derived? Is it just an art where people play with the matrices to see their effects or is there well-understood math behind them?

A little art and plenty of science. The kernel matrices can be broken down logically once you know what the numbers are operating on. Considering a 3x3 kernel, the center of the kernel matrix is the origin pixel, and the kernel elements around the origin are the neighboring pixels in their respective directions and distances.

The identity kernel is [0 0 0; 0 1 0; 0 0 0]. For every input pixel, the output is the original pixel value. Don't change the pixel value based on what is around it.

A simple blur kernel would be 1/9 * [1 1 1; 1 1 1; 1 1 1]. The output for each input pixel is the average of the origin's pixel value and all of its neighboring values, with an even weighting. A less dramatic blur can weigh the origin pixel higher than the neighbors, such as a gaussian blur. This will result in the output pixel being more similar to the origin pixel, than to it's neighbors.

Edge detection like [-1 -1 -1; -1 8 -1; -1 -1 -1] can be understood by multiplying the origin pixel value by 8, and subtracting off all of its 8 neighboring values. If the values are all fairly similar, lets say all gray, your output will be black. 8A - 8A = 0. So it "punishes" pixels that are similar to its neighbors. When a pixel is different than some or all of its neighbors, you will be left with some value > 0 at the output, which detects a change from its neighbors: an edge. Horizontal edge detection: [1 0 -1; 2 0 -2; 1 0 -1]. Ignores the pixels above, below, and center, but accentuates the differences between what is on the left from the right.


To put it succinctly, convolution just is a fancy term for a weighted average, and a kernel is the name for a set of weights.

In general terms, convolution is defined for two functions which both can be continuous and then it's essentially an "weighted integral". In 1D case (sound), when both functions have discrete domain and domain of filter is finite, then it is the same thing as FIR filter. Convolution with continuous domain kernel (eg. sinc()) is useful for example for resampling/scaling.

There is a well-understood math behind it.

For blur, you can either do two things:

- generate a matrix such that each element in the matrix is equal to 1 (e.g. averaging)

- generate a matrix such that it represents the Gaussian distribution (you can use a 2D Gauss function)

For edge detection, you essentially have "derivative", e.g. rate of change; the more abrupt the change, the brighter the resulting pixel, hence why edges are highlighted. A good convolution kernel for edge detection would be the Laplacian.

For sharpness, it's pretty much the Laplacian Kernel + identity kernel.

I never realized sharpness was identity + Laplacian, but that absolutely makes sense. Thanks!

You might be interested in "The Scientist and Engineer's Guide to Digital Signal Processing":


...you might want to start by skimming chapters 23 and 24.

An example of a kernel that has a mathematical explanation for its values is the (discrete) Gaussian kernel, used to blur/smooth images: http://dev.theomader.com/gaussian-kernel-calculator/

The values of the gaussian kernel matrix are determined by doing a discrete sampling of the gaussian function. You get to choose sigma (gaussian's standard deviation) and kernel size (spatial neighborhood of the kernel, ie how much of the surroundings that the kernel will examine).

Another example is the Sobel operator, used to extract edges from images: https://en.wikipedia.org/wiki/Sobel_operator

The kernel matrix is the result of composing a gaussian smoothing with a spatial-differencing operation. Thus, the Sobel estimates edges from smoothed images.

As for the sharpen kernel described in the post -- an intuitive explanation is that you want to accentuate differences in pixel intensities.

Both, but lots of science. In particular, convolution is linear so if you know what you want a bright dot (a single pixel—a delta function) to look like, then that is your kernel. Want a dot to look like a fuzzy blob? Then your kernel should be a fuzzy blob (Gaussian blur). Want a dot to turn into a positive blip on the left and a negative one on the right (an x-derivative), then (-1, 0, 1) is your buddy (I lied in the symmetric case: it needs to be flipped).

If you understand how things work in the frequency domain, you can design them there and convert them back to the time domain (or leave them in the frequency domain, because if they're of any significant size, convolution is faster in the frequency domain anyway).

image kernels are 2D convolutions, and you can think of them as extensions from 1D convolutions. In 1D, understanding the frequency domain behavior is much easier since you don't have to worry about spacial frequency and 1D frequency is something most people can intuitively understand.

look at FIR filters, they are essentially 1D image kernels.

Also an interesting question: given a kernel K, how do you derive the inverse of that kernel?

Would that not just be the inverse matrix?


Nope, and in general it doesn't exist. That said, there are techniques for "deconvolution" which partially reverse (or in some, special cases completely reverse) a kernel convolution.

Ah, thanks. I guess that makes sense since this isn't straightforward matrix multiplication.

I would suggest to try a 1-dimensional example first. In that case, the kernel does not look like a matrix, but more like a vector.

I don't see image kernels compared to cellular automata, but that's what they are. We just don't iterate more than once or twice with a kernel, and the long-term evolution (stability, chaos, or more interesting dynamics) is not the concern here.

That is to say, there is more to cellular automata than the GOL, and one bit per cell.

I'd be interested to learn more about this! I'm completely outside image processing/machine learning/whatever this is, but articles about it really fascinate me.

You might find this paper interesting:


A Survey on Two Dimensional Cellular Automata and Its Application in Image Processing

Parallel algorithms for solving any image processing task is a highly demanded approach in the modern world. Cellular Automata (CA) are the most common and simple models of parallel computation. So, CA has been successfully used in the domain of image processing for the last couple of years. This paper provides a survey of available literatures of some methodologies employed by different researchers to utilize the cellular automata for solving some important problems of image processing. The survey includes some important image processing tasks such as rotation, zooming, translation, segmentation, edge detection, compression and noise reduction of images. Finally, the experimental results of some methodologies are presented.

Golly is a free game-of-life program you can play with:


There's an active community searching-for/categorizing/discussing patterns in Conway's game-of-life (GOL) (and other cellular automata). Examples:

http://conwaylife.com/forums/ http://pentadecathlon.com/lifenews/

I know about GOL and other simple automata, I'm mostly curious about applications in image processing.

Here's a classic article that blurs the lines between image processing and automata. Very sad that it seems offline today.


Thanks for this. I have been playing with CAs recently[1], and doing a lot of GIF rendering based on the Game of Life rules. This water rendering algorithm seems like a great next step for me to experiment with.

[1] https://twitter.com/tweetgameoflife

One way to get into image processing / computer vision would be to pick up an OpenCV book. I liked this book: https://www.amazon.ca/Practical-Introduction-Computer-Vision....

Image kernels generalize very naturally to real values (so naturally that it hardly constitutes a generalization). CAs can be generalized but it seems less natural.

The common element you are picking up on, I think, is [comonadic computation](http://blog.sigfpe.com/2006/12/evaluating-cellular-automata-...).

CAs are already highly generalized, we are just conditioned to only think of the famous ones like the Game of Life.

All you need for a CA is a space in which to map your cells (in an arbitrary number of dimensions, with an arbitrary mapping), some state for each cell (maybe a real, this is arbitrary) and some transition function to compute the next state of each cell from its current state and that of its "neighborhood" (which again is arbitrarily defined.)

The problem is that CAs are too generalized. They don't have the kind of structure image kernels have (in particular, they aren't linear transformations).

Just wanted to add this is quoting the definition of CA as instances of dynamical systems. It is interesting to think about how a large number of filter passes would affect an image. Makes me wonder if there are very simple filters that create absolute chaos after a while (could an edge detector be one?).

It is a pretty good comparison - I implemented a fast GOL code in Python using FFT-based kernel convolution a few years back


The previous discussion has some good comments and links: https://news.ycombinator.com/item?id=8966785

Nice site. This would've been very helpful to me in college when I was trying to get an intuitive grasp of Gaussian blurs and so on via the formulas.

In what context did you learn about Gaussian Blurs?

In Computer Vision, for image manipulation and so on. Using this book if you're interested: https://www.amazon.com/Image-Processing-Analysis-Machine-Vis...

I was interested. Thank you very much.

Man we need more articles like this.

So nice and clear.

I've been working with images for 12 years and I was never sure exactly what 'sharpen' actually did ...

This was an interesting article though I am not an image-o-phile. However, what I really like was the base site! I am a part time instructor for business students and I am teaching them about the power of visualization. This is an incredibly illustrative source that explains points well, and I'll be able to use it as a teaching tool! Thanks for the post!

Really cool way to see how image kernels work.

Wish we had this when I was taking Discrete Math and Numerical Methods back in college. Really neat visualization!

Bug: 3x3 blur kernel, e.g. [1 1 1; 1 1 1; 1 1 1], outputs white image.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact