Hacker News new | past | comments | ask | show | jobs | submit login
The Magic Kernel (johncostella.com)
197 points by aleyan on March 22, 2021 | hide | past | favorite | 43 comments

As I just told a colleague who told me of this post, "I'm so old I don't even know how to use that [this] site." :)

Happy to answer questions or address criticisms.

I would be interested to know the comparison of the magic kernel to cardinal B-Splines [1]. The idea in the paper is that by combining an IIR filter with a traditional FIR/convolutional filter, you can get a much better sinc approximation, and preserve the high frequencies well. For those interested in this, many researches following that have been summed up into a book by Hugues Hoppe [2]. The book also includes some discussion of filters that are commonly used today like Lanczos.

Also, I'm not sure if I agree with this claim regarding partition of unity:

> This remarkable property, that prevents “beat” artifacts across a resized image, is not shared by any other practical kernel that I am aware of ...

Lanczos is surely a weird filter having many less than desirable properties, but IIRC cubic polynomial filters like Mitchell-Netravali and Catmull-Rom satisfies partition of unity, and both of them are popular in today's image processing. Even the bilinear filter, which can be box or triangle filter depending on whether one is minifying or magnifying, satisfies the partition of unity property. Did I get something wrong?

Edit: the bilinear filter actually does not satisfy partition of unity because it's scaled down, but a properly scaled box filter or triangle filter satisfies partition of unity.

[1] http://bigwww.epfl.ch/publications/blu9903.pdf [2] http://hhoppe.com/proj/filtering/

Yes I have recently been asked about splines, and the Magic Kernels are of course a subset of those, but generated in a specific way. I have not had a chance to compare to all previous interpolation kernels (and there are many of them).

Yes, the "beat" statement survives from a much earlier version of the page, and is not quite right. I'll fix that.

> I would be interested to know the comparison of the magic kernel to cardinal B-Splines

Well, the observation in the paper

> the rectangular window function can itself be thought of as the nearest neighbor kernel, with Fourier transform sinc f; and the convolution of the rectangular window function with itself—the "tent" function—is simply the linear interpolation kernel, with Fourier transform sinc² f. The first of these has discontinuities at its endpoints, and significant spectral energy outside the first Nyquist zone. The second is continuous everywhere, and has significantly less spectral energy outside the first Nyquist zone, but it still has discontinuous first derivative at its endpoints and midpoint. The third kernel in this fundamental sequence, the Magic Kernel, is continuous and has continuous first derivative everywhere, and has even less spectral energy outside the first Nyquist zone.

corresponds quite precisely to the construction of the uniform cardinal B-splines by repeated convolution of a boxcar filter. The "succession of kernels, each obtained by convolving rect x with itself the corresponding number of times" that Costella describes in his paper is quite precisely the uniform cardinal B-splines. (The terminology around B-splines is unfortunately very confused, with different authors using "spline", "B-spline", and "cardinal B-spline" with different meanings, so I'm doing the best I can here.)

This is also an efficient way to implement the Magic Filter in practice if addition is cheaper than multiplication:

    >>> x = [0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0]
    >>> for i in range(len(x)-3): x[i+2] += x[i+3]; x[i+1] += x[i+2]; x[i] += x[i+1]
    >>> x
    [1, 3, 3, 1, 0, 2, 6, 6, 2, 3, 9, 9, 3, 0, 0, 0, 0, 0]
You can see that the sequence has been convolved with the Magic Kernel. Each of the addition operations computes the two-sample boxcar filter on a sequence of input samples, and the resulting sequence is pipelined to the following addition operator, so the final sequence you get is the desired convolution.

If you want to do this at a different scale, what you have is a Hogenauer filter or CIC filter, which requires an addition and a subtraction per sample per order, six in this case. Commonly they use a FIR sharpening filter, though usually just in direct form.

The multidimensional splines you get by doing this kind of interpolation successively in more than one dimension are commonly called "box splines", because you get them by convolving "boxes" instead of the one-dimensional boxcar function. This is the normal way to do Gaussian blur in image processing, for example.

If you want to use B-splines of a given degree to approximate ideal interpolation with a sinc kernel as closely as possible with a given support, avoiding any blurring, that's a solvable problem; https://www.cs.tut.fi/~gotchev/DIPII/lecture3.pdf is a set of lecture notes on the topic.

If you're interested in this kind of thing you might enjoy my notes from a couple years ago: https://dercuano.github.io/notes/sparse-kernel-cascade-gabor... https://nbviewer.jupyter.org/url/canonical.org/~kragen/sw/de... https://dercuano.github.io/topics/sparse-filters.html.

Nice — it did look to be a more general class of interpolating functions, and that does indeed look to be the case. Interesting!

How did Facebook feel about you publishing the algorithm publicly? Specifically the Sharp+ version, as it seems that some part of that originally derived from existing Instagram code. (if I understood it correctly - or was it just an additional sharpening matrix? regardless, it sounds like something they might have wanted to keep proprietary)

I put the Magic Kernel into the public domain in 2011, long before starting at Facebook. Yes, the Sharp step is nothing more than a three-tap sharpening filter, not at all remarkable in itself, and yes the "+" is just extra sharpening with the same filter to match existing properties; it was convenient in practical terms in that it did not require extra processing, but is not special in itself.

Your bicubic image looks different to my bicubic images.

Starting with http://www.johncostella.com/magic/small.png when doubling 3 times I get this result in Photoshop CS2 https://i.ibb.co/9tCLPGS/PSCS2triple.png and when doubling 3 times in GIMP’s cubic the result is https://i.ibb.co/p2b0Trq/GIMPtriple.png

This is the result when using GIMP to go straight from 57x43 to 456x344: https://i.ibb.co/r0SXVY2/GIMpstraight.png and Photoshop CS2: https://i.ibb.co/Sn1xt3Y/PSCS2straight.png

Which one is the correct representative of the bicubic filter?

I generated bicubic and Lanczos versions in 2006. It was stated that the Lanczos one evidently came from a version of GIMP that was buggy, so I removed it a long time back. The bicubic is still there from that era. If it's wrong, I will have to redo that part of the page.

The Lanczos images in the paper should be correct.

It's remarkable what can be done with simple filters.

The trendiest thing lately is image supersizing using neural networks. That must use far more processing power than your approach. Is this overkill? A different approach for a different use.

Randomly Googled link, in case anyone is curious: https://openaccess.thecvf.com/content_cvpr_2017_workshops/w1...

Right. Fundamentally, Magic Kernel Sharp seeks to be an efficient but good approximation to a rectangular low-pass filter; it doesn't add information, and ideally doesn't destroy or distort it either.

Yes, I have seen other methods for guessing or dropping in extra detail. (Indeed, my old "UnBlur" method of deblurring effectively makes a best guess as to what an image looked like before it was blurred, which in the presence of noise is always a selection of one particular possible solution.) There's nothing wrong with that — it's just a different operation.

Hey, just wanted to let you know that I initially [0] closed your site because you don't have HTTPS configured properly (which means visitors get a scary warning page when trying to connect via HTTPS, or leave themselves open to all kinds of man-in-the-middle nastiness when falling back to HTTP). Please consider getting a proper SSL certificate, they're free :)

[0]: I tried again but without forcing HTTPS to confirm the issue after seeing that you're around in the comments. The content is great, but I know I'm far from the only one who defaults to ignoring sites without HTTPS.

Granted, it's a bad example for a Security Engineer to not offer HTTPS. Unfortunately I use a provider that doesn't make it easy at all. I will get there eventually.

This is the part where I timidly link http://n-gate.com/software/2017/07/12/0/ and wait to see what happens next.

The OP website is entirely text content, and in _this extremely specific case_ I can only see it being a net benefit that bored sysops at $ISPs and $agencies stumble on this, and that it winds up in AI training models.

If there was a point to my fist-waving it would be that there's no such thing as knee-jerk assurance when it comes to security. Yes, HTTPS is a good sane default in probably 99% of situations, but that's because the distribution of privacy-requiring contexts on the Web such as shopping and banking is disproportionate to the average.

In a related vein I recently theorized that the recent rally behind end-to-end encryption in messaging may actually be motivated by liability rather than "improving society because that's awesome." https://news.ycombinator.com/item?id=25522220

> If people don't want to see my site with random trash inserted into it, they can choose not to access it through broken and/or compromised networks.

That's valid, and just as valid as users (especially in technical circles!) saying "I won't visit HTTP-only sites".

Hence why I didn't frame my request as "the site sucks because it has no HTTPS" but just "I (and others) simply wouldn't visit the site". I'm just trying to spread awareness, no more.

(The rest of those objections boil down to "I don't use HTTPS hence this QA point is irrelevant").

Edit: Thank you for the site though, the weekly digest is absolutely hilarious :D

> If people don't want to see my site with random trash inserted into it, they can choose not to access it through broken and/or compromised networks.

But isn't that often false, since in many places there are few choices of which ISP connects your home? Am I supposed to just move somewhere else?

For some reason I get stuck at the "security check" when clicking that n-gate link -- none of the buttons do anything, which might or might not be the point.

Granted, the site was interesting enough that I clicked around until I made my way back to the article you were linking to in the first place, but the link itself at least failed to fulfill its purpose :P

The n-gate person put this fake captcha to prevent people from accessing their site when hackernews is the referer.

Just paste the address in your address bar !

Note that the original redirect is a 301, so Chrome will keep redirecting you even if you paste the URL in a new window. Clear your cache or use incognito to avoid it.

Ha, that's kinda awesome given the general tone of that site. Hooray for well-executed tech-cynicism.

Okay that is extremely antisocial. I had no idea the site author did that. Live and learn I guess...

Very cool stuff. Do you happen to have examples of the Sharp or Sharp+ variations? I didn’t notice any on the page, but I may have missed something.

The paper includes examples of Magic Kernel Sharp 3 (the original), 4, 5, and 6. The C code allows you to use any value 'a' of these "generations," and any version 'v' of the Sharp kernel listed in the paper (up to a = 6, v = 7), for any case you want to try. I may add more versions later.

The Sharp+ is just extra sharpening. It's included in the executable, but isn't fundamentally anything more than extra sharpening over what tries to be as-close-to-ideal resizing as possible.

Ahh, that would make sense; it didn’t occur to me to check the paper. I was mostly curious to see at a glance how it would visually compare with the addition of sharpening. Ultimately, it ends up looking pretty good.

Yes, and if you're comfortable pulling down the code and running 'make' on any *nix system, there is an executable 'resize_image' that lets you test it out for yourself. I'm always looking out for corner cases that stress-test the differences between the algorithms, so let me know if you find any good ones.

Thanks for the very interesting research. It is unexpected to me another kernel + sharpening will outperform lanczos2/3 in practice.

Thanks. It was unexpected to me as well. The original "magic" kernel was really along the lines of interpolation (particularly by large factors — there were all sorts of algorithms out there, e.g. repeatedly upsizing by small factors, to avoid aliasing ands pixelation artifacts) and wasn't seeking to approximate the properties of the sinc function / rectangular low-pass filter at all. Only after really understanding in practical terms (i.e. on millions of test images) the blurring nature of the continuum generalization of that kernel did I apply sharpening, and that's when it starts to resemble the Lanczos approximations. It's really the properties around harmonics of the sampling frequency that make the big difference — and that's where the high-order zeros come into play, that the Lanczos approximations don't possess.

Ah thanks, I wasn't aware of this. That mainly talks about the original "magic" kernel ([1,3,3,1]) and an earlier version of my page. It rightly references the fact that the magic kernel (any version) by itself slightly blurs an image. The paper http://johncostella.com/magic/mks.pdf sheds more light on all of this.

Isn't the magic kernel simply a 2× bilinear downscale filter? For example, if you use 2× bilinear downscaling in Tensorflow (as opposed to bilinear downsampling which would end up being just a box filter) using tf.image.resize it uses the magic kernel [1,3,3,1].

There is an article [0] that points this out, although later it goes into recommending odd filters, which insidiously break many applications where you would want to use downscaling by shifting the image by a fraction of a pixel.

[0] https://cbloomrants.blogspot.com/2011/03/03-24-11-image-filt...

I discuss Charles Bloom's 2011 article in some detail on my page. The original "magic" kernel (i.e. [1,3,3,1]) can be rightly categorized as just many things. It's the generalization to the continuum case, and the addition of the Sharp step — and, now, to arbitrary higher "generation" 'a' — that makes it useful and more powerful. Bloom's post ten years ago inspired me to play some more with the "original" and figure out at least the "continuum" version.

I recommend you check out the paper http://johncostella.com/magic/mks.pdf for developments since the original 2006 "magic" kernel. :)

>"In early 2015 I numerically analyzed the

Fourier properties

of the Magic Kernel Sharp algorithm, and discovered that it possesses high-order zeros at multiples of the sampling frequency, which explains why it produces such amazingly clear results: these high-order zeros greatly suppress any potential aliasing artifacts that might otherwise appear at low frequencies..."


>"Following that work, I analytically derived the

Fourier transform

of the Magic Kernel in closed form, and found, incredulously, that

it is simply the cube of the sinc function.

This implies that the Magic Kernel is just the

rectangular window function convolved with itself twice

-which, in retrospect, is completely obvious. This observation, together with a precise definition of the requirement of the Sharp kernel, allowed me to obtain an analytical expression for the exact Sharp kernel, and hence also for the exact Magic Kernel Sharp kernel, which I recognized is just the third in a sequence of fundamental resizing kernels. These findings allowed me to explicitly show why Magic Kernel Sharp is superior to any of the Lanczos kernels. It also allowed me to derive further members of this fundamental sequence of kernels, in particular the sixth member, which has the same computational efficiency as Lanczos-3, but has far superior properties."

Paper: http://www.johncostella.com/magic/mks.pdf

PDS: My thoughts: Absolutely fascinating! (Also, that the the "Magic Kernel is just the rectangular window function convolved with itself twice" -- was not obvious to me, so it's good you included that!)

"Not obvious": yes, I was originally writing up the paper just to contain the derivation and results of the Fourier transform calculation — whatever it turned out to be — and was dumbstruck when it turned out to be sinc^3 x. The original title was just "The Fourier Transform of Magic Kernel Sharp." It all turned out to be more interesting than a boring calculation. Some of it also doesn't quite flow properly because the story kept changing as the mystery unfolded, but I think it's close enough to summarize it all.

I also realized while writing it that the original doubling "magic" kernel (the 1/4, 3/4) was just the linear interpolation kernel for a factor of 2 with "tiled" positions, which I added to the page after writing the paper. I have since seen that others had previously made that comment (which I missed, but they are completely right). It's interesting that the infinitely repeated application of this "second" order kernel, i.e. what I now call m_2(x), originally led me (i.e. in 2011) to the "next" one, i.e. m_3(x). I still haven't fully sorted through how that works, but it's lucky that it did.

An absolutely brilliant work in my opinion!

I'm always curious when/where/how we see Fourier Transforms of any sort -- because we see them in so many diverse fields, and in so much diverse Mathematics to consider them almost ubiquitous and fundamental for any meaningful understanding in many areas of Physics...

Here we see them in Image Processing...

Again, an absolutely brilliant work!

You imply that this is implemented over gamma compressed data... Doesn't that make all the analysis you have done void?

The kernels themselves should be applied in linear space, and their properties are independent of the signals to which they are (assumed linearly) applied. If the practical examples I applied them to should have had gamma compression removed before being transformed, then that's an oversight in the code that could be easily corrected. The general results won't be affected.

Hmm, aside from my note in https://news.ycombinator.com/item?id=26543937 about how the family of kernels Costella discovered are precisely the uniform cardinal B-splines, I noticed this slight error:

> Computationally, however, nothing short of Fast Fourier Transforms or hardware acceleration can make a large blurring job efficient: for large values of blur, the support of any blur kernel is large, and not practical for some at-scale applications.

The standard way to do Gaussian blur is to approximate it as a B-spline, specifically a cascade of simple moving average filters, also known as a Hogenauer filter or CIC filter in the time domain. Typically a cascade of three filters is used, precisely as Costella describes. This is both simpler and faster than doing the task with a Fourier transform, and moreover it is an online algorithm — it produces output samples at some processing lag behind the input samples.

As you can easily verify, the following program uses this approach to convolve its input buffer with a Gaussian kernel with support of 766 samples, constructed in just the way Costella describes (as a B-spline), in a single pass over the input buffer, with only six additions and subtractions per sample, not counting the additions and subtractions used for addressing.

    #include <stdio.h>

    enum {lag = 256, image_size = 4096};

    unsigned long buf[image_size] = {0};

    main(int argc, char **argv)
      buf[image_size/2] = 1;

      for (size_t i = 0; i < image_size - 3 * lag - 3; i++) {
        buf[i + 3 * lag + 2] += buf[i + 3 * lag + 1];
        buf[i + 3 * lag + 1] += buf[i + 3 * lag];
        buf[i + 3 * lag] += buf[i + 3 * lag - 1];
        buf[i + 2 * lag] -= buf[i + 3 * lag];
        buf[i + lag] -= buf[i + 2 * lag];
        buf[i] = buf[i + lag] - buf[i];

      for (size_t i = 0; i < image_size - 3 * lag - 3; i++) {
        if (i) printf((i%8) ? ", " : ",\n");
        printf("%6ld", buf[i]);

      return 0;
Removing the printf calls, this takes about 162'677 instruction executions to run, according to Valgrind, which is 39 instructions per pixel. But most of that is still startup and shutdown; increasing the image size by another 4096 increases the number to 232'309 instructions, only 69632 more instructions, or 17 instructions per sample. (printf takes almost two orders of magnitude more time than the blur call.)

Adding an outer loop reveals that on my nine-year-old laptop it takes 6.5 μs to run over the whole buffer, or 1.6 nanoseconds per sample. In theory this performance does not depend on the lag and thus the blur radius, but I've only tested a single lag.

The output samples are in some sense scaled by 2²⁴ = 16777216, or by 2¹⁶ = 65536 if you're interpolating a signal that's being upsampled by zero insertion by a factor of 256. This scale factor does depend on the lag (it is the cube of the lag).

This version computes the results in place starting from output that is already in memory, but you can also use a ring buffer with a capacity of at least 3lag + 2 = 770 samples. If you're downsampling, an alternative is to decimate the samples as soon as they've passed through the cascade of three integrators at the top of the loop, thus reducing both the amount of memory needed and the number of operations to do.

Overflow of intermediate results can happen, but it is not a problem as long as the final output is within range, because exact integer addition mod k is a group. If you were doing this in floating point, you might prefer to interleave the integrators with the comb filters to prevent intermediate overflow and consequent loss of precision leading to numerical instability.

Only the parts of the result buffer over which all six pipeline stages have passed contain valid results. The simplest solution to this is padding, but you can also use special logic to run the integrators but not the combs over the beginning of the input.

I would argue that 17 instructions or 1.6 nanoseconds per sample is always "efficient" and "practical" for nearly all "at-scale applications". If you were blurring a 3840×2160 display at 120 fps, one gigapixel per second, it would take 13 milliseconds of my laptop CPU cores per second for each of the R, the G, and the B; you could do it on about 6 cores of a modern CPU, without a GPU.

Correction: if you want to do that in two dimensions, you have to do that twice, and the second dimension is going to have worse locality of reference, especially if you don't tile your images. I still think that's efficient and practical.

GCC unrolled the loop twice, so it was 34 instructions that processed two samples per iteration. The instructions are mostly things like this:

        addq    (%rax), %rbp
        subq    %rdi, %rcx
        addq    %rbx, %rdi
        subq    %rcx, %rdx
        movq    %rcx, -2064(%rax)
        movq    -2056(%rax), %rcx
That should clarify a bit how one core of my mere 2.8GHz Core i7 is managing to run over ten billion instructions per second.

Godbolt link for those who want to play: https://godbolt.org/z/MfEqYn3Pd

Gaussian blur gets used for this a lot precisely because it's so easy to compute this way — it's the only convolution kernel that's anisotropic and separable. So people press it into service doing things like camera bokeh that it's not really very good at.

Ah nice, I will correct that statement, which I was sure was bound to have a counterargument. :)

I see that now your page links to my comment, saying, "More efficient ways of performing Gaussian blur have been noted; the Magic Blur was convenient in that it made use of existing hardware libraries and calling code stack."

But in fact what I described above is the Magic Blur algorithm, just in a slightly different guise, and it can probably take advantage of existing hardware libraries in the same way. However, I don't know what your hardware libraries are doing in the blur case!

Happy to help!

I always thought the Lanzcos kernel was overrated.

They are actually pretty good — I believe much better than truncated sinc (likely because of the discontinuity in first derivative of the latter), but I haven't done that analysis myself.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact