Hacker News new | past | comments | ask | show | jobs | submit login
Fourier transforms of images (2017) (maths.org)
174 points by sytelus on Jan 22, 2018 | hide | past | web | favorite | 36 comments

An optical lense is essentially transforming the image into fourier space and back again. The radius of the lense (approx) is a limit to the spacial-frequencies, i.e. some frequencies are clipped off. In the image plane, the signal has been backtransformed - however, some frequencies were lost due to the aperture.

I.e. the fourier transform explains, why apertures limit the resolution of optical devices. It is not only absolute brightness, but fundamentally an information-bottleneck.

You can actually use a laser, run it through a beam expander, pass it through a projector slide, run it through a lens, apply the filtering operations at the focal plane just like you would with a fourier transformed image, and get the expected result out. I thought this was just something cool you could do with lasers, but it seems that there is maybe some use for this: https://en.wikipedia.org/wiki/Optical_computing#Optical_Four...

When you have straight-edged aperture blades, the diffraction pattern creates spikes on the point spread function of the lens, and this can be predicted by just taking the Fourier transform of the aperture shape.

I know some (rudimentary) harmonic analysis and am familiar with the Shannon-Nyquist theorem but I'm not sure I follow. Care to explain?

Edit: nevermind, I got it [1]. I see, however, the result only holds 'modulo' paraxial waves well-behaved in the linear regime... So it's not that every lens automagically computes a FFT at the speed of light as it seemed at first!

[1] https://en.wikipedia.org/wiki/Fourier_optics#Fourier_transfo...

quite happy my thesis supervisor does not read this really really simplified portrayal.

Wow, that's awesome, I've never thought of that! I guess it's similar to how a prism deconvolutes light eh?

not sure what you mean by a prism deconvoluting light.

Good lord.

The very strong 45º lines in the rotated image are probably mostly artifacts caused by the image edges seen in the rotated version. A rotation in real-space is equivalent to one in reciprocal space, so the two transforms should look the same (barring the interpolation differences, etc.) but rotated.

I wonder whether one could show this. maybe by pretending the image is actually a cell in a periodic grid, then transform this grid, and rotate this grid. The periodicity would make roation possible without getting empty spots.

That's a good idea. The condition implicitly assumed to get the result mentioned ("A rotation in real-space is equivalent to one in reciprocal space") is that the original image is periodically tiled in all directions. Your suggestion mimics this condition.

Since your other comment mentioned limitations of optical apertures, maybe it's worth noting that the way the 45 degree rotation was done here (plunking the rotated image into a black background) mimics the action of an optical aperture -- that is, clipping the observed image along that rectangle.

This causes 45 degree spikes, similar to: https://en.wikipedia.org/wiki/Diffraction_spike#Diffraction_....

Yes. I was going to add that the lines in the original image are more likely due to the discontinuity between the top, bottom, left, right edges. The other comment wasn’t mine, but is right that an aperture in the back focal plane of a lens would cut out image information at higher resolution.

Something I find interesting is that, although on the surface this seems like a way to deal with rotations in general, it only works for certain angles. For instance, with a 45 degree rotation, you have to scale the image, too, and the resulting tile contains two of the original image.

Topologically, an underlying problem is that there is not a continuous family of rotations for a torus (the topological space corresponding to two independent periodic dimensions).

> The very strong 45º lines in the rotated image are probably mostly artifacts caused by the image edges seen in the rotated version.

I think this is correct, and there are also some (more subtle) artifacts in the unrotated version for the same reason. The thing I don't understand is why this occurs for 2D images, but there are no artifacts created by the "edges" of a 1D sound (i.e. the start and end of the clip).

When you do a Fourier transform of sound, you'd usually apply a windowing function: multiply the sound with a function that descends continuously to zero at the boundary of the "window", so that there is no discontinuity there. That function is chosen in such a way that it should affect frequencies of interest in the transform's output only mildly.

(Off-topic for OP, on-topic for the above interesting comment..)

I've looked at the windowing options on a typical oscilloscope. (The Tektronix MSO 3014 is the one we happen to use in our lab.) Hanning window, Hamming window, Blackman-Harris, etc.

I'm wondering how much of the off-frequency noise in a typical signal is due to the discontinuity at the end/beginning of the fragment of the signal captured by the scope.

If it turns out that the discontinuity is the dominant source, there may be a straightforward way to avoid it.

A way to eliminate the discontinuity is to capture the data from the scope, transfer it to a computer, and trim the file so that it is a multiple of the fundamental frequency in length.

Then, use a mixed-radix FFT algorithm on the resulting file.

(It may be necessary to add or subtract a few lines from the file to ensure that the length of the file does not have large prime factors. Mixed-radix FFT is quadratic in the largest prime factor of the size of the input.)

Is this common practice? Or, are there reasons other than masking the effect of the boundary discontinuity that people typically use windowing functions?

What fundamental frequency? If your signal is a mixture of frequency you care about and some other random ones, the random ones will generate lot of discontinuity if you tune the period to the interesting one. Also, if you're slightly off with your estimation of the interesting frequency, your signal will not match up.

Using a window trades off sensitivity for low frequencies and a slight bit of frequency resolution against these problems.

That makes sense. Seems like the same technique could be used with images, yes?

Yes, although not with very low-resolution images (8x8 or 16x16), because then you run out of frequency gap between interesting frequencies and ones that are contributed to by the windowing function. EDIT: Another facet of the same issue viewed from pixel space is that you lose a lot of contribution of pixels close to the edge, and in such small images majority of pixels are close to the edge.

There's also an interesting alternative for images, if you don't care about inverting the transform to get the original image: you can express an image as a sum of a periodic component and a very slowly changing one. The periodic component looks like the original to the human eye, so taking DFT of it is usually sufficient for analysis. The paper that discovered that technique: https://hal.archives-ouvertes.fr/hal-00388020v1/document

As one example of how this can be useful, Jo and Bengio recently used Fourier filtering to measure the susceptibility of neural networks to adversarial examples. By changing the statistics of the images in a principled manner, they confirmed that even networks that generalize well are learning mostly surface-level statistics. E.g. an image of a car is more likely to also have asphalt and building colors than greenery. The networks overweight these kinds of features, and that turns out to be good enough to get high scores on many data sets. Using Fourier filtering, they were able to alter the images to generate arbitrarily different surface statistics while preserving how humans would perceive the images.

https://arxiv.org/abs/1711.11561 (https://news.ycombinator.com/item?id=16165126)

I was pretty concerned by this paper but now I'm much more calmed down :). I think authors have jumped to conclusion here. In the two papers that they have cited themselves mensions that human visuals system actually filters out high freq components before transmitting signals further down the chain. The paper says the same thing: if you apply low pass filtering, network is more well behaved. So the expectations that network should consume high freq components and still be resistant to adversarial attacks is unfounded.

I love how you can transform images to wavelets and then edit those components to remove high and low frequency pieces of the image, doing some retouches that will make photos look much more natural.

Example: http://registry.gimp.org/node/11742

Removing freckles should be a crime.

Maybe we could make one of those "style transfer" neural nets to re-add them browser-side.

Retouched face seems wider and less feminine...

Optical illusion, or processing side-effect?

It's the same width. If you can view stereograms, you can use the same technique to quickly compare side-by-side images. Cross your eyes so the images overlap but keep them focused. You can then easily see that the biggest changes are to the skin on the nose, chin, and under the eyes. The structure of the face is unchanged.

An MRI scanner is effectively Fourier domain camera - see k-space [https://en.wikipedia.org/wiki/K-space_(magnetic_resonance_im...] - with the final spatial images being reconstructed once acquisition is complete. Quite beautiful really.

MRI is probably better thought of as a physics experiment than as a camera, but it is quite elegant (at least on paper).

Hardly the only example, though, cf for example OCT (which is commonly used clinically in ophthalmology)

I always liked the imagemagick introductions to fourier transform filtering of images:

Here's another one for advanced uses, including sharpening through FFT division:


Reminds me of the recent Productchart blog post where they swapped the brightness of pixels with their position:


Here is a 250 lines jpeg decompressor written using vanilla python


Enjoy the 5 lines (but slow) IDCT :)

  for y in range(0,8):			
    for x in range(0,8):
      nn = an*math.cos( n* math.pi * (x +.5)/8.0 )
      mm = am*math.cos( m* math.pi * (y +.5)/8.0 )
      self.base[ XYtoLin(x, y) ] += nn*mm*coeff

For anyone reading this, I highly recommend doing as much as possible within MATLAB. Its secret sauce is that it operates on vectors (arrays and matrices) instead of primitives (ints, floats etc). It simply runs circles around almost every other language that I've used. Its real power comes from its domain toolboxes (libraries):


But the language itself is uniquely readable even for its brevity. You can do things in a few lines of code that would take 10 pages in a scripting language or 100 pages in lower level languages like C or Java. Here's the most concise explanation for how JPEG/MPEG compression works that I've ever found:


If you can't afford MATLAB right now then you can at least play around with the language in GNU Octave:


The claim that vertical and horizontal lines in the transform's output are caused by natural tendency to have vertical and horizontal edges seems doubtful. I suspect they are caused by discontinuities between the left and right image borders. DFT is nearly unchanged (only phases change) under cyclic shifts of the input. Thus a discontinuity inside the image is "just as bad" as a discontinuity between the left and right border of the image.

If you want to DFT an image for analysis purposes (ie. if you don't need the transform to be invertible), you can use the trick from the "Periodic plus Smooth Image Decomposition"[1] paper that extracts a "periodic approximation" of the image. It's used e.g. for DFT-based texture enlargement/periodization.

[1] https://hal.archives-ouvertes.fr/hal-00388020v1/document

Looks a bit like these video grading tools: https://digitalfilms.files.wordpress.com/2015/06/df2215_reso... (but I never understood what they actually show)

edit: ahhh, it encodes x position, but luminance distribution on y https://youtu.be/0ghTPnf89ME?t=166

If you need a coder-friendly & graphical refresher on Fourier transforms, I highly recommend this: https://jackschaedler.github.io/circles-sines-signals/

The 1D-case for edge filtering is already quite impressive, when done interactively. http://david.li/filtering/

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact