Let's work in one dimension rather than two dimensions. It's easy enough to extend later.
You know that a any signal is the sum of a (potentially infinite) number of sine waves. For example, a square wave is the sum of ever higher-frequency (but smaller-amplitude) sine waves.
The higher frequencies are necessary to get the sharp edges.
If you strip the high frequencies, the sharp edges dissapear, leaving only the larger motions of the lower-frequency (yet bigger amplitude) waves.
So the low frequencies are the hill, and the high frequencies are the grass.
another way i like to think of it (someone please correct me if i'm wrong) is that high-frequency means high-detail (highly frequently needing information to specify how it looks) whereas low-frequency means low-detail
(is that completely off or is it an analogous transform?)
What is misleading when one talk about Fourier transform for pictures, is that it has nothing to do with the waves emitted by the colored particles and received by our eyes. It is more about the spatial distribution of intensities.
Applied to the sound, this "frequency view" is much more natural: we hear a sound, and there is a low and a high part of it. It's because our ears really do real time frequency analysis, a kind of biological Fast Fourier transform.
From what I remember, doing this transformation is just a matter of taking the original signal s, get its level n of the lowest frequency f, and compute s - n × f, and recurse on the result with the next frequency. The theorem proves that if you go to the limit you get two equivalent representations of the signal, one being the wave itself s = f(t), one being its "spectrum" s1 = f(freq) (a function of the frequencies).
For many purposes, f(freq) is much more convenient than f(t), including comparisons, frequency shifting, extraction, compression, etc.
It applies equally well to images, but for me the frequency representation of a picture is not perceptively useful, maybe because our eyes are not Fourier transforming what we see.
All that is's old story for me (I studied acoustics in IRCAM), please correct if my memory is wrong.
Let's work in one dimension rather than two dimensions. It's easy enough to extend later.
You know that a any signal is the sum of a (potentially infinite) number of sine waves. For example, a square wave is the sum of ever higher-frequency (but smaller-amplitude) sine waves.
The higher frequencies are necessary to get the sharp edges.
If you strip the high frequencies, the sharp edges dissapear, leaving only the larger motions of the lower-frequency (yet bigger amplitude) waves.
So the low frequencies are the hill, and the high frequencies are the grass.
Does that make sense?
Edit: Here's an image: http://cnx.org/content/m0041/latest/fourier4.png