I have a vague idea of what this means but can someone please explain it in a bit more detail?
The Discrete Cosine Transform is a variant of a 2-dimensional Fourier Transform. The 1-D version of a Fourier Transform is what we use to break a signal, like a sound wave, into its constituent frequencies. It takes as input the wave amplitude at various times, and returns amplitudes for various frequencies. If you were to take waves of those frequencies and amplitudes, and add them together, you would get back the original sound wave you started with. (I'm hand-waving away a bunch of details like phase, boundary conditions, undersampling, and overtones--but this is the general idea.)
You can make the Fourier Transform and its relatives deal with images the same way as sound, by pretending that the image is periodic, i.e. that you are tiling an infinite wall with copies of that image. You could create this same wall by overlaying waves of color on top of each other. The Fourier Transform will find these waves, the same way it found the frequencies for the sound.
With sound, low frequency = slow vibration = long wavelength (imagine an oscilliscope). High frequency = rapid vibration = short wavelengths. So if you were to try yo draw a picture using waves instead of a brush, you would use low frequencies for large things like a head. You would use medium frequencies to add smaller objects like eyes. You would use high frequencies to give small details, like hair or freckles, or the specific shape of a specific person's head.
Let's work in one dimension rather than two dimensions. It's easy enough to extend later.
You know that a any signal is the sum of a (potentially infinite) number of sine waves. For example, a square wave is the sum of ever higher-frequency (but smaller-amplitude) sine waves.
The higher frequencies are necessary to get the sharp edges.
If you strip the high frequencies, the sharp edges dissapear, leaving only the larger motions of the lower-frequency (yet bigger amplitude) waves.
So the low frequencies are the hill, and the high frequencies are the grass.
Does that make sense?
Edit: Here's an image: http://cnx.org/content/m0041/latest/fourier4.png
(is that completely off or is it an analogous transform?)
Applied to the sound, this "frequency view" is much more natural: we hear a sound, and there is a low and a high part of it. It's because our ears really do real time frequency analysis, a kind of biological Fast Fourier transform.
From what I remember, doing this transformation is just a matter of taking the original signal s, get its level n of the lowest frequency f, and compute s - n × f, and recurse on the result with the next frequency. The theorem proves that if you go to the limit you get two equivalent representations of the signal, one being the wave itself s = f(t), one being its "spectrum" s1 = f(freq) (a function of the frequencies).
For many purposes, f(freq) is much more convenient than f(t), including comparisons, frequency shifting, extraction, compression, etc.
It applies equally well to images, but for me the frequency representation of a picture is not perceptively useful, maybe because our eyes are not Fourier transforming what we see.
All that is's old story for me (I studied acoustics in IRCAM), please correct if my memory is wrong.
one thing you can do is read how JPEG works, the DCT is a lot like generalized FFT.
I'm not a big fan of the periodic transforms, but they do have that nice perceptual interpretation.