

Let's Build an MP3 Decoder - petercooper
http://blog.bjrn.se/2008/10/lets-build-mp3-decoder.html

======
psykotic
In the same genre and likewise in Haskell, I recommend Exploring JPEG:

<http://www.imperialviolet.org/binary/jpeg/>

On the decoder side, for someone who has basic familiarity with compression
and signal processing, the obstacles in understanding MP3 are polyphase filter
banks and the mixed discrete cosine transform, which are not much explained in
this article. They both take advantage of alias cancelation properties (in the
frequency domain for the filter bank and in the temporal domain for the MDCT)
that in my opinion are pretty subtle. The rest of the components in an MP3
decoder have close analogues in image compression (e.g. joint stereo is a
decorrelation transform that is analogous to going from RGB to a luminance-
chrominance color space), so they shouldn't be difficult to grasp.

For those trickier signal processing parts, Pan's IEEE article A Tutorial on
MPEG/Audio Compression is a great place to start:

[http://www.ee.columbia.edu/~dpwe/e6820/papers/Pan95-mpega.pd...](http://www.ee.columbia.edu/~dpwe/e6820/papers/Pan95-mpega.pdf)

------
noonespecial
Wow, that takes me back. One of the first "serious" programs I wrote was an
mp3 decoder in pascal. It took about 30 minutes per minute of audio to do the
decode on my p90. I felt like king of the whole world.

~~~
jensnockert
Yeah, it is a good thing is that you can do it in essentially any language and
it will be fast enough for realtime today.

At least as long as you design it reasonably well.

~~~
sliverstorm
Bubble sort!

~~~
jensnockert
Javascript is the new bubble sort?

------
darxius
Great read! (although I admit I'm not completely done it)

I think the whole process of taking something audible and "real" (like sound)
and digitizing it is simply amazing.

------
Qweef
I don't think the "A Haskell tutorial" part of the submission title is
accurate. It's an mp3 tutorial... which happens to use Haskell. Not everything
which uses Haskell is specifically about doing so. :-)

~~~
petercooper
Was that actually in the submission title here at HN? I didn't put that into
it and it's not there now, so if it was.. we know an admin was fidgeting with
it! :-)

------
DarkMeld
What does it mean when a codec is "lossless"?

~~~
Caerus
This is in no way rigorous, but hopefully the analogy helps.

Imagine you want to encode something like y = 2 * sin(2 * x) + .1 * sin(10 *
x) to save space.

[http://www.wolframalpha.com/input/?i=sin%28x%29+%2B+.1*sin%2...](http://www.wolframalpha.com/input/?i=sin%28x%29+%2B+.1*sin%2810x%29)

That is composed of two different sine waves added together - one low
frequency, large amplitude and one high frequency low amplitude. That can be
encoded two different ways - lossy or lossless.

A lossy encoder gets rid of unimportant data and keeps a "close enough"
representation of the original. In this example, .1 * sin(10 * t) is a minor
component of the overall signal (plot them separately if you need to compare
the difference) so the encoder chooses to delete it and only save 2 * sin(2 *
t). In a real world sound, this would be like throwing away noise that is so
high pitched we can't hear it, or is so quiet we can't detect it. A real
encoder has to decide what the "small enough it can be deleted" threshold is,
and it's rarely black and white.

A lossless encoder looks at that signal and thinks "there has got to be a more
efficient way to store that data". They are both sine functions, so that
doesn't need to be repeated twice. "x" is completely unnecessary, because it
knows the sine functions are dependent on some variable. So, it could write
out something like "sin (2,2) (.1,10)". All of the original data is still
there, if whoever receives the data (the decoder) knows how to interpret it.

~~~
bwarp
Note: A fourier transform is never lossless - it's an approximation (usually a
fairly good one though).

~~~
Bou
The Fourier transform itself is an exact mathematical transformation with an
exact inverse transform. There's nothing lossy about it.

~~~
bwarp
Mathematically, you are correct. Practically you are not if you consider the
transform source.

~~~
psykotic
What do you mean? If the original source is analog and sampled below its
Nyquist rate in the analog-to-digital conversion, the process is indeed
irreversible. But that all happens before any transforms from the time domain
to the frequency domain are in play, so it's a separate issue.

Beyond that, discrete Fourier and cosine transforms as usually implemented are
not fully reversible because of loss in precision. A colleague of mine blogged
about the issue in the context of Haar transforms a few years ago:
<http://cbloomrants.blogspot.com/2008/09/09-08-08-1.html>. By decomposing an
orthogonal transform into shears as explained by Charles, you can design
reversible fixed-precision variants of the DCT like binDCT:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.8...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.8531)

~~~
bwarp
Sorry it is a separate issue - I should have been more clear.

The original point the OP made was a bad example. Unfortunately I made a poor
attempt at explaining that.

------
rplnt
Using this codec would probably be illegal in some countries. I guess it is
all right to play with it, but don't incorporate it in your applications.

