
Explained: The Shannon Limit (2010) - rkaplan
http://web.mit.edu/newsoffice/2010/explained-shannon-0115.html
======
mturmon
This little article failed to say the really surprising thing about channel
coding, which is that good channel codes exist at all. There's a simple
argument that seems to indicate that they should not exist. If I remember
correctly, this argument was commonly accepted by engineers at the time
Shannon's paper came out.

The argument is: you have a communications channel that flips bits with some
probability, say 1/4. Suppose you want to transmit one bit. The first bit
might have an error, so you add a parity bit -- now P(error) is 1/16. But that
parity bit might have an error, so you add a second parity bit, and you're
down to P(error) of 1/64. So you are driving P(error) to zero, but at the
expense of ever-increasing overhead.

If you keep on adding parity bits, it's not clear that you can ever make the
error probability arbitrarily small, while still transmitting some
information.

That is, in the simple approach above, P(error) -> 0, but Overhead ->
Infinity.

It is "intuitively obvious" that's a fundamental problem, or at least it was,
until Shannon's results showed that intuition was wrong, or at least needed
correction. In fact, the amount of overhead-per-bit is not infinite, but is a
finite number related to channel capacity.

~~~
ordinary
Maybe it's because of the way you put it, but I can't help but see an
algorithm that exponentially increases the reliability of the message by
linearily adding bits. 1 bit for an error rate of 1/4, 2 bits for 1/16, 3 bits
for 1/64. Am I wrong or was this really lost on everyone, back in the day?

~~~
mturmon
But then, to decrease error probability arbitrarily close to zero, you need
arbitrarily many parity bits. That is undesirable.

Shannon showed you can decrease it arbitrarily close to zero using a finite
number of parity bits. Surprised?

Not content with that, Shannon showed how to compute the ratio of overhead you
needed. He also showed that, if you used slightly less overhead than his
formula, no matter how clever your code was, your error probability would be
rather large.

Then, the reader felt humble and rather amazed.

About fifty years later, people started to figure out how to make codes (in
general situations) that got very close to the best-possible ratio Shannon
found.

~~~
Dylan16807
It took me a minute to digest this, but I see where the problem was. It's not
exactly a finite number of parity bits, but a finite percentage of parity
bits. The first step is to calculate the parity:data ratio needed to just
barely overwhelm the effect of noise. Then to increase reliability you combine
messages, costing no additional bits. Combining messages dilutes spikes in
noise so that the parity is not overwhelmed, to an arbitrary certainty.

This also goes to show that the direction you approach a problem from makes a
big difference.

It's easy to get the impression that you can't make efficient very-low-error
codes by looking at the effect/cost of parity bits.

But it's _also_ easy to realize that fusing messages can provide statistical
certainty that noise levels don't spike above redundancy levels.

-

-

Also, in a completely different point:

Adding standard parity technically consumes arbitrarily many bits, but log(n)
is a very small number. This has to be considered along with the fact that
Shannon pulled a bit of a hat trick. To make an efficient code safer you have
to increase the block size and therefore the latency. You can't get these
improvements for free.

------
alister
Since we're talking about the Shannon limit, why does a 56 Kbps modem not
violate it?

The common wisdom in the early '90s was that the fastest possible modem for a
dial-up connection would be ~35 Kbps. I even remember reading this "fact" in a
communications theory book from that era. Analog modems could _never_ be
faster. (It's assumed that US and Canadian analog telephones have a 3 kHz
channel.)

Then why does a 56 Kbps modem not violate the Shannon limit for data
transmission on a 3 kHz channel? How is it possible to break a fundamental
principle of information theory?

This question has bugged me a lot. Based on some research on web, I tried to
put together an intuitive answer to what happened. Here's my answer, but I
welcome insight or corrections from people knowledgeable in telephony or
information theory.

At the time the book was written--when people believed that dial-up modems
could never be faster than 35 Kbps--the telephone network was _perceived_ as
being analog end to end (though lots of it was already becoming digital), like
this:

    
    
        User<------------{ telco network }------------------------>ISP
               (analog)        (analog)      (analog)
    

A critical assumption in calculating the Shannon limit was the noise floor of
the network, which was taken to be 35 dB, a figure based on an all-analog
network. (A higher dB number is better for transmission quality.) The Shannon
limit would be 35 Kbps based on a 3 kHz bandwidth and that particular signal
to noise ratio of 35 dB. ( bps = BW log2 (1 + P/N) = 3000 log2 (1 +
10^(35/10)) = 34881 bps )

However, by the mid-'90s, most of the telco network became digital. Only the
customer loop--the connection between the user and the telephone company--
remained analog. So now, with respect to the Shannon limit, only the noise
floor of the customer loop matters, since the rest is digital:

    
    
        User<------------{ telco network }------------------------>ISP
               (analog)        (digital)     (also digital)
    

It turns out that it is possible to achieve a much better S/N ratio when only
the customer loop is analog. The Shannon limit would be based on a 3 kHz
bandwidth (the same bandwidth as before) and but a much higher noise floor of
98 dB for the customer loop. The Shannon limit would now be ~97 Kbps. ( bps =
BW log2 (1 + P/N) = 3000 log2 (1 + 10^(98/10)) = 97664 bps )

The Shannon limit _still_ exists, but other factors related to the network
would limit the data rate before this bigger Shannon limit takes effect.

So the key thing that the experts and the communications theory book got wrong
was to neglect the hidden assumption about S/N ratio. The bandwidth matters,
but the S/N ratio matters too.

~~~
est
this buggs me too. They used to say that ADSL mas max speed of 8mbps, now
there are faster ADSL all around the world.

~~~
alister
When they used to say that 35 Kbps was the maximum for a analog modem, it was
purportedly due to a _fundamental principle_ of information theory.

When they used to say that ADSL had a maximum of 8 Mbps, it might be just
because the spec for the first version of ADSL was written in that way. It's
probably not due to any limits of physics or math.

------
wickedchicken
The codes which the article hints at near the end are LDPC[1] codes, which are
being rapidly deployed in nearly anything requiring storing or transmission of
information. I recall speaking to a grad student working on integrating them
into flash memory to reduce its cost.

[1] <http://en.wikipedia.org/wiki/Low-density_parity-check_code>

~~~
epistasis
In addition to being discovered twice, by Gallagher and Glavieux & Berrou,
LDPC codes have deep connections to the belief propagation algorithm, which
was discovered in machine learning, and algorithms for spin glasses which were
discovered in physics.

There's a great temptation going here to say that this type of algorithm is at
a sweet spot for understanding communication, learning, and some physical
systems. The behavior of belief propagation at state transitions is quite
fascinating, and it really feels as though there is a great discovery to be
made there.

I have the utmost respect for coding researchers; every time I read their
papers I get new ideas. If you're a programmer interested in writing fast code
or coming up with new algorithms, I'd recommend occasionally slipping a coding
paper into your reading list.

------
Cogito
Some of the best known codes that approach the Shannon Limit are the Gallager,
or Low-density parity-check, Codes [1], invented by Robert G. Gallager at MIT
in 1960.

The article goes on to mention Gallager, but doesn't provide many references.
It's pretty interesting work so check it out!

[1] <https://en.wikipedia.org/wiki/Low_Density_Parity_Check_Codes>

[2] <https://en.wikipedia.org/wiki/Robert_G._Gallager>

------
imurray
It's been linked to many times on HN, but not yet in these comments: An
accessible textbook (free online) on information theory, and its links to
machine learning: <http://www.inference.phy.cam.ac.uk/mackay/itila/book.html>

Recently video lectures that go with that book also went online:
<http://www.inference.phy.cam.ac.uk/itprnn_lectures/>

------
daviddaviddavid
The resemblance between Claude Shannon and Thomas Vidick (off to the left) is
positively uncanny. They're nearly indistinguishable.

