Hacker News new | comments | show | ask | jobs | submit login
Fifty Years of Shannon Theory (1998) [pdf] (princeton.edu)
126 points by avastmick 6 months ago | hide | past | web | favorite | 22 comments

For anyone interested in a quick visual overview of some important ideas in information theory, I'd like to give a shout out to colah's "Visual Information Theory" post: http://colah.github.io/posts/2015-09-Visual-Information/

I read Shannon's original papers and have been trying to grok information theory off and on with only limited success for several years. The book "Information Theory: A Tutorial Introduction" by Stone really helped the principles sink in. For anyone who'd like a textual introduction, I highly recommend it. It's semi-technical - high-school level math helps explain much of the foundational concepts.

If you're feeling more adventurous, Cover and Thomas's textbook is the information theory bible. It's very dense but absolutely packed with insight.

Recently, I read a great book that goes into Shannon a bit. It's called "The Information: A History, A Theory, A Flood" by James Gleik.

It starts by talking about African drum speech, and how it is a very simplified version of speech, and how it deals with error correction over great distances. From there it goes into a lot of great historical explanations, including Shannon.

Richard Hamming developed error-correcting codes while working along side Shannon at bell labs, his book The Art and Science of Engineering: Learning to Learn and the associated lectures he gave at the naval postgraduate academy are tremendous


> I read a great book that goes into Shannon a bit

Ha! noticed what you did there. Was that intentional ?

I think grad students might be able to relate to the following: If one's masters thesis is as world changing as it was, how does one even muster up the courage to join a PhD program after that. What could one possibly do that could have a stab at even a token respectability compared to foundation of digital circuitry. Foundation of something that has really changed the world. For grad students who are mere mortals, that would be terrifying.

I did not :) good catch!

I just finished this book, I learned a lot but overall it was disappointing. It felt really disjointed, skipping around a lot and not tying things together at all.

I was similarly underwhelmed.

Tip: Claude Shannon's new biography [https://www.amazon.com/Mind-Play-Shannon-Invented-Informatio...] is an enjoyable read.

I wasn't a huge fan. It was more about his life and less about his ideas.

If you want a little bit of both, I highly recommend Seife's Decoding the Universe: How the New Science of Information Is Explaining Everything in the Cosmos, from Our Brains to Black Holes or Gleick's The Information: A History, A Theory, A Flood.

Are there any online tutorials or courses that guide us step by step to create a non trivial fun project that uses Shannon Theory? For example a program that simulates encoding a message on one end and decoding it on the other end.

This is my favourite channel on youtube, it focuses a lot on information theory and cryptography.


I really like the use of music in these videos (or at least the main one presented on the channel's homepage). There's something about simple atmospheric 'textural' music behind people talking about stuff that gives me enjoyment.

Shannon's original theory does not result in an optimal encoding. Page 2059 of the linked-to text (the third display page) says:

> But, is that the best we can do? Shannon’s Theorem 3 does not address that question, since it only suggests a suboptimal code. (The optimal code of rate R simply disregards all but the 2^(NR) most probable sequences of length N.) Shannon finds the answer in Theorem 4: as long as we require probability of error strictly less than 1, asymptotically, we cannot encode at rates below the entropy. This statement is commonly known as the strong converse source coding theorem. The converse (or weak converse) source coding theorem asserts that error probability cannot vanish if the compression rate is below the entropy.

Later on the same page:

> The variable-length source code that minimizes average length was obtained by D. Huffman

So what you want is to learn about how Huffman coding works. There are many such examples - it's common in data structures courses and text books. One example is in the classic "Structure and Interpretation of Computer Programs" at https://mitpress.mit.edu/sicp/full-text/sicp/book/node41.htm... . Look around and you'll find plenty more.

There are also many code examples at at https://rosettacode.org/wiki/Huffman_coding , and worked-out examples at StackOverflow, like https://stackoverflow.com/questions/11587044/how-can-i-creat... .

The Wikipedia page at https://en.wikipedia.org/wiki/Huffman_coding seems unhelpful for a novice.

There is better than Huffman coding now, see range coding [1] or arithmetic coding [2], as used in Zstandard [3] for example. In a nutshell Huffman coding is optimal under the constraint that one uses an integer number of bits per coded symbol. This constraint is quite natural so was not stated explicitly, but it's not a necessity. Range / arithmetic coding are using a continuous encoding covering a full sequence of symbols. In the end, this enables having a fractional number of bits per coded symbol, and matching the actual probability of a given symbol. This saving leads to a more compact encoding.

The patents have now expired, which is why the technology starts to appear in open source and royalty free implementations like Zstd.

[1] https://en.wikipedia.org/wiki/Range_encoding [2] https://en.wikipedia.org/wiki/Arithmetic_coding [3] https://facebook.github.io/zstd/

Arithmetic coding is also mentioned in the text:

> The optimality of the Huffman code must have seemed at the time to leave little room for further work in fixed-to- variable source coding.[11] That, however, proved not to be the case, because of two major difficulties: 1) the distribution of the source may not be known [12] when the code is designed (Section II-E), and 2) although the Huffman algorithm need not operate symbol by symbol, its complexity grows very rapidly with the length of the source block. ... The second shortcoming is circumvented by the arithmetic coding method of J. Rissanen [60] (generalized in [61] and [62] and popularized in [63]), whose philosophy is related to that of the Shannon–Fano code.

I made two assumption in my comment. 1) s-c-h knows nothing about compression, so would like to start with the easiest code developed along information-theoretical lines, and 2) the author of the IEEE paper knows enough about the topic that I could quote the text directly.

To clarify, the author is https://en.wikipedia.org/wiki/Sergio_Verd%C3%BA and is "the Eugene Higgins Professor of Electrical Engineering at Princeton University, where he teaches and conducts research on Information Theory in the Information Sciences and Systems Group."

I do not have enough knowledge of the field to tell if he included the correct qualifiers. The Huffman code is a "variable-length source code that minimizes average length" while arithmetic codes appear after mentioning the goal of "minimum average length in terms of the distribution of the source."

That appears correct to me.

You might want to check out "Turbo code" It approaches the channel max capacity and is the basis for modern cellular internet (3G/4G).

It is also used for deep space communications, where squeezing every bit is essential.

The wikipedia article is quite well crafted. https://en.wikipedia.org/wiki/Turbo_code

For a sample CPP implementation can be found here. http://the-art-of-ecc.com/topics.html

One fun place to start is to solve cryptograms by searching for a substitution that minimizes perplexity.

Claude Shannon's "A Mathematical Theory of Communication" is available online, and in bookstores.


Khan Academy has done a information theory


Focused on Information Theory in grad school at Stony Brook in 1996/97, and enjoyed Nam Phamdo's courses. Sadly after reading 8-9 pages I feel like much of this review is over my head now. The words make sense but the concepts are pretty dense.

I do remember the algorithms (Huffman coding), but the high level concepts are harder to remember - information being directly correlated to randomness (entropy). Information is the unknown (random?) message, that sounds more like noise. Channel capacity makes sense: based on noise there's an upper limit on how fast you can communicate bits and groups of bits.

Re: information is the unknown message. Think of it this way.

You read the newspaper each day. Every day it's got the same news: "Today the sun shines." This is no useful information (no entropy) and you can predict it very well.

Second scenario. You read the newspaper, but each day it states that either "Today the sun shines" or "Today is raining" with 50/50 probability. That's maximum entropy. That's useful information, which you can't reliably predict.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact