Hacker News new | past | comments | ask | show | jobs | submit login
A Mathematical Theory of Communication (1948) [pdf] (worrydream.com)
169 points by maverick_iceman on July 12, 2016 | hide | past | web | favorite | 53 comments

Meaning no disparagement of Shannon and Information Theory, I would urge you to study Cybernetics as well.

Information Theory can be understood as an attempt to make real world phenomenon act like digital symbols. Cybernetics, in contrast, attempts to make digital symbols act like real world phenomenon.

In logic, "A = not A" is a contradiction that jams the works, but in cybernetics it's just an oscillator. Our systems are more properly modelled by the formalisms of Cybernetics than Information Theory.

(You can get a PDF copy of "Introduction to Cybernetics" from http://pespmc1.vub.ac.be/ASHBBOOK.html)

Cybernetics was highly influenced by Shannon's IT. Shannon was also an active member in the cybernetics community. I would not consider the theories antithetical but rather synthetical.

Thanks for the link. I've been struggling recently in trying to explain a lot of ideas that seem to coincide very tightly with cybernetics. This gives me so much to read on! :)

For those interested in the nature of information as such and as it relates to computation, you might find "From Aristotle to John Searle and Back Again: Formal Causes, Teleology, and Computation in Nature" by E. Feser interesting.

Feser's blog post about his paper, and some interesting comments:


Don't think it's going to be of much interest to HN readers:

>Yet, a powerful objection has been raised by John Searle, who argues that computational features are observer-relative, rather than intrinsic to natural processes. If Searle is right, then computation is not a natural kind, but rather a kind of human artifact, and is therefore unavailable for purposes of scientific explanation.

>In this paper, I argue that Searle’s objection has not been, and cannot be, successfully rebutted by his naturalist critics.

Computation inaccessible to science? Searle being correct about computation? Waterfalls play chess, as viewed by the right observers?

Is their anything in Searle's argument that is specific to information and computing? I mean chairs for example are just piles of atoms, and don't really exist without some observer interpreting them as chairs.

(This by the way is a non-rhetorical question. I disagree with Searle on this point, but he his too smart not have a good response to this line of criticism).

Searle's conclusions are certainly shockingly counterintuitive, which challenges us to do a good enough job of formalizing what we mean by computation, or more specifically universal computation, that we can rebut it.

Do you remember the controversy about whether Matthew Cook's proof† of the computational universality of elementary CA rule 110? There wasn't consensus on whether the infinitely-repeating background pattern 00010011011111 was a valid precondition.

The entire field of fully homomorphic encryption involves programming a computation to be executed on a machine whose owner is (conjectured to be) computationally incapable of understanding what the computation does without the decryption key. If the decryption key is erased, does the computation still take place? What if you just erase 16 bits of the decryption key? Different answers are obviously correct to different people.

At the most basic level, you can view an N-bit AND operation as an N-bit OR operation by inverting your interpretation of the voltage levels, or an M-bit operation where M < N by ignoring some of the bits as noise. And both of these reinterpretations are actually useful and used on a daily basis by programmers and hardware designers.

You can solve any NP-complete problem with a SAT solver, by transforming the problem instance to a SAT instance, solving it with the SAT solver, and then transforming it back. With the dramatic recent progress in SAT solvers, this is actually the most efficient way to solve some hard computational problems, although SMT solvers are more popular for most problems. If you are observing just the SAT-solving part of the computation, you can't tell whether the problem being solved is a Hamiltonian cycle problem, a TSP problem, or what. At the end of the process, the interpretation of the SAT solver result is in some sense "in the eye of the beholder", even if the beholder uses another computer program to transform it into a form they prefer beholding.

Certainly it is possible to do computation with vortices interacting in water, and as you see above, there's legitimately some amount of interpretive latitude about exactly which computation is being carried out, at least in some cases. How wide should we consider that latitude to be? Should it be so wide that we should consider a natural waterfall to be playing a game of chess, or indeed any possible game of chess? That seems shocking and counterintuitive — but if not, we should come up with a good explanation of why not.

†Note that Stephen Wolfram obtained a court order providing prior restraint of the publication of Cook's proof, citing a nondisclosure agreement. Never sign a contract with Wolfram!

Do you refute Searle by just asking? Waterfalls play chess regardless the observer? Do my 4 apples compute their sum just by virtue of being 4?

The quote made by a poster referencing another party's paper on Searle about being not natural, an artifact of the observer, raises some cautions to me even if I get the drift.

I think man is part of nature, is nature, and therefore arguments couched from the 'supernatural' perspective are faulty from the get go.

Take the hexagonal pattern of honeycombs. Nobody really knows yet how bees actually pull it off. There are hypotheses about the pheromone or chemical trail fading, and circle-walking they carry out with the hexagaonal shape being an emergent property. The hexagonal shape also happens to be one of the 'Close Packing' methods, HCP, or Hexagonal Close Packing.

I am re-reading Cellular Automata: A Discrete Universe by Andrew Ilachinski published in 2001. I am on the chapter about possibly defining a new physics, "Information Physics", section 12.4 to be exact. Paraphrasing a bit, the term 'primordial information', or the notion that information exists independently of the semantics used to ascribe meaning to it. That "all observables found in nature are essentially 'data structures' that the universe uses to encode information with."

It goes on to remark that all computations are inherently physical processes (slide rule, electrons in computers) and that information is possibly more than just a concept, but one of the fundamental units woven into the very fabric of the universe.

I have become less of a reductionist as I get older, and I have been following Complexity Science since the 1980s/90s. I am starting to lean towards 'information' as not being just a concept, but something real and physical no matter how it is encoded or contained. I guess from a superficial point this puts me at odds with Searle, or the indirect quote of his works.

I'll have to read Searle; I've only learned of him via second-hand quotes, and his counters with Daniel Dennett.

I deeply appreciate your detailed response. Back to bees, they are natural and honeymaking is natural too, but only bees make honey. Something being natural does not imply some sort of universality in the form of natural law. Computation has a universal interpretation appeal to us, but it might also do so in the context of our observational reference frame.

If he's grounding his argument in an attempted revival of teleology (and it sounds like he is, via Aristotle's 'final cause')—the project sounds pretty doomed. Of course if he does manage to pull that off well, it would be /very/ interesting. Considering it's paywalled, though, I won't be looking into myself.

I'm not sure how you justify calling a "revival" of teleology doomed. It may have been discarded by the early moderns, but it was discarded without a genuine rebuttal ("new" metaphysical theories were simply proposed where teleology was dismissed). However, as Aristotelians and Thomists argue, efficient causality cannot be understood without final causality. Without final causality, it cannot be explained why certain causes produce certain effects and not others. Furthermore, biology is saturated with functional accounts of organisms. Dismissing these functional accounts as merely "useful fictions", as it is often done when the challenge is posed, undermines far more than the dismisser seems to realize, and raises more questions than it answers.

In the case of information as it is discussed in the paper, Feser first discusses the technical sense of "information" which is exactly the one Shannon is concerned with, i.e., the syntactic sense. The semantic content is provided by the observer which is to say it is not inherent in the physical states that are being used to represent bits (which itself is observer dependent). Following this line of thought, Searles argues that computational states are assigned to physical states and are thus as artifactual and conventional as chairs. The computationalist naturalists cannot rebut Searle's charge without invoking notions which are broadly Aristotelian in nature.

It's worth getting a hold of a copy of the paper if you can. In lieu of or in addition to the paper, some of his blog posts delve into related subjects that clarify these points. I also recommend Feser's books on related subjects which are known for their lucid prose and clear explanations. Good places to start are "Aquinas: A Beginner's Guide" and then "Scholastic Metaphysics: A Contemporary Introduction". Aquinas was decidedly Aristotelian and the scholastic philosophers broadly speaking were heavily influenced by Aristotle.

Three years prior, Shannon worked out many of the issues for his theory of information in his work on cryptography:

Shannon, Claude. “A Mathematical Theory of Cryptography.” Murray Hill, NJ: Bell Labs, September 1, 1945. http://www.cs.bell-labs.com/who/dmr/pdfs/shannoncryptshrt.pd....

I've just finished the history book "The Theory that would not die", as mentioned here a couple of weeks ago [1]. It's got some interesting stuff to say on how Shannon's work emerged from the cryptography scene. According to the book, Information Theory was the culmination of all sorts of pragmatic techniques that were developed during code breaking. The terminology and expression was different (eg. information was measured in bans rather than bits), but it was really interesting to read about how practice developed into theory.

[1] https://news.ycombinator.com/item?id=11983200

Not found. Did you mispaste the url?

Working alternative link with PDF here: https://www.iacr.org/museum/shannon45.html

Great to see this classic. But when I saw it was on worrydream.com I got my hopes up that it would be a link to a Bret Victor explanatory web experiment...

Looks like he's not quite ready to kill math?

There is a great Computerphile video on YT about this topic i can recommend:


(Link to playlist, as all of those are great.)

If this really interests you, you might also like Christopher Olah's blog post on Visual Information Theory [1]

[1]: https://colah.github.io/posts/2015-09-Visual-Information/

It's somewhat depressing how much smarter people seem to have been in the past than they seem to be now.

Well this is Claude Shannon, the "father of information theory" and "founder of digital circuit design theory", not your average grad student of back then, or today.


And Shannon himself might have been somewhat depressed by how much “smarter” his young self was. Richard Hamming:

“When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn't the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you. In fact I will give you my favorite quotation of many years. The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren't good afterwards, but they were superb before they got there and were only good afterwards.”

> The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren't good afterwards, but they were superb before they got there and were only good afterwards.

Regression to the mean guarantees this effect regardless of what goes on at the Institute for Advanced Study.

Stupid comment. The quality of work of a very good scientist is not a random variable with an average being the same as the population average. Some scientists are actually better than average. Stop trying to sound smart.

Now you could argue that IAS scientists were actually average, and they were just lucky enough to hit upon quality results before they went to IAS. In this case your comment could logically make sense. If that is what you believe, I will just assume that you know nothing about the world of research in the past 40 or so years (at least not in an area where IAS is world's best, like theoretical physics) during which the IAS produced some of the world's best research, as opposed to just having the scientists with the world's biggest reputations.

You have soundly demonstrated that you don't understand regression to the mean. The quality of work of a very good scientist is not a random variable with the average being the same as the population average. It is, however, a random variable with an idiosyncratic average, and that is sufficient to observe regression to the (individual) mean.

The line of reasoning only attempts to make sense if you're talking about population mean. If you're talking about individual mean, it could still be a win to hire such individual of he's still above population average.

People of today are just as smart. It was only simpler then. Math has grown in complexity and has caused mathematicians to become more specialized. So, in order to see the good stuff and the smart people of today, it requires going deep in to those specialized fields. It's a matter of accessibility.

> People of today are just as smart. It was only simpler then.

An additional effect is that when people learn the major past discoveries (e.g., calculus, mechanics), the version that is taught is often significantly different from how the discoverer cast it or how the idea was originally formulated. Years of teaching, refinement, and finding other techniques to solve problems or prove theorems makes it easier to learn. One example is "Maxwell's Equations" for electro-magnetism; the most commonly used modern form consisting of 4 equations was condensed from Maxwell's 20 equations by Oliver Heaviside:


If the original 20 equation form was more conducive to use and teaching, I suspect we those would be more widely used.

I think all is equal. Yes, you learn calculus in high school now, but we also have Geometric Algebra coming back since the 80s, which allows for succinct symbolic expression of geometrical calculations. This and the field of computation, which for the mathematician is like the telescope was for the astronomers of old.

I think the room for discovery, great discoveries, is still within reach as before. New tools, new concepts, and new challenges. I am still amazed at what could be hand calc'ed or graphed by some of these incredible mathematicians in the past. Amazing.

The fact that we have institutions and foundations like the Santa Fe Institute and The Long Now among many others, show the big questions are still being tackled, but of course, many of us while away in the trenches of technology, and its hard to see what the brightest are up to sometimes.

Could you give some examples in the field of Computer Science then perhaps?

Utter nonsense. People aren't, in general, dumber nowadays that they were in the past. It's just that the cutting edge keeps getting harder and harder. The stuff Newton and Einstein became famous for is now standard high school/undergraduate curriculum, while the stuff being done on the cutting edge nowadays needs 20 years of study to become able to contribute anything new, often ridiculously small and niche contributions.

Or is it just easier to identify the most important thinkers retrospectively, than in the moment?

For example let's ask the HN masses: who do people think is the smartest person alive today? What work have they done / are they doing?

I agree. In his own time, Bach wasn't very well known. Handel on the other hand was an international superstar. Now the tables have turned.

What gives you that impression? When Newton alive, no one knew calculus. He had to invent it (along with Leibniz).

Now we learn calculus in high school.

Everything we take for granted today must have been by definition been invented previously, so any list of who-invented-what contains Great People Of The Past but doesn't contain the rank and file of those who worked in obscurity and achieved little, leaving the impression that in the past folks were smarter. It is a form of survivorship bias.

That's because back then, academic people (i.e. university professors and researchers) REALLY cared about science per se, and not just putting out papers every few months to get promotion.

A must read! Don't despair if it takes you a while to read it, my guess is at least one month for someone with undergrad math understanding. It will be gratifying in the end.

It really is a great paper. I had an entire semester-long course on just this paper in undergrad. So many hours of lessons and instruction and studying and tutorials. And it wasn't clicking. I flunked the exam 3 times.

Then I tossed out my professor's material and sat down with Shannon's original paper for a weekend. Got an A.

Maybe that says more about my professor than about this paper, but it's a great paper.

This paper and lots of others are mentioned in the book The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal.

I'm reading it now, and it's wonderful. Thanks HN.

Edit: It's like a very entertaining index.

Also, the OpenCourseWare course 6.450 by Prof. Gallagher on YouTube can be helpful to digest the main ideas.

Shameless self-promotion: by chance, I happened to write a blog post today that's all about information theory:

"Measuring Information in Millibytes" https://engineering.vena.io/2016/07/12/measuring-information...

This is a pretty clever way to generate random bigrams without a computer:

To construct (3) for example, one opens a book at random and selects a letter at random on the page. This letter is recorded. The book is then opened to another page and one reads until this letter is encountered. The succeeding letter is then recorded. Turning to another page this second letter is searched for and the succeeding letter recorded, etc. A similar process was used for (4), (5) and (6). It would be interesting if further approximations could be constructed, but the labor involved becomes enormous at the next stage.

More neat stuff -- the redundancy of English text happens to be just right for making crossword puzzles!

The redundancy of a language is related to the existence of crossword puzzles. If the redundancy is zero any sequence of letters is a reasonable text in the language and any two-dimensional array of letters forms a crossword puzzle. If the redundancy is too high the language imposes too many constraints for large crossword puzzles to be possible. A more detailed analysis shows that if we assume the constraints imposed by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when the redundancy is 50%. If the redundancy is 33%, three-dimensional crossword puzzles should be possible, etc.

Completely random, I just had googled this before coming on Hacker News so that I could cite the original article. Great read if you haven't done so yet.

Trivia: this paper was renamed The Mathematical Theory of Communication the following year, after people realized just how significant it was.

And the year after that, simply _The Theory_.

> The significant aspect is that the actual message is one selected from a set of possible messages.

this is an amazing opener

does anyone know of the history of simultaneous possibility theory?

Is there a pedestrian explanation of this? My math is, shall we say, lacking.

Read James Gleick's The Information.

One of the best nonfiction books I've ever read and an accessible explanation of Shannon's work is a major part of the book.

No, there is no royal road to Geometry. Learn the math. Have patience with yourself; it will take a while. You'll be glad you did!

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact