
Visual Information Theory - benkuhn
http://colah.github.io/posts/2015-09-Visual-Information/
======
sebastos
I really enjoyed this article, it was very accessible. Recently I've been
studying some literature on stochastic optimal control, and I've bumped into
the KL-divergence concept a number of times, but never really understood it. I
expected this article be a fun read, but I never expected to learn something
so directly useful! Information theory really does show up everywhere.

~~~
diego898
I am currently learning about stochastic optimal control, and am finding the
lecture notes [1] for this course [2] to be extremely helpful. The notes are
by a probabilist Ramon Van Handel at Princeton. I hope you find them useful!

[1]
[https://www.princeton.edu/~rvan/acm217/ACM217.pdf](https://www.princeton.edu/~rvan/acm217/ACM217.pdf)
[2]
[https://www.princeton.edu/~rvan/acm217/acm217.html](https://www.princeton.edu/~rvan/acm217/acm217.html)

~~~
sebastos
Thanks for the link, it looks great!

------
tlb
I often avoid these sort of visual methods, because:

\- they only work with 2 variables, but many interesting problems require
digging into more variables

\- they only work with medium-sized numbers, and aren't readable when P<0.01
or P>0.99

So they're great for gaining intuition (like the Simpson's Paradox example),
but when you try to solve a real problem you find yourself boxed in.

~~~
colah3
Yep. The visualization tricks in this article are for building understanding
of basic ideas in probability theory and information theory.

In most real situations, they wouldn't be very practical. As you note, the
core trick in this essay only works for 2 or 3 variables, assumes they're
discrete, and doesn't scale to the variables having lots of values or really
improbable values.

There are visualization techniques which are useful in the real world, at
least some of the time -- a lot of my blog explores this in the context of
neural networks -- but that wasn't my goal in this article.

------
escherize
After careful consideration I've always enjoyed how:

    
    
        p(rain,coat) = p(rain) * p(coat | rain)
    

Can be pronounced: "the probability of rain, and coat (wearing) is the
probability of rain times the probability of (my wearing a) coat given rain".
This intuitively showcases how the order of independent events doesn't effect
the outcome, since after all:

    
    
        p(coat,rain) = p(rain,coat) = p(rain) * p(coat | rain)

~~~
jsprogrammer
The problem with that example is that there is no reason to assume that coat
wearing and rain are independent (in fact, you have even modeled that wearing
a coat is partially dependent on it raining).

Maybe I missed your point?

~~~
escherize
No, you're right those events shouldn't be called independent!

A defintition of independent events is:

    
    
        A and B are independent events iff P(A|B) = P(A) and P(B|A) = P(B)
    

So that was just plain wrong.

------
FrankenPC
"I love the feeling of having a new way to think about the world. I especially
love when there’s some vague idea that gets formalized into a concrete
concept. Information theory is a prime example of this."

THIS! THIS is why I love programming and electronics/mechanical engineering. I
live for the new ways to think.

~~~
colah3
I feel like 90% of my motivation for writing blog posts is vicariously
reliving this feeling. :)

~~~
FrankenPC
Personally, I feel like a moron most of the time when I'm here on HN. The
caliber of engineers here is astounding. So, I comment here hoping for
clarification. I'm nowhere near the level of engineering necessary to feel
confident to blog.

EDIT: I understand optimizing search algorithms to create better P=NP O(n)
solutions. So, I guess I not totally stupid.

~~~
nazka
It shouldn't be a problem, if you want to write something just do it!

Just avoid to write statements about things you don't know. Start your article
by saying that you are a beginner and for things you don't know, give open
questions instead. You can also learn one simple thing well and do a post
about it, or you can try something new and write a tutorial about it with a
few conclusions of your experience. For instance how to do a simple web app
with React, Flux, and Node.js.

So there are still things you can write, just be open about your level and
what you don't know, and write about what you learned. And even trivial things
can be useful for others (like simple stats for devs, or simple python code
for data scientists).

------
ljk
Off-topic, but does anyone know what was used to draw the graphs? They look
really clean

I'm guessing LaTeX? The font looked like LaTex's font

~~~
colah3
I drew the graphs in inkscape. It has a plugin for LaTeX equations.

------
misiti3780
i like forward to every article colah writes - the way he explained
backpropagation a few weeks ago was really interesting - never thought about
it that way but was very helpful!

------
incompatible
It rains 25% of the time in California? Sounds like an unpleasant place.

~~~
colah3
We wish! We're actually in a drought where I live. I also don't think I wear a
coat 75% when it is sunny. :)

But I wanted to have nice numbers and it felt like a nice example.

------
hackaflocka
The visual cortex is one of the largest and most powerful cortices in the
human brain.

But it may be that vision's supposed to work in conjunction with the other
senses.

I think visual explanations work well for very simple visuals. As soon as
higher order factors need to be factored-in, visual explanations are only
sensible to the highly trained expert (think Feynman diagrams).

Nice essay, nevertheless. A lot of time and work went into it, and I can
appreciate that.

