
Entropy: An Introduction - aishwarya_m
https://homes.cs.washington.edu/~ewein/blog/2020/07/14/entropy/
======
ethanweinberger
Hi HN, I'm the author of this piece (Ethan Weinberger). I wrote this
originally as a set of notes for myself when brushing up on concepts in
information theory the past couple of weeks. I found the presentations I was
reading of the material to be a little dry for my taste, so I tried to
incorporate more visuals and really emphasize the intuition behind the
concepts. Glad to see others are finding it useful/interesting! :)

~~~
spinningslate
Thanks, I enjoyed reading. As an electronic engineering student, I remember
grappling with information theory in the abstract: it was a weather example
very similar to yours that gave me the intuition I was missing.

An observation/suggestion. The intro is accessible to many people; that drops
off a steep cliff when you hit the maths. Now, I'm not complaining about that:
it's instructive and necessary to formalise things. Where I struggle is in
reading the equations in my head when I don't know what words to use for the
symbols. For example, that very first `X ~ p(x)`. I didn't know what to say
for the tilde character, so couldn't verbalise the statement. I do know that
$\in$ (the rounded 'E') means 'is a member of' so I could read the next
statement. The problem gets even more confusing for a non-mathematician as the
same symbol is used with different meaning in different branches of
maths/science (e.g. $\Pi$).

I get that writing out every equation in English isn't feasible (or, at least,
is asking a lot of the writer). But I wonder if there's middle way, e.g.
through hyperlinking?

As I say: not a criticism and I don't have a good solution. Just an
observation from a non-mathematician. Enjoyed the piece anyway.

~~~
jessriedel
"X ~ p(x)" means "X is a random variable drawn from the probability
distribution p(x)" or maybe "X is drawn from p(x)" for short.

Are you sure it's a matter of knowing what to _say_ (in your head) vs knowing
the definition of the notation in the first place? I am pretty familiar with
this notation, but I rarely verbalize it mentally. I can tell because I read
and understand it quickly without problem, but on the rare occasion when I
have to read it aloud I realize I'm not sure how I should pronounce it.

~~~
spinningslate
Thanks for the explanation.

Agree it's more "say in my head" than "speak out loud". But I still need to
know what to say - internally or externally. Without knowing that ~ denotes
"drawn from", all I can say is "X tilde p of x". That has no semantics; no
intuition. Whereas knowing that $\in$ means "is a member of", I can read "x
\in X" as "x is a member of X".

> but I rarely verbalize it mentally

Neither do I when I know something well. For example, I don't explicitly
verbalise "is a member of" now, even internally. There's a shortcut hard-wired
in that understands it without needing to pronounce it explicitly. In fact
that short cut goes beyond the syntax: it goes straight to the intuition of "x
represents any member of the set X". But I had to go through the process of
saying it on the way to the shortcut.

~~~
jessriedel
OK, but if you know the formal definition, and you're not reading it out loud,
why not just make something up? I actually don't know whether "is drawn from"
is the "correct" way to pronounce the tilde. I think maybe other people say
"is distributed as".

------
Analog24
Might be worth clarifying in the title that this is about entropy in the
context of information theory.

~~~
jbay808
In which context is it a different concept?

~~~
hexxiiiz
In thermodynamics there are two other formulations of entropy: the Clausius
one in terms of temperature and heat, and the Boltzmann one. The latter
defines entropy as the log of the number of microstates a system could be in a
particular macrostate.

The Shannon definition is equivalent to the Boltzmann def only in the case
that the micro state consists of infinitely many identical subsystems. If
there are only finitely many, for instance, the log of the quantity does not
correspond to the same "-p log p".

The Clausius def can be derived from the Boltzmann one, but they are
nevertheless also distinct formulations.

~~~
jbay808
[https://en.wikipedia.org/wiki/Boltzmann%27s_entropy_formula#...](https://en.wikipedia.org/wiki/Boltzmann%27s_entropy_formula#Generalization)

According to Wikipedia, if you start with the Gibbs entropy (which is the same
as Shannon entropy), and then assume all microstate probabilities are equal
(which Boltzmann does), you get the Boltzmann entropy formula. It also says
Boltzmann himself used a p ln(p) formulation.

So aren't they the same, perhaps up to a constant factor?

~~~
hexxiiiz
If you count the number of microstates for a given macrostate you get a hyper
geometric number N!/(n_1!n_2!...) The log of this is the Boltzmann entropy.
However, if you consider N to be very large or infinite, you can show using
the Stirling approximation that this ends up being the Gibbs/Shannon entropy
in that case. So, in general, no.

------
cmehdy
I imagine the timing of this post is correlated with release of the
documentary Bit Player about Claude Shannon. Haven't seen it yet but looking
forward to it.

The article does a decent job at graphing and laying out some of the concepts
of entropy for information theory, but I'm not sure who the target reader is,
since prereqs are perhaps only slightly narrower than what one needs to read
Shannon's paper[0] and the article is really illustrating only a fraction of
the concept.

It can perhaps work as a primer for what shows up starting on pages 10-11 of
the original document, in any case, provided you grasp the mathematical
definition of entropy through thermodynamics, and the microstates-based
definition through Boltzman, as well as "basic probabilities" (expected value,
typical discrete distributions, terms like "i.i.d"), you should be good to go.
But then you might already know all this..

And if you do, and you like what you read, then the full original thing by
Shannon is a delight to explore to truly grasp what has been so foundational
to a lot of things since 1948.

[0]
[http://people.math.harvard.edu/~ctm/home/text/others/shannon...](http://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf)

------
abetusk
Not sure if it's appropriate but here's my own take at a very terse
restatement of Shannon's original paper [1]:

    
    
        https://mechaelephant.com/dev/Shannon-Entropy/
    

I recommend everyone that's interested to read Shannon's original paper. It's
one of the few examples of an original paper that's both clear and readable.

[1]
[https://homes.cs.washington.edu/~ewein/blog/2020/07/14/entro...](https://homes.cs.washington.edu/~ewein/blog/2020/07/14/entropy/)

------
wsowens
For the example of H(X) where X ~ Geom(p), shouldn't the second term be
multiplied by (k-1) after breaking up the logarithm? That is, shouldn't

    
    
      pq^(k-1)log(q)
    

be

    
    
      (k-1)pq^(k-1)log(q) 
    

Apologies if I'm off base here, I'm pretty rusty on infinite series.

Great piece, this is really making this concept more intuitive for me.

~~~
ethanweinberger
You're 100% right. Fixed!

------
danielrk
Love the post. Just FYI, your post is not mobile-friendly. When scrolling down
on iPhone it’s impossible not to accidentally shift the viewport away from the
left margin making the left side hard to read

------
yters
Neat fact: entropy and expected algorithmic information are asymptotically
equivalent. Two quite different approaches to information theory converge.

------
narayan_s
Possibly unrelated, but I love the simple an clean layout of the website. Does
anyone know how to create something similar to this?

~~~
ethanweinberger
I (shamelessly) stole the layout from Gregory Gundersen's wonderful research
blog
([https://gregorygundersen.com/blog/](https://gregorygundersen.com/blog/)). He
has instructions on how to replicate it here
[https://gregorygundersen.com/blog/2020/06/21/blog-
theme/](https://gregorygundersen.com/blog/2020/06/21/blog-theme/)

