
TherML – Thermodynamics of Machine Learning - selimthegrim
https://arxiv.org/abs/1807.04162
======
henripal
Self plug: If you enjoy this and want some (less technical) insight on
connections between ML and thermodynamics, you might enjoy my series of blog
posts:
[http://henripal.github.io/blog/stochasticdynamics](http://henripal.github.io/blog/stochasticdynamics)

~~~
SmooL
Oh I'm going to enjoy this, thank you!

~~~
henripal
Happy to talk some more about this :)

Another self plug (would have put it in first post but the video was just
uploaded):

Here's my SciPy 2018 talk on the subject (from last week!!)
[https://m.youtube.com/watch?v=WUs0u2PJ2UU&index=46&t=0s&list...](https://m.youtube.com/watch?v=WUs0u2PJ2UU&index=46&t=0s&list=PLYx7XA2nY5Gd-
tNhm79CNMe_qvi35PgUR)

------
I-like-food
Related paper/blog post by baez! It's quite interesting and very readable
(assuming a background in ugrad thermo). He ends up creating algorithmic
analogs to measures like temperature, energy etc much like in this.

[https://johncarlosbaez.wordpress.com/2010/10/12/algorithmic-...](https://johncarlosbaez.wordpress.com/2010/10/12/algorithmic-
thermodynamics/)

------
mlthoughts2018
Does anyone really find it surprising that you can do contortions around KL
divergence among candidate probabilistic models to the point of shoe-horning
some analogue of thermodynamic laws and equilibria into machine learning? I
don’t find this either surprising, illuminating, or interesting.

It reminds me of the “information geometry of boosting” section of [0].

[0]:
[https://pdfs.semanticscholar.org/2fad/679058e465fc07f942cfed...](https://pdfs.semanticscholar.org/2fad/679058e465fc07f942cfedd215cedbe09c6d.pdf)

~~~
danielmorozoff
Fundamentally, thermodynamics and information theory share a slew of similar
ideas. Because in a lot of ways information theory came out of a thermodynamic
thought process applied to signals. Even more recent work in ml like Boltzmann
machines are directly taken from this overlap. That being said stronger
theoretical connections are well worth the effort as these fields do have some
different ideas that if shown to be similar can yield discoveries and push our
understanding forward. Some thoughts that come to mind are compression, model
generalizability/ convergence.

~~~
kgwgk
Minor nitpick: Boltzmann machines are hardly recent. They were introduced in
the eighties.

~~~
danielmorozoff
Definitely. I meant more recent than the foundations of information theory

~~~
kgwgk
My bad, I understood "more recent work in ml" as being recent in ml terms, not
as work in ml which is more recent than the initial use (and abuse [1]) of
thermodynamic concepts in information theory.

[1] Shannon, according to Tribus and McIrvine: "My greatest concern was what
to call it. I thought of calling it ‘information’, but the word was overly
used, so I decided to call it ‘uncertainty’. When I discussed it with John von
Neumann, he had a better idea. Von Neumann told me, ‘You should call it
entropy, for two reasons: In the first place your uncertainty function has
been used in statistical mechanics under that name, so it already has a name.
In the second place, and more important, nobody knows what entropy really is,
so in a debate you will always have the advantage."

------
mlazos
This is why I enjoyed a computational probability class in college - the
professor introduced all of the concepts starting with a problem in physics, a
lot of methods in modern stats were built to explain physical behavior of
particles with certain constraints.

------
mrcactu5
Google's Data Centre literally generates heat. So anything we can do in this
direction, is helpful.

[https://deepmind.com/blog/deepmind-ai-reduces-google-data-
ce...](https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-
cooling-bill-40/)

~~~
selimthegrim
I’m not sure this is what the authors meant by their title.

~~~
mrcactu5
One of the authors is at Google, so it wouldn't surprise me. Thermodynamics is
clearly a metaphor, but information entropy they are studying be bounded by
the literal heat generated by their computations.

~~~
selimthegrim
That author was my classmate in undergrad, I used to run into him in the
computer lab. The point I was making was that I think he was interested more
in the physics parallels than any practical effects.

