TherML – Thermodynamics of Machine Learning

henripal · on July 15, 2018

Self plug: If you enjoy this and want some (less technical) insight on connections between ML and thermodynamics, you might enjoy my series of blog posts: http://henripal.github.io/blog/stochasticdynamics

SmooL · on July 15, 2018

Oh I'm going to enjoy this, thank you!

henripal · on July 16, 2018

Happy to talk some more about this :)

Another self plug (would have put it in first post but the video was just uploaded):

Here's my SciPy 2018 talk on the subject (from last week!!) https://m.youtube.com/watch?v=WUs0u2PJ2UU&index=46&t=0s&list...

I-like-food · on July 15, 2018

Related paper/blog post by baez! It's quite interesting and very readable (assuming a background in ugrad thermo). He ends up creating algorithmic analogs to measures like temperature, energy etc much like in this.

https://johncarlosbaez.wordpress.com/2010/10/12/algorithmic-...

mlthoughts2018 · on July 15, 2018

Does anyone really find it surprising that you can do contortions around KL divergence among candidate probabilistic models to the point of shoe-horning some analogue of thermodynamic laws and equilibria into machine learning? I don’t find this either surprising, illuminating, or interesting.

It reminds me of the “information geometry of boosting” section of [0].

[0]: https://pdfs.semanticscholar.org/2fad/679058e465fc07f942cfed...

danielmorozoff · on July 15, 2018

Fundamentally, thermodynamics and information theory share a slew of similar ideas. Because in a lot of ways information theory came out of a thermodynamic thought process applied to signals. Even more recent work in ml like Boltzmann machines are directly taken from this overlap. That being said stronger theoretical connections are well worth the effort as these fields do have some different ideas that if shown to be similar can yield discoveries and push our understanding forward. Some thoughts that come to mind are compression, model generalizability/ convergence.

kgwgk · on July 15, 2018

Minor nitpick: Boltzmann machines are hardly recent. They were introduced in the eighties.

danielmorozoff · on July 15, 2018

Definitely. I meant more recent than the foundations of information theory

kgwgk · on July 15, 2018

My bad, I understood "more recent work in ml" as being recent in ml terms, not as work in ml which is more recent than the initial use (and abuse [1]) of thermodynamic concepts in information theory.

[1] Shannon, according to Tribus and McIrvine: "My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons: In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage."

WhitneyLand · on July 15, 2018

Does anyone? What % of people make any connection at all between the fields?

Firstly this is not a CS or Physics only forum so I would guess the subject could easily have never come up for many (if so for the simpler/non-ML connections see https://en.wikipedia.org/wiki/Entropy_in_thermodynamics_and_...).

Secondly I think general awareness beyond formal education has increased in a non-linear way only over the last 20 years only because there has been so much science press around the celebrity of Steven Hawking.

His celebrity alone raises awareness, however it was compounded by drama depicted between himself and others and some really interesting observations along the way (https://en.wikipedia.org/wiki/Holographic_principle).

When all of this publicity has various inevitable Kevin Bacon trails back to the fundamental connections, I’d say the population is not only not everyone but could be much less if things unfolded a little differently.

mlthoughts2018 · on July 15, 2018

I mean it more from a Joel Spolsky sort of too much abstraction point of view, e.g. from [0]:

> “When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already. Then they go up one more level: people send files, but web browsers also “send” requests for web pages. Those are both sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it’s getting really vague and nobody really knows what they’re talking about any more.

And if you go too far up, abstraction-wise, you run out of oxygen. Sometimes smart thinkers just don’t know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don’t actually mean anything at all.”

[0]: < https://www.joelonsoftware.com/2000/07/22/microsoft-goes-bon... >

WhitneyLand · on July 15, 2018

I can relate to your point more in that way. Joel has a gift for exlplaining intangibles. Agree with his point also having had to spend time debating whether an abstraction choice adds value.

Still worth noting I think though that he does say “sometimes”, because no one is going anywhere without abstractions or generalizations. The problem is they’re hard to do really well and we know historically really smart people can make small mistakes and fall into the not cogent/no value/Joel trap #234.

elcritch · on July 16, 2018

That seems to be a poor analogy to relate in this case. I’m not going to claim that the parent article is factual or even useful but at first glance it seems interesting.

In general the difference between a tinkerer and an engineer or scientist is that the latter uses appropriate mathematical models to garner deeper insight into problems than available to general tinkerers and "smart thinkers". If the linked article succeeds in linking any of the concepts of thermodynamics — which is one of the core building blocks of modern physics and relevant to almost every field of practical engineering — to DL models then that’d be huge boon. There are a lot of well honed techniques and tools available in the thermodynamics and statistical physics toolboxes which could vastly improve the ability of DL to be tuned for useful tasks in a scientific manner. Most application of DL appears to be "dark arts" or alchemy phase where only intuition is used to direct what problems can and should be tackeled using DL. It seems every other paper published in DL is just happenstance.

Joel Spolsky‘s comment really fits more with smart "tinkerers" but not serious attempts at modeling of the energy minimization aspects of restricted Boltzmann machines (ahem DL) which really do borrow a lot from thermodynamics [0]. I’m glad "smart thinkers" like Geoffrey Hintonmidday know when to stop [1], and kept generalizing and tweaking Boltzmann machines. Heck even the idea of applying energy minimization techniques from the phase changes of spin glasses is one of those applications of abstract high level models that really don’t mean anything. Except when evenetually they provide entirely new insights and, well fields of study. Of course there’s a difference between serious attempts and hard work of establishing mathematical and conceptual correspondents between fields and just quackery or pseudo-sophistication. Given the amount of formula development in the linked paper I’ll take it as a good faith attempt to seriously link fields.

0: https://en.m.wikipedia.org/wiki/Boltzmann_machine#History —- especially the first paper on spin glasses 1: https://medium.com/@andreykurenkov/a-brief-history-of-neural...

mlthoughts2018 · on July 16, 2018

> “If the linked article succeeds in linking any of the concepts of thermodynamics — which is one of the core building blocks of modern physics and relevant to almost every field of practical engineering — to DL models then that’d be huge boon. There are a lot of well honed techniques and tools available in the thermodynamics and statistical physics toolboxes which could vastly improve the ability of DL to be tuned for useful tasks in a scientific manner. ”

I think this is just wrong. Firstly, statistics has already been linked with thermodynamics for a long, long time, and people already have been using ideas about thermodynamic laws to describe potential functions and algorithms like Hybrid Monte Carlo and simulated annealing, even borrowing convergence metrics for things like parallel tempering.

When that connection is practical, it is about explicit, case by case algorithms, and not at all about recasting the generic interpretation of models into some other thermodynamic framework, which is just a type of cute toy analysis.

I would say specifically the example of Boltzmann machines highlights this exact point between a result that has pragmatic implications and a result that is just playing around with definitions.

orbifold · on July 15, 2018

Whether you find something like this surprising largely depends on your prior knowledge. There are quite different sets of people that work with statistics, theoretical physicists are one of them. So a recent advance in one field (machine learning), will inevitably draw in the hyenas from other fields, especially if one of their favorite toys (building increasingly complicated high energy particle physics models) has been largely taken away from them given the recent zero results at the LHC. Which is how you get review articles like this one (https://arxiv.org/abs/1803.08823).

kgwgk · on July 16, 2018

Thanks for the link, it seems a nice review.

c3534l · on July 16, 2018

I doubt even 1% of the population can decode that sentence, let alone find it so blindingly obvious it need not be said.

mlthoughts2018 · on July 16, 2018

It seems clear that this paper (and many papers like it that try to produce higher level abstractions casting one field’s concepts into theorems of some other field) is only meant for that 1% you mention. In fact this is why I don’t get excited about these sorts of results. For people outside the main fields, it ends up sounding like some dramatic philosophical discovery, but for people in the field it’s borderline meaningless. If you play around with definitions long enough, you can always find some notion of X that looks like “equilibrium” or “convergence” or “entropy” or “orthogonality” or some decomposition or some transformation for some other concept Y. Finding these results is almost always just about rejiggering definitions so that something mundane can be reinterpreted as something significant “in the right space.”

gog-ma-gog · on July 15, 2018

...yes? Why is it not surprising?

mlazos · on July 15, 2018

This is why I enjoyed a computational probability class in college - the professor introduced all of the concepts starting with a problem in physics, a lot of methods in modern stats were built to explain physical behavior of particles with certain constraints.

mrcactu5 · on July 15, 2018

Google's Data Centre literally generates heat. So anything we can do in this direction, is helpful.

https://deepmind.com/blog/deepmind-ai-reduces-google-data-ce...

selimthegrim · on July 15, 2018

I’m not sure this is what the authors meant by their title.

mrcactu5 · on July 16, 2018

One of the authors is at Google, so it wouldn't surprise me. Thermodynamics is clearly a metaphor, but information entropy they are studying be bounded by the literal heat generated by their computations.

selimthegrim · on July 16, 2018

That author was my classmate in undergrad, I used to run into him in the computer lab. The point I was making was that I think he was interested more in the physics parallels than any practical effects.