They weren't. They were a generalization of the Hopfield networks. Boltzmann machines are a stochastic version of the Hopfield network. The training algorithm simply tries to minimize the KL divergence between the network activity and real data. So it was quite surprising when it turned out that the algorithm needed a "dream phase" as they call it. Francis Crick was inspired by this and proposed a theory of sleep.

 Unpacking a comment like this is one of the quieter pleasures of reading HN.
 Haha I'm not sure if you're being sarcastic so I'll try to unpack the comment. Hopfield networks were one of the first models of associative memory. They themselves were based on a model of simple magnets called ising model (generalized). Basically a group of binary units, each connected its nearest neighbors with a coupling strength. Each unit prefers to be like their neighbors. Hopfield developed a clever method to change the coupling so that the networks can store and retrieve patterns of activity. In the Hopfield network everything is deterministic, Hopfield himself realized that if this constraint was relaxed this model could become a very powerful computational machine. Which means that if instead of being always on or off, the units had a probability of being on or off the networks could perform very general computational tools [1]. Unfortunately, training these general stochastic systems was not easy. With their Boltzmann machines Sejnowski and Hinton proposed a possible solution. The activity of stochastic binary units effectively encodes a probability distribution, so all they had to do was make sure that the probability distribution being encoded by the activity of the units was the same as that of the input. They did this by changing the connection strengths between the units such that the activity pattern minimized something called the Kullback-Leibler or KL divergence, which is a measure of how close two probability distributions are (the one encoded by the network activity, or the dream activity of the network and the probability distribution of the real data e.g. a set of natural images). If two distributions match exactly then the KLd is zero and if not it's large. When they wrote out the math it turned out that the algorithm required two phases, an awake phase where the connections were changed according to the real data, and the sleep phase where the connections were pruned by the spontaneous activity of the network without any input (or dreams). This analogy got a lot of people excited, including Francis Crick and several others tried to test this idea in real brains, but we are still waiting for a convincing result.
 Wow, this is a really good explanation! Thank you. I wonder though, are RBMs being used in practice? Are they worth any future research?
 In applied machine learning not so much. They feel ancient! But some use them to study the physics of computation. They were used to make the connection between renormalization group (RG) and machine learning. RG is one of the main workhorses of quantum field theory and condensed matter physics. The fact that there's a mapping from RG to RBMs means that we can understand how deep learning works by using the same techniques that modern physicists use to understand the world! Here's a nice article on this topic if you're interested https://www.quantamagazine.org/20141204-a-common-logic-to-se...
 Couldn't agree more
 I don't disagree with this, but I also think that part of the interest in Hopfield networks and generalizing them to have stochastic outputs was because they seemed like a good model for how neurons might store memories or compute.

Search: