A few comments have mentioned neural nets in this post. adamnemecek mentions in this thread that PGMs are a superset of neural networks, and
and Thomas Wiecki has a few excellent blog posts on creating bayesian neural networks using pymc3.[0][1][2] If you're curious about how these two concepts can be brought together I highly recommend reading through these three posts.
No, back-propagation would not give full Bayesian inference (although there are some tricks [0]). They instead use variational inference[1], which allows for fast inference of continuous PGMs.
A heuristic guide...If you have more data than knowledge about the domain, and you want to do classification/prediction, then NNs are a good.
PGMs are good if you have knowledge that is important to encode; you need modularity i.e. you don't want to embed priors in the model; you have strong causal relationships e.g. diagnostic models with 'explaining away'; you want to integrate value of information e.g. what test should I do next to resolve uncertainty.
You can mix ML with knowledge in PGMs. The downside is the computational complexity of inference which is NP-hard for both exact and approximate inference, although you can identify the complexity of the model at design time and make the appropriate modelling trade-offs.
by complexity of inference do you mean the complexity of learning the structure of a model ? because inference on an existing PGM is linear in the number of edges with belief-propagation isn't it ?
cycles can be handled in two ways: if you are happy with approximate solutions, loopy BP can give that (still linear, but may take longer and there's parameter tuning), for exact solutions you can rewrite the graph to "carry" dependencies (latest paper by Frey)
There's an excellent course on PGM by Koller on Coursera. My friend took it and now he's a PGM evangelist. If you are wondering where PGM lies in the spectrum of machine learning, you should research the difference between generative and discriminate modeling. We have been driven to PGM to solve our ML problem that was hard to frame as A NN. Mainly because we had some priors we needed to encode to make the problem tractable. It reminds me a little of heuristics in search.
The hot use-case of the PGMs approach has often been in the discriminative setting (see M^3 and latent SVM) - which is good because discriminative classifiers work well with fewer data points (see Ng, Russel).
What is the connection other than "they both have graphs somewhere in them"?
I sort of see what you mean since NNs transport information across a graph in a straightforward (non-loopy) way, and PGMs can propagate information in crazy (neverending loops) ways when doing belief propagation...
From my limited understanding, pure NNs are not able to express confidence in predictions. By adding probability to the NN, we can have both predictions and confidence scores. In practice, noise is being injected, or connections dropped out randomly, then predictions averaged over multiple runs.
Jordan: Well, humans are able to deal with cluttered scenes. They are able to deal with huge numbers of categories. They can deal with inferences about the scene: “What if I sit down on that?” “What if I put something on top of something?” These are far beyond the capability of today’s machines. Deep learning is good at certain kinds of image classification. “What object is in this scene?”
I think Jordan refers here to Bayesian models that incorporate gravity, occlusion, and other such concepts.
[0] http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learn...
[1] http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...
[2] http://twiecki.github.io/blog/2017/03/14/random-walk-deep-ne...