Hacker News new | past | comments | ask | show | jobs | submit login
An Introduction to Probabilistic Graphical Models (2003) [pdf] (eecs.berkeley.edu)
214 points by scvalencia on March 29, 2017 | hide | past | favorite | 35 comments



A few comments have mentioned neural nets in this post. adamnemecek mentions in this thread that PGMs are a superset of neural networks, and and Thomas Wiecki has a few excellent blog posts on creating bayesian neural networks using pymc3.[0][1][2] If you're curious about how these two concepts can be brought together I highly recommend reading through these three posts.

[0] http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learn...

[1] http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...

[2] http://twiecki.github.io/blog/2017/03/14/random-walk-deep-ne...


Do they train with backpropagation efficiently?


No, back-propagation would not give full Bayesian inference (although there are some tricks [0]). They instead use variational inference[1], which allows for fast inference of continuous PGMs.

[0] http://mlg.eng.cam.ac.uk/yarin/blog_2248.html

[1] https://arxiv.org/abs/1603.00788


Most variational inference are not full-Bayesian as well...


PGM's are great, but my experience from Koller's course is that it is very hard to identify cases where they can be used.

Part of the reason is that you need a-priori knowledge of the causal relationships (coarse grained I.e direction) between your variables.

Presumably if you're doing ML you don't know those causal relationships to begin with.

Particularly good fits are things like physics where laws are known.


A heuristic guide...If you have more data than knowledge about the domain, and you want to do classification/prediction, then NNs are a good.

PGMs are good if you have knowledge that is important to encode; you need modularity i.e. you don't want to embed priors in the model; you have strong causal relationships e.g. diagnostic models with 'explaining away'; you want to integrate value of information e.g. what test should I do next to resolve uncertainty.

You can mix ML with knowledge in PGMs. The downside is the computational complexity of inference which is NP-hard for both exact and approximate inference, although you can identify the complexity of the model at design time and make the appropriate modelling trade-offs.


by complexity of inference do you mean the complexity of learning the structure of a model ? because inference on an existing PGM is linear in the number of edges with belief-propagation isn't it ?


It's linear in singly connected networks, not for multiply connected graphs. However NP-hard is the worst case as I mentioned (p288 Koller & Friedman)


by multiply connected graphs do you mean graphs with cycles ?


Yes, although in directed acyclic graphs the 'cycles' manifest as multiple paths to a node.


cycles can be handled in two ways: if you are happy with approximate solutions, loopy BP can give that (still linear, but may take longer and there's parameter tuning), for exact solutions you can rewrite the graph to "carry" dependencies (latest paper by Frey)


Could you link to that paper? And does it have anything to do with the junction tree algorithm?


http://www.psi.toronto.edu/~psi/pubs2/1999%20and%20before/13...

I don't know about junction trees but it probably connects as junction trees are a generalization of factor graphs


Thanks!


the Christopher Bishop chapter on graphical models has a good section on junction trees IIRC


> Part of the reason is that you need a-priori knowledge of the causal relationships (coarse grained I.e direction) between your variables.

Isn't there a whole book, titled Causality by Dr. Pearl, on teasing causal relationships out of data, explicitly for this purpose?


This is the best textbook on graphical models, also from Jordan but later (2008): https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_F.... It also covers some general theory of variational inference. Source: I worked on PGMs in grad school.


This course also referred M.I Jordan book: http://imagine.enpc.fr/~obozinsg/teaching/mva_gm/fall2016/

One of the best course I have ever taken, F. Bach and G. Obozinski are incredible teachers.


I couldn't agree more! This course features has slides, which many will prefer.


There's an excellent course on PGM by Koller on Coursera. My friend took it and now he's a PGM evangelist. If you are wondering where PGM lies in the spectrum of machine learning, you should research the difference between generative and discriminate modeling. We have been driven to PGM to solve our ML problem that was hard to frame as A NN. Mainly because we had some priors we needed to encode to make the problem tractable. It reminds me a little of heuristics in search.

The person I'm talking to: an early ML student.


The hot use-case of the PGMs approach has often been in the discriminative setting (see M^3 and latent SVM) - which is good because discriminative classifiers work well with fewer data points (see Ng, Russel).


Good article: "Big-data boondoggles and brain-inspired chips are just two of the things we’re really getting wrong" - Michael I. Jordan ref: http://spectrum.ieee.org/robotics/artificial-intelligence/ma...


PGM seems to me harder than neural nets, but the trend in the last couple of years is to include probabilities in neural nets, so they're hot.


I used "Probabilistic Graphical Models" By Koller/Friedman

[0] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...


Not an expert but PGMs are mostly a superset of NN, so it's kinda understandable.


What is the connection other than "they both have graphs somewhere in them"?

I sort of see what you mean since NNs transport information across a graph in a straightforward (non-loopy) way, and PGMs can propagate information in crazy (neverending loops) ways when doing belief propagation...


From my limited understanding, pure NNs are not able to express confidence in predictions. By adding probability to the NN, we can have both predictions and confidence scores. In practice, noise is being injected, or connections dropped out randomly, then predictions averaged over multiple runs.


Sure they can, many output probabilities: http://deeplearning.net/tutorial/mlp.html#mlp


Not just graphs but also probability.


They are not.


From: http://spectrum.ieee.org/robotics/artificial-intelligence/ma...

Jordan: Well, humans are able to deal with cluttered scenes. They are able to deal with huge numbers of categories. They can deal with inferences about the scene: “What if I sit down on that?” “What if I put something on top of something?” These are far beyond the capability of today’s machines. Deep learning is good at certain kinds of image classification. “What object is in this scene?”

I think Jordan refers here to Bayesian models that incorporate gravity, occlusion, and other such concepts.

http://www.cv-foundation.org/openaccess/content_cvpr_2013/ht... e.g. postulates entire humans to improve scene understanding.

What I get out of this: Deep learning has to be enriched with progress from other machine learning fields


Nice high level talk on Statistical Inference for Big Data by Jordan. Been one of my favorites since his LDA/PLSA papers in 2003 with Andrew NG. http://videolectures.net/colt2014_jordan_bigdata/?q=jordan



Note there's an exploding literature that reads these models as causal models: http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf


Michael I. Jordan was a mentor to Andrew Ng. Probabilistic Graphical Models are the next frontier in AI after deep learning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: