
An Introduction to Probabilistic Graphical Models (2003) [pdf] - scvalencia
https://people.eecs.berkeley.edu/~jordan/prelims/
======
diab0lic
A few comments have mentioned neural nets in this post. adamnemecek mentions
in this thread that PGMs are a superset of neural networks, and and Thomas
Wiecki has a few excellent blog posts on creating bayesian neural networks
using pymc3.[0][1][2] If you're curious about how these two concepts can be
brought together I highly recommend reading through these three posts.

[0] [http://twiecki.github.io/blog/2016/06/01/bayesian-deep-
learn...](http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/)

[1] [http://twiecki.github.io/blog/2016/07/05/bayesian-deep-
learn...](http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/)

[2] [http://twiecki.github.io/blog/2017/03/14/random-walk-deep-
ne...](http://twiecki.github.io/blog/2017/03/14/random-walk-deep-net/)

~~~
ganfortran
Do they train with backpropagation efficiently?

~~~
twiecki
No, back-propagation would not give full Bayesian inference (although there
are some tricks [0]). They instead use variational inference[1], which allows
for fast inference of continuous PGMs.

[0]
[http://mlg.eng.cam.ac.uk/yarin/blog_2248.html](http://mlg.eng.cam.ac.uk/yarin/blog_2248.html)

[1] [https://arxiv.org/abs/1603.00788](https://arxiv.org/abs/1603.00788)

~~~
cttet
Most variational inference are not full-Bayesian as well...

------
platz
PGM's are great, but my experience from Koller's course is that it is very
hard to identify cases where they can be used.

Part of the reason is that you need a-priori knowledge of the causal
relationships (coarse grained I.e direction) between your variables.

Presumably if you're doing ML you don't know those causal relationships to
begin with.

Particularly good fits are things like physics where laws are known.

~~~
mamp
A heuristic guide...If you have more data than knowledge about the domain, and
you want to do classification/prediction, then NNs are a good.

PGMs are good if you have knowledge that is important to encode; you need
modularity i.e. you don't want to embed priors in the model; you have strong
causal relationships e.g. diagnostic models with 'explaining away'; you want
to integrate value of information e.g. what test should I do next to resolve
uncertainty.

You can mix ML with knowledge in PGMs. The downside is the computational
complexity of inference which is NP-hard for both exact and approximate
inference, although you can identify the complexity of the model at design
time and make the appropriate modelling trade-offs.

~~~
wvbeaaoo
by complexity of inference do you mean the complexity of learning the
structure of a model ? because inference on an existing PGM is linear in the
number of edges with belief-propagation isn't it ?

~~~
mamp
It's linear in singly connected networks, not for multiply connected graphs.
However NP-hard is the worst case as I mentioned (p288 Koller & Friedman)

~~~
wvbeaaoo
by multiply connected graphs do you mean graphs with cycles ?

~~~
mamp
Yes, although in directed acyclic graphs the 'cycles' manifest as multiple
paths to a node.

~~~
wvbeaaoo
cycles can be handled in two ways: if you are happy with approximate
solutions, loopy BP can give that (still linear, but may take longer and
there's parameter tuning), for exact solutions you can rewrite the graph to
"carry" dependencies (latest paper by Frey)

~~~
Xcelerate
Could you link to that paper? And does it have anything to do with the
junction tree algorithm?

~~~
wvbeaaoo
[http://www.psi.toronto.edu/~psi/pubs2/1999%20and%20before/13...](http://www.psi.toronto.edu/~psi/pubs2/1999%20and%20before/134.pdf)

I don't know about junction trees but it probably connects as junction trees
are a generalization of factor graphs

~~~
Xcelerate
Thanks!

------
tachim
This is the best textbook on graphical models, also from Jordan but later
(2008):
[https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_F...](https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf).
It also covers some general theory of variational inference. Source: I worked
on PGMs in grad school.

------
JustFinishedBSG
This course also referred M.I Jordan book:
[http://imagine.enpc.fr/~obozinsg/teaching/mva_gm/fall2016/](http://imagine.enpc.fr/~obozinsg/teaching/mva_gm/fall2016/)

One of the best course I have ever taken, F. Bach and G. Obozinski are
incredible teachers.

~~~
aflam
I couldn't agree more! This course features has slides, which many will
prefer.

------
BucketSort
There's an excellent course on PGM by Koller on Coursera. My friend took it
and now he's a PGM evangelist. If you are wondering where PGM lies in the
spectrum of machine learning, you should research the difference between
generative and discriminate modeling. We have been driven to PGM to solve our
ML problem that was hard to frame as A NN. Mainly because we had some priors
we needed to encode to make the problem tractable. It reminds me a little of
heuristics in search.

The person I'm talking to: an early ML student.

~~~
throwaway18974
The hot use-case of the PGMs approach has often been in the discriminative
setting (see M^3 and latent SVM) - which is good because discriminative
classifiers work well with fewer data points (see Ng, Russel).

------
KasianFranks
Good article: "Big-data boondoggles and brain-inspired chips are just two of
the things we’re really getting wrong" \- Michael I. Jordan ref:
[http://spectrum.ieee.org/robotics/artificial-
intelligence/ma...](http://spectrum.ieee.org/robotics/artificial-
intelligence/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-
data-and-other-huge-engineering-efforts)

------
visarga
PGM seems to me harder than neural nets, but the trend in the last couple of
years is to include probabilities in neural nets, so they're hot.

~~~
adamnemecek
Not an expert but PGMs are mostly a superset of NN, so it's kinda
understandable.

~~~
ilzmastr
What is the connection other than "they both have graphs somewhere in them"?

I sort of see what you mean since NNs transport information across a graph in
a straightforward (non-loopy) way, and PGMs can propagate information in crazy
(neverending loops) ways when doing belief propagation...

~~~
visarga
From my limited understanding, pure NNs are not able to express confidence in
predictions. By adding probability to the NN, we can have both predictions and
confidence scores. In practice, noise is being injected, or connections
dropped out randomly, then predictions averaged over multiple runs.

~~~
ilzmastr
Sure they can, many output probabilities:
[http://deeplearning.net/tutorial/mlp.html#mlp](http://deeplearning.net/tutorial/mlp.html#mlp)

------
MrQuincle
From: [http://spectrum.ieee.org/robotics/artificial-
intelligence/ma...](http://spectrum.ieee.org/robotics/artificial-
intelligence/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-
data-and-other-huge-engineering-efforts)

Jordan: Well, humans are able to deal with cluttered scenes. They are able to
deal with huge numbers of categories. They can deal with inferences about the
scene: “What if I sit down on that?” “What if I put something on top of
something?” These are far beyond the capability of today’s machines. Deep
learning is good at certain kinds of image classification. “What object is in
this scene?”

I think Jordan refers here to Bayesian models that incorporate gravity,
occlusion, and other such concepts.

[http://www.cv-
foundation.org/openaccess/content_cvpr_2013/ht...](http://www.cv-
foundation.org/openaccess/content_cvpr_2013/html/Jiang_Hallucinated_Humans_as_2013_CVPR_paper.html)
e.g. postulates entire humans to improve scene understanding.

What I get out of this: Deep learning has to be enriched with progress from
other machine learning fields

------
leecarraher
Nice high level talk on Statistical Inference for Big Data by Jordan. Been one
of my favorites since his LDA/PLSA papers in 2003 with Andrew NG.
[http://videolectures.net/colt2014_jordan_bigdata/?q=jordan](http://videolectures.net/colt2014_jordan_bigdata/?q=jordan)

------
Chris2048
Related?:

[https://www.coursera.org/specializations/probabilistic-
graph...](https://www.coursera.org/specializations/probabilistic-graphical-
models)

[http://openclassroom.stanford.edu/MainFolder/CoursePage.php?...](http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=ProbabilisticGraphicalModels)

------
shurtler
Note there's an exploding literature that reads these models as causal models:
[http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf](http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf)

------
iconvalleysil
Michael I. Jordan was a mentor to Andrew Ng. Probabilistic Graphical Models
are the next frontier in AI after deep learning.

