
Edward – A Turing-Complete Language for Deep Probabilistic Programming - xtacy
https://arxiv.org/abs/1701.03757v1
======
KasianFranks
Speaking of which, check out Michael I. Jordans work on Probabilistic
Graphical Models
[https://www.google.com/search?q=michael+i+jordan+probalistic...](https://www.google.com/search?q=michael+i+jordan+probalistic+prrogramming&oq=michael+i+jordan+probalistic+prrogramming&aqs=chrome..69i57.17982j0j4&sourceid=chrome&ie=UTF-8)

Mentor to Andrew Ng, former head of Google AI, Baidu and a few other things.
[https://en.wikipedia.org/wiki/Michael_I._Jordan](https://en.wikipedia.org/wiki/Michael_I._Jordan)

Saira, Mina and David worked on some interesting stuff related to using ML/AI
in extending human life span, nematodes a while back. Statistical modeling of
biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for
genes related to life span - Blei DM, Franks K, Jordan MI, Mian IS. -
[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533868](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533868)

------
kuwze
There is also ZhuSuan which also leverages TensorFlow.

[https://arxiv.org/abs/1709.05870v1](https://arxiv.org/abs/1709.05870v1)

[https://github.com/thu-ml/zhusuan](https://github.com/thu-ml/zhusuan)

[https://i.imgur.com/gzfhS29.png](https://i.imgur.com/gzfhS29.png)

------
carbocation
The software library is located here:
[http://edwardlib.org/](http://edwardlib.org/) . Notably, Edward is layered on
TensorFlow.

Regarding the significance of the authors, David Blei first described latent
Dirichlet allocation (LDA), an important algorithm for generative topic
modeling, in ~2003. Interestingly, the last I checked, LDA couldn't be done in
Edward (yet).

~~~
shmageggy
I also briefly tried it out, being drawn to the claim of Turing completeness,
but I wasn't able to get inference working over any model with interesting
control flow (e.g. loops). It seemed to have about the same expressive power
of PyMC3, albeit running over Tensorflow which seemed neat. It would be very
cool to see something with the expressive power of, say, Church running on tf.

~~~
eli_gottlieb
In complete sincerity, I think that speeding up Turing-complete probabilistic
programming to the kinds of inference speed we can get in the gradient-descent
training of deep neural networks would be a "change the world"-level advance
for ML/AI.

~~~
RJTrolo
We already have that - variational inference based algorithms like BBVI use
gradient descent for training

~~~
eli_gottlieb
Variational inference also only works for continuous probability models, so it
can't be used for most interesting use-cases of probabilistic programming.

------
jmh530
I'll have to read the paper to see what makes it "deep"...

A cursory skim suggests that it is much faster than Stan, but I suppose the
more significant question is if it provides the correct results. Stan might
take longer, but I'm usually pretty confident that with some simple
diagnostics I can see whether the results are what I really need.

~~~
jmh530
One thing that looks cool is the tutorial for probabalistic PCA. That is a b
__ __of a thing to do in Stan. It really only works under some very limited
conditions. Edward has this ability to combine in a KL divergence minimization
in there. Not exactly sure how it works. I should look into it more. I don 't
really have a good sense of it just from reading the paper and a tutorial or
two.

~~~
groceryheist
As someone who just implemented hierarchical probabilistic PCA in stan, I
agree that it takes finesse, but it is no means impossible. Doing this sort of
work efficiently in stan seems to require a some degree of understanding about
how the sampler works. It also may require really thinking through your model.
It saves you from deriving your own conditional distributions and writing a
gibbs sampler, but you're going to have to do some analysis if you want to fit
models of certain complexity.

KL divergence minimization (variational inference) is typically a weak
approximation to the model you specified. I have seen it produce inferences on
simulated which are just plain wrong. These "wrong" models are still often
good predictors, so whether variational inference will work well for you
depends on whether you care about making valid inferences or just doing
prediction.

~~~
jmh530
I would be very interested in seeing how you implemented the hierarchical
PPCA.

My problem was that I couldn't identify the coefficients. So for instance, the
first principal component could be [x, x, x, ...] or [-x, -x, -x, ...] and the
result would be some bimodal distribution. So if you placed restrictions on
the first PC it would work (like only positive), but those restrictions may
not make sense for the next PCs.

~~~
groceryheist
Yes, multimodality is often a problem for mcmc clustering or dimensionality
reduction. However, if you use the SVD method to estimate PCA you only have a
bimodal distribution since SVD is identified up to the sign. Asymmetric
initialization is usually enough to solve the problem.

This thread has some good examples of PCA implementations in STAN.
[https://groups.google.com/forum/#!topic/stan-
users/5R2-QUDiy...](https://groups.google.com/forum/#!topic/stan-
users/5R2-QUDiyME)

------
flor1s
A nice beginner friendly book about Probabilistic Programming is the book by
Avi Pfeffer: "Practical Probabilistic Programming" (published by Manning). The
only downside of the book is that it used Pfeffer's own Scala library called
Figaro, which does not seem to get as much attention as projects such as Stan
and Edward.

------
frabcus
Anyone recommend any good resources for learning to use Edward?

The tutorials on the main site are good?
[http://edwardlib.org/tutorials/](http://edwardlib.org/tutorials/)

~~~
AlexCoventry
Yes. I would start there.

------
hardbyte
There is another Tensorflow bayesian programming library called Aboleth -
[https://github.com/data61/aboleth](https://github.com/data61/aboleth)

------
foxfired
#not related but:

Maybe nobody else cares, but the name does matter. Edward, Stan, Cassandra.
Have we run out of computer (or programming) sounding names?

This is Computatrum Antropomorphicus.

~~~
yen223
This is bikeshedding at its finest

~~~
visarga
I know people who got stuck at picking a name and gave up writing the program.
What can you do, when there is no name that makes you happy?

------
gigatexal
I like the syntax

