
The State of Probabilistic Programming - superfx
https://moalquraishi.wordpress.com/2015/03/29/the-state-of-probabilistic-programming/
======
mizzao
I've tried to use probabilistic programming for building a model of real data,
and it seems that there is a long way to go before it's practical and fast.

On one hand, there are the Monte Carlo-based methods that will support
modeling almost any distributions, but are slow to use for large amounts of
data.

On the other hand, there are interesting cases like Infer.NET that use a
completely different technique (approximate, deterministic inference) but are
brittle for many real-world use cases.

Then, there is the general issue that one has to be familiar with
probabilistic models and the inner workings of the inference algorithms to
have any hope of debugging the inevitable errors and convergence issues that
arise. That seems to realistically require a machine learning or statistics
PhD and the population of those is very small.

~~~
stdbrouw
Well, MCMC is pretty close to brute force so it's never going to be very fast.

But the diagnostic tools are really quite good these days – tools like Stan
and PyMC make it easy to see diagnostic plots that allow you to check for
convergence and such. Additionally, more and more fine-tuning is happening
automatically, like setting the right jump sizes and a plausible initial value
for the random walk.

Note that traditional statistics, like regression, requires similar diagnostic
steps: qqplots to check for the normality of your data, the Levene test for
heteroskedasticity and so on. They often get ignored, but they're generally
not hard to teach or learn.

We're really not that far from having this stuff be usable by the broader
public. Maybe not "build any model you like and it will just work out of the
box" but definitely as a replacement for much frequentist statistics. See
"Doing Bayesian Data Analysis" by Kruschke, for example.

~~~
x0x0
I'm more pessimistic. My experience has been that inferring often benefits
from reparameterization or analytic marginalizing, plus a bag of tricks that
many people don't know.

It's unfortunately still a long way from being a technology that you can
pretty blindly use like ols or logistic regression. Particularly if you only
care about prediction, not parameter inference, lr with a penalty search is
pretty straightforward.

May I ask what / if you use bayesian stats for professionally in your news
work?

Also, the link to your github from your site goes to stdbrouw instead of
debrouwere

~~~
stdbrouw
So do you feel the pitfalls of probabilistic programming / Bayesian data
analysis are potentially more severe than those of traditional methods?

I don't get to use that much Bayesian analysis in my work, but I've been
toying around with a hierarchical model of content popularity recently: are
some topics / authors / genres more popular than others? Frankly finding it
hard to separate the signal ("this is what people actually like to read") from
the noise ("this is what just happened to be trending and will not be
reproducible").

------
cf
What's interesting about most complaints of these systems is people talk about
their poor performance or scalability? That is usually more a consequence of
using MCMC or other inference algorithm than the language itself.

MCMC is a very slow inference algorithm. Its primary advantage was that for
well-known models it could be coded up much more simply than a fancier
inference technique. When you consider variational methods and newer streaming
methods based on things like Assumed Density Filtering you can get really
great scalable performance. The point of probabilistic programming is write
inference algorithms once for a large class of models and be done. So the
advantage of using a fancier method is amplified.

This means paradoxically probabilistic programming should eventually be faster
than existing methods rather than slower, since you can reuse these fancier
inference methods for new models. This is a very active field so this progress
is only starting to be appear in the existing systems.

~~~
x0x0
Ease of coding is not the primary advantage of mcmc. The (giant) disadvantage
of variational inference is the user needs to derive equations and understand
quite a bit of math, where gibbs or other samplers like those in
bugs/openbugs/jags/stan can work with the factored distributions and require
much less mathematical sophistication from users.

~~~
mjw
I know they've traditionally been quite fiddly, but I'm pretty sure computers
can be persuaded to help derive the maths for variational methods these days.

Perhaps a more important difference is that MCMC, while slow, is exact in the
limit. Variational methods won't converge to the true posterior no matter how
long you run them. You'll converge to an approximate answer which depends on
the particular variational form you choose to use.

~~~
x0x0
Are you aware of any work (or researchers) working on that? I would be very
interested.

And I think your second sentence reinforces my point -- it makes variational
methods either more fiddly, or require more understanding to use well.

edit: here's one such (limited but nice) effort:
[http://ebonilla.github.io/papers/nguyen-bonilla-
nips-2014.pd...](http://ebonilla.github.io/papers/nguyen-bonilla-
nips-2014.pdf)

~~~
davmre
Microsoft's Infer.NET is essentially an automated system for variational
(/expectation-propagation) inference. It implements primitives for common
operations and distributions, and then uses the local structure of mean-field
inference ("variational message passing":
[http://www.jmlr.org/papers/volume6/winn05a/winn05a.pdf](http://www.jmlr.org/papers/volume6/winn05a/winn05a.pdf))
to build up variational inference algorithms for arbitrary factor graphs. It's
not infinitely flexible and doesn't solve all problems related to variational
inference, but once you get used to it you can iterate quite quickly on model
refinements, inference tweaks, etc. without any tedious derivations.

~~~
x0x0
thank you!

------
capnrefsmmat
DrBayes
([https://github.com/ntoronto/drbayes](https://github.com/ntoronto/drbayes))
is another interesting probabilistic programming language. It implements
measure-theoretic probability in Racket, so it's not limited to what you can
express with probability density functions and standard distributions -- you
can condition a random variable on any measurable subset of the probability
space, for example, so you could condition Y on 0 < X < 3, which wouldn't be
possible to evaluate with density functions. Or you can construct random
variables which have no density function.

I'm not sure this has great practical use, but it's a very interesting system.

------
pjungwir
This strikes me as a perfect application for quantum computers---but I'm just
an amateur, so I'd love to hear an expert opinion. My understanding is this
1982 talk by Feynman [1] more or less launched the study of quantum computers,
and it's all about how they can carry a probabilistic value through their
computations rather than a definite one. And one of the lessons from that
paper (if I'm reading & remembering right---maybe it's another paper) is that
simulating quantum behavior with a non-quantum computer turns a lot of
polynomial-time problems into exponential-time problems, so that having a real
quantum computer would be very helpful for solving the performance &
scalability issues described in the OP. Thoughts?

[1]
[http://www.cs.berkeley.edu/~christos/classics/Feynman.pdf](http://www.cs.berkeley.edu/~christos/classics/Feynman.pdf)

~~~
ibab
You have to take into account that quantum processes are governed by a kind of
"randomness" that is different from the one in stochastic processes. Scott
Aaronson explains the basics of quantum mechanics very succinctly in one of
his lectures [1].

The exponential->polynomial speedup we might get from a quantum computer
doesn't really apply to stochastic processes, because we can already execute
them in polynomial time on a classical computer (just use a random number
generator).

Having a quantum computer would still be truly amazing, though. Even just
being able to simulate quantum mechanics efficiently would be revolutionary.

[1] -
[http://www.scottaaronson.com/democritus/lec9.html](http://www.scottaaronson.com/democritus/lec9.html)

~~~
pjungwir
I upvoted this comment as soon as I read it, but just to follow up: that link
is amazing. I read lecture 9, then 10, 10.5, 11, . . . Eventually I bought the
book and am reading it from the beginning. Godel and the Halting Problem has
been a pet interest for ~17 years, and I've often wondered about implications
similar to Penrose's argument. The recent death of a young promising friend
has made me want to research these things more seriously, and Aaronson seems
like a wonderful guide. Thank you so much!

------
nextos
It seems Markov Logic nets can scale to really big problems:

* [https://code.google.com/p/rockit/](https://code.google.com/p/rockit/)

* [http://i.stanford.edu/hazy/hazy/tuffy/](http://i.stanford.edu/hazy/hazy/tuffy/)

But they seem to be getting less and less coverage in probabilistic
programming. Perhaps they are considered too inexpressive?

------
typedweb
Probabilistic programming in Common Lisp:

[https://github.com/nikodemus/screamer](https://github.com/nikodemus/screamer)

~~~
kgwgk
Not really. Screamer gives you "nondeterministic" programming. It can be used
for example to solve sudokus:
[http://nikodemus.github.io/screamer/sudoku.lisp.html](http://nikodemus.github.io/screamer/sudoku.lisp.html)

------
nmrm2
_" Having been rescued from the clutches of object-orientation by early
exposure to Mathematica"_

I kind of laughed at this line, because this seems like a situation where the
author came to the correct conclusion for all the wrong reasons.

Maybe things have changed, but the Mathematica programming language never
struck me as useful for anything over a couple thousand lines.

edit: this is completely beside the purpose and message of this article, but
it's worth noting that modularity is a pretty significant design criteria that
none of these languages have a particularly strong story about.

~~~
clementine
Mathematica itself and Wolfram | Alpha have quite substantial code-bases
written in Mathematica (now the Wolfram Language). Also, there was an article
recently about probabilistic programming with WL:

[http://www.mathematica-
journal.com/data/uploads/2013/11/Cass...](http://www.mathematica-
journal.com/data/uploads/2013/11/Cassel.pdf)

My guess is that WL would be an interesting and powerful starting point for
probabilistic programming at some point down the line (if not already).

------
enupten
I wonder if kids 30y from now look upon this as the new Prolog.

~~~
vezzy-fnord
I was pretty much thinking about logic programming throughout most of the
article. Seems like PPS is a continuation of these ideas, in many ways.

~~~
enupten
In more ways than one,
[http://alchemy.cs.washington.edu/](http://alchemy.cs.washington.edu/)

