Hacker News new | past | comments | ask | show | jobs | submit login
The State of Probabilistic Programming (moalquraishi.wordpress.com)
109 points by superfx on Apr 12, 2015 | hide | past | favorite | 30 comments

I've tried to use probabilistic programming for building a model of real data, and it seems that there is a long way to go before it's practical and fast.

On one hand, there are the Monte Carlo-based methods that will support modeling almost any distributions, but are slow to use for large amounts of data.

On the other hand, there are interesting cases like Infer.NET that use a completely different technique (approximate, deterministic inference) but are brittle for many real-world use cases.

Then, there is the general issue that one has to be familiar with probabilistic models and the inner workings of the inference algorithms to have any hope of debugging the inevitable errors and convergence issues that arise. That seems to realistically require a machine learning or statistics PhD and the population of those is very small.

Well, MCMC is pretty close to brute force so it's never going to be very fast.

But the diagnostic tools are really quite good these days – tools like Stan and PyMC make it easy to see diagnostic plots that allow you to check for convergence and such. Additionally, more and more fine-tuning is happening automatically, like setting the right jump sizes and a plausible initial value for the random walk.

Note that traditional statistics, like regression, requires similar diagnostic steps: qqplots to check for the normality of your data, the Levene test for heteroskedasticity and so on. They often get ignored, but they're generally not hard to teach or learn.

We're really not that far from having this stuff be usable by the broader public. Maybe not "build any model you like and it will just work out of the box" but definitely as a replacement for much frequentist statistics. See "Doing Bayesian Data Analysis" by Kruschke, for example.

I'm more pessimistic. My experience has been that inferring often benefits from reparameterization or analytic marginalizing, plus a bag of tricks that many people don't know.

It's unfortunately still a long way from being a technology that you can pretty blindly use like ols or logistic regression. Particularly if you only care about prediction, not parameter inference, lr with a penalty search is pretty straightforward.

May I ask what / if you use bayesian stats for professionally in your news work?

Also, the link to your github from your site goes to stdbrouw instead of debrouwere

So do you feel the pitfalls of probabilistic programming / Bayesian data analysis are potentially more severe than those of traditional methods?

I don't get to use that much Bayesian analysis in my work, but I've been toying around with a hierarchical model of content popularity recently: are some topics / authors / genres more popular than others? Frankly finding it hard to separate the signal ("this is what people actually like to read") from the noise ("this is what just happened to be trending and will not be reproducible").

The performance of most vanilla Metropolis-Hastings MCMC methods is pretty awful, but there are some good improvements. I use DREAM (a non-Markovian and parallelizable extension of M-H) with realistic problems and its convergence rate is much much much better, to the point that I can use it for what I'm working on while regular M-H methods are just too slow.

What's interesting about most complaints of these systems is people talk about their poor performance or scalability? That is usually more a consequence of using MCMC or other inference algorithm than the language itself.

MCMC is a very slow inference algorithm. Its primary advantage was that for well-known models it could be coded up much more simply than a fancier inference technique. When you consider variational methods and newer streaming methods based on things like Assumed Density Filtering you can get really great scalable performance. The point of probabilistic programming is write inference algorithms once for a large class of models and be done. So the advantage of using a fancier method is amplified.

This means paradoxically probabilistic programming should eventually be faster than existing methods rather than slower, since you can reuse these fancier inference methods for new models. This is a very active field so this progress is only starting to be appear in the existing systems.

> The point of probabilistic programming is write inference algorithms once for a large class of models and be done. So the advantage of using a fancier method is amplified.

So that should be something where a few standard libraries or toolsets should emerge that push out the "easy-to-implement" default choice? Are there any contenders yet? (As you might be able to tell, I don't really know anything about the field)

I know STAN and Figaro are going to push out inference methods like I mention, but my hope is eventually all of the ones mentioned in the article do this. I like thinking of this in terms of the standard library that needs to be built out. All the systems are making great progress in this regard.

Ease of coding is not the primary advantage of mcmc. The (giant) disadvantage of variational inference is the user needs to derive equations and understand quite a bit of math, where gibbs or other samplers like those in bugs/openbugs/jags/stan can work with the factored distributions and require much less mathematical sophistication from users.

I know they've traditionally been quite fiddly, but I'm pretty sure computers can be persuaded to help derive the maths for variational methods these days.

Perhaps a more important difference is that MCMC, while slow, is exact in the limit. Variational methods won't converge to the true posterior no matter how long you run them. You'll converge to an approximate answer which depends on the particular variational form you choose to use.

Are you aware of any work (or researchers) working on that? I would be very interested.

And I think your second sentence reinforces my point -- it makes variational methods either more fiddly, or require more understanding to use well.

edit: here's one such (limited but nice) effort: http://ebonilla.github.io/papers/nguyen-bonilla-nips-2014.pd...

Microsoft's Infer.NET is essentially an automated system for variational (/expectation-propagation) inference. It implements primitives for common operations and distributions, and then uses the local structure of mean-field inference ("variational message passing": http://www.jmlr.org/papers/volume6/winn05a/winn05a.pdf) to build up variational inference algorithms for arbitrary factor graphs. It's not infinitely flexible and doesn't solve all problems related to variational inference, but once you get used to it you can iterate quite quickly on model refinements, inference tweaks, etc. without any tedious derivations.

thank you!

I think we are broadly in agreement. When I say coding up inference for a particular model, that includes deriving the equations and updates needed. This effort is nonzero for all inference methods, but is much lower for MCMC.

You're totally right; I should have read it that way.

DrBayes (https://github.com/ntoronto/drbayes) is another interesting probabilistic programming language. It implements measure-theoretic probability in Racket, so it's not limited to what you can express with probability density functions and standard distributions -- you can condition a random variable on any measurable subset of the probability space, for example, so you could condition Y on 0 < X < 3, which wouldn't be possible to evaluate with density functions. Or you can construct random variables which have no density function.

I'm not sure this has great practical use, but it's a very interesting system.

This strikes me as a perfect application for quantum computers---but I'm just an amateur, so I'd love to hear an expert opinion. My understanding is this 1982 talk by Feynman [1] more or less launched the study of quantum computers, and it's all about how they can carry a probabilistic value through their computations rather than a definite one. And one of the lessons from that paper (if I'm reading & remembering right---maybe it's another paper) is that simulating quantum behavior with a non-quantum computer turns a lot of polynomial-time problems into exponential-time problems, so that having a real quantum computer would be very helpful for solving the performance & scalability issues described in the OP. Thoughts?

[1] http://www.cs.berkeley.edu/~christos/classics/Feynman.pdf

You have to take into account that quantum processes are governed by a kind of "randomness" that is different from the one in stochastic processes. Scott Aaronson explains the basics of quantum mechanics very succinctly in one of his lectures [1].

The exponential->polynomial speedup we might get from a quantum computer doesn't really apply to stochastic processes, because we can already execute them in polynomial time on a classical computer (just use a random number generator).

Having a quantum computer would still be truly amazing, though. Even just being able to simulate quantum mechanics efficiently would be revolutionary.

[1] - http://www.scottaaronson.com/democritus/lec9.html

I upvoted this comment as soon as I read it, but just to follow up: that link is amazing. I read lecture 9, then 10, 10.5, 11, . . . Eventually I bought the book and am reading it from the beginning. Godel and the Halting Problem has been a pet interest for ~17 years, and I've often wondered about implications similar to Penrose's argument. The recent death of a young promising friend has made me want to research these things more seriously, and Aaronson seems like a wonderful guide. Thank you so much!

It seems Markov Logic nets can scale to really big problems:

* https://code.google.com/p/rockit/

* http://i.stanford.edu/hazy/hazy/tuffy/

But they seem to be getting less and less coverage in probabilistic programming. Perhaps they are considered too inexpressive?

Probabilistic programming in Common Lisp:


Not really. Screamer gives you "nondeterministic" programming. It can be used for example to solve sudokus: http://nikodemus.github.io/screamer/sudoku.lisp.html

"Having been rescued from the clutches of object-orientation by early exposure to Mathematica"

I kind of laughed at this line, because this seems like a situation where the author came to the correct conclusion for all the wrong reasons.

Maybe things have changed, but the Mathematica programming language never struck me as useful for anything over a couple thousand lines.

edit: this is completely beside the purpose and message of this article, but it's worth noting that modularity is a pretty significant design criteria that none of these languages have a particularly strong story about.

Mathematica itself and Wolfram | Alpha have quite substantial code-bases written in Mathematica (now the Wolfram Language). Also, there was an article recently about probabilistic programming with WL:


My guess is that WL would be an interesting and powerful starting point for probabilistic programming at some point down the line (if not already).

I wonder if kids 30y from now look upon this as the new Prolog.

I think the difference is that Prolog was aimed to solve all kinds of programming problems. PPs on the other hand is very domain specific from the start.

Like Erlang?

I was pretty much thinking about logic programming throughout most of the article. Seems like PPS is a continuation of these ideas, in many ways.

This paper comes to mind as an interesting link between logic and probabilistic ideas: http://okmij.org/ftp/kakuritu/rethinking.pdf

In more ways than one, http://alchemy.cs.washington.edu/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact