On one hand, there are the Monte Carlo-based methods that will support modeling almost any distributions, but are slow to use for large amounts of data.
On the other hand, there are interesting cases like Infer.NET that use a completely different technique (approximate, deterministic inference) but are brittle for many real-world use cases.
Then, there is the general issue that one has to be familiar with probabilistic models and the inner workings of the inference algorithms to have any hope of debugging the inevitable errors and convergence issues that arise. That seems to realistically require a machine learning or statistics PhD and the population of those is very small.
But the diagnostic tools are really quite good these days – tools like Stan and PyMC make it easy to see diagnostic plots that allow you to check for convergence and such. Additionally, more and more fine-tuning is happening automatically, like setting the right jump sizes and a plausible initial value for the random walk.
Note that traditional statistics, like regression, requires similar diagnostic steps: qqplots to check for the normality of your data, the Levene test for heteroskedasticity and so on. They often get ignored, but they're generally not hard to teach or learn.
We're really not that far from having this stuff be usable by the broader public. Maybe not "build any model you like and it will just work out of the box" but definitely as a replacement for much frequentist statistics. See "Doing Bayesian Data Analysis" by Kruschke, for example.
It's unfortunately still a long way from being a technology that you can pretty blindly use like ols or logistic regression. Particularly if you only care about prediction, not parameter inference, lr with a penalty search is pretty straightforward.
May I ask what / if you use bayesian stats for professionally in your news work?
Also, the link to your github from your site goes to stdbrouw instead of debrouwere
I don't get to use that much Bayesian analysis in my work, but I've been toying around with a hierarchical model of content popularity recently: are some topics / authors / genres more popular than others? Frankly finding it hard to separate the signal ("this is what people actually like to read") from the noise ("this is what just happened to be trending and will not be reproducible").
MCMC is a very slow inference algorithm. Its primary advantage was that for well-known models it could be coded up much more simply than a fancier inference technique. When you consider variational methods and newer streaming methods based on things like Assumed Density Filtering you can get really great scalable performance. The point of probabilistic programming is write inference algorithms once for a large class of models and be done. So the advantage of using a fancier method is amplified.
This means paradoxically probabilistic programming should eventually be faster than existing methods rather than slower, since you can reuse these fancier inference methods for new models. This is a very active field so this progress is only starting to be appear in the existing systems.
So that should be something where a few standard libraries or toolsets should emerge that push out the "easy-to-implement" default choice? Are there any contenders yet? (As you might be able to tell, I don't really know anything about the field)
Perhaps a more important difference is that MCMC, while slow, is exact in the limit. Variational methods won't converge to the true posterior no matter how long you run them. You'll converge to an approximate answer which depends on the particular variational form you choose to use.
And I think your second sentence reinforces my point -- it makes variational methods either more fiddly, or require more understanding to use well.
edit: here's one such (limited but nice) effort: http://ebonilla.github.io/papers/nguyen-bonilla-nips-2014.pd...
I'm not sure this has great practical use, but it's a very interesting system.
The exponential->polynomial speedup we might get from a quantum computer doesn't really apply to stochastic processes, because we can already execute them in polynomial time on a classical computer (just use a random number generator).
Having a quantum computer would still be truly amazing, though. Even just being able to simulate quantum mechanics efficiently would be revolutionary.
 - http://www.scottaaronson.com/democritus/lec9.html
But they seem to be getting less and less coverage in probabilistic programming. Perhaps they are considered too inexpressive?
I kind of laughed at this line, because this seems like a situation where the author came to the correct conclusion for all the wrong reasons.
Maybe things have changed, but the Mathematica programming language never struck me as useful for anything over a couple thousand lines.
edit: this is completely beside the purpose and message of this article, but it's worth noting that modularity is a pretty significant design criteria that none of these languages have a particularly strong story about.
My guess is that WL would be an interesting and powerful starting point for probabilistic programming at some point down the line (if not already).