Hacker News new | past | comments | ask | show | jobs | submit login
Doing Bayesian Data Analysis On the GPU: Examples (github.com/uncomplicate)
112 points by dragandj on July 30, 2016 | hide | past | favorite | 28 comments

Characteristic examples from the book Doing Bayesian Data Analysis 2nd edition [1] programmed in Clojure and OpenCL to run on the GPU. Much, much faster than Stan or JAGS!

The library used (Bayadera) is still pre-release, so much polishing is still needed, so this can be considered a preview. But, it is still very useful, and not more complex for programmers than the mainstream Bayesian tools.

[1] https://www.amazon.com/Doing-Bayesian-Data-Analysis-Second/d...

Any hint how much faster? Not looking for defensible benchmarks, but are you talking an order of magnitude? Multiple orders?

Does it make the same probabilistic guarantees as the methods used in Stan etc? Or is it trading validity for speed?

Nothing is universal and guaranteed, of course. YMMV, and all that.

For example, robust linear regression from chapter 17, that fits 300 points over 4 parameters (easy, but far from trivial) runs in 180 seconds in JAGS and 485 in Stan, in parallel with 4 chains, taking 20,000 samples.

Bayadera takes 276,297,912 samples in 300 milliseconds, giving much fine-grained estimations.

So, depending on how you count the difference, it would be 500-1000 times faster for this particular analysis, while per-sample ratio is something like 7,000,000 (compared to JAGS).

Of course, JAGS and Stan are mature software packages, while Bayadera is still pre-release...

Thanks. About the second part of my question - are you doing much the same stuff as JAGS/Stan? Like, they do a lot of work to make sure that their MCMC is validly converging to the posterior - does Bayadera make similar guarantees?

Is the speedup coming from a better implementation, or because GPUs are just way faster, or because it cuts statistical corners? If its cutting corners, are they sensible?

It uses different MCMC algorithm - affine invariant ensemble MCMC. The difference comes from the fact that this algorithm is parallelizable, while JAGS/Stan's isn't. So, many GPU cores are the main factor. But, the algorithm is also a factor, in a sense that parallel chains always mutually inform each other.

They may do a lot of work to make sure that MCMC is validly converging, and Bayadera also does its stuff on that front, but the truth is, and you'll find it in any book on MCMC (Gelman included) that you can never guarantee MCMC convergence.

Looks very nice. I wonder if the upcoming Xeon Phi will make the task of parallel sampling simpler. Or at least compiling and optimising automatic parallel samplers on the fly. Macros might be great for this. That's the ultimate probabilistic programming goal. Write the model and get efficient sampling for free.

Thanks. I doubt that XeonPhi would be any faster than my old AMD R9 290X, and the 10x price tag is also not inviting.

Can you point me to any good documentation on parallel MCMC algorithms and any info you might have written down on how you parallelized it? This sounds extremely worth porting over to some other probabilistic programming languages.

I'll be glad to send you the paper once it gets accepted.


Validity probably depends on whether you use a professional GPU or gaming devices. I am currently running NMF on a GTX980 and from time to time the algorithm totally fails, probably due to missing EC. I hope our new Tesla server will solve the issue

I can not guarantee, but this should not be a problem for this particular algorithm (MCMC).

It's not really comparable to stan if you're not computing gradients for HMC. Of course maybe for these models you don't need to.

Please point me to models that absolutely need HMC, and I can try to see how Bayadera fares.

Stan 's developers in particular use it for hierarchical models, but it general anything with highly correlated parameters works better with HMC than MCMC, IIRC.

Michael Betancourt (à stan dev working on the HMC parts) has a pair of YouTube videos which go into details.

That said, I switched to pymc3 so that I could compute logp via opencl more easily and if there were better ways to do this, I'm happy to see them.

But there ARE hierarchical models in the examples. One is 158-dimensional. With highly correlated parameters. Works like a charm in Bayadera.

My point was mainly that comparing speed between an algorithm that doesn't require gradients and HMC in Stan is apples and oranges.

How's that? The algorithms have the same goal - find the posterior distribution. The time to get to that distribution is what is important and what is compared, provided that both algorithms get proper results. How they do it underneath is irrelevant for the user who waits.

That's like saying that comparing a horse cart and an automobile is comparing apples and oranges.

That being said, there are other things where Stan might fare better. User familiarity, or maturity, or personal taste...

Looks cool. Would love to see the "much, much faster" claim quantified. Both including and excluding compile times. Stan is neat, but the recompile time after every tweak to the model really got to be a drag. If you can improve on that, it would be a real win for me.

Recompile time is under a second for most models that I tried. Let's say 150 - 700 ms. Once you compile it, you can use it many times.

The diference for the inference time is in the post below (but YMMV).

This looks great, and very tastefully implemented on the Clojure side. Have you considered splitting out things like the quil utils?

Thanks! About the quil utils - you mean get rid of them, or you think that they would be useful on their own? Btw, I only use quil for the sketch setup, the plots are done in low-level Processing. Quil does some(un)boxing, so It would be a drag when plotting millions of sample points.

Ultimately I've been disappointed with Incanter, which seems effectively abandoned at this point, and I'm always on the lookout for alternatives that might one day rival the tools available in R. I use quil for various visualisations, but it (and I guess Processing beneath it) is jarringly mutable. Despite that, it would be interesting to me to see more plotting libraries spring up for Clojure, I suppose that was my point.

Bayadera does not aim to rival or to mimic R tools - it aims to be better! But, it is strongly opinionated; either you take the bayesian path, or you use something else :)

Currently, I do not have resources to take on the task of creating a general-purpose plotting library. My idea with the plotting toolbox available in Bayadera is to provide a tool that would do one job - provide fast, easy to use, plotting for exactly those types of plots that are needed for bayesian stuff - and do that job well.

Plotting in general is not that hard technically, but the problem is that each user in each situation wants something unique. The level of customizability found in ggplot, matplotlib and the likes is rather high; that's something I do not have motivation to pursue.

Shameless self promotion: https://github.com/sbelak/huri has a half-decent plotting DSL that uses ggplot underneath.

Just of out curiosity, how hard would it be to write a compiler that takes JAGS or STAN model files and compiles it to the s-exps needed to use the library?

I must be missing something. Is there any output that we are supposed to be seeing? Plots of the fits or something?

Yes. Run the tests in the REPL. Call (analysis) to only invoke the fits and get the timings, or call (display-sketch) and then (reset! all-data (analysis)) to see the plots. It is only done like that in these examples; for actual work, you are free to use any meaning combination of functions that you like.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact