
Doing Bayesian Data Analysis On the GPU: Examples - dragandj
https://github.com/uncomplicate/bayadera/tree/master/test/clojure/uncomplicate/bayadera/examples/dbda
======
dragandj
Characteristic examples from the book Doing Bayesian Data Analysis 2nd edition
[1] programmed in Clojure and OpenCL to run on the GPU. Much, much faster than
Stan or JAGS!

The library used (Bayadera) is still pre-release, so much polishing is still
needed, so this can be considered a preview. But, it is still very useful, and
not more complex for programmers than the mainstream Bayesian tools.

[1] [https://www.amazon.com/Doing-Bayesian-Data-Analysis-
Second/d...](https://www.amazon.com/Doing-Bayesian-Data-Analysis-
Second/dp/0124058884/ref=sr_1_1?ie=UTF8&qid=1469897741&sr=8-1&keywords=doing+bayesian+data+analysis)

~~~
feral
Any hint how much faster? Not looking for defensible benchmarks, but are you
talking an order of magnitude? Multiple orders?

Does it make the same probabilistic guarantees as the methods used in Stan
etc? Or is it trading validity for speed?

~~~
dragandj
Nothing is universal and guaranteed, of course. YMMV, and all that.

For example, robust linear regression from chapter 17, that fits 300 points
over 4 parameters (easy, but far from trivial) runs in 180 seconds in JAGS and
485 in Stan, in parallel with 4 chains, taking 20,000 samples.

Bayadera takes 276,297,912 samples in 300 milliseconds, giving much fine-
grained estimations.

So, depending on how you count the difference, it would be 500-1000 times
faster for this particular analysis, while per-sample ratio is something like
7,000,000 (compared to JAGS).

Of course, JAGS and Stan are mature software packages, while Bayadera is still
pre-release...

~~~
feral
Thanks. About the second part of my question - are you doing much the same
stuff as JAGS/Stan? Like, they do a lot of work to make sure that their MCMC
is validly converging to the posterior - does Bayadera make similar
guarantees?

Is the speedup coming from a better implementation, or because GPUs are just
way faster, or because it cuts statistical corners? If its cutting corners,
are they sensible?

~~~
dragandj
It uses different MCMC algorithm - affine invariant ensemble MCMC. The
difference comes from the fact that this algorithm is parallelizable, while
JAGS/Stan's isn't. So, many GPU cores are the main factor. But, the algorithm
is also a factor, in a sense that parallel chains always mutually inform each
other.

They may do a lot of work to make sure that MCMC is validly converging, and
Bayadera also does its stuff on that front, but the truth is, and you'll find
it in any book on MCMC (Gelman included) that you can never guarantee MCMC
convergence.

~~~
nextos
Looks very nice. I wonder if the upcoming Xeon Phi will make the task of
parallel sampling simpler. Or at least compiling and optimising automatic
parallel samplers on the fly. Macros might be great for this. That's the
ultimate probabilistic programming goal. Write the model and get efficient
sampling for free.

~~~
dragandj
Thanks. I doubt that XeonPhi would be any faster than my old AMD R9 290X, and
the 10x price tag is also not inviting.

------
anon1253
Just of out curiosity, how hard would it be to write a compiler that takes
JAGS or STAN model files and compiles it to the s-exps needed to use the
library?

------
jsweojtj
I must be missing something. Is there any output that we are supposed to be
seeing? Plots of the fits or something?

~~~
dragandj
Yes. Run the tests in the REPL. Call (analysis) to only invoke the fits and
get the timings, or call (display-sketch) and then (reset! all-data
(analysis)) to see the plots. It is only done like that in these examples; for
actual work, you are free to use any meaning combination of functions that you
like.

