

NumbaPro CUDA Python speeds up a Monte Carlo pricer 14x - corinna
http://continuum.io/blog/monte-carlo-pricer

======
apaprocki
We've been using CUDA to speed up Monte Carlo pricing for years now. :)
Writing in Python does make it more accessible to more programmers. I'm
curious how much overhead is added.

[http://www.wallstreetandtech.com/it-
infrastructure/bloomberg...](http://www.wallstreetandtech.com/it-
infrastructure/bloomberg-uses-gpus-to-speed-up-bond-pri/220200055)

------
rck
It would be more interesting to see a visualization of throughput vs. problem
size. That would almost certainly give a better sense of how good this
approach is, and it would let us compare to other CUDA monte carlo
implementations, like the one Nvidia has had on their CUDA site for years:

[http://developer.download.nvidia.com/compute/cuda/2_2/sdk/we...](http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/MonteCarlo/doc/MonteCarlo.pdf)

------
pavanky
This particular problem is so embarrassingly parallel, I am surprised the
speedup is only 14x.

~~~
tdees40
Second that one. On these sort of problems it's not uncommon to see 100x+
speedups. My general experience is that just re-writing this in C++ instead of
Numpy you'd see a >14x speedup, so I'm pretty unimpressed at this point.

~~~
jofer
I'm be surprised if there was that large of a speedup going from a well-
written numpy implementation to C/C++ (assuming a non-parallel implementation
in both)...

In my experience, it's usually more like 2x-5x for a straight translation of
the algorithm from numpy to C. (Though I have to confess that I often write
rather naive C...)

Numpy performs quite well if you avoid a few common pitfalls. It certainly can
be beat, but it's often faster than people think it is.

~~~
dagw
That matches my experience as well. I've never gotten more than a 3x speed up
by doing a straight numpy->C rewrite on actual real world code (actually doing
one right now, and I've gotten a 2x speed up after my first pass).

------
viraptor
I wonder if there's any good way to select random indexes from a bitmap using
CUDA. If yes, this could be very nice for monte-carlo poker hand scoring.

~~~
CountHackulus
That's something the demoscene loves to do. There's a few ways to do it, but
what's usually done is to have a large bitmap filled with random values that's
accessed by a function of threadID and current pixel (or whatever other info
you have handy). Essentially, the large bitmap acts as a precalculated PRNG.

~~~
incision
>There's a few ways to do it, but what's usually done is to have a large
bitmap filled with random values that's accessed by a function of threadID and
current pixel (or whatever other info you have handy).

Do you have any good links that elaborate on this kind of technique? I had an
idea that sounds a lot like this once, but being fairly ignorant about the
topic I had no idea what to call it or how to implement it.

Thanks.

~~~
CountHackulus
This is probably the simplest article about the subject:
[http://www.reedbeta.com/blog/2013/01/12/quick-and-easy-
gpu-r...](http://www.reedbeta.com/blog/2013/01/12/quick-and-easy-gpu-random-
numbers-in-d3d11/)

------
dbecker
Anyone outside of Continuum who can share their experiences with Numba and/or
NumbaPro?

