I used RDRAND to seed a SSE Fractal Flame generator (which is a stochastic system which needs 4 random numbers per thread per loop iteration, sometimes more depending on the variations used). RDRAND has a maximum throughput of something like 500MB/s, and it takes approximately 150 clocks per invocation. I was never able to hit that performance wall with my fractal program. My fractal program is also significantly faster than other CPU implementations, though that is probably due to the intrinsic vectorization more than the random number source. I also got significantly better looking fractals then what I would get with most of the PRNG that I tried.
For more check the "performance" section of this article
Note: if you need more than 500MB/s you can uses RDRAND (or RDSEED in Broadwell, when it comes out) to seed a PRNG. I was doing this at first, but the performance of the system didn't improve enough for it to be worth the added complexity.
And if you want to access it in linux from C you can either:
* Use inline assembly, just be sure to check the Zero flag after calling RDRAND, because if RDRAND fails (you're exceeding the 500MB/s) the zero flag isn't set, so you have to just keep calling RDRAND until it is.
* Use intrinsics (easier, immintrin.h), here's how I did it in my program (bug reports welcome, I'm only a freshman in college who had lots of free time and a fascination with fractals)