

The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software (2005) - bontoJR
http://www.gotw.ca/publications/concurrency-ddj.htm

======
varelse
Wow, that's a 10 year-old article and still most of the code I see is single-
threaded Java and Python. I rarely even see C code anymore.

Methinks that explains the popularity of Hadoop: just break your task into
manageable subchunks and throw lots of independent threads at it, reducing the
mess at the end. The mapreduce abstraction does all your thinking for you.

But as mscharrer said, both GPUs (and CPUs) are continually improving in terms
of core counts and SIMD widths while processor clocks are mostly static. That
said, a ~$1000 GPU can give you up to 6.7 TFLOPS while a ~$7000 CPU can give
you up to 1.4 TFLOPs if you program them correctly. The difference is that
it's a lot easier IMO to use OpenCL/CUDA to tap that 6.7 TFLOPs than it is to
hand-code AVX inner loops to get anywhere close to peak performance out of the
CPU.

------
mscharrer
GPU's still seem to be improving rather quickly.

