Have you looked into Xeon Phi? They have up to 288 threads. With a 256 thread Xe...

m_mueller · on Feb 14, 2017

I haven't tried MICs personally, but AFAIK you need vectorization to match current Pascal generation GPU performance with Knight's landing -> which is where my comment applies. I don't doubt that you can get good speedups when this applies to your code already, but if you start from naive CPU code you'll have a lot of work to do, which IMO is similar to the work needed to port to GPGPU.

jcoffland · on Feb 14, 2017

That is true. CPUs are catching up to GPUs in some ways. Intel is doing its best to take this market from NVidia. The future will tell.