Some thoughts on my experiments with SIMD Programming: 1. AVX2 is good, but tedi...

gnufx · on Feb 19, 2019

I can't comment on the GPU comments, but you may be better off leaving the vectorization to gcc than using the simd pragma. On haswell, it uses avx2, but not fma, so you loose a factor of two on GEMM, for instance. The GCC manual also gives an example for the ivdep pragma.

pjmlp · on Feb 20, 2019

Would not that be just a question of missing improvements?

If I recall correctly, OpenJDK can use FMA thanks to Intel contributions.

gnufx · on Feb 20, 2019

I don't see how openjdk is related to the openmp pragma. GCC has no problem using FMA if you just let it, avoiding the pragma which simply says "simd".

pjmlp · on Feb 21, 2019

I understood that GCC auto-vectorization wouldn't do it currently, and hence gave an example where auto-vectorization does make use of it, assuming I remember Intel's session at CodeONE correctly.