And the "how" behind Octavian.jl is basically LoopVectorization.jl [1], which he...

KolenCh · on Feb 15, 2022

Thanks. But how LoopVectorization.jl is helping here, say comparing to C/Fortran optimized w.r.t. to the CPU? Is there somewhere in their doc mentioning this?

adgjlsfhk1 · on Feb 15, 2022

The basic answer is that LLVM doesn't do as good a job with some types of vectorization because it is working on a lower level representation. There are several causes of this. One is that LoopVectorization has permission to replace elementary functions with hand-written vectorized equivalents, another is that it does a better job using gather/scatter instructions.