This isn't really apples-to-apples comparing with FFTW. 1. It's been my experien...

O3marchnative · 2024-05-02T14:50:13 1714661413

Hi,

One of the authors of PhastFT here. Thank you for your interest.

We went out of our way to configure FFTW for AVX-512. The Rust bindings don't do it, but the FFTW itself in the benchmark does.

It's worth noting that with FFTW you have to choose between building it for your CPU and making it non-portable, or targeting the lowest common denominator of CPU features so that it runs everywhere but much slower. Meanwhile PhastFT detects the available CPU features at runtime, and will utilize the fastest CPU features without sacrificing portability.

Lastly, we are currently working on support for interleaved format [1]. That should ship in the next release.

[1] https://github.com/QuState/PhastFT/pull/27

gct · 2024-05-02T21:03:03 1714683783

FFTW will definitely query cpuid at runtime too, since it's piecing together kernels anyways it's not much more work for it to choose to ignore AVX, etc. If you use the [guru interface](https://www.fftw.org/fftw3_doc/Guru-vector-and-transform-siz...) to configure it to work with split arrays (and maybe use FFTW_MEASURE when planning) I think the benchmarks will be a lot more 1:1