I doubt it would compare favorably at the moment, I don't think it's particularly well optimized besides using rayon to get CPU parallelism and wide for a bit of SIMD.
It's good enough to get pretty good performance for little effort, but I don't think it would win a benchmark race either.