Did you do a benchmark with `--threads 1`? I suspect most of the speedup is coming from that as the the listing of the files is likely to be IO bound (or at least bound by calling into the kernel to read the directory listings).
That has certainly been my experience in the past when experimenting with this sort of thing, that more threads makes a lot of difference.
I did. You are right, multi-threading does not give a linear speed up, but it makes fd about a factor of three faster on my machine (8 virtual cores). With `--threads 1`, fd is on-par with 'find -iname', but still faster than 'find -iregex'.
That has certainly been my experience in the past when experimenting with this sort of thing, that more threads makes a lot of difference.