Awesome comparison, super curious how this would scale out on the current generation of big CPU machines. E.g. Epyc with 64 cores, would threads still perform that well?
Would be lovely to get my hands on such metal :)
I wonder how Linux thread scheduling scales on multi-CPU machines. To keep things simple, I specifically chose to go with a single-core machine to benchmark all architectures.