Try running the workload with pinned CPUs assigned and not across NUMA nodes.
I've run into the same weirdness on other things and this always solves it. Some cores are better at some things than others.
Try running the workload with pinned CPUs assigned and not across NUMA nodes.