That's an interesting point. The bias might or might not be intentional. From th...

That's an interesting point. The bias might or might not be intentional. From the benchmarks I have seen, lot of tools solve slightly different problems altogether and also target slightly different data distribution and in the end have to build best solution around it.

Which is why publishing open benchmarks is first step where there is public scrutiny around whether then benchmark itself irrespective of the results is fair. In the end, the end user will choose the benchmark that's best fit for their usecase or mostly will create a variation of their own, do their own unbiased evaluations.