All you are saying is here is that you got it wrong in the same way for all parts of the test.

Which may remove the bias (I'm not sure about this) but it doesn't inspire confidence.

Yes, I agree it is not a rigorous statistical analysis but I tried doing a fair comparison (based on assumptions/heuristics clearly described in the post). The blog post simply claims what is described there.

