This really depends on where you set the baseline standard.
You may see the worst-performing variants of a given model as defining the baseline, with some being above the baseline.
But it's perfectly legitimate to define the baseline at the best-performing variants of a given model, with some unfortunately being below the baseline.
Personally, I prefer to set expectations high, and define the baseline based on the best that has been achieved so far. The worst-performing variant from 2014 should still be expected to exceed, or at the very worst be equal to, the best-performing variant from 2013.
You may see the worst-performing variants of a given model as defining the baseline, with some being above the baseline.
But it's perfectly legitimate to define the baseline at the best-performing variants of a given model, with some unfortunately being below the baseline.
Personally, I prefer to set expectations high, and define the baseline based on the best that has been achieved so far. The worst-performing variant from 2014 should still be expected to exceed, or at the very worst be equal to, the best-performing variant from 2013.