"Model A 13B", "Model B 20B" etc are pretty vapid claims. Which *actual models*?...

"Model A 13B", "Model B 20B" etc are pretty vapid claims. Which actual models? There are plenty of terrible high-param-count models from a year or two ago. The benchmark seems meaningless without saying what models are actually being compared against... And "13B" in particular is pretty sketchy: are they comparing it against Llama 2 13B? Even an untuned Llama 3.1 8B would destroy that in any benchmark.

Smells a little grifty to me...