No News Is Good News: A Critique of the One Billion Word Benchmark

Reuzel · on Nov 23, 2021

No Paper Is Good Paper: A Critique of Long Titles

The Arxiv One Billion Paper Benchmark was released in 2011, and is commonly used as a benchmark to writing academic papers. Analysis of this dataset shows that it contains several examples of sarcastic papers, as well as outdated references to current events, such as Support Vectors Machines. We suggest that the temporal nature of science makes this benchmark poorly suited to writing academic papers, and discuss potential impact and considerations for researchers building language models and evaluation datasets.

Conclusions

Papers written on top of other papers snap-shotted in time will display the inherent social bias and structural issues of that time. Therefore, people creating and using benchmarks, should realize that such a thing as drift exists, and we suggest they find ways around this. We encourage other paper writers to actively avoid using benchmarks where the training samples are always the same. This is a poor way to measure perplexity of language models and science. For better comparison, we suggest the training samples always change to reflect the current anti-bias Zeitgeist and that you cite our paper when doing so.