Incidentally, it's a mystery to me why so many of the Gutenberg/Internet Archive books have such ludicrously bad plain-text renderings, utterly unusable/unreadable, with often barely one word correct per page - since neural nets have been getting very high scores on MNIST handwriting recognition (for example) for a long time now, maybe since the 90s? It's a shame.
On any level, the data represents an average product of human visual calculation. A simple use case would be comparing that with the product of machine visual calculations to better understand and optimize the systems they are designing.