The contents of that hardcoded dictionary are really weird, too. It includes a lot of bizarrely specific entries like:
in popular culture
Holy Roman Emperor
It is important to
examples include the
have speculated that
aria-hidden="true">·<
(new Date).getTime()}
:url" content="http://
Many of the English phrases (like "in popular culture"!) are highly characteristic of content from the English Wikipedia. I've speculated before that this may be the result of the use of a compression benchmark which included (or consisted entirely of) content from that site, such as https://cs.fit.edu/~mmahoney/compression/textdata.html
If you're curious, here's a full dump of the dictionary:
however the idea seems sound? I can't help wondering why Google who can fund GP-3, couldn't come up with a better initial dictionary? especially for small responses it is a big win
If you're curious, here's a full dump of the dictionary:
https://gist.github.com/duskwuff/8a75e1b5e5a06d768336c8c7c37...