Hacker News new | past | comments | ask | show | jobs | submit login

The contents of that hardcoded dictionary are really weird, too. It includes a lot of bizarrely specific entries like:

    in popular culture
    Holy Roman Emperor
    It is important to
    examples include the
    have speculated that
    aria-hidden="true">·<
    (new Date).getTime()}
    :url" content="http://
Many of the English phrases (like "in popular culture"!) are highly characteristic of content from the English Wikipedia. I've speculated before that this may be the result of the use of a compression benchmark which included (or consisted entirely of) content from that site, such as https://cs.fit.edu/~mmahoney/compression/textdata.html

If you're curious, here's a full dump of the dictionary:

https://gist.github.com/duskwuff/8a75e1b5e5a06d768336c8c7c37...




however the idea seems sound? I can't help wondering why Google who can fund GP-3, couldn't come up with a better initial dictionary? especially for small responses it is a big win




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: