I'm not sure I believe anyone in the field has ever said "AI token". I also don't buy that the term "training data" implies the existence of labeled input/output pairs. Unlabeled data is still training data.
Seems useful at a first glance but after doing some reading it looks like a lot of the list is proprietary technologies, platforms, and companies rather than helpful definitions
(How'd I notice this? I have a little HN reader app I maintain at https://www.thnr.net/ , and I got some error messages in my logs when my word-count function (which computes how long it would take a person to read the article) was processing this web page's html. Part of this function examines what text encodings the web server and web page each self-report the page as having. The HTTP headers correctly said "UTF-8", fwiw.)