Hi HN,
We've just published a lot of original, visual, and intuitive explanations of concepts to introduce people to large language models.
It's available for free with no sign-up needed and it includes text articles, some video explanations, and code examples/notebooks as well. And we're available to answer your questions in a dedicated Discord channel.
You can find it here: https://llm.university/
Having written https://jalammar.github.io/illustrated-transformer/, I've been thinking about these topics and how best to communicate them for half a decade. But this project is extra special to me because I got to collaborate on it with two of who I think of as some of the best ML educators out there. Luis Serrano of https://www.youtube.com/@SerranoAcademy and Meor Amer, author of "A Visual Introduction to Deep Learning" https://kdimensions.gumroad.com/l/visualdl
We're planning to roll out more content to it (let us know what concepts interest you). But as of now, it has the following structure (With some links for highlighted articles for you to audit):
---
Module 1: What are Large Language Models
- Text Embeddings (https://docs.cohere.com/docs/text-embeddings)
- Similarity between words and sentences (https://docs.cohere.com/docs/similarity-between-words-and-sentences)
- The attention mechanism
- Transformer models (https://docs.cohere.com/docs/transformer-models HN Discussion: https://news.ycombinator.com/item?id=35576918)
- Semantic search
---
Module 2: Text representation
- Classification models (https://docs.cohere.com/docs/classification-models)
- Classification Evaluation metrics (https://docs.cohere.com/docs/evaluation-metrics)
- Classification / Embedding API endpoints
- Semantic search
- Text clustering
- Topic modeling (goes over clustering Ask HN posts https://docs.cohere.com/docs/clustering-hacker-news-posts)
- Multilingual semantic search
- Multilingual sentiment analysis
---
Module 3: Text generation
- Prompt engineering (https://docs.cohere.com/docs/model-prompting)
- Use case ideation
- Chaining prompts
---
A lot of the content originates from common questions we get from users of the LLMs we serve at Cohere. So the focus is more on application of LLMs than theory or training LLMs.
Hope you enjoy it, open to all feedback and suggestions!
Kinda frustrating that the main link dumps me onto what reads like a university syllabus, and nothing original, visual, or intuitive.
If I click through the sections in order, there are 5 "preamble" sections describing logistical and other meta-information about the course. All text.
The first pedagogical image I see this this, which tbh doesn't make any sense to me: https://files.readme.io/329efd5-image.png
"Where would you put the word apple?"
The image alone doesn't work without reading the supporting text very closely. I also have to have a pretty sophisticated understanding to get the idea that I can represent words as points in a plane.
Representing the words as icons is fundamentally confusing, too, I think. After all, maybe I say the word "apple" should go in "d" because it has at least two senses: a fruit and a machine.
Oh, sorry, you failed your first quiz!
"You can't fail the quiz, you're not being graded." Then why call it a quiz? Why use classroom metaphors unless you want students to fall back on classroom behaviors?
Of course, you know the #1 student classroom behavior: not reading the syllabus.
But if I have no trouble with that level of abstraction, what's with the cutesty way of describing the problem?
Get rid of all this chocolate-covered broccoli. Just say and show what you mean.
Computers like numbers. Vectors are lists of numbers. Vectors come with concepts like length and distance. We want to transform words into vectors so that words we think of as similar are close together as vectors.
There are many ways to translate words into vectors. Here are 5-10 examples of how we might do that. What are some pros/cons? What relationship(s) do they make clear or obscure?
Get them thinking about what it means to embed things and why we'd want to embed words one way vs. another. That'll pay dividends. Having them remember "where the apple icon goes" isn't going to be something they'll benefit from reflecting on in any future experience.