I’ve been using SRS on and off since 2006 and this post addresses one of the key failure modes I’ve seen in SRS usage—they tend to be fantastic for discrete pieces of information, like the names of capital cities, the pronunciation of various symbols or various physical constants.
Most my usage of SRS in the early days was languages. For that domain, SRS is useful for learning an alphabet (or syllabary), for scaffolding enough common words to start understanding graded readers and for a few other tasks.
They’re terrible as a primary learning strategy, though! Too much of language is highly nuanced and context dependent and the only way to absorb all the unwritten rules, the common collocations and the precise boundaries of word meanings is through extensive input. This means encountering the same words and structures in countless different variations and contexts, not the same few sentences drilled repeatedly (at least past the beginning stages).
By paying attention to the specifics of learning math and tailoring the SRS to match it, it’s possible to make a far more efficient system than anyone could with an Anki deck. I’d love to see similar efforts in other subjects!
For this reason, I’ve seen it recommended in various language learning guides to use Anki in a manner that’s complementary to consumption of media in the target language. The general idea is to do Anki-only for the earliest bootstrapping stages (super basic vocab) and switch to media+Anki as early as possible.
The media gives both context and real experience with the language while Anki serves to commit the words you’ve read/heard (even those not used quite as commonly) to memory. I’ve only just dipped my toe into this method but have seen many reports of success with it from others.
You might find, as you get deeper into the language learning journey, that this approach to practice is most popular among an extremely vocal group of intermediate language learners. But people who've been doing it for a long time tend to gravitate toward the parent poster's opinion.
I've also noticed that assiduously sentence mining your input and then drilling it in Anki is much more popular among people whose media consumption is almost exclusively video content. This resonates with the work of SLA researchers such as Paul Noble who've found that extensive reading is easily the fastest way to grow vocabulary in a new language. There's even some (scant) evidence that SRS combined with extensive reading is less effective than extensive reading alone for long-term vocabulary growth and retention.
It suggests, to me at least, that heavy reliance on SRS for vocabulary retention past the beginner stages of learning a language may not be a universal optimal path, so much as a backup option for people who don't enjoy reading. Like AlchemistCamp I do still like it a lot for specific things, and I'm not here to wag fingers at people who enjoy it (Above all else, make sure you're enjoying your study routine! Getting there should be 100% of the fun.), but I do still worry that many Anki users have a serious case of golden hammer syndrome.
For sure, by no means am I married to the method and am very flexible and open to trying various approaches. I do believe however that a little bit of structure can be helpful in the beginner to intermediate levels, where it's less likely that one has formed a habit/urges to watch/listen to foreign media without subtitles or read untranslated print media. It's a real slog when you don't have a decent arsenal of words to work with.
Perhaps I should define terms. In SLA research, "extensive reading" specifically means reading material where you know at least 98% of the words in the text. At that level you can generally follow the text without a dictionary - including figuring out the meaning of new words and expressions from context - so almost by definition it's not a slog.
It's not just a second language acquisition thing, either. At least in the USA, many educators recommend younger readers do the same thing when reading in their native language, in part to save them having to repeatedly refer to a dictionary.
Anki is great for building vocabulary. It's not suitable for actually learning the language. I use it constantly, averaging over 200 cards a day for over fifteen years.
If you have an Android device, go install Ankidroid. It's absolutely terrific, letting you fill in those wasted seconds when you're waiting in line, sitting on the toilet, waiting for the pot to boil. Two or three cards at a time, dozens of times per day, adds up.
You could treat yourself to three cards after every git commit, for instance. Takes literally less than ten seconds to review three cards.
A couple of problems I found with Anki for languages:
1. The words aren't in context, which leads to "flashcard blindness", where you see a word in context, know that you recognize it, but can't remember what it means.
2. The theory behind spaced repetition is that you learn most efficiently when you try to remember something just when you're about to forget it. But that means that if the algorithm is working properly, every single card is hard. This makes motivation a problem, because you know studying is going to be a grind, not fun.
3. While "being able to remember it after thinking a second or two" might be fine when studying for an exam, or in many other contexts where memorization might be important, that's too slow for languages. What you want for a language is "know it immediately without having to think about it".
4. The "scheduled review" system is too inflexible. Some days you get only a handful of cards to review, some days you get dozens. It's hard to tell when you're starting out how many cards you should be adding each day such that the number of cards match the amount of time / effort you have to study. Furthermore, if you skip a single day, you have twice as much the next day; and if life happens and you end up missing a week or a month, you come back with a giant jumble of cards, half of which you've forgotten, and it's really difficult to dig your way out of it.
By a strange coincidence, in 2019 I also started on an alternate to Anki to help myself study Mandarin. It has some similarities to his system, in that there's multi-level knowledge; but it's different in that instead of having a fixed schedule, it has the concept of "difficulty" and "study value" for each word / grammar concept, and the algorithm tries to give you a full "readunit" of "native input" which will balance the two. "Spaced repetition" emerges naturally from the model, and if you go away for a week (or 6 months), it knows you've forgotten some things, so it gradually refreshes your memory. And because you're reading actual native text, there's something which pulls you in.
It's in closed beta now; the first public language (MVP) will be in Biblical Greek, but the second one (if it happens, maybe in a year or two) will be in Mandarin; and hopefully there will be other ones after that. There's a sign-up form you can use to be notified for updates.
> Furthermore, if you skip a single day, you have twice as much the next day; and if life happens and you end up missing a week or a month, you come back with a giant jumble of cards, half of which you've forgotten, and it's really difficult to dig your way out of it.
The Anki system is actually really good at dealing with missed days. You end up recalling most of the cards that you skipped review for, and the system gives you extra credit for the increased interval before review, in that subsequent repeats for the same card will be spaced out even further. Cards that you outright fail to recall due to the missed reviews are a problem, but the best way to 'dig your way out' of that hole is to keep reviewing on a regular schedule and not to overwork or cram. The backlog gets cleared rather quickly.
Sure, a single missed day isn't bad; but have you ever skipped a month?
I'd been using Anki for Mandarin flash cards for a couple of years, and decided I wanted to memorize the "outs" (probabilities) of various hands for Texas Hold 'Em Poker. So I made a separate Anki deck and used it for a few months. Then I got busy, and stopped studying the poker deck (while maintaining the Mandarin deck). When I tried to pick it up again, it was just impossible -- I'd forgotten so much, and the card list was so long, that if I said "I forgot this", it would be scheduled for the next day, but because it was behind 100 other cards, I wouldn't actually be shown it for a week or two. The system was just completely broken.
Contrast that to the system of my own that I developed, based on the "study value" (effect of studying now on the difficulty) rather than fixed timeouts. After working on and using my own system for about 4 years, I got a bit tired of it (and I was also in the middle of redesigning the database from the ground up), and so decided to give Duolingo a go for a bit, just to see what it was like. Six months later, I came back to my own system, and it was great -- just slowly eased me back into the vocab I had before. The same is true after missing a day, or a week: it's always welcoming to get back into, rather than terrifying to get behind.
It's not broken, it's just doing its best to cope with your situation. Missing lots of days means that there will be lots of cards where it's just not clear if you're going to recall them or not. Figuring that out becomes the priority, then the system is effectively back to normal - possibly with some missed cards that will have to be learned again. I'm not sure how one could do better than that.
The comment you replied to described an edge case and explained why it's broken in that particular case. You haven't actually responded to the example provided.
> it's just not clear if you're going to recall them or not. Figuring that out becomes the priority
Presumably the priority ought to be (re)starting with a small subset of cards and gradually trickling the others back in. The algorithm needs to account for time spent by the given individual and adapt to changes in that over time.
I haven't used Anki for about a decade so I'm not familiar with the current state of things. At the time a major factor in my dropping it was that I found the algorithm to be more of a hindrance than a help.
It's not at all clear that "restarting with a small subset of cards" is better. It means that you're very likely to fail to recall almost every card outside the small subset, which increases the work you must do to memorize the deck again. Any card that you verify immediately is going to save you a lot of work down the line.
> It's not at all clear that "restarting with a small subset of cards" is better.
Sure it is. It could grow as quickly as allowed for given the time invested by the user. That could mean a return to the full subset within the span of a single day, or it could mean many months. Perhaps even never. It all depends on time invested by the user going forward. Starving regular review for the sake of verification is an example of an algorithm failing when faced with the real world.
At minimum it is clear from what was said that (better) prioritization between conflicting goals is needed. That somewhat matches my own experience with it from years ago. The algorithm was simply not flexible enough to fit my own usage patterns. In other words I was not part of the target audience, which I found frustrating because I very easily could have been.
Keep in mind that any unverified cards have been "starved of regular review" for longer than any of the cards that have been shown at least once already. It makes sense to prioritize them, at least once they've become "due" for review. The fact that some users might find this unfamiliar or even confusing (because it only happens after you've taken a break and then resumed using Anki, so quite rarely) doesn't make it broken.
> Missing lots of days means that there will be lots of cards where it's just not clear if you're going to recall them or not. ...I'm not sure how one could do better than that.
It's not clear to Anki because its model ("due" or "not due") and algorithm ("study the card that's been due the longest") are too simplistic. I know it can be done better because I built a system that does better with the same data by having a better model ("expected difficulty") with a better algorithm ("study the card which will have the highest long-term impact on difficulty").
I've seen studies that show that the optimal learning rate occurs when you are 70-75% likely to succeed in a challenge. If it is much easier you tend to downgrade your effort, which leads to learning to half ass at worst, and slow learning at best. If it is much harder the stress of the challenge actually inhibits learning. The exact optimum is thought to be person dependent as a lot has to do with how you respond to challenges, but for the average person 70% effort is the sweet spot.
As I said in a parallel thread, if you're going for "most number of facts you can recite per hour of time spent studying", I can well believe flashcards with a 70% failure rate are "optimum". But if you're trying to have a conversation, watch a movie, or read a newspaper article, and there's a 30% chance you're not going to recognize any given word, you're going to have a hard time.
What you really want is an appropriate level of difficulty for an entire thing you're trying to understand. This could be either because you have one completely new word per paragraph, or because you have 5 moderately difficult words, or 10 not-too-hard words. The fact that the rest of the words might already be super easy for you doesn't mean you aren't still reinforcing them.
That's basically what my algorithm is trying to do: hand you something to read (a sentence, paragraph, section, chapter, whatever) that's at the "right level" of effort for you.
> 1. The words aren't in context, which leads to "flashcard blindness", where you see a word in context, know that you recognize it, but can't remember what it means.
Clozes should help with this. If anything, I sometimes find it easier to remember words when they're presented in a sentence, though obviously you have to be cautious of overfitting.
In my experience, it's actually easier to memorize a sentence than to actually learn the principles behind something. (Turns out this is also true for neural networks, and there are loads of techniques for counteracting it.)
So OK, to counteract memorization and lack of contextualization, you have 5-6 sentences with the same word. But now Anki doesn't know that they're related, so the SRS system can't actually space out the learning the way it wants to.
With the system I developed, you're given a full phrase / sentence / paragraph / section / chapter, and it separately tracks the words or grammar elements you've seen, in a way similar to that described by OP. So you're always actually reading native content, of which much of the content will be new even if the words are already known to you.
I have been quite enjoying Clozemaster[0] for anki-based language learning. Actually, I think I got the recommendation from HN. I'm still on the free version right now, but the pre-curated lists, ChatGPT integration where it breaks down the grammatical translation of the cloze, and convenient links to wiktionary and native pronounciation examples all to be extremely helpful.
So let's separate out "effective" from "efficient".
I'm perfectly happy to accept that the "every single card is hard" is the most efficient system for memorizing facts: i.e., that if you measure the number of facts you can recite and divide it by the amount spent studying, that SRS will come out on top.
But is that the most "effective" -- will it actually result in you learning more facts at the end of some time frame?
For that you need to know not only the amount learned per unit of studying, but the amount that you actually study; and the amount you study depends in part on your motivation; and your motivation depends on how hard / engaging the study is.
Suppose that with every card being hard, you study on average 20 cards a day; but that with every card being only moderate, you average 60 cards a day. Even if the "effectiveness" of moderate card study is only one half of difficult card study, you still end up learning more, because you've studied three times as much.
The idea that after seeing a given card 30 times over the course of a year, you'd somehow end up knowing it less well than if you'd seen it only 10 times, because it wasn't "hard enough" when you did see it, seems really unlikely to me.
More than a decade ago I worked as a research programmer on a similar AI tutoring product for hierarchical skills like mathematics at Carnegie Mellon (in conjunction with the Pittsburgh Science of Learning Center and Carnegie Learning).
The system would prompt students with problems that incorporated dozens of sub-skills, each of which incorporated other sub-skills and so on. If the student gets the problem right, all requisite sub-skills are marked as reasonably strong. If they get it wrong then the statistical model concludes that at least one sub-skill must be weak/faulty/missing, and when selecting subsequent problems the system would incorporate problems with different subsets of the original sub-skills to identify the missing sub-skill(s), with the goal being to spend time reinforcing only those skills that the student has not mastered.
I no longer work in that domain, but I recall it being amazingly efficient -- it often only took a dozen questions to assess a student's understanding of thousands of skills.
Seems quite close to the 'active learning' ML topic: let the model choose the question that will bring the best information return.
And this kind of decomposition using simple knowledge pieces (logical axioms?) is IMHO what we have to do to bring LLMs to senses. These pieces should light up in the intermediate embeddings inside the LLM. It won't really be more intelligent, but it'll model reality better.
There lots of knowledge types where Anki might not be the ideal choice. For example,
- itemized lists of information. E.g. The 5 reasons for the civil war, type of questions. What to do, if you miss one item? Your card as a whole gets over exposed, while the missed item might get underexposed.
- Related information, e.g., history dates. When using Anki the students tends to latch on to incidental facts, e.g., Treaty of Paris, ah I remember it has two repeating digits, yes, 1899. While it would be more useful for a student to think, ah, that was after the Cuban independence struggles in the 1890's.
Any user will recognize that this effect is very strong. You tend to remember cards by the most trivial of things.
- Hierarchical knowledge. Things like chess openings. How do you put that in Anki cards. It's all a kludge. Where to put the variants, etc.
- Knowledge networks. Things like medical information (where Anki is hugely popular). But typically, you get cards with massive amounts of information because you have lots of linked information, (symptoms, causes, treatment, pharmaceuticals, etc. )
- Even in language learning. We have the useful fiction that there is a 1-1 relationship between words in two languages, but there almost never is.
- Learned knowledge tends to be linked to Anki. It happens often that you can remember things when doing Anki, while being at a loss when needing the information in the real world.
Add to that, that Anki does nothing for conceptual understanding. You really need to learn a subject before memorizing it, but in the learning phase Anki is not helpful.
So in short, yes, Anki is the best tool there is to help learning, but I'm sure better tools can be made, especially when targeted to a particular knowledge field.
> - Even in language learning. We have the useful fiction that there is a 1-1 relationship between words in two languages, but there almost never is.
It's not really a problem: space repetition is a way to build familiarity with vocabulary. Once it is done, more finer uses can be inferred from context. The value lies in breadth not depth.
It's a very bad way to build familiarity with vocabulary for two reasons—it doesn't teach word boundaries and it doesn't teach collocations.
For example, even simple words like "nose" differ a bit from language to language. Can "nose" refer to the thing that smells on a pig or an elephant? In English, no. In Chinese, yes. In Malay, people and pigs have noses but an elephant's trunk is a different word.
In English if someone asks you how you are, it's reasonable to reply "absolutely fantastic" or "pretty good", while it would be a bit odd to answer "pretty fantastic" and very strange to answer "absolutely good". This isn't because of a grammatical issue that can be memorized. It's just that certain words tend to be used together and the only way to really consistently get it right is to get a lot of input and develop a feel for the language.
Extensive reading is a better way of building and maintaining vocabulary than anything you can do with flash cards and at the same time you'll be gaining understanding about the common stories, beliefs and culture of the speakers of the target language.
Hmm, sort of agree for the later stages of learning, but learning by flashcards is incredibly helpful at the beginning and to get started.
Referring badly to a nose on an elephant is better than not having any word ready at all.
Saying „absolutely good“ is totally fine for most people if you are a beginner in the english language. (Also, you can put phrases on flashcards)
And sure, reading extensively is a great way to build vocabulary. But it takes a while (depending on the language quite a long while) before you can begin to read extensively.
A lack of suitable resources is often a problem at the lower levels. FWIW, I experienced this first hand as the first foreign language I learned to a relatively fluent (~B2 level) was Japanese and that was from 2000-2002.
Graded readers are ideal early on, but if the language you're learning just doesn't have them, then going through multiple textbooks aimed at your level (e.g. an intro textbook from publisher A and another intro textbook from publisher B) is a good strategy. Materials aimed at 5 or 6 year-old native learners are also often a decent path. It's not exciting but there's a lot of repetition, there are a lot of pictures and if it's a character-based language, there will be a syllabary like hiragana/zhuyin/pinyin to help you.
Depends on learner. I’ve seen people really love graded readers from an early stage. I personally can’t stand them until the 1000-2000 word or more range. Note that I am a big fan of extensive reading, but the lower level stuff just makes my eyes bleed, and in some cases is just bad (e.g., graded English readers that use lower frequency meanings of common words to meet a word limit).
> going through multiple textbooks aimed at your level (e.g. an intro textbook from publisher A and another intro textbook from publisher B) is a good strategy
Very underrated, imho. I wish more folks would do this.
> Materials aimed at 5 or 6 year-old native learners are also often a decent path
Again, this makes my eyes bleed. There is often quite a bit of words that are aimed at children that maybe aren’t the best for beginning learners. I can’t remember what they were with Japanese, but I’m pretty sure a lot of them were onomatopoeic words that every Japanese kid knows but non-native speakers do not… and probably should not learn that early in the process.
Iirc, third grade is when the language becomes more “standard” and less kids talk, middle school stuff is godly for beginning mid-range fluency, and high school content fleshes out mid-range fluency (as one might expect).
Books and online content aimed at teenaged readers can be surprisingly accessible, with the main challenge being to find material with substance (imho).
> Extensive reading is a better way of building and maintaining vocabulary than anything you can do with flash cards
Eh, I think that this is only true at certain ranges of fluency development.
Note that I am a huge fan of extensive reading, and I don’t think folks use it enough, but…
1. It’s usually prudent to brute force the first few hundred words, maybe up to 1000, depending on one’s access to quality graded readers. The theory says that you want 95-98% lexical coverage for extensive reading to reach its highest potential.
2. To maintain general fluency, extensive reading just can’t be beat.
3. That said, for domain specific vocabulary and/or low frequency vocabulary, cards are almost necessary since the space in between exposure can be incredibly wide. For reference, as a native speaker of English, I still add new words that I run across to a vocab memorization list — recent additions are petard and malapert. Frankly, I’m not sure I will ever run across these words again in a text, but I want to know them and (in the case of malapert) use them. For specialists in a field, knowing things like “nuclear non-proliferaiton treaty” or “bilateral negotiations” might be worthy of flash card study for folks in politics/political science.
> "I still add new words that I run across to a vocab memorization list — recent additions are petard and malapert."
I'm familiar with those only due to Shakespeare. I don't think I've ever heard anyone say "malapert", but have heard definitely heard friends say, "<person> was hoisted with <his/her/their/my> own petard" occasionally while playing board games as a teenager. Those friends must have picked it up from Hamlet. None of us used SRS for vocabulary building back then. We just read a lot.
A couple of problems I found with Anki for languages:
1. The words aren't in context, which leads to "flashcard blindness", where you see a word in context, know that you recognize it, but can't remember what it means.
2. The theory behind spaced repetition is that you learn most efficiently when you try to remember something just when you're about to forget it. But that means that if the algorithm is working properly, every single card is hard. This makes motivation a problem, because you know studying is going to be a grind, not fun.
3. While "being able to remember it after thinking a second or two" might be fine when studying for an exam, or in many other contexts where memorization might be important, that's too slow for languages. What you want for a language is "know it immediately without having to think about it".
4. The "scheduled review" system is too inflexible. Some days you get only a handful of cards to review, some days you get dozens. It's hard to tell when you're starting out how many cards you should be adding each day such that the number of cards match the amount of time / effort you have to study. Furthermore, if you skip a single day, you have twice as much the next day; and if life happens and you end up missing a week or a month, you come back with a giant jumble of cards, half of which you've forgotten, and it's really difficult to dig your way out of it.
By a strange coincidence, in 2019 I also started on an alternate to Anki to help myself study Mandarin. It has some similarities to his system, in that there's multi-level knowledge; but it's different in that instead of having a fixed schedule, it has the concept of "difficulty" and "study value" for each word / grammar concept, and the algorithm tries to give you a full "readunit" of "native input" which will balance the two. "Spaced repetition" emerges naturally from the model, and if you go away for a week (or 6 months), it knows you've forgotten some things, so it gradually refreshes your memory.
It's in closed beta now; the first public language (MVP) will be in Biblical Greek, but the second one (if it happens, maybe in a year or two) will be in Mandarin; and hopefully there will be other ones after that. There's a sign-up form you can use to be notified for updates.
ad 1: have you tried using Cloze cards for that? I like to group eg 1-2 reasons, then I will see the list with 1 or 2 items missing
ad 3: I had success for similar problems by simply creating a lot of cards that give enough context and just ask for the next step. In chess openings, couldn't you just display the current position and ask something like "In opening X, variant Y, what are the next moves for white here"? In some cases I've written scripts to create cards for all variations of a question I want to ask
ad 4: I think Cloze deletions can help to some extent here (I've basically made Cloze my new default card type), but you are probably running into the limitations of Anki there
ad 5: language learning, specifically vocab lists, has always baffled me as a use case for SRS. there is so much context that you need in order to use words proficiently (in what kind of medium was the word used? what register was used (formal/scientific/informal/...)? was it used ironically, empathetically,...?). the only way to learn language imho is to immerse youself as much as possible, through ways where it gets actually used, not such artificial environments
the one thing that I'd like to see changed about Anki would be to have more options for changing the scheduler, or making it easier to use custom schedulers on multiple device types. I simply don't like the logic of SM2/FSRS of hiding a card from you until a specified date and assuming that you'll be reading it then. if you don't open the app for a while, the review dates get completely messed up (I've had new cards scheduled for review sometime in the 2040's). I love the interface and that you can use HTML to enter cards, but I just want to put knowledge in there and get exposed to it from time to time. I wish there was a scheduler that just randomly shows you cards, with probability roughly proportional to how urgently you need to see them. and do not interpret too much into the fact if I haven't opened the app for half a year but still remember some of the cards well. I don't mind seeing those cards "too often", but I do mind if Anki hides the knowledge that I put in there from me, for years or decades even
It's not the logic of FSRS. In fact, FSRS knows the probability of recalling any cards in your collection, which may means how urgently you need to see them. So it can be implemented via an add-on.
1. The best card-creation practice is to divide these itemized lists into separate cards. So ideally, you shouldn't have 5 items on one card anyway.
2. Dates are pretty important to know, so I'm not sure what the issue is here.
3. With language learning, there is no reason why you need to make cards word-word. Personally I use image-word much more often, which correlates more to how we learn our first language.
It happens often that you can remember things when doing Anki, while being at a loss when needing the information in the real world.
I haven't had that experience at all. But if you have, there is a simple fix: create cards that mimic the situation you'd use the phrase in. For example, have the phrase "a cappuccino, please" matched with a photo of a barista in a coffee shop.
Anki does nothing for conceptual understanding. You really need to learn a subject before memorizing it, but in the learning phase Anki is not helpful.
I also don't agree with this. Learning the "foundational" aspects of something is critique for understanding it conceptually. For example, knowing the boring geographical details of European borders circa 1900 is key to understanding the subsequent 45 years of geopolitical conflicts.
In general, I think most of the critiques of Anki/spaced repetition/flashcards are mostly just critiques of the "typical" way people make cards and use the apps. If you get a little more creative with creating cards, most of these issues go away.
That all said, I do agree that Anki is definitely not perfect and could be improved – but more in the sense of making better-designed cards and better practices the default, not something you have to implement manually yourself.
> In general, I think most of the critiques of Anki/spaced repetition/flashcards are mostly just critiques of the "typical" way people make cards and use the apps.In general, I think most of the critiques of Anki/spaced repetition/flashcards are mostly just critiques of the "typical" way people make cards and use the apps.
Somwhere, there's gotta be an expert/course/training on how to best use SR, that is to say, how to best ingest knowledge into the system and prepare the reptition items themselves. Kind of like a best-practices approach ...
Who knows, right, maybe AI-assisted SRS can become so effective ...
... that the "singularity" ends up being us :)
(we become the superhuman intelligence that we were seeking to create ...
... by "partnering" with increasingly intelligent systems).-
In a sense it'd make much sense: We are as good a starting point as any, if not better.-
> We have the useful fiction that there is a 1-1 relationship between words in two languages, but there almost never is.
But you aren't forced to drill single word L1->L2/L2->L1 cards. You can drill sentences; I've never found single vocabulary word cards useful, or translation cards (before more advanced stages - B2/C1 - rather than early on.) What I found useful were full L2 sentences, often embedded in paragraphs, with words missing ("clozes.") There's also an L1 version of the complete sentence in small print on the front of the card, to use as a prompt to figure out the L2 word being asked for, but it may be phrased completely differently.
They allow you to just learn languages as they are. You can also drill synonyms and antonyms: single L2 word on the front, multiple L2 synonyms and antonyms on the back (and maybe a terse also L2 definition.) When the card comes up, name as many as you can out loud, say the definition if you can remember it. If you got a lot of them, and/or could repeat the definition you pass the card.
Also, an author (David Parlett) who studied the process of language learning from written grammars and radio broadcasts/ethnographic recordings (i.e. without an instructor or specialized recordings, usually very small languages) advised a long time ago that one tackle the hard part(s) first: coming from one language to another there are features that have no parallel in the languages one already speaks but are very important to be able to use. For example, if you're going from English to a Romance language, verbs and their conjugations. Anki can just let you learn those by rote and figure out how to use them later; if you separate each conjugation onto its own card, it will eventually be effortless. Then all you have left is vocabulary and set phrases, and all of the vocabulary and set phrases you read from native material will be sandwiching another verb that you can conjugate, being reinforced in that conjugation.
What I'm saying is that you can't dismiss Anki because you think that translating word by word between languages is a dead end for learning languages. There are any number of ways to use Anki; ingenuity doesn't stop at the existence of a spaced repetition effect, you can subject that to a little further engineering. People are doing all of the above, I certainly am.
> You really need to learn a subject before memorizing it, but in the learning phase Anki is not helpful.
I disagree with this for similar reasons. A lot of the learning phase is memorizing a bunch of vocabulary and units and remembering basic checklists. Having that done before you show up to do the learning will accelerate that learning significantly.
I don't think spaced repetition and Anki are limited by not being all-encompassing. It's a tool for remembering largely atomic things, and we have to figure out how to apply it.
But overall I probably have to agree with you on some level, because I think that Anki itself promotes a particular style of usage that may not always be ideal. Anki encourages a usage that reflects its simple model of spaced repetition, which is largely borrowed from Supermemo, and they make it difficult or impossible to change that behavior to the point of actively discouraging users from experimenting. I find it annoying that Anki is very opinionated, and I think that its decisions about how it should work were partially shaped by the fact that it started very amateurishly put together, and adjusting one's opinions to match the interface is easier done than the opposite. There's not a lot of good, exacting science around spaced repetition, and all the papers people cite are old, of very small size, and not very systematic or adventurous. It's too early to be opinionated.
Say you have concepts/items/cards A, B and C, with
A -> B -> C (C encompasses B, B encompasses A, keeping the notation from the article).
As I understand it, the article advocates for showing C first, then you can assume that you also know B and A to at least some part, and save yourself the repetitions for these.
Intuitively, I would have guessed the opposite approach to be the best: Show A first, suspend B until A is learned (by some measure), then show B, etc.
That means no repetitions to skip, but also you get less failures (and thus additional repetitions) that occur as follows: you are shown C, but don't know B anymore, and thus cannot answer and have to repeat C.
If you are shown C before B, you kinda make C less atomic (you might have to actively recall both, B and C to answer it), showing B before C makes C more atomic, as you will have B more mentally present/internalized and can focus on what C adds to B.
1. First want to clarify that the learner is first introduced to the topics through mastery learning (i.e., not given a topic until they've seen and mastered the prereqs). So, they would explicitly learn A before learning B, and explicitly learn B before learning C. It's only in the review phase when we do all this stuff with "knocking out" repetitions implicitly.
2. When you say "then you can assume that you also know B and A to at least some part," I want to emphasize that if C encompasses B and B encompasses A in the sense of a full encompassing that would account for a full repetition, then doing C fully exercises B and A as component skills. Not just exercises them "to some part." For instance, topic C might be solving equations of the form "ax+b=cx+d," topic B might be solving equations "ax+b=c," and topic A might be solving equations "ax=b."
3. This scenario should never happen: "you are shown C, but don't know B anymore, and thus cannot answer and have to repeat C." There are both theoretical and practical safeguards.
3a-- Theoretical: if you are at risk of forgetting B in the near future, then you'll have a repetition due on B right now, which means you're going to review it right now (by "knocking it out" with some more advanced topic if possible, but if that's not possible, we're going to give you an explicit review of B itself. In general, if a repetition is due, we're not going to wait for an "implicit knock-out" opportunity to open up and let you forget it while we wait. We'll just say "okay, guess we can't knock this one out implicitly, so we'll give it to you explicitly."
3b-- Practical: suppose that for whatever reason, the review timing is a little miscalibrated and a student ends up having forgotten more of B than we'd like when they're shown C. Even then, they haven't forgotten B completely, and they can refresh on B pretty easily. Often, that refresher is within C itself: for instance, if you're learning to solve equations of the form "ax+b=cx+d," then the explanation is going to include a thorough reminder of how to solve "ax+b=c." And even in other cases where that reminder might not be as thorough, if you're too fuzzy on B to follow the explanation in C, then you can just refer back to the content where you learned B and freshen up: "Huh, that thing in C is familiar but it involves B and I forgot how you do some part of B... okay, look back at B's lesson... ah yeah, that's right, that's how you do it. Okay, back to C." And then the act of solving problems in C solidifies your refreshed memory on B.
Anyway, I think I've clarified all your questions? But please do let me know if you have any follow-up questions or I've misinterpreted anything about what you're asking. Happy to discuss further.
I guess math is uniquely suited for this kind of strategy, but would you say it translates to learning concepts in other domains too?
I was thinking about whether something like "what is X?" -> "What field is X used in?", which seems to form a hierarchy for me, would benefit of this technique? Personally, I found that for something like the preceding example, I could answer the second question without thinking about what X is at all, just by rote memorization of the wording. Happened to me quite a lot when I was using Anki. And actually, I guess this is even acceptable in some way, since the question is not about activating "what X is", but "what X is used in". What I am trying to express: I feel like I would not necessarily activate a parent concept by answering a child concept, and I think that might be true for a lot of questions outside math problems, although they form a hierarchy. So I am wondering what you think about the general applicability of this technique...
Please don't take all of this questioning wrong, I think you are doing pretty cool stuff, and I am grateful for everyone trying to push the boundaries of current SRS approaches :-)!
Yeah, you're right that the power of this strategy comes from leveraging the hierarchical / highly-encompassed nature of the structure of mathematical knowledge. If you have a knowledge domain that lacks a serious density of encompassings, there's just a hard limit to how much review you can "knock out" implicitly.
> I feel like I would not necessarily activate a parent concept by answering a child concept, and I think that might be true for a lot of questions outside math problems, although they form a hierarchy.
This is where it's really important to distinguish between "prerequisite" vs "encompassing." Admittedly I probably should have explained this better in the article, but you are right, prerequisites are not necessarily activated. If you do FIRe on a prerequisite graph, pretending prerequisites are the same as encompassings, then you're going to get a lot of incorrect repetition credit trickling down.
We actually faced that issue early on, and the solution was that I just had to go through and manually construct an "encompassing graph" by encoding my domain-expert knowledge, which was a ton of work, just like manually constructing the prerequisite graph. You can kind of think of the prerequisite graph as a "forwards" graph, showing what you're ready to learn next, and the encompassing graph as a "backwards" graph, showing you how your work on later topics should trickle back to award credit to earlier topics.
Manually constructing the encompassing graph was a real pain in the butt and I spent lots of time just looking at topics asking myself "if a student solves problems in the 'post'-requisite topic, does that mean we can be reasonably sure they truly know the prerequisite topic? Like, sure, it makes sense that a student needs to learn the prerequisite beforehand in order for the learning experience to be smooth, but is the prerequisite really a component skill here that we're sure the student is practicing?" Turns out there are many cases where the answer is "no" -- but there are also many cases where the answer is "yes," and there are enough of those cases to make a huge impact on learning efficiency if you leverage them.
I still have to make updates to the encompassing graph every time we roll out a new topic, or tweak an existing topic. Having domain expertise about the knowledge represented in the graph is absolutely vital to pull this off. (In general, our curriculum director manages the prerequisite graph, and I manage the encompassing graph.)
Happy to answer any more questions if you've got any! :)
I've been wanting to request an Anki feature for a while that is in this vein (and would enable this style): allowing cards within a deck to be tagged with a trigger, which would activate when all (or some portion of) cards within a deck also tagged with that trigger reach "maturity." When activated, it can add a tagged batch of cards from the deck as new cards for review, or more generally it could change their state (from/to Young, Mature, Suspended, etc.).
This would allow cards to be automatically introduced in groups, rather than the "just give me 10 new cards every day" thing. It could also allow a default suspended deck style, where all but a few cards begin suspended, and they are unsuspended in an orderly way based on one's progress with the active cards. That could be a pattern that works for drilling hierarchies of knowledge.
There's a deck style called the KOFI (conjugation first) system for learning Romance conjugations that would really benefit from that, which now depends on people manipulating the decks manually with the aid of an abused deck description field describing that manipulation.
Anki is pretty hostile to order, because spaced repetition is all about moving individual cards optimally and without regard to other cards, but the order in which information is introduced can be important. At this point you can only choose between introducing cards in a linear order, or completely at random. There's no way to deviate from that without the user having to learn to juggle things manually. A deck creator can't just suggest a schedule programmatically, even if the deck was designed around that schedule.
The other half of this, where cards that duplicate combinations of other cards are automatically strengthened when their related cards are strengthened (or vice versa) is interesting. Seems like a neuron model.
How do you feel about hierarchical tags? No, I don't mean in the way that Anki has a "file tree/directory" kind of structure for tags and decks, but like, a separate "concept" hierarchical graph that organizes the tags.
For example, consider two tags, "Fractions" and "Prime Numbers". One should know Fractions before they study Prime Factors, and one could represent that using a drag/drop UI like <insert generic mindmap tool>. This "concept hierarchy" would organize using tags. This way you could still syntactically tag cards with, say, "Prealgebra/Fractions" and "Prealgebra/Prime Numbers". In this way one could have a "syntax" tree and a "semantic" tree/graph. (File trees can't represent graphs, as you may know.)
One problem with making cards directly hierarchical with each other is that people have many cards, and organizing them individually can be a pain (and questionable usage of time).
I really hope to see more research like this. I feel like LLMs have insanely high pedagogic value, and that's because I've been using them to teach myself difficult subjects and review my understanding.
The issue with the kind of system that the author is proposing for curious self-driven learning is that the SRS is optimized for a given curriculum.
Many people including myself use flashcards to guide (and sort of "create") their own curriculum. Flashcards with SRS are really good for many things, but it's difficult to generalize them for thus usecase.
I'd really like to see some prototypes of people integrating LLM intelligence in creating and adjusting cards on the fly. It's clearly something LLMs excel at, especially the larger closed-source ones like Claude 3.5 sonnet
I really hope to see more research like this. I feel like LLMs have insanely high pedagogic value, and that's because I've been using them to teach myself difficult subjects and review my understanding.
For mathy subjects you usually have to verify things yourself so you'd detect any mistakes as you try to work things out or makes sense of the information. You can't learn math, physics or theory for engineering by remembering and believing things. It would be different for "bag of random facts" subjects though.
I don't use an LLM in isolation. I use it alongside reading material, books, papers etc.
Since I'm using it in involved studying sessions, it is totally sensible to cross-check most of what the LLM outputs, since that's part of the learning process anyway!
In my experience learning design, some math, basic android dev with local and private models, they are very reliable if you use them "in the loop" (ie, as part of a process).
During programming, it's extremely easy to verify what the LLM says is correct by running code.
For math, this is a bit more difficult, but so far it's only been surface level math. I know that LLMs are good at teaching concepts and ideas in math, but not at _doing_ math. I don't fully trust an LLM to teach me more advanced math because I have no way of verifying what I learn is
* Correct
* Relevant
* The "right" way of learning it
For design, which I'm currently studying a lot for work and personal projects, it does well. I'm using Claude to help me solidify my understanding by asking it to critique my summaries and quiz me on topics. I know Claude is correct because I'm using it to solidify my understanding and how topics interrelate, but not to learn new topics. I already sort of "intuitively understand" these topics, but am training myself to go a bit deeper.
They work really well, and I really do understand the skepticism, but in practice it's unwarranted.
Hallucination isn’t really a problem when you can verify the ground truth. For example if you’re using it as a companion to a Khan Academy math problem with immediate feedback.
This fear that LLMs are hallucinating constantly, and therefore entirely unreliable, is unfounded.
Sure, that'll happen if you're going into extreme limits of areas of knowledge - but if you're dipping your foot into superficial waters of known areas, it's reasonably fine (and perhaps even more sound than the hotter areas of Wikipedia).
E.g. if you're programming in the most well-known languages (say Python), and you're only asking beginner questions (which is where high pedagogic value lies) it's unlikely that the program will hallucinate all that deeply to mislead you. Yes, you won't know, but simple/mid questions are easy to check.
You shouldn't of course head to LLMs for the most arcane scientific/mathematical knowledge and conclusions; if you do that, that's on you, because offering expertise is not what LLMs are designed for, and the disclaimers all warn against it. But you shouldn't throw out the baby with the bathwater.
This is why my initial flashcards on a topic would be cover a breadth of individual details, but later I'd start creating concept based flashcards where I'd ask things like "how does it make sense that....?"
After somewhat memorizing foundational knowledge, understanding concepts and having frameworks of thinking for a topic will further increase critical thinking while reducing the amount of cognitive effort required for critical thinking about unfamiliar situations.
I hope one day these more efficient models that "clean up" the review schedule by linking/identifying related flashcards become publicly available.
Now these are the insights I'm looking for when it comes to "second generation" SRS software suites. When you focus your attention on a specific niche, like mathematics here, you can get some serious improvements in retention just by using the shape of the subject to your advantage like this.
Math pedagogy in the US leans to understanding being more important than practice. This is wrong. As you point out, understanding more often comes from practice. To correct this imbalance, software like your company is developing is useful, as are platforms like ALEKS, iReady, etc.
The challenge is addressing the social aspect of learning -- the benefits students get from working and learning together. Is it still a classroom if all the students are doing different things?
Of course, students come in with different abilities and levels of preparation. Differentiation that works in a class setting is very hard. Software is the easier part. Students who are motivated to learn and can grow with guidance from a computerized tutoring environment -- helping them is also the easier part.
The initial version of the Math Academy system has been primarily geared toward individual learners, which for a variety of technical and business reasons was the only realistic way to get off the ground.
However, we're implementing a variety of features that will give teachers the ability to direct their class's learning progression while still allowing the system to adapt to and meet each student's individual needs. In some cases, it will provide critical remediation while for others it will offer topic extensions and challenge problems. Additionally, the system will offer a variety of differentiated group and class projects that are unlocked as students demonstrate mastery of the requisite skills.
While this article presents a system it doesn't present any results. Does this modification to SRS help? Which type of student does it help? How large is the effect if there is one?
I haven't read the larger pdf of which this is a part so perhaps someone who has can provide a pointer to some results.
The main idea is that this approach makes spaced repetition feasible in something like mathematics. Without this approach, spaced repetition wouldn't even be feasible because after a short while, you'd be continually overloaded with too many reviews to really make any progress learning new material.
Moreover, in addition to making spaced repetition feasible, it minimizes the amount of review (subject to the condition that you're getting sufficient review) which allows you to make really fast progress.
We (Math Academy) don't have any official academic studies out at the moment, but if you want some kind of more concrete evidence of learning efficiency, you can read more online about our original in-school program in Pasadena where we have 6th graders start in Prealgebra, and then learn the entirety of high school math (Algebra 1, Geometry, Algebra 2, Precalculus) by 8th grade, and then in 8th grade they learn AP Calculus BC and take the AP exam.
The AP scores started off decent while doing manual teaching, but the year we started using our automated system (of which the SRS described here is a component), the AP Calculus BC exam scores rose, with most students passing the exam and most students who passed receiving the maximum score possible (5 out of 5). Four other students took AP Calculus BC on our system that year, unaffiliated with our Pasadena school program, completely independent of a classroom, and all but one of them scored a perfect 5 on the AP exam (the other one received a 4).
Even some seemingly impossible things started happening like some highly motivated 6th graders (who started midway through Prealgebra) completing all of what is typically high school math (Algebra I, Geometry, Algebra II, Precalculus) and starting AP Calculus BC within a single school year. Funny enough, the first time Jason & Sandy (MA founders) saw a 6th grader receiving AP Calculus BC tasks, Jason's reaction was "WTF is happening with the model, why is this kid getting calculus tasks, he placed into Prealgebra last fall, this doesn't make any sense," but I looked into it only to find that it was legit -- this kid completed all of what is typically high school math (Algebra I, Geometry, Algebra II, Precalculus) within a single school year.
Again, I realize these are not official academic studies, but we're completely overloaded in startup grind mode right now and have so many fish to fry with the product that we just don't have the time for academic pursuits at the moment, let alone much sleep. Happy to answer any follow-up questions that you might have, though.
I'm excited about learning DAG + spaced repetition - fully sold this is an improvement over Anki
> Without this approach, spaced repetition wouldn't even be feasible because after a short while, you'd be continually overloaded with too many reviews to really make any progress learning new material.
I think this may be an overstatement. I've used Anki to learn and retain a lot of math including linear algebra (e.g worked through several chapters of Strang's books and its exercises). While it's not perfect, and I would love to have a DAG, what ends up happening in my experience is the more basic topics I understand and related topics end up being proven understood enough that they are backed off from review for a long time. So I might not be asked about a basic topic / problem for a year. This seems to prevent being overloaded with duplicate / repeat cards.
However, if I do find that I am faced with a more advanced card that I have forgotten some of the foundations on, it is frustrating to not be able to easily be challenged smartly up through its ancestors. If I keep up with anki every day (10-20 minutes) this doesn't happen, but if I take 2-3 months off and come back to my deck, I can be faced with this problem and have to sometimes go manually digging for relevant background topics. So that's why I would love to have all this stuff get smarter, and am now reading / following you and math academy's work.
These are fair points; I guess the feasibility of spaced repetition comes down to what you consider a repetition. If you're considering a single quick problem on a topic to be a repetition, and you don't have too many topics, and you're not plowing through them too quickly, then I can see it being feasible as you're saying.
Math Academy is built with a different context in mind:
-- 1. we have tons of different topics (for instance, over 300 in our AP Calculus BC course -- and that's just one course; students who stick with us past their first course and continue taking more courses on our system can easily accumulate thousands of topics on which they need to maintain their knowledge)
-- 2. each repetition consists of multiple questions on a topic (after an initial lesson task consisting of somewhere in the ballpark of 10 questions, future repetitions are review tasks in the range of 3-5 review questions)
-- 3. students are getting through a new topic every 20 minutes or so on average (our AP Calculus BC course is estimated to take about 6000 minutes, and that includes time spent on review / quizzes / etc., and 6000 minutes / 300 topics = 20 minutes/topic)
Just some back-of-the envelope math in our context: say you learn 3 new topics per day (an hour-long session), and each review takes you about 4 questions. Then, as a rough estimate, pretty quickly you reach a point where you've got
12 review questions based on yesterday's lessons,
+ another 12 review questions on topics you reviewed for the first time last week,
+ another 12 review questions on topics you reviewed for the second time a couple weeks ago,
+ another 12 review questions on topics you reviewed for the third time a month ago,
+ another 12 review questions on topics you reviewed for the fourth time a couple months ago,
...
That's already 60 review questions/day, already past the point where you’re spending every hour-long session entirely on review, which means that your progress grinds to a halt in terms of learning new material.
So, in our context at least, raw spaced repetition just doesn't work out in a way that’s feasible – student will get absolutely crushed by a tsunami of review unless we unless we take measures to drastically cut down the amount of review (i.e., fractional implicit repetition + repetition compression).
I hear your point about needing some refresher if you take 2-3 months off and come back, though. Similar thing happens with Math Academy students if they take time off and come back, but the way we solve that problem is we just recommend they take a new diagnostic to refresh their knowledge profile. Basically, it just peels back their "knowledge frontier" to a point where they can pick up and continue learning smoothly with our standard approach.
(In my experience, that’s what you’d ideally do as a teacher after summer vacation – you know your students have forgotten a lot of material, so you have them take a beginning-of-the-year knowledge evaluation and go from there.)
Cool, thanks for these details - perhaps I'm underestimating how much faster I could be learning with a tool like this - I also think how much time one is spending makes a difference - I would be chipping away a few hours a week max, had I spent more like full time I probably would have run into what you are describing.
Thanks for the detailed response! Fwiw I bet you could get some free academic labor just by offering to let a local PhD student have access to your internal data. (Obviously still not zero effort)
This patent is just for the business owners vanity. It's clearly not defensible in courts. Even if it was, they probably don't have the money required to pursue court enforcement of the patent. More so when any reasonable lawyer will tell them they'll get their case dismissed anyway.
For example Skinner's Learning Machine as well as any automatic implementation of Pimsleur's method, both described in the SIXTIES, would fall under this (according to my reading, I'm not a patent lawyer, hamdulillah).
Seems like a reasonable way to solve the packing problem. You could override to set priority levels for people, but the background process makes sure no one person goes too long without a meetup.
I’ve been using SRS on and off since 2006 and this post addresses one of the key failure modes I’ve seen in SRS usage—they tend to be fantastic for discrete pieces of information, like the names of capital cities, the pronunciation of various symbols or various physical constants.
Most my usage of SRS in the early days was languages. For that domain, SRS is useful for learning an alphabet (or syllabary), for scaffolding enough common words to start understanding graded readers and for a few other tasks. They’re terrible as a primary learning strategy, though! Too much of language is highly nuanced and context dependent and the only way to absorb all the unwritten rules, the common collocations and the precise boundaries of word meanings is through extensive input. This means encountering the same words and structures in countless different variations and contexts, not the same few sentences drilled repeatedly (at least past the beginning stages).
By paying attention to the specifics of learning math and tailoring the SRS to match it, it’s possible to make a far more efficient system than anyone could with an Anki deck. I’d love to see similar efforts in other subjects!