
Failing 15% of the time is the best way to learn, say scientists - laurex
https://www.independent.co.uk/news/science/failing-study-success-machine-learning-a9186051.html
======
yorwba
The paper is interesting because they calculate the theoretically optimal
difficulty for a specific class of learning algorithms:
[https://dx.doi.org/10.1038/s41467-019-12552-4](https://dx.doi.org/10.1038/s41467-019-12552-4)
(I think the method might be applicable for scheduling flashcards better than
the rule-of-thumb spacing of Anki et al.)

The Independent article is unfounded speculation about this applying to the
way humans learn, without any discussion of whether the model is actually
applicable. (Most things humans are trying to learn aren't binary
classification tasks.)

~~~
zadokshi
Yes, Anki’s goal of “let’s quickly hide all the cards you know and only show
you cards you are forgetting or on the edge of forgetting” has never seemed
overly useful. The end result is the following algorithm: “let’s present you
with the toughest things you can’t remember over and over again, and ignore
all the facts you’ve done so well at learning (until you have almost, or
completely, forgotten those facts)”

Throwing up cards you can answer in 1 second is not wasting anyone’s time. It
is more likely to be encouraging to people to remind people of what they know.

~~~
echelon
Is there an easy adjustment to the algorithm parameters to make things better?

Wanikani does an amazing job at SRS. I wish Anki followed its model.

~~~
rajlego
I haven’t used either Anki not wanikani (I use SuperMemo) but from what I’ve
heard wanikani has no leech management meaning you end up stuck repeating the
kanji you keep failing, making the experience miserable. Leech management is
really important for any decent SRS

~~~
grep_name
Could you expand on this? I've used both but am not sure what you mean by
leech management

~~~
echelon
Sometimes you'll have much more difficulty with certain items than others. No
matter what you do, even after repeated attempts to learn, you just can't get
them to stick.

These are "leeches". They burn productive study and review time.

I have a theory that these items have lower adjacency to past experience or
knowledge and it's difficult to form mnemonics or other connections. Or
they're less novel and don't cause our brain to take interest. That's where
all of my leeches lie -- in the realm of things I don't particularly care
about.

A good leech management algorithm will back-burner unproductive items so you
can focus on the rest of the concept population. There are different types of
leeches too -- things you don't get during introduction, or things that you
can commit to short term memory but won't stick for long. A good algorithm
will identify all of them and block them.

~~~
grep_name
That's really interesting, I'll have to give supermemo a try. I definitely
remember those items in WaniKani; a lot of the times they were mnemonics based
on pop culture references that I just didn't get, or the mnemonic was just
kind of a stretch, or too many similar concepts had been introduced at once.
When I stopped I had definitely hit a wall where I just didn't feel like I
could keep learning.

------
geewee
Unless this article is leaving out some major points, the whole thing seems
flawed. So their machine-learning models learn best if they fail 15% of the
time. Fair enough - but trying to discern anything meaningful about how often
people should fail based on that, seems like quite a bit of a stretch

~~~
Dumblydorr
Anecdotally, I'd say I fail most things 95% or more of the attempts. That's
why we rewrite, debug, practice, drill, google, avoiding failure takes a lot
of human effort.

~~~
infogulch
Humans get way more information out of each failure than ML systems. When you
or I fail just a few times we can analyse the failures and often discover huge
classes of wrong behaviors, and never repeat any of them. We're also good at
differentiating what _parts_ of the failure caused it and can even learn what
parts were successful. We might even test dozens of hypothesis at once in a
single attempt, even if we're focusing on just one of them. A computer often
only gets one bit of learning from a failure or a success: this single
behavior in particular either did or did not work.

My hypothesis is that we model the system we're studying and simulate many
'attempts' for every real world attempt. I.e. we grow a low-fidelity, but much
faster, model of the system in our brain that we can use to make medium-low
confidence predictions about the real system many times for each time we test
against the real system.

So when you say you fail 95% of the time, I'm saying each of those failures
actually have 200 mini-successes embedded that you can still use to train your
mental model.

~~~
mike_hock
> and often discover huge classes of wrong behaviors, and never repeat any of
> them

Once burned, twice shy. And often that results in irrational aversion to huge
classes of behaviors just because they appeared in the larger context of the
failure of an endeavor as a whole, which I'd say is not a good way to learn
from failures.

------
omarhaneef
On a binary classification task, it is a priori true that you would likely not
learn if you were right 50% or 100% of the time. This is not a function of any
particular learning algorithm.

If you got 100% right, you already know everything that was being tested.

If you got 50% right, you don't know if you are guessing or if you should be
picking up on any features.

So you would expect that the rate would not be close. 50.1% would be similar
to 50% for most intents and purposes. Similarly 99.99%.

So you might expect that the optimal learning rate would be close to 75%/25%
in general. This would apply to humans too because it is a statement of the
information you need to solve the problem, not a statement about the algo.

This paper finds it to be 85%/15% for a particular algorithm. Perhaps humans
learn similarly, perhaps not. However, you might expect the optimal examples
to be somewhere in the 65-85% range for any particular algorithm.

------
beefman
The portion 15% seems to crop up suspiciously in optimization contexts... this
was noted by Gell-Mann in _The Quark and the Jaguar_. It's roughly the portion
of false warnings sounded by certain tropical birds to gain uncontested access
to food. Gell-Mann speculates that it is close to 1/(2pi)...

~~~
jacques_chester
> _The portion 15% seems to crop up suspiciously in optimization contexts_

That's approximately the vale of the area under one tail of a normal
distribution, from one standard deviation above the mean to infinity.

I'm not statistically mature enough to say whether it's just coincidence. For
one thing, oodles of natural phenomena in no way follow the normal
distribution.

------
solicode
Not too sure about the case in the article and if relates exactly, but it made
me think about spaced-reptition systems. I aim for a 80-90% success rate as
I've found that to be the optimal range (arrived at that after doing this for
10+ years now with varying settings).

I found other articles about this: [https://eshapard.github.io/anki/target-
an-80-90-percent-succ...](https://eshapard.github.io/anki/target-
an-80-90-percent-success-rate-in-anki.html)
[https://vladsperspective.wordpress.com/2017/03/14/optimize-y...](https://vladsperspective.wordpress.com/2017/03/14/optimize-
your-anki-youre-overtesting-yourself-on-too-few-cards-make-huge-gains/)

------
alexeichemenda
Interestingly, this correlates well with what is happening in the ad-tech
world.

Specifically in performance marketing spend, 15% of the budget is very often
allocated to "new initiatives & new partners", with the thought process that
it'll either allow to find a previously un-identified improvement, or it'll
allow to learn what to avoid in the future on the 85% of spend.

------
orasis
Shouldn’t maximum information gain be found at maximal uncertainty: 50% ??

How is this different from information theory?

~~~
jacques_chester
Learning in this case is really about recall -- ensuring that information
_already_ captured is successfully retrieved.

That's a different sense from learning as _discovery_ , or at least, learning
as search. In searching a graph of possible hypotheses, yes, it is a better
rule to look for opportunities to halve the search space.

------
dmix
Wasn't that "10,000 hrs of practice to master anything" paper discredited?
This one sounds very similar in trying to quantify a very chaotic and
qualitative process. The usefulness of such a stat on any one person is
probably nil.

~~~
dragonwriter
> Wasn't that "10,000 hrs of practice to master anything" paper discredited?

No, AFAIK it literally never existed. That was, as I recall, a popular
misinterpretation of what was itself an unwarranted generalization made by
Malcolm Gladwell based on a paper with much more limited scope and
conclusions.

> This one sounds very similar in trying to quantify a very chaotic and
> qualitative process.

The actual direct conclusion—that this error rate is optimal for a variety of
machine-learning processes—does not seem to ha r the problem you describe. The
suggestion in the paper that this extends to “biologically plausible” neural
networks that may model animal learning also does not seem problematic in the
way you describe. The news article’s claim that this is a finding of a sweet
spot for human learning is, while it is a possibility suggested by the paper,
simply unwarranted as a conclusion.

It's certainly plausible that a quantifiable sweet spot of this type exists
for some kinds of human learning at the optimization of effectiveness in a
curriculum that can be dynamically scaled to individual learners could
effectively be guided by it, but there is not a strong reason without actually
testing in concrete human learning scenarios to believe that the particular
number here is a guide to that.

------
anonytrary
I have a data point/anecdote! I like to play certain sports. After about a
year of intense focus and determination, you can get good at pretty much any
sport. What I've noticed though is that on a good day, I'm messing up about
15% of the time. If I mess up significantly more than that, I get discouraged
and want to go home and try again the next day. If I'm not messing up enough,
I feel like I'm overfitting a particular technique and should probably be
messing up more to become more well-rounded.

~~~
blankaccount
Any tips for picking up soccer? ie. do you have a tim ferris style 80/20 split
of what you would work on in the first year?

------
viig99
Bert has an 15% masking rate, seems co-related, also 90% is what works well
when you are trying to do label smoothing using entropy minimisation, what's
going on!

------
dr_dshiv
As though getting 100% on my calculus quizzes indicated that I wasn't
learning. Or, that I would learn more if I didn't study as hard and got 85%
correct.

~~~
detaro
Neither of those two examples are about what the article says.

~~~
dr_dshiv
But they are presenting their Goldilocks theory as something to generalize,
no?

~~~
detaro
Your quiz example is not about choosing the difficulty of the quiz, which is
the variable being considered.

> _As though getting 100% on my calculus quizzes indicated that I wasn 't
> learning._

It doesn't say that. It says you would probably be learning faster if the test
would be more difficult.

Same category error with the second.

~~~
dr_dshiv
"you get 100 per cent right all the time and there’s nothing left to learn."

~~~
yorwba
The learning process considered in the paper is more like a pretest before
each class, and the lesson just reveals the correct answers.

If you get 100% on every pretest, you're probably not learning anything new
from being told the answers.

~~~
dr_dshiv
In that case, I'd be surprised that optimally 85% is best, if it is testing me
on stuff I hadn't learned yet. 85% seems till too easy, if it is testing
material I haven't had a lesson on yet.

------
sova
1/0.15 means that the optimal failure point is every 1/6 or 1/7 times. Agree?

~~~
mkl
No, 1/7 < 0.15 < 1/6\. Whether that's an optimal failure level for human
learning is a different question that this research doesn't seem to really
answer.

------
stjohnswarts
I'm sure there's some way we can tie this to the golden ratio...

------
kd3
It is painful to admit, but you learn best from mistakes and failure.

~~~
atomicity
Just because you learn more from failures doesn't mean that you will learn
more in aggregate over time with more failures. If you fail too much, your
brain will tell you that it's better to quit and spend your attention/time in
another way.

~~~
kd3
Elon Musk recently said it well. He said something along the lines of always
assuming you are wrong. The goal is to try to be less wrong all the time.
Which is basically what good science is. And that guarantees long term
improvement and success. So you need to fail a little to learn and get better.
If you always succeeded, I think you'd never really know _why_ you succeeded
which is important knowledge and guarantees to a certain degree not making
those mistakes again in the future.

------
joker3
In mice.

Well, you get the idea.

~~~
kyshoc
@justsaysinmice[0] would like to have a word...

[0]: [https://twitter.com/justsaysinmice](https://twitter.com/justsaysinmice)

~~~
huherto
or "en ratones" if the language of the article is Spanish.

------
stjohnswarts
Dude I wish it was only 15% of the time :)

------
cagenut
see mom those F's on my report card show I was learning optimally

~~~
Merrill
So B students are learning fastest, and A students are just idling.

OTOH, I graded for a Physics prof where 50/100 on his test was about average.

~~~
kiba
B and A is just a proxy for effectiveness in scoring.

However, you do want to be a "B" student when learning, just not in scoring.

~~~
jfk13
This rings so true to me! At school (high school, for you Americans -- I don't
mean university), I was an "A" student without really trying. Unfortunately,
this meant that I didn't learn to try.

Eventually (at a prestigious university) I found I could no longer "coast",
and studying required real work. It wasn't easy to come to grips with that
reality, and I wish I'd learned earlier.

