
Ask HN: Can we adapt AlphaZero's self-play technique for better human learning? - stenecdote
Since I lack the ML background to debunk this suspicion, I figured I&#x27;d let HN debunk it for me. Seeing AlphaZero&#x27;s success at learning Chess, Shogi, and Go, I was immediately struck with the intuition that the fact that AlphaZero could learn so much from &quot;self-play&quot; should provide some insight into improving human teaching and learning strategies. With the caveat that humans lack AlphaZero&#x27;s ability to separate themselves into two versions, I can imagine a teaching paradigm that emphasizes simulating competitive activities but playing as both sides. Is something like this at all related to what AlphaZero&#x27;s doing and are there chess training paradigms that emphasize this type of simulation?
======
jdoliner
Short answer: No, there's nothing new here that can inform better human
learning.

Longer answer: The concept of self-play isn't new in any sense. All chess
players use this technique to some degree. None use only this technique. The
advantage of self play is that there's no risk of accidentally picking up
someone else's an incorrect assumption. Since you're deriving everything from
scratch. Some people take this to extremes, there's a math professor who
doesn't read any math papers so that he's deriving everything from first
principles and not "contaminating his mind" it works quite well for him but
unfortunately I'm blanking on his name. However, commitment to this technique
removes one of the major advantages that humans have which is their ability to
communicate knowledge amongst themselves in a compact, abstract way with
language. Humans also have a pretty good way to mitigate the faulty assumption
risk: skepticism. We can reevaluate our assumptions, and, if we deem it
necessary, excise them from our mental model. AlphaZero could in theory do the
same thing, the reality for AlphaZero though is that there's not much point,
it has no use for the sum total of human knowledge on chess, it's capable of
recreating that and much more in a few hours.

If there is something to be learned from AlphaZero's training it's that you
should always be skeptical of your assumptions, that's not anything new, but
it's always worth reiterating. It's pretty obviously not feasible to take this
to the extremes of AlphaZero though, humans need other humans to learn. Even
the math professor who doesn't read papers needed a lot of interfacing with
other humans to learn to get to the point where he could derive things from
first principles.

~~~
markussss
>unfortunately I'm blanking on his name

John Nash (supposedly) had this mindset? Is that who you're thinking about?

~~~
jdoliner
That wasn't who I had in mind, but thanks for sharing that example. I think
the guy I'm thinking of is at Cornell and still alive. He also might actually
be in CS instead of Math. I tried googling it but, unfortunately, "math
professor who doesn't read papers" didn't come up with any results.

~~~
sabalaba
Shinichi Mochizuki developed a new theory (IUT) which eventually yielded a
proof to the abc conjecture. I believe he largely developed it in isolation
and thus when it was published it took years to bring the rest of the
community up to speed. Not sure if he doesn’t read other’s papers but maybe
this is what’s you were thinking about?

1\.
[https://en.wikipedia.org/wiki/Shinichi_Mochizuki](https://en.wikipedia.org/wiki/Shinichi_Mochizuki)

2\. [https://en.wikipedia.org/wiki/Inter-
universal_Teichm%C3%BCll...](https://en.wikipedia.org/wiki/Inter-
universal_Teichm%C3%BCller_theory)

3\.
[https://en.wikipedia.org/wiki/Abc_conjecture](https://en.wikipedia.org/wiki/Abc_conjecture)

------
conistonwater
Don't humans already do this, in a way? Instead of playing against yourself,
you take somebody stronger and play them. You only need on the order of a 100
games of chess against a decent opposition, with some verbal explanations, to
reach amateur level. Per-game, this is much more efficient than AlphaZero,
which requires millions of games as well as tons of computing power. Surely
the main reason AlphaZero uses that particular technique is that nobody can
figure out something better? You'd really want it to copy learning techniques
from humans (especially learning from many fewer examples), not the other way
around.

~~~
stenecdote
Thanks for being the first to reply. I was worried I'd just get upvotes and no
replies!

I think you have two separate points, one with which I agree and one with
which I disagree.

First, I agree (and other commentators about AlphaZero seem to as well) that
human learning "algorithms" still beat AlphaZero's on per-game ROI.

On the other hand, I disagree that AlphaZero's self-play is no more
interesting than a human playing someone better and learning from them.
AlphaGo, AlphaZero's predecessor, followed a strategy more like what you
described, learning from a large corpus of existing expert chess matches.
AlphaZero, on the other hand, requires no training beyond an encoding of the
basic rules of chess that it can understand. From there, it bootstraps its
understanding of chess without input from experts.

This is the piece I find most interesting, see as potentially useful for the
future of human learning, and believe differs from practice with an expert
teacher. And so I wonder, can we design learning environments where the
learner bootstraps their own understanding from a limited input without
continuous feedback from an expert or teacher?

~~~
Cookingboy
> And so I wonder, can we design learning environments where the learner
> bootstraps their own understanding from a limited input without continuous
> feedback from an expert or teacher?

Why would you remove continuous feedback from expert or teacher? Would that
make human learning "faster" and more "efficient"? That approach works for AI
because unlike human, AI remembers every single data point with 100% accuracy
and can iterate repeatedly without fatigue. It also does not suffer from
issues such as boredom and it doesn't require motivation either.

By the way, human already learn from experience by bootstraping their own
understanding, teachers and experts exist to fast track the beginning phase so
a kid doesn't have to play ten thousand games just to reach beginner skill
level.

~~~
stenecdote
> By the way, human already learn from experience by bootstraping their own
> understanding, teachers and experts exist to fast track the beginning phase
> so a kid doesn't have to play ten thousand games just to reach beginner
> skill level.

Yeah, I came part of the way to this realization in my reply to your other
message.

------
infinity0
AlphaZero plays games with (1) _perfect information_ and (2) _well-defined
winning conditions_. Neither of these hold for most human-learning scenarios.

I can imagine that a healthy dose of probability theory (and probably more
advanced stuff I don't know about[1]) might improve (1), but (2) is going to
keep computer scientists and philosophers and ethicists arguing for quite a
long time. :)

[1] get the joke, eh? eh? eh?

~~~
paulcole
> AlphaZero plays games with (1) perfect information

I'm not sure why this matters? Everyone plays chess with perfect information.
Both players see the entire board and all possibilities unlike, say, Scrabble
or poker.

~~~
theptip
I think GP meant that in the sense of "AlphaZero can only play games that have
perfect information". It's a restriction of the algorithm, not a statement
about how AlphaZero approaches the games it plays.

This is why AlphaGo leveled up into AlphaZero playing Chess, and didn't learn
to play Starcraft (yet).

~~~
paulcole
ah yeah, i gotcha. my bad

------
ThrustVectoring
Human brain architecture already does "self-play" during REM sleep. So yeah,
but the implementation details are "get more sleep" rather than some sort of
novel technique.

------
stenecdote
It only now occurs to me that the line of thinking I follow here is sub-
consciously inspired by section 3 of this Marvin Minsky talk
([https://web.media.mit.edu/~minsky/papers/TuringLecture/Turin...](https://web.media.mit.edu/~minsky/papers/TuringLecture/TuringLecture.html)).
If you're at all interested in the intersection of learning and computer
science, I highly recommend taking a look.

------
Cookingboy
Are we sure AlphaZero has better learning efficiency than human?

Sure, it reached peak skill after 4 hours of learning, but how many games did
it play during those 4 hours? How many moves did it memorize perfectly and
analyzed? Are those numbers even achievable by a human in one's lifetime?

Even with AlphaZero's efficiency, it still evaluates 80000 moves per second,
which is by far more moves than a human grandmaster evaluates in an entire
game. If we cut AlphaZero's "processing power" to that of a human, can it
still beat a top level human player, let alone other AIs?

To me it seems like there is still a long way to go to improve in this space.

~~~
stenecdote
I agree that AlphaZero's per-game learning efficiency is much shorter than a
human's (as mentioned in my other reply). The part that interested me more was
the fact that it bootstrapped its learning from the basic rules of each game.

Now that I think about it though, one might argue that human learning in a
given discipline starts as isolated with feedback only coming from the outside
world. This is what we typically call research. But the magic of our education
system, when it works, is that we compress the output of this slow process
into a faster one and feed it to learners, allowing them to build
understanding of knowledge which originally took generations to discover.
Riffing off Matt Might's illustrated depiction of a PhD
([http://matt.might.net/articles/phd-school-in-
pictures/](http://matt.might.net/articles/phd-school-in-pictures/)), expanding
the circle of knowledge is exponentially slower than getting close to the
edge.

------
forgot-my-pw
I don't think we can learn much from how an engine learns, but we certainly
can learn from its results.

For example, there's this interesting discussion:
[https://www.reddit.com/r/chess/comments/7ibzq4/stockfish_vs_...](https://www.reddit.com/r/chess/comments/7ibzq4/stockfish_vs_alphazero_jerrys_analysis_of/dqy4yzc/)

Because Alphazero did not learn from human games, it looks at the different
pieces without attaching values like we do. It has no problems sacrificing a
higher "valued" piece for the sake of its strategy.

------
egypturnash
I would submit that we already have an example of self-play being used as part
of a strategy to learn chess: chess problems.

Something like "Here's a board position. It looks utterly hopeless but the
problem says "Black to mate in 7 moves". How can you get there from here
without relying on White making any beginner's mistakes?" is pretty much self-
play.

------
ararar
I think there is a possibility of applying machine learning to teaching humans
in the sense of continuous, algorithmic tuning/personalization of lesson
plans/teaching strategies to accelerate human learning ... as a teacher's aide
in other words.

------
canadaduane
My impression of Plato is he channeled different people/characters in his
writing in order to create adversarial conditions in which he could improve
his rhetoric. Perhaps this is similar to AlphaZero's technique?

------
eutropia
Remember that AlphaZero played 44 million games of chess, whereas your average
professional chess player has played somewhere on the order of 10,000-100,000.
Self-play works, but rather slowly.

~~~
pixl97
How many years did it take the professional to play 100,000 games? How many
minutes did it take AGZ to play 44M? It sounds like self play is rather fast
to me.

