
Finding surprising moves in chess games - mdp
https://github.com/CYHSM/chess-surprise-analysis
======
Matetricks
If I understand your code correctly in analyse_evaluations, you're defining a
"surprising move" as a move that has a large change in valuation when it's
considered at a higher depth. So if a "human" (really Stockfish at depth 5)
evaluates a move as +1 and a "computer" (Stockfish now at depth 11) evaluates
the move as +5, the move is surprising.

This is pretty interesting, but I'm not sure if it fully captures all the
nuances of what a surprising move is. You might be able to classify a move as
tactically surprising if it becomes clear after depth 7 that the ending
position is favorable. However, in my opinion truly surprising moves are ones
that carry plans that I haven't even considered. Hence, this methodology
doesn't capture moves that are positionally surprising as there wouldn't be
such a drastic change in evaluation at different depths. I'm not sure where
you would start to figure that one out though :)

That being said this is really cool work!

~~~
yakult
You can filter for this:

Take a database of games from pro players; Take the set of all moves where
Stockfish-5 agrees that the move actually played is the optimum. Filter for
all the moves where Stockfish-11 has a different opinion that results in a big
gain in position. What you get is a list of moves that would surprise pro-
players under time pressure.

I wouldn't be surprised if professional chess players are all running a
version of this against individual known opponents before a tournament to
probe for weaknesses.

A harder problem would be to cross-reference this final list with the post-
game opinions published by professional commentators and identify major
discrepancies. This would be the "wouldn't have thought of it in a million
years" list.

~~~
Matetricks
The list that's returned still contains mainly tactical surprises where
Stockfish inaccurately evaluated the position at the end of depth 5. I think
what I'm trying to say is there are some moves in a position that aren't
tactically surprising (a piece sacrifice, a crazy attacking move, etc.) but
positionally surprising (a long maneuver to get a piece to a certain square
that I didn't think of). These positionally surprising moves aren't captured
by this methodology because they don't involve large fluctuations in valuation
when the depth changes.

As to your second point, an issue with how computer chess affects the modern
scene is how playing the "best" move in any given position isn't
representative of how humans play. Humans carry out plans and evaluate
positions to the best of their ability, but the heuristics and procedure they
use aren't the same as a computer's. For example, Karjakin didn't prepare for
his match against Carlsen last month by playing a bunch of games against
Stockfish. Rather he probably analyzed Carlsen's past games and opening
choices to come up with a strategy.

I do think you can come up with a way to prepare against individually known
opponents by identifying weaknesses programmatically. You can model a human's
approach to playing chess as a distribution of parameters (material, king
safety, pawn structure, etc.) that take in the current position and return the
best move. You also have Stockfish's evaluation which returns the "best" move.
With this, it's possible that you could use build a neural network that learns
to play very similarly to a certain player by using their past games as a
training set and comparing the chosen move to Stockfish's move. The network
could learn to mimic the heuristics that the human individual uses to make
decisions and playing against this new AI would be great practice for
preparing against specific opponents.

~~~
yakult
I'm not sure I follow your point about tactical vs positional surprise. Surely
the ultimate goal of the positional surprise is the same as the tactical
surprise - you get an advantage at the end of an expected series of moves.
Otherwise what's the point of getting into a surprising position that's not
better than the conventional one?

My question is, is there any difference here that can't be solved by, say,
upping the ply-number?

On humanlike chess-AI: have an adversarial network that works to classify
human vs machine players, and optimize for humanness * strength-of-play in the
AI?

~~~
Isofarro
The difference is that the positional sacrifice is less tangible. A space
advantage, a tempo advantage, more mobile pieces, improved
cohesion/coordination of pieces (Kasparov was legendary for taking this last
kind of advantage and turning it into a lethal attack). It's a dynamic
advantage rather than a static/permanent advantage, which also means there's a
risk of that advantage dissipating as the game drags on.

These advantages aren't the kind where you can sit back and let the game play
out confident of winning. It's a deliberate unbalancing of the equilibrium of
the position, and one where this temporary dynamic advantage needs to be used
to create a longer-lasting and static advantage.

~~~
yakult
Would it be fair to say you are trying to optimize for future positions where
you aren't sure you will win, but the positions resemble certain archetypal
positions/ share certain features that are advantageous (i.e. has a high
probability of transforming into conventionally advantageous situations)?

I'm sure the chess AIs are full of this sort of knowledge internally, though,
in the form of computation optimization algorithms. Perhaps the issue is to
translate it to a human-usable format.

~~~
schoen
Indeed, chess engines do have heuristics to include positional advantage in
their evaluation of a board, so they "know" in some way that a doubled pawn is
disadvantageous or that development of pieces or attacking central squares is
beneficial, much as humans know these things.

I've never heard experts discuss this, but I bet it's true that human beings
still succeed in appreciating many of these benefits at a higher level of
abstraction than machines do. An argument for this is that computers needed an
_extremely_ large advantage in explicit search depth to be able to beat human
grandmasters. So the humans had other kinds of advantages going for them and
most likely still do. One of those advantages that seems plausible is more
sophisticated evaluation of why a position is strong or weak, without explicit
game tree searches.

I looked at the Stockfish code very briefly during TCEC and it looks like a
number of the evaluation heuristics that are not based on material (captures)
are manually coded based on human reasoning about chess positions. But if I
understood correctly, they are also running machine learning with huge numbers
of simulated games in order to empirically reweight these heuristics, so if a
particular heuristic turns out to help win games, it can be assessed as more
valid/higher priority.

You could imagine that there are some things that human players know tacitly
or explicitly that Stockfish or other engines still have no representation of
at all, and they might contribute quite a bit to the humans' strength.

------
al_biglan
This seems a perfect candidate to test with some well known test games.

Fischer's 17 ... Be6, for example in the Game of the Century

Or 15 ...Nf2 (again Fischer and Byrne)
[http://www.chessgames.com/perl/chessgame?gid=1008419&kpage=1](http://www.chessgames.com/perl/chessgame?gid=1008419&kpage=1)

It might be interesting to see how many blunders this identifies compared with
traditional analysis. (Looking at the "surprising bad move" vs "surprising
good move")

In any case, very interesting work. Others have some good ideas for additional
checks/considerations, and it would be interesting to see how this evolves!

~~~
zzazzdsa
A good game to test this on is
[http://www.chessgames.com/perl/chessgame?gid=1139126](http://www.chessgames.com/perl/chessgame?gid=1139126)
.

Tal played the move 19. Rf6!!, which instantly wins the game. However, (at
least when I tested this a few months ago), Stockfish takes a fairly long time
to recognize this-- it prefers the more conservative 19. c4 instead.

~~~
Rainymood
I just went through the game and man 19. Rf6!! Took me by surprise! What an
amazing move!

I also would love to see some chess engine that plays ridiculous moves in the
beginning to throw people off guard (i.e. increase greatly increase the
variance in possible evaluations) and then crushes the opponent by playing all
the computer moves while the human players have to walk on their toes in a
minefield of possible blunders because the position is so wacky.

------
schoen
Note that Stockfish, which this project uses, just won the TCEC computer chess
championship.

[https://en.wikipedia.org/wiki/TCEC](https://en.wikipedia.org/wiki/TCEC)

There are other championships out there, but this is an indication that
Stockfish is among the very strongest chess-playing entities in the world.

------
CYHSM
Hi! I'm the author of this little library. Somebody apparently took it from my
reddit thread
([https://www.reddit.com/r/chess/comments/5i54lu/how_to_quanti...](https://www.reddit.com/r/chess/comments/5i54lu/how_to_quantify_surprising_chess_moves/))
and linked it here. Thank you for doing that, i haven't thought this would
interest so many people!

I hope i can answer some of your questions given more time but I have to say
there are some very cool suggestions here and in the reddit thread which i
have to keep in mind.

------
billforsternz
I've been thinking about adding a game-analyse feature to my Tarrasch Chess
GUI [https://triplehappy.com](https://triplehappy.com). I was toying with the
idea of using stockfish plus a poor engine as opposed to stockfish deep plus
stockfish shallow. I was actually going to use my own Tarrasch Toy engine
(about 1500 elo) as a reasonable proxy of a poor human player. My engine has
fixed depth, so when set to very low depth will routinely just do the obvious
thing and fall into simple fork traps etc., which would make it possible to
auto generate simple explanatory variations like "Not
$bad_move_played_by_my_engine because of
$refutation_variation_pointed_out_by_stockfish."

Like you I've noticed that often when looking at Engine Analysis the obvious
human move doesn't appear as one of the first choices, which usually means
there is an opportunity to get the machine to show you some tactics.

So essentially your idea modulated by my assumption it would be difficult to
get Stockfish to play sufficiently badly for stockfish deep plus stockfish
shallow.

But anyway, congrats to you for actually getting there first and making the
idea work.

------
dsjoerg
Really cool idea and interesting visualization! I am working on similar /
related stuff if you'd like to collaborate further. dsjoerg at gmail.com. I
participated in [https://www.kaggle.com/c/finding-
elo](https://www.kaggle.com/c/finding-elo)

------
pk2200
You might also add the condition that best-move(deep-search) != best-
move(shallow-search). IMO this is what really makes a move surprising - it
looks bad, but is good.

Btw, I've never seen Rxd4!! from Kasparov-Topalov described as a "natural"
move. I'm guessing the author meant "unnatural". :)

~~~
azag0
I read <irony>.

------
vinchuco
I wish it had examples! I'd be more interested in a move that complicates the
game. In fact, what is the most complicated position in chess? (Legal or not)

~~~
JoeDaDude
The "Legal or not" part of your question is interesting. I am guessing you
mean a "legal" board is one which was derived from the starting position
following the legal moves. How could this be determined, short of doing a
brute force search from the starting position?

This is something I've thought about when looking at chess problems, whether
such problems could actually occur in a real game.

~~~
schoen
I wonder if existing computer chess engines can be modified to do retrograde
analysis, and then find some heuristic for positions that are "obviously"
possible via legal moves, and then see if the retrograde analysis can reach
one of those.

I have a feeling this is often tractable, because we could probably construct
a cooperative scenario involving sacrifice of arbitrary pieces at the
beginning of the game (maybe via a series of knight captures?), and then the
main question is confirming how particular pawns moved past each other, and
finding a way of ensuring that no stalemates occurred while positioning the
kings.

~~~
vinchuco
Sounds computationally intensive. It reminds me of an experiment I read about
some time ago, tangentially:
[http://theinvisiblegorilla.com/blog/2012/02/15/how-
experts-r...](http://theinvisiblegorilla.com/blog/2012/02/15/how-experts-
recall-chess-positions/)

~~~
schoen
Thanks for the link! That in turn reminds me of how Fischer advocated
randomizing starting positions in order to eliminate the role of opening
theory.

[https://en.wikipedia.org/wiki/Chess960](https://en.wikipedia.org/wiki/Chess960)

(because apart from the structure part that the link you gave mentions,
experts seem to know a huge amount of explicit opening theory, in terms of
memorized opening lines and particular chess masters' view of their tactical
consequences)

I'm getting more curious about the reachability question.

There are some discussions at

[https://chess.stackexchange.com/questions/4830/how-many-
lega...](https://chess.stackexchange.com/questions/4830/how-many-legal-but-
unreachable-positions-exist)

[https://mathoverflow.net/questions/138133/what-proportion-
of...](https://mathoverflow.net/questions/138133/what-proportion-of-chess-
positions-that-one-can-set-up-on-the-board-using-a-leg)

[https://www.chess.com/forum/view/general/how-many-
different-...](https://www.chess.com/forum/view/general/how-many-different-
chess-positions-are-there?page=1)

[https://en.wikipedia.org/wiki/Chess#Combinatorics_of_chess_a...](https://en.wikipedia.org/wiki/Chess#Combinatorics_of_chess_and_chess_puzzles)

Also related:

[https://en.wikipedia.org/wiki/Proof_game](https://en.wikipedia.org/wiki/Proof_game)

That includes a retrograde solver for openings, but that's not exactly what I
was envisioning -- I was thinking of a retrograde solver for _endgames_ ,
which then needs something kind of akin to an inverse tablebase ("yes, this
family of positions is known to be legal!" instead of "this position is a win
for black").

