
When Grandmasters Blunder: Even the best make mistakes - jaz46
https://medium.com/pachyderm-data/when-grandmasters-blunder-a819860b883d
======
mcherm
It seems to me that the article overlooks one glaringly obvious issue: that
the two blunders may not be independent events.

In this case, it seems quite likely that the second player's blunder was made
much more likely by the fact that the first player had just blundered. To be
more specific, white moved the king which appeared (at first glance) to
prevent black from using a check threat to attack white's rook. The blunder
was in not realizing that the check threat could still be used to attack
white's rook, albeit in a more complicated fashion.

Black responded to this with another "blunder" \-- failing to attack the rook
and moving elsewhere instead. But this blunder was NOT independent of the
first -- it is quite likely (I believe) that black saw the move and assumed
white had successfully prevented the attack on the rook. He assumed that such
a top-level player would never make such a mistake, and that caused him to not
look closely enough at it. The first blunder helped cause the second.

(Thanks to stolio for linking to the game analysis I used here.)

One could test this hypothesis of mine using the same data set. Instead of
looking just at single errors, look at error pairs (one error occurring in the
move following another error). If the probability of a blunder is
significantly higher on the move immediately after a blunder than it is at any
other time, then my hypothesis (that the events are not independent, but
correlated) is supported.

~~~
sireat
I was expecting the article to focus on this and dis/prove hypothesis of
double blunder.

Most top players would comment (as Anand and Carlsen did after the game) that
double blunders are relatively common in top level play.

Here is one famous case: [http://www.chess.com/article/view/the-amazing-chess-
illusion](http://www.chess.com/article/view/the-amazing-chess-illusion)

In my personal experience this has been common too, when I mix playing against
2500 players and 1900 players in blitz (I am 2350fide), it is relatively easy
to skip over simple hanging pieces for a move or two.

In a regular tournament game it has happened a few times as well (one player
commiting a gross blunder and other not noticing).

The big question whether it is out of ordinary statistically speaking.

------
dfan
Using "number of pawns of evaluation lost" as a proxy for the severity of the
blunder has some fundamental problems. The main one is that the relationship
between evaluation in "pawns" and expected result (expected value of the game
result, from 0 to 1) is not linear. (It couldn't be, since one of them maxes
out at one.) It's actually more of a sigmoid curve.

This means that a player may easily make a horrific "3-pawn blunder" reducing
his evaluation from +8 to +5, but in fact all he's done is reduce his chance
of winning from 99% to 98%. Actually, the +5 move may even be better in
practice, in that it might lead to a sure safe win rather than a tricky
blowout.

Even if you changed the definition of blunder from "reduces the evaluation by
n pawns" to "reduces the expected result by x", I would have an issue in that
it ignores any of the human aspects of blunders. If someone drops a pawn
outright for no reason (eval change -1), that is a blunder because it was so
trivial to avoid. But if someone, even a grandmaster, makes a move that causes
a large drop in eval due to allowing a sacrifice that no human could calculate
all the ramifications of, because as far as he (and probably his opponent)
could humanly calculate it didn't lose, it is hard to call that a blunder.
(Conversely, failing to see some immensely complicated non-forcing winning
move may be unfortunate but it's not a blunder.) But that's more a cavil with
terminology than a methodological error; the study is still measuring
something interesting, just not quite what I think it is claiming to measure.

~~~
iwwr
Nope, "number of pawns" is only a notional number, it's a score calculated by
a chess engine. Being 1 pawn ahead may just mean a particular position where
one side has an equivalent advantage though not necessarily being a physical
pawn ahead. Another aspect is sometimes you're a physical pawn short, but the
position evaluation may only show -0.3 pawns against you, meaning you've got
positional or counter-play advantages to compensate. Often players will
sacrifice pieces for counter-play and activity.

Chess engines also implement a heuristic called 'contempt' where they may make
a sacrifice in order to avoid a drawn position, when faced with an inferior
opponent.

~~~
rockdoe
Your response has absolutely nothing to do with the point the parent poster
makes and completely and utterly misses the point.

He is arguing that "percentage of winning" is not linearly related to "pawn or
equivalent advantage". That has got nothing to do with whether those pawns are
physical ones or positional advantages that have equivalent value.

------
verteu
> Due to cost limitations we had to limit crafty to 2 seconds of analysis time
> per move

A grandmaster with standard time controls could defeat a 2-second limited
Crafty. So how do you know you're finding true blunders, and not simply
positions that the engine evaluates incorrectly?

~~~
jdoliner
This is definitely the biggest limitation of our approach right now and there
are certainly some things that we counted as blunders that aren't true
blunders. We're working on rectifying this by doing another pass with a better
engine and more time to analyze.

That said we tested this on a smaller set of games by comparing it to results
from better engines and found that only a very small number of moves tricked
crafty. It's still generally quite reliable for the majority of moves.

~~~
logicallee
you could just rewrite your article to call these "obvious blunders" \- i.e.
which you define as ones that crafty identifies in 2 seconds or less. redefine
what you're doing so your methodology is correct :) Plus it's still
interesting. Probably more interesting than blunders that take longer to
identify!

Once you _have_ found the blunders, you can verify them by analyzing the found
positions more deeply. (Of course you should also report the number of false
positives - ones that appear blunders after 2 seconds but turn out not to be
on slightly longer analysis.)

------
jdoliner
Hi guys, author here. I'll be monitoring this thread for the rest of the
evening. Happy to answer any questions.

~~~
cjbprime
Hi! Did you consider modeling the fact that a blunder in the first ten move-
pairs seems far less likely in expert play, given that they're usually playing
from book and there are fewer pieces in motion to consider?

(I'm wondering if blunders after move 15 are in fact far more common than your
model suggests, and they're just being extremely diluted in your stats by
correct opening play almost every game.)

~~~
jdoliner
You're totally right about this. In general the efficacy of engines is
different for different parts of the game. In the beginning players (and
engines) are mostly playing from books so there's a lot fewer mistakes and
what the engine thinks of as a mistake is just someone going outside its pre
defined book. Mid game the analysis is very effective. In end game the engine
can actually completely solve the game, so it really just judges moves as
winning or losing.

Spoiler: We're planning to address a lot of this in an upcoming followup I'm
working on right now. We're going to take this in to account in our analysis
as well as giving people the raw dataset with info about when the blunders
occurred so that can learn from it themselves.

------
fsk
One thing I know from playing serious Bridge: An expert player makes far fewer
mistakes than the average player. He does not make zero mistakes.

I read one tournament report, where an expert player revoked.

When an expert plays good or average players, he does not need to be brilliant
to win. He just has to play competently, and wait for his opponents to make
mistakes.

~~~
hudibras
A revoke at a bridge tournament is the closest thing to a crime scene that
I've ever seen. I half-expected the tournament directors to cordon off the
table in yellow plastic tape while they recreated what happened.

~~~
fsk
Some of the rules are awkward and confusing to someone who doesn't understand
them.

For example, you can convey information to your partner, based on how long you
take to bid. Technically, you aren't allowed to have that information.

Me - Left Opponent - Partner - Right Opponent

1NT - x - xx - pass

pass - 2C (after long pause) - x - pass (after long pause)

pass - 2D

Explaining:

I played a weak 12-14 HCP NoTrump opening

opponent on my left doubled, showing a good hand

my partner redoubled, saying he also has a good hand (i.e., we got them now)

Rather than letting us make 1NT redoubled, the opponent on the left ran out to
2 Clubs.

My partner doubled, because he had good clubs (i.e., we got them).

The opponent on the right passed, but he waited a long time before passing,
illegally conveying to his partner that he was not sure if they should stay in
2 Clubs or run.

Taking advantage of that (illegally obtained) information, the opponent on the
left decided to run to 2 Diamonds.

So, someone who does not understand the concept of unauthorized information
would not understand why I would be disadvantaged.

At a local club game, I would let it slide. At a regional tournament, I'd
expect the director to get it right.

------
rozim
See also the papers by IM & PhD Kenneth Regan on "Intrinsic Chess Ratings"
such as
[http://www.cse.buffalo.edu/~regan/papers/pdf/ReHa11c.pdf](http://www.cse.buffalo.edu/~regan/papers/pdf/ReHa11c.pdf)
.

------
stolio
Here's commentary on the Carlsen/Anand double-blunder:
[http://youtu.be/6K86f27uuP0?t=14m36s](http://youtu.be/6K86f27uuP0?t=14m36s)

It's not easy to see.

------
pk2200
These results surprised me. I expected a much wider gap in correct move %
between a 1500-player and grandmaster. It'd be interesting to see if the slope
of the graph is steeper for minor blunders that reduce the evaluation by less
than a pawn. These are the more subtle positional errors - weakening a square,
not maximizing piece activity, wrecking your pawn structure, etc. Amateur
games are filled with these mistakes, but they are much rarer in GM games, and
I'd expect the difference to be more than just a few percentage points. But
Crafty's not the right engine for this job. You'd want something with a more
sophisticated evaluation function, like Stockfish (several-hundred ELO
stronger than Crafty).

~~~
jdoliner
Give us a few days :p. We'll have exactly the dataset you need to answer these
questions. (And we'll be releasing it publicly.)

~~~
pk2200
Great! After thinking about it some more, I think I understand why the graphs
are flatter than I expected. There are differing degrees of difficulty in
tactical mistakes. When a 1500-player blunders a pawn or piece, it's often
resolved by a trivial one-move sequence. GM blunders are more subtle, often
requiring a lengthy (say 5-10 ply) sequence to resolve. You could prove this
by recording the minimum search depth the engine needs to recognize the
blunder. (This is tricky, because search extensions result in many sub-
variations being analyzed much deeper than the nominal search depth, but I
seem to recall that Crafty has an option for disabling extensions.)

------
grizzles
Grandmasters blunder more often than this. I would venture to say that what
correlates with blunders more so than rating is time. Error rate goes way up
in Blitz and Rapid.

IMO the more interesting thing about chess skill at the top is how much way
way better GMs are than everyone else.

To me, ratings at the top feel more like an exponential scale than a linear
one. For example, I have beaten International Masters at chess lots of times
but have never once beaten a GM.

If I studied or cared (which I don't), I think maybe it would be possible to
squeeze out a lucky win once in awhile. Aspiring to be a punching bag isn't a
very appealing notion though, so you can understand my lack of motivation. GMs
are crazy good.

~~~
slm_HN
>To me, ratings at the top feel more like an exponential scale than a linear
one. For example, I have beaten International Masters at chess lots of times
but have never once beaten a GM.

If true this is purely psychological. You are unable to beat a GM because he's
a GM and you think you're unable to beat GMs.

The strength difference between IMs and GMs simply isn't that great. Because
the GM title is based on results and not ratings there are frequently IMs who
are higher rated than GMs.

~~~
grizzles
I think you are citing the exception(s) to the rule. Most GMs are stronger
than IMs imo. I don't think it's psychological. I have played players (GM and
otherwise, including other untitled players like myself) that I know are so
much better than me because they win and I can't even comprehend how they
arrived at making the moves that they did.

As an aside, this is kind of an issue I have with chess analysis. A computer
can 'verify' that a certain move is good or bad. That's fair enough. But in
the past I have seen players (of lower skill level to me) discuss analysis in
for example, a battle between two bigname players.

I have sometimes wondered if these discussions are truly honest because I have
seen moves made by top players that I don't even understand how they arrived
at the process of deciding that was the correct move vs others. Excluding GMs,
a human simply cannot prune the game tree at depth like a computer can. So
discussing a few tiny branches of the game tree like one is correct and the
others aren't just seems really silly for the rest of us.

------
paulftw
OK, so they've analyzed 4.9 million moves. How many double blunders did they
find in the set? If the hypothesis of independent events is true there should
be about (4.9e6 / 10,000) = 490 doubles in the data set. An obvious way to
test accuracy of the model is to compare that to the actual number.

Why hasn't that comparison been done/mentioned?

------
sogjis
What exactly is blunder? Watch this game:
[http://www.chessgames.com/perl/chessgame?gid=1032537](http://www.chessgames.com/perl/chessgame?gid=1032537)

Tal sacrificed horse and queen.

------
sprkyco
Not relevant to chess, but in Japanese there is a saying: saru mo kikara
ochiru. Not sure if I spaced that correctly but it amounts to "Even monkeys
fall from trees"

