Hacker News new | past | comments | ask | show | jobs | submit login
Positions chess engines don't understand (chess.com)
407 points by diplodocusaur 27 days ago | hide | past | favorite | 193 comments

When OpenAI trained their Dota model to beat people 1v1 mid SF, they had a contest at The International (the big yearly dota tournament, RIP) like "can you beat OpenAI's new bot?" and had a big bucket of prizes for anyone that could.

By the end of the night, the bucket was empty. People learned to cheese the bot by running "out of bounds", so to speak -- normally in a 1v1, you're supposed to stay close to your opponent, since they'll be getting stronger if you leave. But the bot didn't know how to deal with it when you snuck behind it (normally an insane maneuver, almost guaranteed to cost you the game in normal play) and prevented his army (called "creeps") from running to the middle lane. (All you have to do is go wave your hand at them, and they mindlessly chase you.)

People did that over and over, and your own army would eventually overwhelm the bot and win. :)

That's a great story! It definitely speaks to human beings' knack for finding edge cases.

Do you know what year that was? I'm wondering if AI has advanced and patched those weaknesses.

I am of the surprisingly-controversial opinion that ML’s future is going to be as a human empowerment tool — a bicycle for the mind, yet so much more. Listen to how cool my gamedev music sounds: https://soundcloud.com/theshawwn/sets/ai-generated-videogame...

I feel comfortable saying that it’s awesome and being immodest, because I didn’t write it! None of those notes are mine. “Crossing the Channel” has one of the strangest tempos I’ve heard, yet it sounds so cool because I think the model made a mistake early on, and decided it wasn’t a mistake — instead, it came up with the most likely not-mistake it could think of, and it happened to sound great.

But I “wrote” all ~20 songs on that list in one night. Took about six hours, aka a standard workday. I don’t think it’s production-grade music - far from it - but it also wasn’t a production grade model. (On the other hand, it was trained by gwern, so I’m skeptical the production models of the future will capture all the little magic that he seems to imbue into his.)

I chose the instruments, I decided what sounded good, but I didn’t write a single note. It felt a lot like listening to someone jam out, and asking them to play different things. I was just the fella who was there to listen.

That’s where ML is going to shine. The future is going to be so exciting with the things you’ll be able to do.

But not in the current direction, I think. Currently, the goal is to factor the human out of the equation entirely. So the answer to your question is: even if the AI has “advanced”, its only because they added this case to the training set. If a human was at the helm, they would have annihilated you, because they’d see what you were doing and pilot the AI over to you.

I am skeptical this case is solvable, in a fully automatic way. IMO human empowerment will happen by arming actual humans with these models. (Depressingly, “arming” might have a few meanings depending on how AI war turns out, but that’s a different conversation.)

Interesting songs. "Crossing the Channel" doesn't seem to have a strange tempo so much as the piano chord on the offbeat (and perhaps some swing on it as well, I can't quite tell).

The problem I think AI generate music has to overcome is that it kind of meanders around without coming to any "conclusions". That said, perhaps this is the perfect music for wandering around a town in an RPG!

This is where the human element comes in though I would think: a guiding hand to say "start ramping up to more of that, okay now you want a lull here..."

The best music I’ve heard from AI by far is AIVA[0], the “AI Composer” with a YouTube channel uploading a couple new songs every week. I’d argue its the only generated music I’ve heard that could pass as “good”.

I don’t think it will be long before AI composers are better than all but the most talented humans.

[0] https://youtube.com/channel/UCykVChITx5kqBoGkzfz8iZg

> I am of the surprisingly-controversial opinion that ML’s future is going to be as a human empowerment tool — a bicycle for the mind, yet so much more.

In chess, these players are called centaurs. Per my limited knowledge of the current chess meta, they are considered to be much better than either people or AI in isolation.


Centaur chess did reign for a while. For AlphaZero, which has a strong ability to perform pattern recognition based on experience in a limited domain, humans may not be able to add value to its play.

That said, most board games are a very limited domain. Even more complex computer games prove more challenging than the current AI approach can handle without weaknesses (see comments about Dota on this page).

In complex real-world domains, humans can definitely still add much value relative to computers working by themselves.

  "considered to be much better than either people or AI in isolation."
Is this really shown to be true?

I went down this rabbithole a few years ago. At the time, I looked around but couldn't find any more details about centaur chess, or evidence that centaurs could outplay AIs.

Then, last year, I stumbled across the following pair of articles. It's about the correspondence chess world championship, where both players are allowed to use engines, and (crucially) have a very long amount of time to analyse each position in-depth and consider long-range strategic implications of each move. The chap interviewed learned to exploit his opponent's over-reliance on the engine, and played in such a way that he was able to gradually accumulate small but compounding positional advantages that eventually gave him an edge. The whole time he used his own engine to catch tactical weaknesses in potential move sequences. A fascinating read.



There's correspondence play which allows engines (or any resource) and certain players which are consistently better at the game under those conditions.

Some of the other players surely try just running Stockfish overnight and taking whatever it says is the best move, and apparently that strategy isn't equally good.

Note that I think the "centaurs" are kind of playing a pretty different game than regular chess: they might have a knack for knowing when a particular engine is weak or strong and then trusting the corresponding engines lines more, or something like that.

I don't think it's that controversial at this point. Especially given a growing sentiment that there are limits to where ML/DL can take us which we're reaching--and the general lack of success getting practical results from cognitive science, etc.--there seems to be a growing sentiment that we should be looking at augmented AI types of approaches.

Heck yeah, thank you! It's delightful that there's a name for it, because it seems like the first people to become centaurs in their day to day lives will have a massive advantage over those that want the purity of unguided AI.

I was eagerly hoping for a video of a centaur vs centaur tournament, but there seem to be none. https://www.youtube.com/results?search_query=centaur+chess

At least, not yet. Maybe someone could do a tournament with https://www.twitch.tv/gmnaroditsky or https://www.twitch.tv/gmhikaru? Actually, the Gotham Chess guy might: https://www.youtube.com/watch?v=O1b-cuPDBZo&list=PLBRObSmbZl... He's always looking for new angles for his Youtube channel.

Centaurism isn't applicable to every situation -- Starcraft's AI kicks the butts of every pro player, because you can scale up and overwhelm your opponent. But there are often already humans in the mix, in these models. The humans are just designing the loss functions or deciding what to model, rather than using the results. So it's a "delayed" mechanical turk in that sense.

When done right, it's so effective that it feels like cheating^Wthe future: https://twitter.com/theshawwn/status/1182208124117307392

Those are some of the nicest Stylegan outputs I've ever made. (Uh, do me a favor and ignore the Zeus one...)

Each of them were crafted. The process was to start with something, and that "something" often didn't need to be anything close. For example, if the photo is an old man, but stylegan's showing a kid, you turn up the "age" slider. Do that for every feature; it felt like the character creation screen in fallout.

Amazed it isn't an app yet. Artbreeder is nice, but it's not the same -- the key component was to have Peter Baylies' (follow him! https://twitter.com/pbaylies) reverse encoder as a button you can press. Whenever the model gets too far from what you're thinking, you press it, and it morphs the face back closer to the target photo. In the process, it might distort the age slightly, or make the chin a little bigger, but it's an anchor at sea; it's why you can nail your final result, every time.

I predict Centaurism might be popularized by gamedev. It's going to be pretty neat when some studio trains an RL algorithm vs someone's heart rate. Higher heart rate = more enjoyment, lots of the time, so you'd end up with either the funnest game or the scariest game you've ever seen.

Probably a decade away from that though.

Please see my comment to another post above, and read the following pair of links!



> Starcraft's AI kicks the butts of every pro player, because you can scale up and overwhelm your opponent

Nah, it's because it has superhuman mechanics. If AlphaStar didn't rely so heavily on it's mechanics it might have produced interesting insights.

Everyone in the SC2 community was hoping for that, but it didn't really happen.

Was there any Starcraft tournament where the AI was limited to 200 apm and disabled maphack?

AlphaStar doesn't have maphack. It has more APM than 200 but so do the top players (and so do I on good days btw, and I'm only in Diamond 3). It does have insane mechanics that no human can match, like pixel perfect clicks every time and incredible reaction times.

One streamer (Lowko I think) analyzed a game he played against AlphaStar where it would "macro" by going to some position where you could just see all the production structures (some by only a few pixels on the edge of the screen) and just click all the button needed for unit production in a few milliseconds.

I am aware that Nada could have 400 apm in a 30 minute game, but most of it was spam.

Those Brood War bots had 36 thousand actions per minute and mainly due to technical limitations (the game couldnt accept more). And those were "perfect" clicks - not spam.

Limiting a bot to 200 actions (or even less) makes it more comparable to human, since at some point every click becomes a resource too. Even best players have to stop macro for some (relatively short) time. While AI with 36 000 clicks per minutr is not limited by time.

I don't remember that Lowko made that video. Besatyqt made a video where AlphaStar clicked on a Barrack that was barely visible at the top of the screen, but IIRC it was only once, not a systematic trick. In my opinion, if the AlphaStar team just reduce the clickable area of the screen, it would have seam natural.

>I am of the surprisingly-controversial opinion that ML’s future is going to be as a human empowerment tool

After a recent article about an ai driven text adventure game I seen on HN, Aidungeon, I decided to give it a try. The article was about some issues with the company and content filters or something, but my number one impression from trying it out was that it would make a great base for writing short fiction from.

At least that was what it seemed like to me. The first few times i tried it, playing it like a game, it was like wandering through a dream that kept changing. After that i figured out the different command modes and realized, it was far more useful as something of an idea generator where it's more like a collaborative writing partner than a dungeon master or something and I found it more enjoyable.

I think it would be a great tool for a stuck writer with a vague idea and a concept, but isn't sure where to take it, but as a game by itself, the human element is definitely missing. It seems to works best when you're the one controlling the narrative bouncing ideas off the ai as opposed to human crafted text adventures where the game is more in control.

Tom Gruber (one of the cofounders of Siri) had a talk a while back where he spoke about this. https://www.ted.com/talks/tom_gruber_how_ai_can_enhance_our_...

> Tom Gruber, co-creator of Siri, wants to make "humanistic AI" that augments and collaborates with us instead of competing with (or replacing) us. He shares his vision for a future where AI helps us achieve superhuman performance in perception, creativity and cognitive function -- from turbocharging our design skills to helping us remember everything we've ever read and the name of everyone we've ever met. "We are in the middle of a renaissance in AI," Gruber says. "Every time a machine gets smarter, we get smarter."

I'm a big believer in Intelligence Augmentation, in the Engelbart tradition, over completely independent agents. I don’t think my ideal AI is one that has its own sense of agency and autonomy, rather I think of technology as an extension of myself, one that enhances my autonomy.

Kind of like Star Trek AI (besides Lt. Cmd. Data that is).

The reason your opinion is controversial is because it assumes an ill-defined transient state as a stable equilibrium. What exactly is it about the human brain that makes it so special? Why couldn't other systems given many, many orders of magnitude more computational resources be arbitrarily better at everything it does?

Humans do things for humans, an AI, no matter how powerful, is not a human mind. So a human may be required to make is more creative in a way that isn't alien.

Suppose I can emulate a human brain neuron for neuron. How exactly is it different in capabilities from an actual human brain? What quantifiable property can you assign to the human mind that makes you so certain a machine can never match it?

Your argument is basically good old god of the gaps. You look at what the state of the art in technology cannot do right now, and base assumptions on that without really delving into the issue. 10 years ago you would've included stuff like image classification and natural language generation in the "only humans can do this" bin.

Nice playlist! Have you written about your process anywhere? I'm sure there are a lot of people who would love to follow in your footsteps here.

The closest writeup is probably Gwern's: https://www.gwern.net/GPT-2-music

It won't get you quite as far, since I had to discover that you can prompt it with chords, and and how to set the instruments. But it's pretty good, and maybe someone will discover some new way to make it better.

>> I am of the surprisingly-controversial opinion that ML’s future is going to be as a human empowerment tool — a bicycle for the mind, yet so much more.

I don't think this is controversial at all, rather it's what Donald Michie called Ultra-Strong Machine Learning back in the 1980's [1].

Briefly (and with modern interpretations) Michie defined three "levels" of machine learning system: Weak, Strong and Ultra Strong.

A machine learning exhibits Weak machine learning ability when it is only capable of improving its predictive accuracy when trained on data.

A machine learning system exhibits Strong machine learning ability when it satisfies the Weak machine learning criterion and can additionally output its model in symbolid form that is readily understandable by humans.

And a machine learning system exhibits Ultra Strong machine learning ability when it satisfies the Strong machine learning criterion and can additionally instruct the human user so as to improve the human user's performance.

In a sense, not a bicycle, but more like a jetpack, for the mind. Unfortunately we don't have anything like that today. Yes, I'm aware of claims that Go players have learned a lot by observing AlphaGo play. But AlphaGo/Zero/blah is not capable of instructing a human directly. Michie envisioned Ultra Strong Machine learning as a kind of coach basically, or teacher, an AI teacher.


[1] Michie, D. (1988). Machine learning in the next five years. In Proceedings of the third European working session on learning (pp. 107–122). Pitman.

Online here (but behind a paywall):


Full disclosure: I'm one of Donald Michie's grand-students; he was my thesis advisors' thesis advisor. Additionally, Michie was besties with Alan Turing, so I have the privilege of being a grand-nothing of Turing's :P

Did the model also choose the song names?

Surprisingly, yes. I want to say “yes” with no caveats, but I am ethically bound to point out that I may have changed a few of them. But if I did, it was inspired directly from what it came up with.

The way it works is, there’s something called ABC notation, and Gwern found a big ass-database of Irish folk songs. It has a “title” field along with tempo, ID, etc. I would fill in what key I wanted, give it the first chord, and let it go. (If you don’t give it the chord, it generates ABC songs that are all kind of boring piano pieces with no background chords. It was amazing how much difference that made.)

So, not only did it choose the song names, but it couldn’t not choose a song name. It’s the only thing it knew. Its whole world was Irish ABC folk music, and as far as it was concerned, the title was as important as the notes. It couldn’t know it wasn’t.

Ah, I figured out what had been bothering me. I’m pretty sure I chose “For Ireland!” and possibly Blackbird, but e.g. Crossing the Channel was a GPT original, I believe. I was aiming to make it feel like an oldschool FF3 type game, so For Ireland was the battle song, Blackbird was the name of the airship you’d make your daring escape on at the end, Marco’s shadow was the assassin team hired by the empire to take Marco out, etc.

The whole point of running such a stand is to get high quality data from players and make the AI learn from them.

You don't 'patch' a reinforcement learning model, you just give it data to train on.

""AI"" will probably never be good in games like Dota2, League of Legends

they're way too complex, even in above mentioned example it was 1vs1, meanwhile in general those games are 5vs5 (lol at least)

I'm sorry but this is just incorrect - OpenAI beat the previous year's champions in a 5v5 match the year after the 1v1 debut: https://www.youtube.com/watch?v=pkGa8ICQJS8

There were restrictions / rules on the game (only a pool of 18 heroes were allowed), but they they won both games in a best of 3. You can see the progression / evolution of the training of the AI here: https://openai.com/projects/five/

It's definitely doable to train them to be good at extremely complex games like Dota / League, it's just that the resource requirements to train the engine are significant. After the bots were opened to the public, they had a 99.4% win rate against pubs, even accounting for cheese strats.

>There were restrictions / rules on the game (only a pool of 18 heroes were allowed), but they they won both games in a best of 3. You can see the progression / evolution of the training of the AI here: https://openai.com/projects/five/

In league of legends there's almost 150 heroes

not only the whole draft phase makes giant amount of possibilities if you apply combinatorics, then also in game itself, the amount of possibilities that the draft phase itself results is difficult to imagine

I didn't play much Dota, I'm LoLer, so I don't understand those limitations:

Pool of 18 heroes (Axe, Crystal Maiden, Death Prophet, Earthshaker, Gyrocopter, Lich, Lion, Necrophos, Queen of Pain, Razor, Riki, Shadow Fiend, Slark, Sniper, Sven, Tidehunter, Viper, or Witch Doctor)

No Divine Rapier, Bottle

No summons/illusions

5 invulnerable couriers, no exploiting them by scouting or tanking

No Scan


But good to know that I'm relatively close to being 100% wrong here, thanks.

This comment is very late, but maybe you'll see it in your threads. They discovered the pool of heroes didn't take exponential amounts of growth to increase, despite the pick possibilities moving up exponentially. That was one of the results of their blog - training each new hero was a linear increase in difficulty. They stopped at 18 because that's how many they had trained when the competition start date hit.

For the limitations, divine raiper and bottle are items with unique interactions. Divine rapier drops on death, and bottle has unique interactions with elements on the map that require interacting with the environment in a specific way.

Illusions are the same as LeBlanc's Mirror Image skill. Just copies of the hero that either deal no or limited damage, but can't cast spells. Likely they'd have to train the model to include a evaluator on the likelihood of a unit being an illusion rather than a hero.

Couriers bring items from the shop in base to your hero, so you don't have to return to shop. They are killable however, and if they are holding items when they are killed those items will be inaccessible for 3 minutes. They made them invulnerable because the bots would re-buy items that were inaccessible. Invulnerable couriers can be exploited however, thus the rule.

Scan is a global ward that only tells you whether or not someone is in the area (doesn't give you vision), and lasts 5 seconds.

It's worth mentioning that the restrictions they placed on the game were enormous, to the point that the human players were almost playing a different game.

It certainly beat OG in many aspects, but the beauty of dota is the ability to adapt in the game with different strategies, strategies which weren't possible with the restrictions.

Never is a long time (...hopefully)

Not too long ago, I think some people would have said the same about go.

the difference is that e.g lol constantly changes and you'd have to relearn significant part of game every iirc 2 weeks

also those games are waaaaaaaaay more complex than e.g chess

I guess we'd need giant advance in computational power in order to analyze hundreds of thousands of matches

Kinda reminds me of this old folklore.org story: https://www.folklore.org/StoryView.py?story=Make_a_Mess,_Cle...

Is there a video of this?

Yes, here's an example of the "cheese" strategy: https://www.youtube.com/watch?v=R85WVTFPmRE

Plays "normally" (cautiously since the bot is so good) until 7:00 accumulating gold for enough items to more safely eliminate the creeps: https://www.youtube.com/watch?v=R85WVTFPmRE&t=420s

Thank you so much! I couldn’t find a video anywhere I looked, and started wondering whether I imagined the whole thing.

I think these kinds of adversarial examples will be extremely common in production models. People won’t be crafting images that fool the model into thinking you’re a stop sign; they’ll discover that when the human isn’t paying attention, you can run in front of a Tesla with a group of friends and it veers into oncoming traffic. (Terrible made-up example, but I’m pretty sure that it’s a losing game to play “can we think of all possible cases we need to train for ahead of time?”)

that's such a weird behavior, the bot completely ignores the opponent when in that particular spot?!

Sadly not. I'm not sure anyone expected the bot to lose. I've been searching for where I saw it, but 2017 feels like a lifetime ago. I'll link it here if I do.

You can watch Black beat the bot in a fair game here, though, which I find immensely satisfying: https://youtu.be/qov1NXsTSbs?t=88 IIRC he was one of only like... five? (less?) who managed to win one game.

I don't think this would work for chess. Leela is using AlphaZero like self training and the strategy is to not be biased by human moves.

Just look at this https://www.youtube.com/watch?v=8dT6CR9_6l4 chess engines improved very much lately with regards to "understanding" chess positions.

Likewise, fancy AIs playing starcraft take also unpredictable moves that well trained human players would never suspect.

An astonishing insert:

>A good example of human exploitation of the engine's failings is GM Hikaru Nakamura's defeat of Rybka in the following three-minute blitz game from the Internet Chess Club. Nakamura cleverly locks the position so that progress is impossible for Rybka, then he offers two exchange sacrifices to the engine. With the position locked, the engine's rooks have no value, but the engine thinks it has a winning material advantage. With a draw by the 50-move rule approaching, the engine sacrifices a pawn to avoid a draw, but this proves a huge mistake as Nakamura is then able to win the game, an incredible achievement!

There is a video of this on YouTube


Thanks for the link. The commentary notes that this was possible because Rybka is an algorithmic engine with hard constraints like "must try to win in 50 moves to avoid tie", and this is what Nakamura exploited to win. DeepMind and other deep learning/neural network approaches don't have hard rules like this.

> because Rybka is an algorithmic engine with hard constraints like "must try to win in 50 moves to avoid tie"

That rule isn't the engine's constraint, that's a rule of chess. 50 moves without a pawn move or piece capture (basically "irrevocable progression of the game") is a draw.

The chess engine thought it was winning, so it wanted to avoid a draw, even if it meant putting itself in a still winning but otherwise worse situation. Unfortunately for the chess engine it was mistaken, in the original position it wasn't winning, but drawing, and in the new position it wasn't still winning, but losing.

> 50 moves without a pawn move or piece capture (basically "irrevocable progression of the game") is a draw.

A player may call a draw if they wish.

This game was also in 2008, I don't think cell phones could even beat all humans beings back then.

These days a human couldn't draw against a cellphone, contrary to the article.

And Rybka thought, materially, it was winning, so didn’t want to allow Nakamura to be able to call the game a draw.

So it lost a “small” amount of material which transposed the game from one that was an obvious draw to one that was obviously winning for Nakamura. That’s the “contempt” effect that Nakamura kept referring to: the computer would rather continue the game in a slightly-less-good position than allow a draw from a better position.

Nakamura exploited the fact that it was wrong about its assessment due to the horizon effect. A traditional chess engine evaluates a position by starting from the current position and evaluating promising moves downward along the tree. In a locked-up position like the one linked, the computer has to evaluate an enormous number of positions, none of which will materially affect the game due to the inability for either side to force meaningful progress.

A human can understand the themes in a position and understand where they want the pieces to go, then work backward from there to see a path to achieve it. This is particularly easy when the opponent has no achievable setup of their pieces where they could counter you once you’ve achieved that position. In such a situation, you can (almost) completely discount any moves they make and only consider the path for you to get to your ideal position. Even an beginner-intermediate chess player (1600+) could analyze this type of position easily.

This was such a situation. Rybka had no moves to progress things (other than the game-losing sacrifice Nakamura wanted to induce). So Nakamura didn’t have to evaluate any of Rybka’s moves, he just had to put his pieces in the right place to be ready for the inevitable blunder when Rybka decided to sacrifice a pawn to keep its “winning” position thanks to contempt.

> A human can understand the themes in a position and understand where they want the pieces to go

So can a computer though. This is an interesting edge case, but it seems to me that this particular bot mostly evaluated odds of winning by the pieces, not by the position. A more modern algorithm will actually use the positions on the board.

What would a human do if told "you're up by two rooks, do you want to allow a forced draw?" without being allowed to look at the board.

Sure a computer can, conceptually. But they don’t, at least not in the way we generally do.

Traditional engines have been designed under the premise that it’s better to have an extremely fast evaluation function you can apply deeply down the tree than it is to have a slower, more thorough evaluation function you can’t apply to as many nodes. Any extra processing budget is spent on more intelligent pruning of the tree of moves that don’t appear likely to yield anything promising.

So an engine will look at a position like this, see that it has a sizable material advantage, and doesn’t give as much weight to the obvious positional conclusion that it has no productive path forward. It also won’t understand that “just” sacrificing a pawn turns the position into one that’s completely lost due to the chain of passed pawns it eventually results in.

Yes, the engine could be improved to understand this type of position, but without extreme care that easily results in it taking longer to evaluate every other position. The net effect is an overall weaker engine, unless you find a way to improve the evaluation without slowing down the evaluation function.

> Traditional engines

That's the crux of it though isn't it? These are complaints about a specific approach, not generally of computers, and not even of the best demonstrated models.

I'm also not entirely sure this is the correct way to think about it. If the function to evaluate the accept-a-draw decision is separate from the next-move decision. Then if the accept-a-draw is refused, doing the poor pawn sacrifice is, without question, the best available move. My interpretation is that the accept-a-draw is made in a vacuum here.

Thank you.

Was it actually configured with "no draw" policy? The chess.com explains that Rybka thinks it's winning because of material advantage.

Looks to me that a modern AlphaZero based engine like Leela would understand this position easily. Even SF with its new neural net based eval function might understand it. My Leela reports a close to 0 evaluation after Nakamuras first sacrify.

Why did Rybka sacrifice the pawn to avoid the draw if the resulting position was bad? Was its contempt setting really high or something?

Edit: looks like in the video Nakamura says the contempt factor was too high, and later it was changed.

Correct. Rybka believes it has the piece superiority and doesn't consider sacrificing a pawn to be an issue but the only pawn that it can sacrifice opens up that side of the board for black. Nakamura explains a little before the sacrifice that if they exchanged queens it'd be almost certain that Rybka would have to sacrifice that pawn to avoid draw.

Might need a glitchless chess league!

When watching the WC games, I've seen it happen that a move wasn't considered as a top move by the engine, but once played the engine realizes it's actually crushing. Something about the heuristics used to prune the vast search space can make it miss sacrifices or seemingly sub-optimal moves that temporarily weakens the perceived position but has a huge payoff in the end. But humans find them. Of course, given enough time and depth the engine will eventually circle back and try the move. But it has no intuition.

Also, an engine without an endgame tablebase can be pretty stupid. There are certain rules one can deduct when there are few pieces left, but a min/max engine will search forever, not knowing the patterns.

> I've seen it happen that a move wasn't considered as a top move by the engine, but once played the engine realizes it's actually crushing. Something about the heuristics used to prune the vast search space can make it miss sacrifices or seemingly sub-optimal moves that temporarily weakens the perceived position but has a huge payoff in the end. But humans find them.

Just to observe, humans display exactly the same phenomenon of ignoring a move before it's made while still being able to realize, after it's made, that it was very strong and ignoring it was a mistake.

The word I would use for a move like that is “counterintuitive”

Which... suggests the computer displays intuition

Well played.

OP's choice of words is inexact. The engine has a different intuition, not none. Humans will usually explore trades and overlook seemingly "slow" moves that the computer usually finds.

What the engine does lacks is what chess players call "theory," hence the need for endgame DBs. Humans can also beat engines with "logic," like the Naka match. That one was extraordinary. A simpler & more typical example is isolating a pawn so that it can't ever be defended, but also can't be attacked immediately. For a human, it's easy to understand this is long term vulnerability even if the pawn survives for 20+ more moves.

Anti-computer strategies, ironically, force you to "play the player, not the board."

Exactly. In fact you could argue that intuition, ‘arriving at a conclusion without a conscious understanding of your reasoning’, is the only thing a neural net displays.

Isn't an heuristic the exact definition of an intuition (i.e. a way to evaluate a complex situation using simpler rules that woks most of the times)?

Intuitions often do not deliver the correct response; many puzzles and paradoxes (not to mention a few philosophical arguments) hang on misleading intuitions.

Furthermore, heuristics are rules, while intuitions are not. Sometimes people use their intuition to go against what a heuristic suggests!

Yes, and they display the converse. It's the existence of the converse behaviour which is distinguishing.

A machine which is just pruning a search space with a low-quality valuation strategy isn't doing anything very impressive; only very fast.

A human with a very high quality valuation technique, but a slow search pace, is relevantly impressive.

This is actually a category of moves on sites like Chess.com.

They categorize moves into things like blunder, mistake, inaccuracy, good, best.

But the top move is 'Brilliant' and the second best is 'Best Move.' Best move is the move the chess engine thinks is the best, and brilliant is the category for moves that weren't considered, but turn out to be the best. They're uncommon, but if you play enough, you'll hit them occasionally.


Did a little reading and it seems that chess.com are a little ambiguous as to exactly what they're doing here.

Two plausible theories are

1) They run an evaluation of the position to a certain depth and come up with the Best move. Then, when evaluating your opponent's position if you didn't make the Best move, it turns out that it evaluates this position as even more in your favour than the Best move, it calls your previous move Brilliant.

2) They run an evaluation of the position up to a certain depth and note the Best move as the most advantageous one. They then search a bit deeper and if the move you actually took wasn't the best in the original eval but now is, they call it Brilliant.

There would of course be a few caveats to both ideas. In 1 do they go and rerun the evaluation after the Best move to compare? Are they running the whole analysis backwards anyway (this seems to cause problems with both systems)?

My suspicion would be number 2 given the way it's phrased in your link, but I've seen other descriptions that make it sound more like 1. My overall takeaway is that it's simply a bad idea to have the category to begin with as it just confuses matters for little gain, at at least they should provide a more technical description. I'd wager that close to 0% of Brilliant moves were intentionally calculated so there was no brilliancy as the term is usually used in chess.

In the chess literature this categorization is established as !!, !, !?, ?!, ?, ?? for over a century now. Everybody loves the !! and ?! moves which do not only evade chess engines but also humans.

> But it has no intuition.

On the contrary, by your own description:

> Something about the heuristics used to prune the vast search space can make it miss [these].

it does have a intuition. And in these cases, its intuition is wrong.

It's not clear it's helpful in understanding intelligence to simply redefine terms to be "whatever the computer happens to be doing". In this way it will always appear that a computer is intelligent; trivially.

Rather we should start with what known intelligent systems (ie., animals) are actually doing, and then compare.

I think it's extremely unlikely the mechanism of intuition in relevant animals (esp. humans) has anything to do with heuristics for pruning a search space.

How are you defining intelligence so that that animals are "known" to have it but computers aren't? And what do we know about what animals are actually doing?

I think it's very likely that intuition in animals will have similar mechanisms to the things that produce the same results in computer reasoning, since the problems are the same and so optimisation processes (whether evolutionary or designed) will likely converge on the same solutions.

Computers don't reason. Computation isn't a property of any physical system, it's a subjective property attributed by an observer.

For example, consider a circuit. Suppose we attribute TRUE to 3v, and FALSE to 0v. Then two simultaneous input of 3v, 3v giving 3v (and so on: 33->3, 03->0, 30->0, 00->0) can be said to compute AND.

But the same physical system is at the same time "computing" OR. Here we attribute TRUE to the 0 signal.

Any "computer" is actually simultaneously running many, quite radically different, programs. We only attribute one program to it in how we rig devices (screens, keyboards, etc.) to interpret signals to mean something device-relative. If you "plug a speaker in a gfx socket" the output isn't visual. It means something completely different, and the GPU is computing audio now.

So "computational" algorithms in being merely purely formal specifications do not describe any unique physical system, and only apply to physical systems in an observer-relative way. Computation is explanatorily bankrupt; it is only a game people play in rigging physical systems to behave as they wish.

Animals however have the intrinsic property of intelligence: it isn't observer-relative. We are, quite literally, the observer imparting meaning to our environment. Our brain states resolve to unique thoughts, not to "many radically different ones simultaneously". Gold isn't both gold AND something else, and neither is thinking.

The category of explanation for these abilities which are intrinsic to animals cannot be, and is not, computational. It's scientific. Involving characterising the properties of materials, and their causal effects.

> Any "computer" is actually simultaneously running many, quite radically different, programs. We only attribute one program to it in how we rig devices (screens, keyboards, etc.) to interpret signals to mean something device-relative. If you "plug a speaker in a gfx socket" the output isn't visual. It means something completely different, and the GPU is computing audio now.

I'm sure if you spliced an animal's nerves to different places, that would make them "mean" something different. In fact we know that this kind of thing happens in nature, e.g. the star-nosed mole's visual cortex is connected to the parts of its nose that sense touch rather than to its eyes.

> Animals however have the intrinsic property of intelligence: it isn't observer-relative. We are, quite literally, the observer imparting meaning to our environment. Our brain states resolve to unique thoughts, not to "many radically different ones simultaneously". Gold isn't both gold AND something else, and neither is thinking.

How do you know? What would you expect to be different if it was? Often two people do observe the same piece of animal behaviour but disagree about whether it shows the animal was thinking/intelligent.

> I'm sure if you spliced an animal's nerves to different places,

That's sort of right, yes. But if you put their nerves in the wrong devices, animals wouldn't be intelligent at all. And these nerves are themselves plastic, ie., they arent like a CPU with a limited set of physical operations; nerves are part of the body and grow.

So intelligence is really bodily. Intelligence is something that occurs because your brain-motor-body system is highly plastic (at the celluar and whole-body level) and is capable of adapting itself to its enviornment competently in ways that we call "intelligent".

It isnt any computation properties of the brain which do this (if we impart any, we cannot impart them uniquely anyway, ie., they cant explain much). Rather it's the plasticity of the body (including the brain) which provides "the right device" for intelligence.

> Often two people do observe the same piece

Well it is literally the case that I showed a CPU running a "program" to several alien races, none could tell me what program it was running. Each could provide an essentially infinite number of different answers, depending on how they interpret the electrical signal. A program is a purely formal thing that describes no physical system uniquely.

Whereas if I showed animals to these aliens, they could actually describe what processes constituted their intelligence.

And likewise, if i showed them Gold; they could tell it apart from Silver.

Insofar as a really-existing digital computer has anything akin to intelligence, its because of how its devices behave. Ie., if you showed aliens the LCD, then they could say "well the LCD output is particular, and is intereptable 'this way'.".

We are easily fooled by the devices we attach to CPUs into thinking that the CPU is "computing" the device output. It isnt. We have just created an illusion: the keyboard button says '1', '+', '2' and the LCD says '3' and we think the CPU has computed 3.

But, looking at the CPU itself, it has actually "computed" an infinite number of things. The '1' signal could be interpreted abitaribly.

My point here is that talking about "computation" is a deeply unhelpful way to understand anything. It's a profoundly non-explanatory category (it cannot explain any physical system; it is just a formal specific we follow).

Once we throw that away. We can see more clearly that what enables animal intelligence is the character of their "devices". The physical plasticity of their tissues (their growth and adaption), their motor skills, and so on.

It is possible to describe a tiny subset of intelligence ("reasoning") by a formal system; but that misses why animals can reason in the first place. You don't get the capacity for reasoning by just making electrons dance in an observer-relative way. That requires the observer to already have that capacity.

The capacity itself is a property of the physical system, "the animal".

> We are easily fooled by the devices we attach to CPUs into thinking that the CPU is "computing" the device output. It isnt. We have just created an illusion: the keyboard button says '1', '+', '2' and the LCD says '3' and we think the CPU has computed 3.

> But, looking at the CPU itself, it has actually "computed" an infinite number of things. The '1' signal could be interpreted abitaribly.

That's not really true. We could very easily connect up the CPU to flash a light 3 times, or use a speech synthesizer to say "three". The reason we call these systems "computers" is that we built them to compute like (human) computers, that is, to evaluate mathematical expressions. Understanding computation is useful, even essential, in understanding the behaviour of these systems, just as understanding language and grammar is useful in understanding the pattern of sounds that a person might make under particular circumstances, even though the relationship between particular sounds and particular meanings is arbitrary or at least underdetermined.

If computer chess has taught us anything is that "the problems are the same" does not translate to "so are the solutions".

To clarify, computers play chess by searching exhaustively large game trees. Humans, on the other hand, do not. "The problem is the same", but the solutions are not. So we have no reason to assume that "the problems are the same" means that "so are the solutions", for animals and machines, and we have evidence to the contrary.

> To clarify, computers play chess by searching exhaustively large game trees.

That's less and less true with more recent chess engines.

> So we have no reason to assume that "the problems are the same" means that "so are the solutions", for animals and machines

The optimal solutions are the same. Human chessplaying has not had millions of years of evolution go into it.

>> That's less and less true with more recent chess engines.

Search is still the basis of AI chess playing. It's either alpha-beta minimax or Monte Carlo Tree Search (used by Stockfish and Leela Chess, respectively).

Game-tree search algorithms like minimax and MCTS need an evaluation function to select the move that leads to the best board position and some modern engines train neural networks to estimate these evaluation functions. For example Leela Cheess is based on AlphaZero (I think) that popularised this approach and Stockfish has recently adopted it. But chess engines still use a good old game-tree search algorithm.

>> The optimal solutions are the same.

We don't know that. We have programs that can beat humans at chess, but whether they play optimally, or what constitutes "optimal" play in chess, that's difficult to say.

> Game-tree search algorithms like minimax and MCTS need an evaluation function to select the move that leads to the best board position and some modern engines train neural networks to estimate these evaluation functions. For example Leela Cheess is based on AlphaZero (I think) that popularised this approach and Stockfish has recently adopted it. But chess engines still use a good old game-tree search algorithm.

Humans will definitely consider lists of possible moves and countermoves and what kind of boardstates will result, if you define that as "search" then it seems fair to say that humans search too. A human searches less deeply and gains more of their playing ability from being good at evaluating the resulting positions - but as you say, that's also the direction that computer chess engines are moving in.

I don't know everything about how humans play chess and I don't think anyone knows, either. Maybe they do what you say, maybe they don't. Most of what they do is not open to our introspection so it's hard to know for sure.

Computers, as I said earler, play by searching "exhaustively". "Exhaustive" can mean different things and I apologies for introducing such vagueness, but let's say that computer players spend all their resources, during play, to search a game tree. Learning evaluation functions, or memorising good board positions during self-play is done at an earlier, training stage, but it is also a form of search. So both for training and playing it is not "less and less true" that "computers play chess by searching exhaustively large game trees".

I agree that it's a bit confusing because if you read press announcements by e.g. DeepMind after AlphaGo's win against Lee Sedol, you would indeed get the impression that the field is moving away from search, towards something magickal that can only be performed by deep neural networks and that obviates the need for a good, old-fashioned search. But this is just marketing, though of course very unfortuante such. In truth, search still reins supreme in game AI.

One point of confusion is what we call "search". There is one kind of search that is sometimes called "classical search" and that includes algorithms that search explicitly or implicitly defined search trees with nodes that represent search candidates. Minimax and MCTS are this kind of "classical search" as is e.g. Dijkstra's algorithm or binary search, etc. When I say that "search still reins supreme in game AI", I mean classical searh. In machine learning, under PAC-Learning assumptions, a "search" is anything that selects a generalisation of a set of examples, i.e. a hypothesis, from a set of candidate hypotheses (a "hypotheis search space", or sometimes just "hypothesis space"). So gradient optimisation-based methods, like neural networks, do also use search - they search for the model that minimises error on the training or testing set, etc.

For an early discussion of search in machine learning see Tom Mitchell's "Generalisation as search":


And for machine learning without search, stand by for my upcoming thesis :)

> I don't know everything about how humans play chess and I don't think anyone knows, either. Maybe they do what you say, maybe they don't. Most of what they do is not open to our introspection so it's hard to know for sure.

We don't know everything about how humans play chess, but as a chess player certainly at least some of the time a human player will consciously enumerate possible moves (or particular subsets of possible moves) and countermoves and think about the position after each one in turn. At least I do.

> Learning evaluation functions, or memorising good board positions during self-play is done at an earlier, training stage, but it is also a form of search.

If you define "search" this broadly then essentially any way of playing chess would constitute search?

>> It's not clear it's helpful in understanding intelligence to simply redefine terms to be "whatever the computer happens to be doing".

Famously warned against years ago by Drew McDermot:

However, in AI, our programs to a great degree are problems rather than solutions. If a researcher tries to write an "understanding" program, it isn't because he has thought of a better way of implementing this well-understood task, but because he thinks he can come closer to writing the first implementation. If he calls the main loop of his program "UNDERSTAND", he is (until proven innocent) merely begging the question. He may mislead a lot of people, most prominently himself, and enrage a lot of others.

Artificial Intelligence meets Natural Stupidity, Drew McDermot, MIT AI Lab, Cambridge, Mass, 1981.


I'm not redefining anything; heuristics to decide what to focus on and how are what intuition is. That fact that it's implemented differently is irrelevant; by that logic LAPACK isn't doing linear algebra because it's not using jury-rigged neural networks augmented with pencil and paper.

The machine isn't using a heuristic. The machine is just a system of electric field oscillations.

A rock rolling down a hill isn't using the "geodesic heuristic" either.

We impart to physical systems we have designed our intentions in designing them: we interpret so-and-so field oscillation as a "heuristic". But this is an observer-relative designation.

By the same observer-relative gesture, the rock has its heuristics too.

Animals however actually have heuristics; ie., dynamical interior models of their environment which are analysed to prescribe action.

Animals in being present within environments and having genuine interior reasoning/imagination/intutition etc. existi in a different relationship to "the chess game".

They aren't like rocks just "following the geodesic" (they do that at the atomic level, sure). But relevant heuristics here are genuine interior processes which are dynamically attached to environments.

you have the _horizon effect_. You do something, but the repercussions are too far in the future to see what the effect will be. This can cause the machine (or human) to do bad things. For example, trying to postpone the inevitable loss of a Queen which is at the edge of the horizon by giving up a pawn: this "wins" 2 tempi, pushing the loss of the queen behind the horizon. the game moves on, the machine sees the loss of the Queen and this time can give up a Knight to push it behind of the horizon... you get the idea.

There's also the positive horizon effect: you do the right thing, but for the wrong reason fe because there's a refutation that you didn't see. However, the refutation is wrong, and a few forced moves later you do spot why.

Huh? The only thing I can think of is that the engine analysis can swing immediately once the move is played because it hasn’t thought about the new position for very long. Obviously if you just watch a stream the evaluation changes immediately.

It would be very rare for the computer in the first place not to have a strong move as one of its candidate entries. Obviously the evaluation can shift but the difference shouldn’t be apparent to a human player and it would be even more unlikely that a computer would reject a top move but a human would find it.

This usually happens in very open position, with many <0.5cp options. The engines tend to explore totally different paths than humans. They don't care if a move is slow, forcing, etc.

Humans will tend to explore exchanges, faster & more forceful lines. Engines treat these like any other line, without prejudice. I'm a pretty average player. My positional moves aren't generally reasoned beyond 2-3 moves ahead. I go here. You do something, then I go there. Where I do reason deeper, it's usually forceful lines where a lot of material is traded. I go here. We trade 3 pieces. I attack this. Opponent defends, then I move there.

In a slower position, I can't reason ahead as much because I have no idea what the opponent might do. He has too many options.

> Engines treat these like any other line, without prejudice.

i'm not sure if i'm understanding 100% what you're saying but don't most engine heuristics try more potentially favorable first, such as moving a pawn? i think most modern engines have pretty aggressive move ordering

Kind of, but they don't know which one is more favourable and resources are limited. If there are lots of seemingly OK options, an engine will have to spread those resources and explore lots of moves.

Chess engines tend to find good moves that seem really passive to humans. They don't create a threat or force an exchange. The purpose of the move may not be that complex, the payoff will become obvious 2-3 moves later. A human just wouldn't have thought to even explore it. There are no hints (like threats and exchanges) that it is worth exploring.

That's what I meant by prejudice.

Conversely, even in very option-rich positions... humans will explore the more active lines, and calculate material exchanges more deeply than other moves. If a normal player is calculating 7 moves deep in the mid game, that probably means a lot of material is being exchanged.

Remember that an equally matched engine is exploring for more moves than you are. The human is better at deciding which move to explore.

You’re correct. The more aggressive version of this is called “move pruning”, effectively cutting off entire branches of the search tree. Techniques like this are essential for the strength of modern chess engines, but like every heuristic, they can be wrong. Sometimes, the engine makes a mistake because it literally didn’t consider a certain response.

> Also, an engine without an endgame tablebase can be pretty stupid.

What is an example of a realistic endgame (not something with 9 white pawns, 5 black knights and such) that chess engines are easily beatable in?

stupid != easily beatable

the AI will still avoid losing, but it also won't be able to find the endgame strategy to win

Your second paragraph vaguely reminds me that in Magic: The Gathering there are some decks that focus on attaining a certain combination of cards which would put you in a position where you cannot lose if you play them right. Some play those decks while being a bit fuzzy on the details. They couldn't actually finish it if pressed, but rely on opponents recognising the combo and conceding.

One of my favourite Magic stories is Luis Scott-Vargas splitting the top 4 of a Vintage tournament with Storm having forgotten to put Tendrils of Agony, one of his two win conditions, in his sideboard to Burning Wish for. Most of his opponents scooped with Burning Wish on the stack.


> When watching the WC games, I've seen it happen that a move wasn't considered as a top move by the engine, but once played the engine realizes it's actually crushing.

You don't need to watch WC game to witness that. Any position sharp enough will have swinging evaluations at the search horizon, and even amateurs can find a twist in their favor if they're lucky enough (to have their 8-moves ahead flash of genius actually work 20 moves ahead by complete chance).

In decent engines the heuristics used to prune the search tree range from exact (e.g. alpha-beta prunning) to conservatively innexact (low chance to actually miss something), and unless you find a systematic winning or drawing pattern (like the examples in OP's article), you can't beat the engine with your "intuition".

what is intuition? To me it is just very good pattern matching. I recommend everyone to play against the Leela chess engine with a very low number of search nodes. Start with 1 node and you will find that it rarely makes bad moves. The estimated ELO is around 2400-2500 for Leela with one node. which to me means that Leela does not have to look at much more positions than a human does. How is this not intuition?

Well, because it's Monte Carlo Tree Search with a learned evaluation function?

Once there are only a few pieces left, chess seems like a solvable game.

With three pieces there’s about 250000 positions, can’t a modern computer just check them all?

Indeed; endgame tablebases have been constructed for all positions with up to 7 pieces (including kings), and at some point in the not too distant future, we expect to have all 8 piece ones as well.


3 pieces is two kings and one other piece. That’s easy (I think it can be summed up as “draw if that piece is a knight or a bishop, won otherwise, unless the piece can be taken in move 1 or the starting position is a stalemate”)

It gets harder fairly rapidly as you add pieces.

What you describe is a tablebase. https://en.wikipedia.org/wiki/Endgame_tablebase:

“By 2005, all chess positions with up to six pieces (including the two kings) had been solved. By August 2012, tablebases had solved chess for every position with up to seven pieces (the positions with a lone king versus a king and five pieces were omitted because they were considered to be "rather obvious")”

Oh yes I forgot the kings are 2 already.

No intuition? What is intuition other than a position assessment heuristic? Game-tree AI must limit the game tree search at some depth. In order to exploit this, the opponent would need to identify the position strength at a deeper level than the AI’s search depth, which seems unlikely.

A game like 'go' is a better example of this type of endless tree depth where human intuition can out perform game tree searches.

I'm fine with labelling it "intuition" if you are able to avoid searching the game tree to sift potentially good moves and positions from bad, but in this case machines have intuition too now.

Which is why AlphaZero plays Go with similar performance to the best human players.

Is it really similar performance? Last I heard it was crushing all competition, human or otherwise?

It has crushed all human competition. They haven't made AlphaGo play against other computers (apart from versions of itself). And it's neither publicly available nor playing humans anymore.

However, a few other engines have re-implemented ideas from AlphaGo. Leela Zero is one of them.

Alpha Zero and the other open source deep learning Go engines seems to be able to give 2 or 3 stones handicap to the top human players.

In my experience, between two strong players:

2 stones means "I can lose some even games against you but not often"

3 stones means "I won't lose a single even game unless I'm drunk"

4 stones means "You really didn't understand this game yet"

Here's a particularly extreme example:


It's a mate-in-93 puzzle that's fairly accessible to humans, using abstract reasoning. But not chess engines. Comparing against the OP article, the main "technique"/"trick" is zugzwang (#7), but on a dramatic scale.

I think this is the kind of position you could use to stress-test a candidate puzzle solver, just because of the shear size of the solution.

Does anybody know if advanced chess/centaur chess (chess play where a human uses a computer for assistance) is still a thing/whether a human+computer combo is a meaningful improvement these days (i.e. last couple of years) over just a computer.

I can't find any recent advanced chess tournaments and though I see quotes of people saying that the combo is stronger than a computer alone, I haven't found any recent examples of a top tier engine by itself losing to a human + engine (e.g. Stockfish + human vs Stockfish).

I don't know about chess, but in the similar game Go, the very best centaur teams were at a similar or maybe even slightly higher level than engines until recently. This was due to cheese strategies, details of the rulesets and better extrapolation of intermediate results.

However, this changed a few years ago, when engines learned many of the tricks that the human could contribute. Since then, I believe pure engines are stronger in all practical applications.

Source: am national champion in centaur Go and worked on modern Go engines

Fully automated engines are now probably even with centaur teams.

See, for example, this great write-up: https://www.gwern.net/Notes#advanced-chess-obituary

A quick glance doesn't seem to give conclusive evidence that pure engine play strictly dominates centaurs (the footnotes only have tournaments where centaurs still win, but these tournaments are also getting a bit old).

The usual messaging I see around centaur-based styles such as certain correspondence chess tournaments is that you will lose if you just do "push-button play," that is just blindly do what the computer tells you to do.

I'm curious if that's no longer true with the new crop of ML engines.

You're absolutely right! Edited the gp from "strictly dominate" to "are probably even with". My memory of the piece was playing tricks on me. That was a serious error, thanks for catching it.

My best guess based on a rereading of the footnotes is that the performance ceiling for chess is probably low enough that it has been ~reached by both centaur teams and pure engines. So the two would have been operating neck-and-neck as of ~2017, with win rates largely determined by random-ish factors like human mis-clicks (also mentioned in the article).

I feel like I remember even in 2017 correspondence chess players in centaur-allowed matches had centaurs beating people who expected to just be able to set up a cluster and run an engine and copy its moves. And gwern's article seems to me pretty strong evidence that the best centaur players still held an edge in all the tournaments he listed.

I am very curious to hear how that's still possible (or to learn that in fact it is now impossible), especially in the post Leela/Alpha world.

There's a similar situation with Go where some positions utterly confuse bots trained on playing mostly normal games. There was this interesting research blog post on training a bot specifically to become good at solving one of these weird problems (Igo Hatsuyoron 120, the "hardest go problem ever")


It is worth noting that that problem is only understood by humans (as well as it is) after _centuries_ of study by many professional players. My understanding is that even very strong human players must study that problem for days/months/years to really understand it well.

So katago being unable to handle it without special training doesn't seem _quite_ as blind of a blindspot as the chess examples from the article seem to be (I suck at chess and I was able to understand one or two).

I'm not trying to undermine you mentioning this, in case it comes off like that, on the contrary I think the comparison is quite interesting. I'm curious if this is just a difference in go vs chess, or in the relative abilities of specific kinds of AIs to handle these, or maybe just differences in human ability to craft and/or understand different problem difficulties between the games.

Agadmator covered Kramnik v Leko 2002 (in 2018) titled "Invisible to Engines | One Of The Greatest Moves Ever Played" [0] which is worth a watch.

[0] https://www.youtube.com/watch?v=yGnpewUKP88

It seems dubious to show that engines are sometimes bad at evaluating positions by giving a position with three black bishops on black squares.

While unheard of, this is not illegal.

You could theoretically convert two pawns to bishops and have three black bishops. Nobody would do that as it is usual to convert pawns to queens, but it is within the rules for you to choose.

So if you plan to write chess engine it would be pretty stupid of you to not prepare it to face multiple black bishops. If I knew that it would give me a lot of advantage.

Underpromotions are one of those "toy problem" things. A human might finish a whole professional career having never once promoted a pawn to anything but a queen, but in theoretical problem positions - like the one you're talking about - they happen "all the time" because it's a fun twist.

So it's understandable for a machine not to even bother modelling these weird cases I think.

Statistics of it is irrelevant.

If I know a player is bad at certain types of situations I will be trying to force the player into those positions.

And chess engines are vulnerable because they don't learn and can be repeatedly exploited if you can find a way.

In poker this approach can cost you a lot of money. "I will try to force the player into those positions" actually means you're now doing things that aren't optimal for you, based on a possibly false assumption about the world.

I suspect but can't prove that for the positions we're talking about here (opponent's pawn should be promoted, but promoting to a Queen is terrible) in Chess you're putting yourself in a very similar situation, ie. to pull this off you're obliged to deviate so far from good strategy that you invariably lose anyway.

For chess engines playing each other we already know it doesn't matter because the chess engines play far more games than it would take to notice, some of them have exactly this "weakness" and others do not, and of course for chess engines playing humans it's a joke, the humans aren't competitive in that scenario.

edit: This comment is wrong and useless; I'm leaving it up to preserve the logic of the replies.

To quantify, here's raw counts from a representative sample of serious games:

    $ rg -Io '=\w' twic*.pgn | sort | uniq -c
        365 =B
       1551 =N
     122386 =Q
       1393 =R
I see knight (N) under-promotions pretty regularly on chess streams. The common pattern is there isn't a free tempo to promote the pawn, unless promoting the pawn creates an immediate threat that holds the attacker's initiative; and a knight at that particular moment creates that opportunity.

This obviously can't be trivially gleaned from those statistics - but would you say most of the bishop/rook promotions are just showboating, or are they done to avoid stalemates?

I, of course, understand why people would promote to a knight.

I'm an 1850 player and when I am playing under extreme time pressure (e.g. < 2 seconds remaining), I occasionally underpromote to a rook because I can execute a rook+king endgame mechanically faster than a queen+king without an accidental misclick causing a stalemate.

That's interesting. Is it just easier because you can't accidentally move to diagonals, or it's easier in your mind because you've seen it more? If you can even tell the difference.

Not OP but with rook and king you make a box and shrink it while keeping your king next to your rook, and stalemate is all but impossible. With queen and rook you have to get the queen to a knight's distance from your opponents king and mirror their moves until the edge of the board, and at the end you need a slightly bigger L (4 squares) to avoid stalemate while your bring your king around. In the former the two pieces work together, in the latter they work somewhat independently (with a quirk at the end) until the final sequence of moves and therefore there is a slightly larger risk of error. In practice you should drill this regularly to mitigate the risk.

>or it's easier in your mind because you've seen it more

I think the point OP is making is that if you need to use your mind at all with <2 seconds left you're going to lose. Pre moving safe legal moves without hitting a stalemate is definitely easier without a queen.

A rook promotion is something regular chess players will need to do to prevent stalemate, but it isn't that common where it is required (as opposed to someone who could win with a queen instead but it is trickery so beginners should go for a rook ). A Knight is the better promotion choice often enough that you should study tactics that lead to it, while a rook it is enough to do a stalemate check before promoting. There are tactics that only a bishop wins, but they are things you study for fun, not because you expect them.

Ahh you're right; I should have thought of that possibility! It turns out my data is more or less garbage; mea culpa.

In fact: all of the =R promotions I inspected were frivolous, and 4/5 of the =N promotions were frivolous. I'm surprised it's that bad. One scenario where this happens is "capture-promote, followed by immediate re-capture".

Props for trying it out even though the data was garbage.

I've seen a game by Alireza where he delivers mate by promoting to a knight. Arguably, the position was winning anyhow and standard promotion would also win, but the knight made it immediate checkmate.

(also this https://www.youtube.com/watch?v=3COS0_p3sfo)

Knight instead of queen offers different moves. Bishop instead of queen offers fewer moves. It's easy to imagine (and demonstrate, as you have) positions where a promotion to knight is advantageous, but it's hard to imagine positions where a promotion to bishop is advantageous except that it may confuse the opponent (computer or otherwise).

Well, on a little bit further reflection, possibly you may want to promote to bishop instead of queen if it helped you get to stalemate when you were confident you wouldn't win. Or as others said, to avoid the opponent having an opportunity to get stalemate.

Avoiding stalemate is a reason to underpromote, but it seems rather improbable that such a situation would come up twice in a single game.

I think it's worth noting the value here. The reason why these positions are interesting is that they're so obvious to humans, even ametuer humans, and yet difficult to even very strong computers. This implies that the way computers "understand" the game is radically different; an engine can't reason logically about a chess puzzle like a human can, for instance.

I would say that the computers are operating in a binary way, they don't have a "spectrum brain" which humans have various components to be able to analyze and see different scenarios. It is like humans have spatial awareness whereas computers are "one-track mind".

The linked article includes a concrete example of such a case wherein a GM underpromotes to a rook to avoid a stalemate against another GM.


I could imagine such a move making sense with the king on the backline such that promotion to queen would trigger a stalemate

If it is in a tournament, it could even happen for reasons not related to the position itself. Let's say you have a winning position, but you are in severe time pressure. You really need a few moments to stop and think to figure out possible stalemate swindles your opponent might have.

You are about to promote to a queen. You realize that both of your bishops are still on the board, and promote to a bishop instead of a queen.

That lets you stop the clock and summon an arbiter to go find another bishop of the right color. While the arbiter if off dealing with that, you get your much needed thinking time.

In rare cases this might happen, two times in a match very unlikely.

The only situation where this might be worthwhile would be a forced mate in consequence. Otherwise the Queen is worthy enough that you can afford the tempo loss.

> So if you plan to write chess engine it would be pretty stupid of you to not prepare it to face multiple black bishops. If I knew that it would give me a lot of advantage.

It would not.

Rybka (formerly the #1 chess engine) doesn't support promoting to bishops. The author chose an efficient move representation that couldn't handle more than four promotion types. Not quite the same situation, but I can be sure the difference is small even against an opponent trying to exploit it.

Isn't that a big advantage... if you can promote to bishop and the other player crashes, wouldn't you win by default or something?

So long as you don't crash the engine just evaluates when it happens .bishop promotions are rare.

Wait, there are only four promotions! Queen, rook, bishop, and knight – you're not allowed to keep a pawn on the back rank.

Good point. If I remember right, the engine used one state to represent "no promotion happening this turn". Of course it's redundant, because you can determine that from the rest of the representation, but apparently it was worth being able to determine that with a single bitwise operation.


> each move is easily encoded as a 16 bit value: 6 bits for origin square, 6 for destination square, 1 for en passant, 1 for castling, and 2 for promotion. The 2 promotion bits can only represent 4 options... and since the move encoding is for ANY move, not just promotions, one of the options is "no promotion"[0]

And another tidbit from that page: the same encoding is used for making moves and retracting them (going back up the search tree), so efficient encoding of "no promotion" is important there. E.g. If there's a white rook on e8 and the last move was e7-e8, when retracting it you need to know whether to put a pawn or a rook on e7.

[0] http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=1...

Actually now that I think about it, promoting to a second king would be pretty sick. Then you have a spare so losing the original doesn't lose you the game immediately. I could see an incredible game where someone uses this to salvage an otherwise unescapable checkmate.

Yes. It’s possible some engines represent bishop-on-black as a different thing than bishop-on-white, which would make 5 with bishops and 3 without.

There are other more realistic examples like the Nakamura game sacrificing 2 useless rooks to give the engine a false sense of winning. Then the engine sacrifices a valuable pawn to avoid a 50 move draw. Turn out the pawn was much more valuable than the 2 rooks in the particular closed position and Nakamura goes on to win. Basically, the only way to beat an engine is to try to create an unusual situation that the AI hasn’t practiced before.

But as others other out, AI suffers when it doesn’t have enough experience in a particular situation. So, the author is really just pointing out the extreme edge cases AI hasn’t mastered yet. But over time and getting these examples into the training process there’s no reason to believe that the AI couldn’t learn these situations as well.

Once you tell a human 'there's a tactic, go', given enough time we find it.

Is it though? If humans instantly realise that the three black bishops have some special significance and a chess engine does not then apparently the chess engine lacks some understanding that humans do have.

Sure, but it's 'meta' knowledge.

If, for instance, the name of a chess problem includes the solution in it, written in plain English, no current chess engine is going to pick up on this bit of 'meta' information - yet any English-speaking human trying to solve the problem would be trivially able to use it.

Does that make humans 'better' at chess than chess engines? Not really. Does it make humans 'better' at solving a given chess position, given some metacontextual clues? It does, but it is a bit contrived.

It's not exactly 'meta' knowledge, it's something that anyone with a few hours of experience in chess would be able to pick up on. It doesn't require knowledge of anything other than chess.

If an AI can't pick up on it then that suggests it lacks parts of what to humans would be a basic understanding of chess.

It does, however, seem to be the ‘secret sauce’ to humans’ ability to generalise our problem solving across domains.

It's not 'meta' knowledge, it's a basic rule of the game.

Promoting to a bishop is a rule of the game.

Solving a chess puzzle where there are for some reason three black-square bishops - recognizing that the bishops are important to the puzzle is meta-knowledge.

I misunderstood your point initially, but yes I agree. In a chess puzzle that was constructed yes it is meta-knowledge. But in the course of a normal game it is not. Importantly this meta-knowledge is not strictly required in either situation.

The distinction in my mind, was that the meta-knowledge here is not about the game of chess, but about the opponent. In the case of the puzzle this is meta-knowledge that the puzzle designer gave you a hint. In the case of Nakamura's game, his meta-knowledge is information about his opponent (the chess engine) and the heuristics it uses.

Chess engines only see the board state (at least to my knowledge), while humans have some kind of information about their opponent, or the history that lead to that board state. Humans have extra information, and sometimes that information is exploitable.

I agree. I’m shit at chess and even I can see the solution to the first problem presented.

If we’re just going to say, “no one will ever do that in real life” then the humans will just start finding AI weaknesses and how to maximally exploit them.

I play board games and I definitely “play my opponent”.

If you can force 2 surviving bishop promotions, you can force 2 queen promotions. There's no need for finesse when you can utterly crush instead.

The Hasek study in #2 is bugging me: Why can't Black simply play 1. ... Rh8 ? It looks to me like this would gain the one crucial tempo to win the game. None of the discussion of the study I've found seems to consider this move (And the online analysis engines are useless, as they don't properly understand that the original line is a draw, as the article notes).

ETA: Corrected typo, I originally wrote 1. ... Re8, which does not accomplish anything.

I think white just continues moving the king toward the center -- if black plays waiting moves, white can bring the king to c1-d1-e1 and then shuffle the king between e1 and f1. Eventually white will sacrifice the rook in the same pattern as the main line, and then play Ke2 -- there's no "wrong" square for the white king to be stuck on such that Ke2 would be unreachable, so black doesn't gain anything from the extra waiting moves.

Ah, sorry, I had a typo, I meant 1. ... Rh8, after which the rook sacrifice does not quite work the same way anymore. I agree that 1. ... Re8 does not help.

Ah, understood -- I spent some time trying to figure it out after seeing your clarification. You are right that if you try the same rook sac technique, white won't be able to get the white king over in time to defend g2.

It turns out there is another very impressive move white needs to spot: after 1...Rh8, white needs to find 2. Rf8! and now black has to either trade into the drawn king and pawn ending, or take the rook and give back the tempo that's just been saved. Now the white king is in time again. Quite a move, that should be listed as a sideline in the puzzle for sure!

Brilliant! That's the move I was missing.

Do you mean the Hasek puzzle which starts Kb1 for white? Rh8 doesn't change the approach for white here, in fact the move order is exactly the same. You want to block the rook with the king for one move so that the white king has enough time to protect g2 before the rook arrives.

So move order (as I see it) would start:

1. Kb1 Rh8, 2. Rh6 Kxh6, 3. Kc1... etc.

No, if you try that line, you will find that black gets to h2 one move earlier because its king did not move to g7 first, and therefore the white position collapses. But trykondev found a correct answer 2. Rf8!

Ah you're correct, my bad!

So I actually checked the problems listed.

In the "IQ Test #52" position (FEN: 8/1p1q1k2/1Pp5/p1Pp4/P2Pp1p1/4PpPp/1N3P1P/3B2K1 w - - 0 1) listed in #1 both LC0 and Stockfish play the correct line on my computer in seconds.

Both of them also play the right move Ba4+ in the second position of #1 "William Rudolph vs." (FEN: 8/1p1q1k2/1Pp5/p1Pp4/P2Pp1p1/4PpPp/1N3P1P/3B2K1 w - - 0 1) but they take quite a bit longer. Stockfish variants get it quicker.

Stockfish solves "Hasek vs." (FEN: r7/7k/5R2/p3p3/Pp1pPp2/1PpP1Pp1/K1P3P1/8 w - - 0 1) listed in #2 quickly. Both Stockfish and LC0 solve "Lazard=F vs." (FEN: q7/8/2p5/B2p2pp/5pp1/2N3k1/6P1/7K w - - 0 1) quickly.

Stockfish gets Bh3 from #3 "Veselin Topalov (?) vs. Alexey Shirov" (FEN: 8/8/4kpp1/3p1b2/p6P/2B5/6P1/6K1 b - - 2 47) in seconds (7-piece end game tablebases installed).

The next position from #3 "Spassky, Boris V vs. Byrne, R." (FEN: 3B4/1r2p3/r2p1p2/bkp1P1p1/1p1P1PPp/p1P4P/PPB1K3/8 w - - 0 1) is easy for both Stockfish and LC0. They both get 50. c5!! right away.

Stockfish also gets the last position from #3 "Stefan Brzozka vs. David Bronstein" in seconds (FEN: 1r6/4k3/r2p2p1/2pR1p1p/2P1pP1P/pPK1P1P1/P7/1B6 b - - 0 48) Rxb3+.

Stockfish and LC0 see Kd1 in "Lamford=P vs." from #4 (FEN: 8/8/8/1k3p2/p1p1pPp1/PpPpP1Pp/1P1P3P/QNK2NRR w - - 0 1) but believe Rg2 is also winning.

Stockfish gets c8N from "Randviir=J vs." (FEN: 5nr1/2Pp2pk/3Pp1p1/4P1P1/6P1/5K2/8/7n w - - 0 1) in #4 in about 2 minutes on my computer.

Stockfish gets Bf5 from "Simkhovich=F vs." (FEN: 8/8/2pk4/8/p1p3B1/PpP5/1P6/r1NK4 w - - 2 2) in #5 in seconds. LC0 also gets it.

The next two positions are mentioned as easier for programs and they are:

Qe3 from "Deep Blue vs. Garry Kasparov" in #6 (FEN: 1r6/5kp1/RqQb1p1p/1p1PpP2/1Pp1B3/2P4P/6P1/5K2 b - - 14 45) is very easy for both Stockfish and LC0.

Both also get "Vladimir Kramnik vs. Peter Leko" (FEN: 6k1/5p1p/P1pb1nq1/6p1/3P4/1BP2PP1/1P1Nb2P/R1B3K1 b - - 0 25) in #6 quickly.

"Matous=M vs." (FEN: n2Bqk2/5p1p/Q4KP1/p7/8/8/8/8 w - - 0 1) is indeed harder for Stockfish and LC0 than expected. I've confirmed mate in 13 in my mate solver.

In "Nigel Short vs. Vladimir Kramnik" (FEN: r3r1k1/1bp1Bppp/pb1p4/1p6/1P6/1BP2P2/P1P2PKP/R3R3 b - - 6 19) from #9, the engines like ...a5 more than ...c6. Hard to say that this move doesn't also win without more analysis.

Stockfish wants to play c6! and b4! from "Marwitz=J vs." from #10 (FEN: 2K3k1/1p6/R3p1p1/1rB1P1P1/8/8/1Pb5/8 w - - 0 1) right from the start. LC0 takes longer but gets it as well.

In "Anish Giri vs. Maxim Rodshtein" (FEN: 8/5pkp/3p1np1/Rpr5/8/6P1/PB3PKP/8 w - - 6 34) from #10 both Stockfish and LC0 like 34. h4 over 34. a4. More analysis would be needed to see if h4 is not also winning.

Last position "IQ Test #16" (FEN: 5k2/4bp2/2B3p1/1P4p1/3R4/3P2PP/2r2PK1/8 b - - 0 1) takes around 10 seconds for Stockfish.

In summary, chess engine might not really "understand" these positions but they solve them pretty well.

I don't think your analysis makes sense, because seeing the first move is not relevant in most of these problems. For example, in the Lamford study:


If you spot the computer the first twelve moves and create this position:


My Stockfish still fails to see the key idea of Rc2 followed by Ka2 (at least for a few minutes until I got bored) -- it just wants to shuffle the rooks around forever.

I'm sure that some engines given enough time will solve some of these, but your check isn't accurately assessing their capabilities.

No, my Stockfish 13 definitely sees the right solution to 8/8/8/1k3p2/p1p1pPp1/PpPpP1Pp/1P1P3P/QNRKRN2 w - - 0 1 in less than 10 minutes. The score shoots up to over +20 in 6 minutes and it sees both Rc2 and Qa2. I generally looked at the winning lines, not just at the first moves. Any other positions where you think I was wrong?


Maybe a more realistic scene than the fanbase gives it credit for. :)

If it weren't a well-studied gambit perhaps...

I think there are some engines (Crystal is the one I'm thinking of) which do well in fortresses; these come at the cost of play strength.

Presumably the next step is a synthesizer which can choose the appropriate engine to delegate to based on its own reading of the board.

Fortresses and the types of closed positions that confuse engines are exceedingly rare. The benefit of fortress detection is necessarily overtaken by the runtime cost.

Has anyone actually checked that a modern chess engine believes black is winning in the Penrose position? I find it very hard to believe.

Lichess cloud compute has that position. It still thinks black is winning at depth 59 (Stockfish 11 HCE).


Indeed, and same with Stockfish 13 running locally.

Stockfish 12 (280920) on my phone rates black as up 19.71

It looks like Stockfish still thinks black is winning. I'd be curious to see if Leela Chess Zero also thinks black is winning.

Is that 59 ply? 59 moves without a pawn move or capture means the game was drawn nine moves ago.

Here's what's confusing me: if the chess engine has an initial numeric value for the current position, and the positions going out to the end of its search all have the same value or worse, shouldn't it be pretty trivial to conclude it is not making progress?

Yes, I guess Stockfish doesn't bother to consider a draw on the 50 move rule. It does, iirc, go for a draw if it has no better via threefold repetition.

Stockfish does plan around the 50-move rule. Here's a quick demo (it is silly):


It's not visible on the board itself, but this position's 50-move rule counter is set to 98 ply -- two ply short of a draw. (That's the 2nd-to-last-field in the FEN representation -- the editable text field just below the board, and also in the URL). The only way to avoid that draw, as Stockfish does in fact suggest, is to a sacrifice a piece in a specific and ridiculous way, to compel black to re-capture on the 100th ply (captures reset the 50-move rule). If you edit the 50-move counter down, Stockfish instead shows you the simple mate-in-3.

Nakamura famously tricked a different chess engine, relying on this 50-move draw avoidance (as one component of the hack) -- a sibling thread is discussing that example:


Politics are about making up rules that others have to follow. May be AI still not good at playing this level of politics.

Yeah I would have thought these engines to be alot smarter then this, especially the “AI” ones

Nice try, chess.com. I'm not going to let you use all those cycles stolen by your site to win chess games :P

Quote: "Since IBM's Deep Blue defeated World Chess Champion Garry Kasparov in their 1997 match..."

Deep Blue lost in 1996. Its upgrade, called Deeper Blue is the one that won the famous match in 1997. Please SamCopland, do your homework.

Do you have a source? Deeper Blue was only ever an unofficial nickname. The upgraded Deep Blue was still Deep Blue.

Per Wikipedia "heavy upgrade". Same like google of today is the same name as the one launched at beginning of 00's. Do you really think that from technological perspective is the same search engine or a heavy upgrade? The only thing Deep Blue and Deeper Blue have in common is that they were both put in the same case and created by IBM, everything else is different

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact