By the end of the night, the bucket was empty. People learned to cheese the bot by running "out of bounds", so to speak -- normally in a 1v1, you're supposed to stay close to your opponent, since they'll be getting stronger if you leave. But the bot didn't know how to deal with it when you snuck behind it (normally an insane maneuver, almost guaranteed to cost you the game in normal play) and prevented his army (called "creeps") from running to the middle lane. (All you have to do is go wave your hand at them, and they mindlessly chase you.)
People did that over and over, and your own army would eventually overwhelm the bot and win. :)
Do you know what year that was? I'm wondering if AI has advanced and patched those weaknesses.
I feel comfortable saying that it’s awesome and being immodest, because I didn’t write it! None of those notes are mine. “Crossing the Channel” has one of the strangest tempos I’ve heard, yet it sounds so cool because I think the model made a mistake early on, and decided it wasn’t a mistake — instead, it came up with the most likely not-mistake it could think of, and it happened to sound great.
But I “wrote” all ~20 songs on that list in one night. Took about six hours, aka a standard workday. I don’t think it’s production-grade music - far from it - but it also wasn’t a production grade model. (On the other hand, it was trained by gwern, so I’m skeptical the production models of the future will capture all the little magic that he seems to imbue into his.)
I chose the instruments, I decided what sounded good, but I didn’t write a single note. It felt a lot like listening to someone jam out, and asking them to play different things. I was just the fella who was there to listen.
That’s where ML is going to shine. The future is going to be so exciting with the things you’ll be able to do.
But not in the current direction, I think. Currently, the goal is to factor the human out of the equation entirely. So the answer to your question is: even if the AI has “advanced”, its only because they added this case to the training set. If a human was at the helm, they would have annihilated you, because they’d see what you were doing and pilot the AI over to you.
I am skeptical this case is solvable, in a fully automatic way. IMO human empowerment will happen by arming actual humans with these models. (Depressingly, “arming” might have a few meanings depending on how AI war turns out, but that’s a different conversation.)
The problem I think AI generate music has to overcome is that it kind of meanders around without coming to any "conclusions". That said, perhaps this is the perfect music for wandering around a town in an RPG!
I don’t think it will be long before AI composers are better than all but the most talented humans.
In chess, these players are called centaurs. Per my limited knowledge of the current chess meta, they are considered to be much better than either people or AI in isolation.
That said, most board games are a very limited domain. Even more complex computer games prove more challenging than the current AI approach can handle without weaknesses (see comments about Dota on this page).
In complex real-world domains, humans can definitely still add much value relative to computers working by themselves.
"considered to be much better than either people or AI in isolation."
Then, last year, I stumbled across the following pair of articles. It's about the correspondence chess world championship, where both players are allowed to use engines, and (crucially) have a very long amount of time to analyse each position in-depth and consider long-range strategic implications of each move. The chap interviewed learned to exploit his opponent's over-reliance on the engine, and played in such a way that he was able to gradually accumulate small but compounding positional advantages that eventually gave him an edge. The whole time he used his own engine to catch tactical weaknesses in potential move sequences. A fascinating read.
Some of the other players surely try just running Stockfish overnight and taking whatever it says is the best move, and apparently that strategy isn't equally good.
Note that I think the "centaurs" are kind of playing a pretty different game than regular chess: they might have a knack for knowing when a particular engine is weak or strong and then trusting the corresponding engines lines more, or something like that.
I was eagerly hoping for a video of a centaur vs centaur tournament, but there seem to be none. https://www.youtube.com/results?search_query=centaur+chess
At least, not yet. Maybe someone could do a tournament with https://www.twitch.tv/gmnaroditsky or https://www.twitch.tv/gmhikaru? Actually, the Gotham Chess guy might: https://www.youtube.com/watch?v=O1b-cuPDBZo&list=PLBRObSmbZl... He's always looking for new angles for his Youtube channel.
Centaurism isn't applicable to every situation -- Starcraft's AI kicks the butts of every pro player, because you can scale up and overwhelm your opponent. But there are often already humans in the mix, in these models. The humans are just designing the loss functions or deciding what to model, rather than using the results. So it's a "delayed" mechanical turk in that sense.
When done right, it's so effective that it feels like cheating^Wthe future: https://twitter.com/theshawwn/status/1182208124117307392
Those are some of the nicest Stylegan outputs I've ever made. (Uh, do me a favor and ignore the Zeus one...)
Each of them were crafted. The process was to start with something, and that "something" often didn't need to be anything close. For example, if the photo is an old man, but stylegan's showing a kid, you turn up the "age" slider. Do that for every feature; it felt like the character creation screen in fallout.
Amazed it isn't an app yet. Artbreeder is nice, but it's not the same -- the key component was to have Peter Baylies' (follow him! https://twitter.com/pbaylies) reverse encoder as a button you can press. Whenever the model gets too far from what you're thinking, you press it, and it morphs the face back closer to the target photo. In the process, it might distort the age slightly, or make the chin a little bigger, but it's an anchor at sea; it's why you can nail your final result, every time.
I predict Centaurism might be popularized by gamedev. It's going to be pretty neat when some studio trains an RL algorithm vs someone's heart rate. Higher heart rate = more enjoyment, lots of the time, so you'd end up with either the funnest game or the scariest game you've ever seen.
Probably a decade away from that though.
Nah, it's because it has superhuman mechanics. If AlphaStar didn't rely so heavily on it's mechanics it might have produced interesting insights.
Everyone in the SC2 community was hoping for that, but it didn't really happen.
One streamer (Lowko I think) analyzed a game he played against AlphaStar where it would "macro" by going to some position where you could just see all the production structures (some by only a few pixels on the edge of the screen) and just click all the button needed for unit production in a few milliseconds.
Those Brood War bots had 36 thousand actions per minute and mainly due to technical limitations (the game couldnt accept more). And those were "perfect" clicks - not spam.
Limiting a bot to 200 actions (or even less) makes it more comparable to human, since at some point every click becomes a resource too. Even best players have to stop macro for some (relatively short) time. While AI with 36 000 clicks per minutr is not limited by time.
After a recent article about an ai driven text adventure game I seen on HN, Aidungeon, I decided to give it a try. The article was about some issues with the company and content filters or something, but my number one impression from trying it out was that it would make a great base for writing short fiction from.
At least that was what it seemed like to me. The first few times i tried it, playing it like a game, it was like wandering through a dream that kept changing. After that i figured out the different command modes and realized, it was far more useful as something of an idea generator where it's more like a collaborative writing partner than a dungeon master or something and I found it more enjoyable.
I think it would be a great tool for a stuck writer with a vague idea and a concept, but isn't sure where to take it, but as a game by itself, the human element is definitely missing. It seems to works best when you're the one controlling the narrative bouncing ideas off the ai as opposed to human crafted text adventures where the game is more in control.
> Tom Gruber, co-creator of Siri, wants to make "humanistic AI" that augments and collaborates with us instead of competing with (or replacing) us. He shares his vision for a future where AI helps us achieve superhuman performance in perception, creativity and cognitive function -- from turbocharging our design skills to helping us remember everything we've ever read and the name of everyone we've ever met. "We are in the middle of a renaissance in AI," Gruber says. "Every time a machine gets smarter, we get smarter."
I'm a big believer in Intelligence Augmentation, in the Engelbart tradition, over completely independent agents. I don’t think my ideal AI is one that has its own sense of agency and autonomy, rather I think of technology as an extension of myself, one that enhances my autonomy.
Your argument is basically good old god of the gaps. You look at what the state of the art in technology cannot do right now, and base assumptions on that without really delving into the issue. 10 years ago you would've included stuff like image classification and natural language generation in the "only humans can do this" bin.
It won't get you quite as far, since I had to discover that you can prompt it with chords, and and how to set the instruments. But it's pretty good, and maybe someone will discover some new way to make it better.
I don't think this is controversial at all, rather it's what Donald Michie called Ultra-Strong Machine Learning back in the 1980's .
Briefly (and with modern interpretations) Michie defined three "levels" of machine learning system: Weak, Strong and Ultra Strong.
A machine learning exhibits Weak machine learning ability when it is only capable of improving its predictive accuracy when trained on data.
A machine learning system exhibits Strong machine learning ability when it satisfies the Weak machine learning criterion and can additionally output its model in symbolid form that is readily understandable by humans.
And a machine learning system exhibits Ultra Strong machine learning ability when it satisfies the Strong machine learning criterion and can additionally instruct the human user so as to improve the human user's performance.
In a sense, not a bicycle, but more like a jetpack, for the mind. Unfortunately we don't have anything like that today. Yes, I'm aware of claims that Go players have learned a lot by observing AlphaGo play. But AlphaGo/Zero/blah is not capable of instructing a human directly. Michie envisioned Ultra Strong Machine learning as a kind of coach basically, or teacher, an AI teacher.
 Michie, D. (1988). Machine learning in the next five years. In Proceedings of the third European working session on learning (pp. 107–122). Pitman.
Online here (but behind a paywall):
Full disclosure: I'm one of Donald Michie's grand-students; he was my thesis advisors' thesis advisor. Additionally, Michie was besties with Alan Turing, so I have the privilege of being a grand-nothing of Turing's :P
The way it works is, there’s something called ABC notation, and Gwern found a big ass-database of Irish folk songs. It has a “title” field along with tempo, ID, etc. I would fill in what key I wanted, give it the first chord, and let it go. (If you don’t give it the chord, it generates ABC songs that are all kind of boring piano pieces with no background chords. It was amazing how much difference that made.)
So, not only did it choose the song names, but it couldn’t not choose a song name. It’s the only thing it knew. Its whole world was Irish ABC folk music, and as far as it was concerned, the title was as important as the notes. It couldn’t know it wasn’t.
Ah, I figured out what had been bothering me. I’m pretty sure I chose “For Ireland!” and possibly Blackbird, but e.g. Crossing the Channel was a GPT original, I believe. I was aiming to make it feel like an oldschool FF3 type game, so For Ireland was the battle song, Blackbird was the name of the airship you’d make your daring escape on at the end, Marco’s shadow was the assassin team hired by the empire to take Marco out, etc.
You don't 'patch' a reinforcement learning model, you just give it data to train on.
they're way too complex, even in above mentioned example it was 1vs1, meanwhile in general those games are 5vs5 (lol at least)
There were restrictions / rules on the game (only a pool of 18 heroes were allowed), but they they won both games in a best of 3. You can see the progression / evolution of the training of the AI here: https://openai.com/projects/five/
It's definitely doable to train them to be good at extremely complex games like Dota / League, it's just that the resource requirements to train the engine are significant. After the bots were opened to the public, they had a 99.4% win rate against pubs, even accounting for cheese strats.
In league of legends there's almost 150 heroes
not only the whole draft phase makes giant amount of possibilities if you apply combinatorics, then also in game itself, the amount of possibilities that the draft phase itself results is difficult to imagine
I didn't play much Dota, I'm LoLer, so I don't understand those limitations:
Pool of 18 heroes (Axe, Crystal Maiden, Death Prophet, Earthshaker, Gyrocopter, Lich, Lion, Necrophos, Queen of Pain, Razor, Riki, Shadow Fiend, Slark, Sniper, Sven, Tidehunter, Viper, or Witch Doctor)
No Divine Rapier, Bottle
5 invulnerable couriers, no exploiting them by scouting or tanking
But good to know that I'm relatively close to being 100% wrong here, thanks.
For the limitations, divine raiper and bottle are items with unique interactions. Divine rapier drops on death, and bottle has unique interactions with elements on the map that require interacting with the environment in a specific way.
Illusions are the same as LeBlanc's Mirror Image skill. Just copies of the hero that either deal no or limited damage, but can't cast spells. Likely they'd have to train the model to include a evaluator on the likelihood of a unit being an illusion rather than a hero.
Couriers bring items from the shop in base to your hero, so you don't have to return to shop. They are killable however, and if they are holding items when they are killed those items will be inaccessible for 3 minutes. They made them invulnerable because the bots would re-buy items that were inaccessible. Invulnerable couriers can be exploited however, thus the rule.
Scan is a global ward that only tells you whether or not someone is in the area (doesn't give you vision), and lasts 5 seconds.
It certainly beat OG in many aspects, but the beauty of dota is the ability to adapt in the game with different strategies, strategies which weren't possible with the restrictions.
Not too long ago, I think some people would have said the same about go.
also those games are waaaaaaaaay more complex than e.g chess
I guess we'd need giant advance in computational power in order to analyze hundreds of thousands of matches
Plays "normally" (cautiously since the bot is so good) until 7:00 accumulating gold for enough items to more safely eliminate the creeps: https://www.youtube.com/watch?v=R85WVTFPmRE&t=420s
I think these kinds of adversarial examples will be extremely common in production models. People won’t be crafting images that fool the model into thinking you’re a stop sign; they’ll discover that when the human isn’t paying attention, you can run in front of a Tesla with a group of friends and it veers into oncoming traffic. (Terrible made-up example, but I’m pretty sure that it’s a losing game to play “can we think of all possible cases we need to train for ahead of time?”)
You can watch Black beat the bot in a fair game here, though, which I find immensely satisfying: https://youtu.be/qov1NXsTSbs?t=88 IIRC he was one of only like... five? (less?) who managed to win one game.
>A good example of human exploitation of the engine's failings is GM Hikaru Nakamura's defeat of Rybka in the following three-minute blitz game from the Internet Chess Club. Nakamura cleverly locks the position so that progress is impossible for Rybka, then he offers two exchange sacrifices to the engine. With the position locked, the engine's rooks have no value, but the engine thinks it has a winning material advantage. With a draw by the 50-move rule approaching, the engine sacrifices a pawn to avoid a draw, but this proves a huge mistake as Nakamura is then able to win the game, an incredible achievement!
That rule isn't the engine's constraint, that's a rule of chess. 50 moves without a pawn move or piece capture (basically "irrevocable progression of the game") is a draw.
The chess engine thought it was winning, so it wanted to avoid a draw, even if it meant putting itself in a still winning but otherwise worse situation. Unfortunately for the chess engine it was mistaken, in the original position it wasn't winning, but drawing, and in the new position it wasn't still winning, but losing.
A player may call a draw if they wish.
This game was also in 2008, I don't think cell phones could even beat all humans beings back then.
These days a human couldn't draw against a cellphone, contrary to the article.
So it lost a “small” amount of material which transposed the game from one that was an obvious draw to one that was obviously winning for Nakamura. That’s the “contempt” effect that Nakamura kept referring to: the computer would rather continue the game in a slightly-less-good position than allow a draw from a better position.
Nakamura exploited the fact that it was wrong about its assessment due to the horizon effect. A traditional chess engine evaluates a position by starting from the current position and evaluating promising moves downward along the tree. In a locked-up position like the one linked, the computer has to evaluate an enormous number of positions, none of which will materially affect the game due to the inability for either side to force meaningful progress.
A human can understand the themes in a position and understand where they want the pieces to go, then work backward from there to see a path to achieve it. This is particularly easy when the opponent has no achievable setup of their pieces where they could counter you once you’ve achieved that position. In such a situation, you can (almost) completely discount any moves they make and only consider the path for you to get to your ideal position. Even an beginner-intermediate chess player (1600+) could analyze this type of position easily.
This was such a situation. Rybka had no moves to progress things (other than the game-losing sacrifice Nakamura wanted to induce). So Nakamura didn’t have to evaluate any of Rybka’s moves, he just had to put his pieces in the right place to be ready for the inevitable blunder when Rybka decided to sacrifice a pawn to keep its “winning” position thanks to contempt.
So can a computer though. This is an interesting edge case, but it seems to me that this particular bot mostly evaluated odds of winning by the pieces, not by the position. A more modern algorithm will actually use the positions on the board.
What would a human do if told "you're up by two rooks, do you want to allow a forced draw?" without being allowed to look at the board.
Traditional engines have been designed under the premise that it’s better to have an extremely fast evaluation function you can apply deeply down the tree than it is to have a slower, more thorough evaluation function you can’t apply to as many nodes. Any extra processing budget is spent on more intelligent pruning of the tree of moves that don’t appear likely to yield anything promising.
So an engine will look at a position like this, see that it has a sizable material advantage, and doesn’t give as much weight to the obvious positional conclusion that it has no productive path forward. It also won’t understand that “just” sacrificing a pawn turns the position into one that’s completely lost due to the chain of passed pawns it eventually results in.
Yes, the engine could be improved to understand this type of position, but without extreme care that easily results in it taking longer to evaluate every other position. The net effect is an overall weaker engine, unless you find a way to improve the evaluation without slowing down the evaluation function.
That's the crux of it though isn't it? These are complaints about a specific approach, not generally of computers, and not even of the best demonstrated models.
I'm also not entirely sure this is the correct way to think about it. If the function to evaluate the accept-a-draw decision is separate from the next-move decision. Then if the accept-a-draw is refused, doing the poor pawn sacrifice is, without question, the best available move. My interpretation is that the accept-a-draw is made in a vacuum here.
Edit: looks like in the video Nakamura says the contempt factor was too high, and later it was changed.
Also, an engine without an endgame tablebase can be pretty stupid. There are certain rules one can deduct when there are few pieces left, but a min/max engine will search forever, not knowing the patterns.
Just to observe, humans display exactly the same phenomenon of ignoring a move before it's made while still being able to realize, after it's made, that it was very strong and ignoring it was a mistake.
Which... suggests the computer displays intuition
OP's choice of words is inexact. The engine has a different intuition, not none. Humans will usually explore trades and overlook seemingly "slow" moves that the computer usually finds.
What the engine does lacks is what chess players call "theory," hence the need for endgame DBs. Humans can also beat engines with "logic," like the Naka match. That one was extraordinary. A simpler & more typical example is isolating a pawn so that it can't ever be defended, but also can't be attacked immediately. For a human, it's easy to understand this is long term vulnerability even if the pawn survives for 20+ more moves.
Anti-computer strategies, ironically, force you to "play the player, not the board."
Furthermore, heuristics are rules, while intuitions are not. Sometimes people use their intuition to go against what a heuristic suggests!
A machine which is just pruning a search space with a low-quality valuation strategy isn't doing anything very impressive; only very fast.
A human with a very high quality valuation technique, but a slow search pace, is relevantly impressive.
They categorize moves into things like blunder, mistake, inaccuracy, good, best.
But the top move is 'Brilliant' and the second best is 'Best Move.' Best move is the move the chess engine thinks is the best, and brilliant is the category for moves that weren't considered, but turn out to be the best. They're uncommon, but if you play enough, you'll hit them occasionally.
Two plausible theories are
1) They run an evaluation of the position to a certain depth and come up with the Best move. Then, when evaluating your opponent's position if you didn't make the Best move, it turns out that it evaluates this position as even more in your favour than the Best move, it calls your previous move Brilliant.
2) They run an evaluation of the position up to a certain depth and note the Best move as the most advantageous one. They then search a bit deeper and if the move you actually took wasn't the best in the original eval but now is, they call it Brilliant.
There would of course be a few caveats to both ideas. In 1 do they go and rerun the evaluation after the Best move to compare? Are they running the whole analysis backwards anyway (this seems to cause problems with both systems)?
My suspicion would be number 2 given the way it's phrased in your link, but I've seen other descriptions that make it sound more like 1. My overall takeaway is that it's simply a bad idea to have the category to begin with as it just confuses matters for little gain, at at least they should provide a more technical description. I'd wager that close to 0% of Brilliant moves were intentionally calculated so there was no brilliancy as the term is usually used in chess.
On the contrary, by your own description:
> Something about the heuristics used to prune the vast search space can make it miss [these].
it does have a intuition. And in these cases, its intuition is wrong.
Rather we should start with what known intelligent systems (ie., animals) are actually doing, and then compare.
I think it's extremely unlikely the mechanism of intuition in relevant animals (esp. humans) has anything to do with heuristics for pruning a search space.
I think it's very likely that intuition in animals will have similar mechanisms to the things that produce the same results in computer reasoning, since the problems are the same and so optimisation processes (whether evolutionary or designed) will likely converge on the same solutions.
For example, consider a circuit. Suppose we attribute TRUE to 3v, and FALSE to 0v. Then two simultaneous input of 3v, 3v giving 3v (and so on: 33->3, 03->0, 30->0, 00->0) can be said to compute AND.
But the same physical system is at the same time "computing" OR. Here we attribute TRUE to the 0 signal.
Any "computer" is actually simultaneously running many, quite radically different, programs. We only attribute one program to it in how we rig devices (screens, keyboards, etc.) to interpret signals to mean something device-relative. If you "plug a speaker in a gfx socket" the output isn't visual. It means something completely different, and the GPU is computing audio now.
So "computational" algorithms in being merely purely formal specifications do not describe any unique physical system, and only apply to physical systems in an observer-relative way. Computation is explanatorily bankrupt; it is only a game people play in rigging physical systems to behave as they wish.
Animals however have the intrinsic property of intelligence: it isn't observer-relative. We are, quite literally, the observer imparting meaning to our environment. Our brain states resolve to unique thoughts, not to "many radically different ones simultaneously". Gold isn't both gold AND something else, and neither is thinking.
The category of explanation for these abilities which are intrinsic to animals cannot be, and is not, computational. It's scientific. Involving characterising the properties of materials, and their causal effects.
I'm sure if you spliced an animal's nerves to different places, that would make them "mean" something different. In fact we know that this kind of thing happens in nature, e.g. the star-nosed mole's visual cortex is connected to the parts of its nose that sense touch rather than to its eyes.
> Animals however have the intrinsic property of intelligence: it isn't observer-relative. We are, quite literally, the observer imparting meaning to our environment. Our brain states resolve to unique thoughts, not to "many radically different ones simultaneously". Gold isn't both gold AND something else, and neither is thinking.
How do you know? What would you expect to be different if it was? Often two people do observe the same piece of animal behaviour but disagree about whether it shows the animal was thinking/intelligent.
That's sort of right, yes. But if you put their nerves in the wrong devices, animals wouldn't be intelligent at all. And these nerves are themselves plastic, ie., they arent like a CPU with a limited set of physical operations; nerves are part of the body and grow.
So intelligence is really bodily. Intelligence is something that occurs because your brain-motor-body system is highly plastic (at the celluar and whole-body level) and is capable of adapting itself to its enviornment competently in ways that we call "intelligent".
It isnt any computation properties of the brain which do this (if we impart any, we cannot impart them uniquely anyway, ie., they cant explain much). Rather it's the plasticity of the body (including the brain) which provides "the right device" for intelligence.
> Often two people do observe the same piece
Well it is literally the case that I showed a CPU running a "program" to several alien races, none could tell me what program it was running. Each could provide an essentially infinite number of different answers, depending on how they interpret the electrical signal. A program is a purely formal thing that describes no physical system uniquely.
Whereas if I showed animals to these aliens, they could actually describe what processes constituted their intelligence.
And likewise, if i showed them Gold; they could tell it apart from Silver.
Insofar as a really-existing digital computer has anything akin to intelligence, its because of how its devices behave. Ie., if you showed aliens the LCD, then they could say "well the LCD output is particular, and is intereptable 'this way'.".
We are easily fooled by the devices we attach to CPUs into thinking that the CPU is "computing" the device output. It isnt. We have just created an illusion: the keyboard button says '1', '+', '2' and the LCD says '3' and we think the CPU has computed 3.
But, looking at the CPU itself, it has actually "computed" an infinite number of things. The '1' signal could be interpreted abitaribly.
My point here is that talking about "computation" is a deeply unhelpful way to understand anything. It's a profoundly non-explanatory category (it cannot explain any physical system; it is just a formal specific we follow).
Once we throw that away. We can see more clearly that what enables animal intelligence is the character of their "devices". The physical plasticity of their tissues (their growth and adaption), their motor skills, and so on.
It is possible to describe a tiny subset of intelligence ("reasoning") by a formal system; but that misses why animals can reason in the first place. You don't get the capacity for reasoning by just making electrons dance in an observer-relative way. That requires the observer to already have that capacity.
The capacity itself is a property of the physical system, "the animal".
> But, looking at the CPU itself, it has actually "computed" an infinite number of things. The '1' signal could be interpreted abitaribly.
That's not really true. We could very easily connect up the CPU to flash a light 3 times, or use a speech synthesizer to say "three". The reason we call these systems "computers" is that we built them to compute like (human) computers, that is, to evaluate mathematical expressions. Understanding computation is useful, even essential, in understanding the behaviour of these systems, just as understanding language and grammar is useful in understanding the pattern of sounds that a person might make under particular circumstances, even though the relationship between particular sounds and particular meanings is arbitrary or at least underdetermined.
To clarify, computers play chess by searching exhaustively large game trees. Humans, on the other hand, do not. "The problem is the same", but the solutions are not. So we have no reason to assume that "the problems are the same" means that "so are the solutions", for animals and machines, and we have evidence to the contrary.
That's less and less true with more recent chess engines.
> So we have no reason to assume that "the problems are the same" means that "so are the solutions", for animals and machines
The optimal solutions are the same. Human chessplaying has not had millions of years of evolution go into it.
Search is still the basis of AI chess playing. It's either alpha-beta minimax or Monte Carlo Tree Search (used by Stockfish and Leela Chess, respectively).
Game-tree search algorithms like minimax and MCTS need an evaluation function to select the move that leads to the best board position and some modern engines train neural networks to estimate these evaluation functions. For example Leela Cheess is based on AlphaZero (I think) that popularised this approach and Stockfish has recently adopted it. But chess engines still use a good old game-tree search algorithm.
>> The optimal solutions are the same.
We don't know that. We have programs that can beat humans at chess, but whether they play optimally, or what constitutes "optimal" play in chess, that's difficult to say.
Humans will definitely consider lists of possible moves and countermoves and what kind of boardstates will result, if you define that as "search" then it seems fair to say that humans search too. A human searches less deeply and gains more of their playing ability from being good at evaluating the resulting positions - but as you say, that's also the direction that computer chess engines are moving in.
Computers, as I said earler, play by searching "exhaustively". "Exhaustive" can mean different things and I apologies for introducing such vagueness, but let's say that computer players spend all their resources, during play, to search a game tree. Learning evaluation functions, or memorising good board positions during self-play is done at an earlier, training stage, but it is also a form of search. So both for training and playing it is not "less and less true" that "computers play chess by searching exhaustively large game trees".
I agree that it's a bit confusing because if you read press announcements by e.g. DeepMind after AlphaGo's win against Lee Sedol, you would indeed get the impression that the field is moving away from search, towards something magickal that can only be performed by deep neural networks and that obviates the need for a good, old-fashioned search. But this is just marketing, though of course very unfortuante such. In truth, search still reins supreme in game AI.
One point of confusion is what we call "search". There is one kind of search that is sometimes called "classical search" and that includes algorithms that search explicitly or implicitly defined search trees with nodes that represent search candidates. Minimax and MCTS are this kind of "classical search" as is e.g. Dijkstra's algorithm or binary search, etc. When I say that "search still reins supreme in game AI", I mean classical searh. In machine learning, under PAC-Learning assumptions, a "search" is anything that selects a generalisation of a set of examples, i.e. a hypothesis, from a set of candidate hypotheses (a "hypotheis search space", or sometimes just "hypothesis space"). So gradient optimisation-based methods, like neural networks, do also use search - they search for the model that minimises error on the training or testing set, etc.
For an early discussion of search in machine learning see Tom Mitchell's "Generalisation as search":
And for machine learning without search, stand by for my upcoming thesis :)
We don't know everything about how humans play chess, but as a chess player certainly at least some of the time a human player will consciously enumerate possible moves (or particular subsets of possible moves) and countermoves and think about the position after each one in turn. At least I do.
> Learning evaluation functions, or memorising good board positions during self-play is done at an earlier, training stage, but it is also a form of search.
If you define "search" this broadly then essentially any way of playing chess would constitute search?
Famously warned against years ago by Drew McDermot:
However, in AI, our programs to a great degree are problems rather than solutions. If a researcher tries to write an "understanding" program, it isn't because he has thought of a better way of implementing this well-understood task, but because he thinks he can come closer to writing the first implementation. If he calls the main loop of his program "UNDERSTAND", he is (until proven innocent) merely begging the question. He may mislead a lot of people, most prominently himself, and enrage a lot of others.
Artificial Intelligence meets Natural Stupidity, Drew McDermot, MIT AI Lab, Cambridge, Mass, 1981.
A rock rolling down a hill isn't using the "geodesic heuristic" either.
We impart to physical systems we have designed our intentions in designing them: we interpret so-and-so field oscillation as a "heuristic". But this is an observer-relative designation.
By the same observer-relative gesture, the rock has its heuristics too.
Animals however actually have heuristics; ie., dynamical interior models of their environment which are analysed to prescribe action.
Animals in being present within environments and having genuine interior reasoning/imagination/intutition etc. existi in a different relationship to "the chess game".
They aren't like rocks just "following the geodesic" (they do that at the atomic level, sure). But relevant heuristics here are genuine interior processes which are dynamically attached to environments.
There's also the positive horizon effect: you do the right thing, but for the wrong reason fe because there's a refutation that you didn't see. However, the refutation is wrong, and a few forced moves later you do spot why.
It would be very rare for the computer in the first place not to have a strong move as one of its candidate entries. Obviously the evaluation can shift but the difference shouldn’t be apparent to a human player and it would be even more unlikely that a computer would reject a top move but a human would find it.
Humans will tend to explore exchanges, faster & more forceful lines. Engines treat these like any other line, without prejudice. I'm a pretty average player. My positional moves aren't generally reasoned beyond 2-3 moves ahead. I go here. You do something, then I go there. Where I do reason deeper, it's usually forceful lines where a lot of material is traded. I go here. We trade 3 pieces. I attack this. Opponent defends, then I move there.
In a slower position, I can't reason ahead as much because I have no idea what the opponent might do. He has too many options.
i'm not sure if i'm understanding 100% what you're saying but don't most engine heuristics try more potentially favorable first, such as moving a pawn? i think most modern engines have pretty aggressive move ordering
Chess engines tend to find good moves that seem really passive to humans. They don't create a threat or force an exchange. The purpose of the move may not be that complex, the payoff will become obvious 2-3 moves later. A human just wouldn't have thought to even explore it. There are no hints (like threats and exchanges) that it is worth exploring.
That's what I meant by prejudice.
Conversely, even in very option-rich positions... humans will explore the more active lines, and calculate material exchanges more deeply than other moves. If a normal player is calculating 7 moves deep in the mid game, that probably means a lot of material is being exchanged.
Remember that an equally matched engine is exploring for more moves than you are. The human is better at deciding which move to explore.
What is an example of a realistic endgame (not something with 9 white pawns, 5 black knights and such) that chess engines are easily beatable in?
the AI will still avoid losing, but it also won't be able to find the endgame strategy to win
You don't need to watch WC game to witness that. Any position sharp enough will have swinging evaluations at the search horizon, and even amateurs can find a twist in their favor if they're lucky enough (to have their 8-moves ahead flash of genius actually work 20 moves ahead by complete chance).
In decent engines the heuristics used to prune the search tree range from exact (e.g. alpha-beta prunning) to conservatively innexact (low chance to actually miss something), and unless you find a systematic winning or drawing pattern (like the examples in OP's article), you can't beat the engine with your "intuition".
With three pieces there’s about 250000 positions, can’t a modern computer just check them all?
It gets harder fairly rapidly as you add pieces.
What you describe is a tablebase. https://en.wikipedia.org/wiki/Endgame_tablebase:
“By 2005, all chess positions with up to six pieces (including the two kings) had been solved. By August 2012, tablebases had solved chess for every position with up to seven pieces (the positions with a lone king versus a king and five pieces were omitted because they were considered to be "rather obvious")”
Which is why AlphaZero plays Go with similar performance to the best human players.
However, a few other engines have re-implemented ideas from AlphaGo. Leela Zero is one of them.
In my experience, between two strong players:
2 stones means "I can lose some even games against you but not often"
3 stones means "I won't lose a single even game unless I'm drunk"
4 stones means "You really didn't understand this game yet"
It's a mate-in-93 puzzle that's fairly accessible to humans, using abstract reasoning. But not chess engines. Comparing against the OP article, the main "technique"/"trick" is zugzwang (#7), but on a dramatic scale.
I think this is the kind of position you could use to stress-test a candidate puzzle solver, just because of the shear size of the solution.
I can't find any recent advanced chess tournaments and though I see quotes of people saying that the combo is stronger than a computer alone, I haven't found any recent examples of a top tier engine by itself losing to a human + engine (e.g. Stockfish + human vs Stockfish).
However, this changed a few years ago, when engines learned many of the tricks that the human could contribute. Since then, I believe pure engines are stronger in all practical applications.
Source: am national champion in centaur Go and worked on modern Go engines
See, for example, this great write-up: https://www.gwern.net/Notes#advanced-chess-obituary
The usual messaging I see around centaur-based styles such as certain correspondence chess tournaments is that you will lose if you just do "push-button play," that is just blindly do what the computer tells you to do.
I'm curious if that's no longer true with the new crop of ML engines.
My best guess based on a rereading of the footnotes is that the performance ceiling for chess is probably low enough that it has been ~reached by both centaur teams and pure engines. So the two would have been operating neck-and-neck as of ~2017, with win rates largely determined by random-ish factors like human mis-clicks (also mentioned in the article).
I am very curious to hear how that's still possible (or to learn that in fact it is now impossible), especially in the post Leela/Alpha world.
So katago being unable to handle it without special training doesn't seem _quite_ as blind of a blindspot as the chess examples from the article seem to be (I suck at chess and I was able to understand one or two).
I'm not trying to undermine you mentioning this, in case it comes off like that, on the contrary I think the comparison is quite interesting. I'm curious if this is just a difference in go vs chess, or in the relative abilities of specific kinds of AIs to handle these, or maybe just differences in human ability to craft and/or understand different problem difficulties between the games.
You could theoretically convert two pawns to bishops and have three black bishops. Nobody would do that as it is usual to convert pawns to queens, but it is within the rules for you to choose.
So if you plan to write chess engine it would be pretty stupid of you to not prepare it to face multiple black bishops. If I knew that it would give me a lot of advantage.
So it's understandable for a machine not to even bother modelling these weird cases I think.
If I know a player is bad at certain types of situations I will be trying to force the player into those positions.
And chess engines are vulnerable because they don't learn and can be repeatedly exploited if you can find a way.
I suspect but can't prove that for the positions we're talking about here (opponent's pawn should be promoted, but promoting to a Queen is terrible) in Chess you're putting yourself in a very similar situation, ie. to pull this off you're obliged to deviate so far from good strategy that you invariably lose anyway.
For chess engines playing each other we already know it doesn't matter because the chess engines play far more games than it would take to notice, some of them have exactly this "weakness" and others do not, and of course for chess engines playing humans it's a joke, the humans aren't competitive in that scenario.
To quantify, here's raw counts from a representative sample of serious games:
$ rg -Io '=\w' twic*.pgn | sort | uniq -c
I, of course, understand why people would promote to a knight.
I think the point OP is making is that if you need to use your mind at all with <2 seconds left you're going to lose. Pre moving safe legal moves without hitting a stalemate is definitely easier without a queen.
In fact: all of the =R promotions I inspected were frivolous, and 4/5 of the =N promotions were frivolous. I'm surprised it's that bad. One scenario where this happens is "capture-promote, followed by immediate re-capture".
(also this https://www.youtube.com/watch?v=3COS0_p3sfo)
Well, on a little bit further reflection, possibly you may want to promote to bishop instead of queen if it helped you get to stalemate when you were confident you wouldn't win. Or as others said, to avoid the opponent having an opportunity to get stalemate.
You are about to promote to a queen. You realize that both of your bishops are still on the board, and promote to a bishop instead of a queen.
That lets you stop the clock and summon an arbiter to go find another bishop of the right color. While the arbiter if off dealing with that, you get your much needed thinking time.
The only situation where this might be worthwhile would be a forced mate in consequence. Otherwise the Queen is worthy enough that you can afford the tempo loss.
It would not.
Rybka (formerly the #1 chess engine) doesn't support promoting to bishops. The author chose an efficient move representation that couldn't handle more than four promotion types. Not quite the same situation, but I can be sure the difference is small even against an opponent trying to exploit it.
> each move is easily encoded as a 16 bit value:
6 bits for origin square, 6 for destination square, 1 for en passant, 1 for castling, and 2 for promotion. The 2 promotion bits can only represent 4 options... and since the move encoding is for ANY move, not just promotions, one of the options is "no promotion"
And another tidbit from that page: the same encoding is used for making moves and retracting them (going back up the search tree), so efficient encoding of "no promotion" is important there. E.g. If there's a white rook on e8 and the last move was e7-e8, when retracting it you need to know whether to put a pawn or a rook on e7.
But as others other out, AI suffers when it doesn’t have enough experience in a particular situation. So, the author is really just pointing out the extreme edge cases AI hasn’t mastered yet. But over time and getting these examples into the training process there’s no reason to believe that the AI couldn’t learn these situations as well.
If, for instance, the name of a chess problem includes the solution in it, written in plain English, no current chess engine is going to pick up on this bit of 'meta' information - yet any English-speaking human trying to solve the problem would be trivially able to use it.
Does that make humans 'better' at chess than chess engines? Not really. Does it make humans 'better' at solving a given chess position, given some metacontextual clues? It does, but it is a bit contrived.
If an AI can't pick up on it then that suggests it lacks parts of what to humans would be a basic understanding of chess.
Solving a chess puzzle where there are for some reason three black-square bishops - recognizing that the bishops are important to the puzzle is meta-knowledge.
The distinction in my mind, was that the meta-knowledge here is not about the game of chess, but about the opponent. In the case of the puzzle this is meta-knowledge that the puzzle designer gave you a hint. In the case of Nakamura's game, his meta-knowledge is information about his opponent (the chess engine) and the heuristics it uses.
Chess engines only see the board state (at least to my knowledge), while humans have some kind of information about their opponent, or the history that lead to that board state. Humans have extra information, and sometimes that information is exploitable.
If we’re just going to say, “no one will ever do that in real life” then the humans will just start finding AI weaknesses and how to maximally exploit them.
I play board games and I definitely “play my opponent”.
ETA: Corrected typo, I originally wrote 1. ... Re8, which does not accomplish anything.
It turns out there is another very impressive move white needs to spot: after 1...Rh8, white needs to find 2. Rf8! and now black has to either trade into the drawn king and pawn ending, or take the rook and give back the tempo that's just been saved. Now the white king is in time again. Quite a move, that should be listed as a sideline in the puzzle for sure!
So move order (as I see it) would start:
1. Kb1 Rh8, 2. Rh6 Kxh6, 3. Kc1... etc.
In the "IQ Test #52" position (FEN: 8/1p1q1k2/1Pp5/p1Pp4/P2Pp1p1/4PpPp/1N3P1P/3B2K1 w - - 0 1) listed in #1 both LC0 and Stockfish play the correct line on my computer in seconds.
Both of them also play the right move Ba4+ in the second position of #1 "William Rudolph vs." (FEN: 8/1p1q1k2/1Pp5/p1Pp4/P2Pp1p1/4PpPp/1N3P1P/3B2K1 w - - 0 1) but they take quite a bit longer. Stockfish variants get it quicker.
Stockfish solves "Hasek vs." (FEN: r7/7k/5R2/p3p3/Pp1pPp2/1PpP1Pp1/K1P3P1/8 w - - 0 1) listed in #2 quickly. Both Stockfish and LC0 solve "Lazard=F vs." (FEN: q7/8/2p5/B2p2pp/5pp1/2N3k1/6P1/7K w - - 0 1) quickly.
Stockfish gets Bh3 from #3 "Veselin Topalov (?) vs. Alexey Shirov" (FEN: 8/8/4kpp1/3p1b2/p6P/2B5/6P1/6K1 b - - 2 47) in seconds (7-piece end game tablebases installed).
The next position from #3 "Spassky, Boris V vs. Byrne, R." (FEN: 3B4/1r2p3/r2p1p2/bkp1P1p1/1p1P1PPp/p1P4P/PPB1K3/8 w - - 0 1) is easy for both Stockfish and LC0. They both get 50. c5!! right away.
Stockfish also gets the last position from #3 "Stefan Brzozka
vs. David Bronstein" in seconds (FEN: 1r6/4k3/r2p2p1/2pR1p1p/2P1pP1P/pPK1P1P1/P7/1B6 b - - 0 48) Rxb3+.
Stockfish and LC0 see Kd1 in "Lamford=P vs." from #4 (FEN: 8/8/8/1k3p2/p1p1pPp1/PpPpP1Pp/1P1P3P/QNK2NRR w - - 0 1) but believe Rg2 is also winning.
Stockfish gets c8N from "Randviir=J vs." (FEN: 5nr1/2Pp2pk/3Pp1p1/4P1P1/6P1/5K2/8/7n w - - 0 1) in #4 in about 2 minutes on my computer.
Stockfish gets Bf5 from "Simkhovich=F vs." (FEN: 8/8/2pk4/8/p1p3B1/PpP5/1P6/r1NK4 w - - 2 2) in #5 in seconds. LC0 also gets it.
The next two positions are mentioned as easier for programs and they are:
Qe3 from "Deep Blue vs. Garry Kasparov" in #6 (FEN: 1r6/5kp1/RqQb1p1p/1p1PpP2/1Pp1B3/2P4P/6P1/5K2 b - - 14 45) is very easy for both Stockfish and LC0.
Both also get "Vladimir Kramnik vs. Peter Leko" (FEN: 6k1/5p1p/P1pb1nq1/6p1/3P4/1BP2PP1/1P1Nb2P/R1B3K1 b - - 0 25) in #6 quickly.
"Matous=M vs." (FEN: n2Bqk2/5p1p/Q4KP1/p7/8/8/8/8 w - - 0 1) is indeed harder for Stockfish and LC0 than expected. I've confirmed mate in 13 in my mate solver.
In "Nigel Short vs. Vladimir Kramnik" (FEN: r3r1k1/1bp1Bppp/pb1p4/1p6/1P6/1BP2P2/P1P2PKP/R3R3 b - - 6 19) from #9, the engines like ...a5 more than ...c6. Hard to say that this move doesn't also win without more analysis.
Stockfish wants to play c6! and b4! from "Marwitz=J vs." from #10 (FEN: 2K3k1/1p6/R3p1p1/1rB1P1P1/8/8/1Pb5/8 w - - 0 1) right from the start. LC0 takes longer but gets it as well.
In "Anish Giri vs. Maxim Rodshtein" (FEN: 8/5pkp/3p1np1/Rpr5/8/6P1/PB3PKP/8 w - - 6 34) from #10 both Stockfish and LC0 like 34. h4 over 34. a4. More analysis would be needed to see if h4 is not also winning.
Last position "IQ Test #16" (FEN: 5k2/4bp2/2B3p1/1P4p1/3R4/3P2PP/2r2PK1/8 b - - 0 1) takes around 10 seconds for Stockfish.
In summary, chess engine might not really "understand" these positions but they solve them pretty well.
If you spot the computer the first twelve moves and create this position:
My Stockfish still fails to see the key idea of Rc2 followed by Ka2 (at least for a few minutes until I got bored) -- it just wants to shuffle the rooks around forever.
I'm sure that some engines given enough time will solve some of these, but your check isn't accurately assessing their capabilities.
Maybe a more realistic scene than the fanbase gives it credit for. :)
Here's what's confusing me: if the chess engine has an initial numeric value for the current position, and the positions going out to the end of its search all have the same value or worse, shouldn't it be pretty trivial to conclude it is not making progress?
It's not visible on the board itself, but this position's 50-move rule counter is set to 98 ply -- two ply short of a draw. (That's the 2nd-to-last-field in the FEN representation -- the editable text field just below the board, and also in the URL). The only way to avoid that draw, as Stockfish does in fact suggest, is to a sacrifice a piece in a specific and ridiculous way, to compel black to re-capture on the 100th ply (captures reset the 50-move rule). If you edit the 50-move counter down, Stockfish instead shows you the simple mate-in-3.
Nakamura famously tricked a different chess engine, relying on this 50-move draw avoidance (as one component of the hack) -- a sibling thread is discussing that example:
Deep Blue lost in 1996. Its upgrade, called Deeper Blue is the one that won the famous match in 1997. Please SamCopland, do your homework.