It really is a bit scary to see. I would not have guessed that human strategy was that deficient. Computer chess programs still tend to play with human-like strategy (partially because humans have coded their evaluation functions!) but godlike tactics. Master is not really playing like a human at all.
In a nutshell, positional play in chess is simply heuristics we humans use to be able to evaluate a position in lieu of being able to calculate deep non-forced lines. Computers do use this to an extent (as you point out, we coded their evaluation functions) but positional play matters less when you see all the outcomes of every possible tactic with 100% accuracy. So computers tend to play reasonably human-like in the openings, but by the time you reach the middle game they'll happily enter lines where their pawn structures are shattered, pieces appear superficially to have little coordination, and where their king safety appears compromised (all things humans rarely intentionally do), all because they've seen that it works out 25+ moves in advance.
Makes you wonder if computers will essentially ruin all the novelty of innovating and discovering new tricks/strategies in age-old games. No more will genius' have their moments of brilliance and improvisation that lead to a stunning victory or upset. No more will commentators debate and regal over phenomenal victories in ages past.
With a computer, it's just the optimal play strategy and you were just playing a god who knew the quickest path to terminating your king. When a human runs with non-standard play strategies in almost any game, it's considered brilliant, innovative, game of the century, etc...
Beating human players in go is like finding a good heuristic for an NP-complete problem. Solving 19*19 go is like proving P!=NP, i.e. we don't even have tools that can approach the problem.
I don't know if there are some theorems, thoughts, philosophies about whether this means it can't be solved, but at least it must be extremely difficult.
There may be other ways to solve the game, but we don't know what they are. Because we know it is theoretically solvable, we cannot rule out a practical approach to solving it by some mathematical magic even if we have no idea what that would look like.
By analogy, we can prove many things about infinitely many integers by mathematical induction, but if we didn't have that technique, such proofs might seem impossible.
Two players write a turing machine with at most n states with two symbols. The player that produced a terminating turing machine that produces the most 1 symbols on the tape before terminating wins.
The optimal strategy for this game is producing a busy beaver, a feat shown not to be computable.
I was thinking that for a fixed n that doesn't really work, because there are only finitely many options, but I guess if n>~2000 , ZFC cannot show the winning strategy to be the winning strategy? Is that what you meant?
Given any two machines which halt, finding the one that ends with more ones is computable. Assuming at least one of the two machines halts, which one wins can be computed in the limit? By which I mean, if the process is allowed to have a "who is currently winning" (the one that already halted if only one has, the one that halted later if both have, or neither if both haven't) thing, and the limit of that is whatever it eventually never switches from. I guess that works even if neither halt.
Uh... I'm just saying stuff you already know to try to think through it myself.
Edit: I guess the question is then, what exactly do we mean by solvable?
Do we mean that there is an algorithm that outputs an optimal move on every turn? For any n, there is such an algorithm. The one that has the correct move hard-coded. Maybe we mean that there is an algorithm that probably always outputs an optimal move? In this case, well, I suppose it depends on the axiom system. Hm.
The trivial case of a Turing machine that can be proven to halt is one with only one state: halted.
Compare this to lines that the humans have memorized after deep analysis, like poisoned pawn Najdorf. The play there certainly doesn't look humanlike either.
What you say sounds plausible, but I do not believe it is backed up by any analysis of the games.
When playing against humans, modern engines can accurately evaluate so much deeper than we can that their play is frequently indistinguishable from the purely tactical play of some such perfect computer (at least, once out of the opening, once the number of "good" moves is constrained a bit allowing computers to evaluate to significantly deeper ply).
You say that computers exhibit a different positional judgment, which is true in the "technically correct" sense (their evaluation function is pretty much the literal definition of their positional judgment) but at depths of 25 ply, 30 ply, or even greater, the simple truth is that they are highly willing to enter lines where their king is exposed, they have pawns doubled, they trade away a good bishop and keep their bad bishop, or all of the above (all what we'd consider anti-positional play) just because they can see a concrete outcome (a won piece, a strong attack, etc.) that we can't.
So yeah, by some definition they play positionally. But it's really by a definition that's only perhaps useful to other computers and not us humans; to us, it's significantly closer in practice to what we consider highly tactical play.
I don't think you can "see 25+ moves ahead" in Go. The branchout factor is just too big.
If I recall correctly, the version that beat Lee Sedol was trained on amateur games plus self-play. My guess would be that this new version relies more heavily on pro games.
Unlikely, since AlphaGo can now generate large numbers of "pro quality" games from scratch. I think it's far more likely it is an autodidact at this point.
No-limit is far more difficult than limit due to the risk of catastrophic failure. A Nash equilibrium robot won't make any money. A robot must identify a weakness in you, then deviate from equilibrium to exploit your weakness. So long as you're playing deep stack, you could simply play the Bertrand Russel chicken story (echoing David Hume): The farmer feeds it every day, so the chicken assumes that this will continue indefinitely. One day, though, the chicken has its neck wrung and is killed. It's the "maniac" style. Pretend to be an idiot that plays too many hands. Don't lose your shirt. The robot will learn that you're always bluffing. Eventually you have the nuts and you take everything.
This especially doesn't work against multiple opponents.
That's a good strategy against a bad robot, not the latest batch.
It is possible that they fed it some pro games after the Fan Hui games but before the Lee Sedol games, but that would be weird; at that point it was already learning from self-play rather than trying to match human moves.
That said, I don't think that Master's better performance comes from being trained on pro games. The AlphaGo version that played Lee Sedol played much more like a human pro than Master does.
I'm confused. I thought 9-dan players were considered pro? That's the highest ranking you can get, right?
Even the abbreviations differ: 9d (amateur dan) vs 9p (pro dan).
Edit: I'm not actually that sure. I'm asking around right now (with go players, not DeepMind people).
This is pretty similar to what chess engines do.
Now, with a perfect white play there may be moves an imperfect black player makes which causes white to attack. But, perfect play on both sides probably means any white stone gets captured so white plays zero stones.
That is true. The same, however, doesn't hold for larger boards (such as 19x19).
> But, perfect play on both sides probably means any white stone gets captured so white plays zero stones.
For boards larger than the small boards you mentioned above, this is completely untrue.
Is this accurate? Does MasterP also use this style? Are there humans that can play this way?
(I'm asking you because you seem to know what you are talking about here.)
What alphago showed however, is that once in the lead it made mistakes: small mistakes that didnt jeopardize the game, but lost the lead. If all paths lead to rome, it doesnt matter which is shorter. Humans however, always think of the best after safety.
What was terribly cruel to see as a Go player was the computer playing poorly early on: showing that it already knew it was going to win.
Humans using the engine don't like this however, and so the authors build in heuristics to make it play more "humanly". Things like using scores for positions, rather than "chance of winning" is also largely for the sake of the users.
Humans do the same thing, playing conservatively in a situation where they're far enough ahead. The difference is that a human sometimes looks at a move and says "This move works, and gains points. There is no risk." For a bot using MCTS, everything is a probability.
However, you are correct insofar that it doesn't care about winning by large margins, it prefers winning by smaller margins if it can achieve this with a higher win probability.
Is this not done in go?
To become a pro you have to go through insane levels of competitions, you need to be strong, not find a weakness in the 100's of players you will face to just have a shot to become the lowest level of professional.
No matter how I've played against it to get to where we are, it'll play the same from that point on. It won't identify me as a risky player from my pattern, nor will it try and classify me as "unpredictable" in some way. It'll play each move as though it has sat down at an already in-progress game between two random opponents.
> It's possible for an AI to evaluate a move based solely on the resulting board position, but it wouldn't be very good. Pretty much all AIs play many turns out to see if the move is any good.
I would strongly argue that these are identical situations. Playing out scenarios in your head but taking into account no history is the same as "evaluating a move based solely on the board position".
In the end of a well-played game the losing player will find that they perhaps get more points than they expect, but each time they win a tradeoff with a slight margin the position on the board becomes much more solid and fixed than it ought to be. A 10 point lead with a large variance consolidates toward an unfaltering 1/2 point lead.
Indeed, commentary on Master (P)s games seemed to suggest this was exactly how endgames went.
Early in the game, confidence in winning is usually correlated exactly with point margin... but at the same time, score in the early parts of the game is very hard to estimate. It's repeatedly noted that AlphaGo plays a very "influence oriented" game which means that it eschews confidently having many points for having lots of "power" on the board which will later translate to points.
So, all together I'd say that AlphaGo plays very well here and doesn't have too much of an obvious computational bias.
The one true oddity is once AlphaGo's internal win probability reaches 100% it starts playing idiotic moves. The reason is simple: it's just searching for moves which have the highest expected win rate and at this point nothing it can do is bad. It won't lose points or anything, but instead just plays moves that are obviously pointless.
I just watched a couple of games, and I can't agree. Master's fuseki looks a lot like human fuseki. It plays some original, unusual, confusing, non-human looking moves, yes. It also plays some moves at what would normally be considered the "wrong time" conventionally. But the fuseki still looks highly derived from human games to me -- human with some (sometimes major) tweaks.
When AlphaGo is trained fully from first principles alone, I expect another Shin Fuseki -- a revolution in opening strategy. I would be surprised if Go Seigen, Kitani Minoru, et al had finally discovered optimal opening strategy at the beginning of the 20th century (developments since then have mostly been small refinings of what was started in the Shin Fuseki era, not revolutionary).
On chess computers certainly seems to have supported that basically any opening move is playable with tight enough play.
It would be very interesting to have pro-players comment on published records of alphago self-play.
Maybe alphago has discovered a new balance between black and white (that is a new optimal value of the komi) but when playing with the human defined of the value of the komi its optimal style is also different than what it would be otherwise.
(analysis by Gu Li and Zhou Ruiyang, two top pros; standard komi)
I'm a mere 1 dan, but I fail to see this marked "difference in normality".
What specifically looks "much less human" to you in these newer games?
In this games described here, the floating reduction of the wall looks very odd to me: http://lifein19x19.com/forum/viewtopic.php?f=15&t=13929&p=21.... I also thought the first game against kiss88 featured a lot of very unorthodox moves: the double tenuki in the top right and the shoulder hit against the 44 + knight's move in the lower left especially seemed novel to me. The two space extension from the 34 point is also uncommon, but less surprising.
Another move that didn't seem familiar to me was the shoulder hit and large knight's reduction in the lower right, but that may just be a tactical variation that's unfamiliar.
P.S. I'm only 3k AGA, 4k OGS, so take my personal opinion with a grain of salt! I do think I have a decent feeling for what other people are thinking. Ke Jie made a comment about no human having scratched the surface of go.
But the question was whether these Master games look less human than the previously published self-play games by AlphaGo. And I just don't see it.
The first and third self-play games in particular look crazier (to me) than anything I've seen in this Master collection.
But again, just a measly 1d amateur commenting on unworldly 9p vs ?11?p battles :) People at my level can barely glimpse the tip of the iceberg.
"I am sketching out territory while attacking an opponent group."
"I am making my group safe so that I will not have to worry about its life while I accomplish other strategic goals."
"I am making a very strong group so that I can use it to make it harder for my opponent to accomplish anything."
Master, on the other hand, will sometimes seemingly just plop stones down in the middle of the board in a way that is hard to assign a narrative to. One can list a bunch of ways that the stone might come in handy in various futures, but it's too "vague" a move to be played with confidence by a human professional go player.
Of course, human strategies may change in response...
When AI gets strong enough (and it seems like it has already), it will just tenuki everyone all the time, while winning. Sounds like exactly what's happening already. It's past the event horizon for human understanding.
AlphaGo creators could "rewind" the whole program state to that move and inspect the tree search probabilities according to the board states it looks through to find a list of board states that generate a cumulative highest probability of wining by doing a move in that exact odd spot.
My guess would be that while humans tend to put stones with a single, double, or sometimes triple "reason", or in AI words, "high probability of local effectiveness in upcoming several turns", AlphaGo, with his ability to see further, can see past the local effectiveness into more global effectiveness and higher probability of winning further down the road.
In other words, those tenuki moves might actually be past the event horizon of human understanding, but only if inspected by looking at them and thinking ourselves. If we use AlphaGo itself, it should be possible to find out the reason for every single tenuki it will ever do.
I think this phrase is going to pop up more and more frequently.
It is a misunderstanding that has been breathlessly repeated by futurists and transhumanists for years.
I'm not a Go player, but could it not be that if the AI played its moved in a more human order you could see what was going on and assign a narrative to the moves, but the AI can see the order of the moves doesn't matter sometimes so it seems to play more vaguely/randomly to observers? For example, say the AI played a set of moves at the top of the board that you could give an attacking narrative to and then it plays a set of moves at the bottom of the board that you could give an defensive narrative to. If you intermixed the move order for both narratives you, it seems like the AI is playing in a nonhuman way when really its moves have a narrative but humans are too fixated on the move ordering.
Given that it trains against a copy of itself, with equivalent predictive powers, it makes sense that it would pick moves that have lots of branching possibilities because that would increase its effectiveness against itself.
It'd be very scary to watch a "sibling" to AlphaGo play a 4X game.
There is the question of what your AI takes as its input though. If you feed it the rendered frames of the games there are a lot of open challenges in comparison to feeding it a friendlier representation of the game state.
There is also the question of teaching the AI a strategy vs only giving it the rules of the game and letting it develop strategies by itself.
In comparison to Chess or Go, StarCraft has a lot more game elements to understand and conceptualize. StarCraft is played with incomplete information, because of the fog of war - so for an AI to play it properly it has to have persistent knowledge of what it has seen of the game state and be able to draw conclusions from that about the probable overall state of the game.
You call it chaos when it's far above your head.
If we're far enough from "perfect play", then the answer is no.
A lot of go comes down to timing. AlphaGo doesn't seem to play moves which are inhuman in that they just make zero sense as much as plays moves which are more daring than humans would like to try. Then, infallibly, it turns out that AlphaGo's daring move had the mark of being amazingly well-timed.
With as much freedom as Go allows I think it would be surprising if humans had stumbled upon an optimal strategy (or Master for that matter, I'm sure there is still much to be improved!).
So if it requires some experience just to recognize a winner, and as far as I know sometimes even professionals can't tell for sure who is wining, it's pretty safe to say Go game is just too complex of a game for humans to come to optimal strategies in any reasonable amount of time.
I like your idea of maximising the 'alienness' of play though probably not to the exclusion of playing the strongest moves.
My guess has always been that the real gap is much higher, and a perfect player could give 6 or even 9 stones to the world's top players.
In perfect play there would be no such thing as joseki. I don't think it would be a game we would even recognize.
At amateur level, say a 2d vs a 2k player would perhaps have a 99.9% winrate. To make it an even game, would take 4 handicap stones. But at pro level, I think it's possible for player A to have a 99.9% winrate against player B, but on two handicap stones, for player B to be the favorite.
Even an engine that is enormously successful against top human players would struggle at high handicaps vs them.
What I really wanted to express is the amount of headroom available between top players and perfect play. When I say nine stones, I mean, whatever the win rate is between an idealized 9p and 8p player, there would be nine more such steps between the 9p player and god. Probably more. I don't necessarily mean that god would have even odds giving nine stones to top players, because that's a different game.
However, I have seen enough games where strong amateurs are taken apart with shockingly high handicaps by top pros, especially in faster games, to wonder. I think we simply fail to imagine how strange perfect play would be. Even if a pro spent the rest of their life thinking about the next move, they are unlikely to find the one true best move. A player that always played that move would be so far ahead of anything we've seen that we just can't imagine how much better it would be. Imagine knowing at move 10 that the best move, given perfect play by both, leads to a 3.5 point win in 256 more moves, while the second-best move leads to a 2.5 point win after 310 moves. Just stating it this way shows the amount of headroom there is above human play.
What I would find truly fascinating is if Master could divide moves that it plays, which professionals wouldn't, into groups based on a similar internal categorization of the moves. And then see if human minds can look at any of groups and come up with a human understandable principle that humans had been missing about the game.
The point here is not so much to improve human play (though it presumably would do so), but as a step towards having an AI that can break down its internal model into principles that can be used to train another AI to learn those principles. Just like how a human expert can learn to turn expertise into something that can be taught to other humans.
This has several potential benefits. The first is that human experience suggests that this type of introspection tends to improve our own competency. The second is that we could have a single AI trained by multiple specialist AIs to get a compact "generalist". And the third is that this is a path towards having AIs that can discover things then teach them to humans.
The whole idea might fail horribly. But I'd like to see it given a shot.
At some point, if this method would be used, it would probably require a second AI that would help understand the main AI, because the primary "explanation" would still be too complex and/or subtle for us to comprehend.
However there is hope that cluster analysis on data about the internal reasoning process can successfully identify groups of positions that "seem to share a common principle". Success in that is a first step towards lots of interesting things.
There's a livestream of bots facing off here: https://www.twitch.tv/certicky
I think at the end of 2017, we may have to say that computers have conquered Go.
Interesting that the timing is 20 years after mastering chess.
I can already barely follow what is going on on the board, so I'm sure watching slower paced games against pros would be interesting.
I am not sure if slower time settings favoring the human is still true -- AlphaGo works a little bit differently from traditional computer go programs.
Earlier versions of AlphaGo against Fan Hui, the slower official matches were 5-0 in favor of AlphaGo, whereas in the unofficial blitz games, Fan Hui managed to win some games.
You can find all the games at http://tieba.baidu.com/p/4922688212?pn=1
It's chinese but the pictures are universal and the comments don't really matter much.
By the way, Google Translate makes a pretty horrible work at translating those pages. I think they need to add some more DeepMind to it :-)
To be fair, they probably don't have much training data for the jargon of the game of go or of any other game.
I think they made an announcement about that.
> Found in translation: More accurate, fluent sentences in Google Translate
So it doesn't seem to work so well.
One area google translate could look at is the name identification from the context and using roman instead of literal translation.
Also experimenting with a new ranking system beyond 9 dan.
Then, professional dans in some Go associations are granted by total wins over their career rather than relative strength to another rank (e.g: Japan).
Fernando Aguilar, a 6 dan amateur from Argentina (http://senseis.xmp.net/?FernandoAguilar) defeated 2 Nihon Kiin 9 professional dans (Hasegawa Sunao and Yo Kagen), which is unexpected given the substantial rank difference.
1) The two 9p professionals you mentioned achieved 9p status before the new promotion system you linked to. The new standards are much more stringent (though still based on lifetime achievement).
2) The standards differ by country. All the major countries now give out 9p ranks sparingly, but I think China especially may be quite difficult.
As such, the old system was, roughly speaking, based on your historical performance, not just wins.
But the bigger point is that it was very permissive. There were something like 70 9p professionals in Japan, and there will be quite fewer in the coming years (I believe Iyama Yuta is the only 9p aged less than 30).
This game is a mirror go upto move 72. Black player is https://en.wikipedia.org/wiki/Chou_Chun-hsun
I would also like to know if two perfect players would always end up with a draw, or if they would each win 50% of the games.
Go rules prevent draws by giving Black a non-integer score bonus (this is called Komi, http://senseis.xmp.net/?Komi); By definition of "perfect play", a Black perfect player either always wins games, or a White perfect player always beats him.
Then, if something better than MCTS is found that will provide advantage too. Remember that those bots are hybrid solutions.
For example Leela (a free program) does this, and can display it with a "heatmap" style: https://sjeng.org/leela.html
Crazy Stone Deep Learning ($80 USD) has some similar analysis/hint features: http://www.unbalance.co.jp/igo/eng/
The way I recall building simple game AIs is to create a board evaluator, then build a game tree, prune as necessary, and min-max as system constraints allow. I'm sure "real" AIs are much more sophisticated but I don't know how much of what it does can be taught to a human.
I'm not sure that's an accurate statement.
"There is no widely played game for which an Elo rating system shows a larger measurable range of skill than Go."
This is a precise and measurable statement. The standard deviation of the Elo system used in Chess is 200, and the range from the best humans to rank amateurs is 14 times this standard deviation. When the Elo system was adapted to Go, the standard deviation was set to 100, and the range is 29 times the standard deviation.
The possible skill range, of course, measures something how complex the possible strategies are. And Go trounces every other game by this measure.
I think that gives Go a pretty defensible claim to "most complex strategy of any popular board game".
Even if you use the larger range of inactive players, that's a range of 7.5 times the standard deviation. Which is considerably less than chess.
Chess has a floor of 1200 for novices. So I'm not sure your point.
Chess ratings from the USCF have a floor of 100. What you are thinking of is that they start people at a provisional rating of 1200 and then let them drift to where they belong. But a person who just knows how the pieces move will quickly head down towards that floor.
(Question from someone who does not know anything about board games.)
I ask because a natural question arises: can one engineer a game specifically so that it is extremely difficult for a machine to master compared to a human -- due to some strategic complexity, or some approach that does not lend itself well for current AI models.
edit: looks like there's at least one http://arimaa.com/arimaa/
Even if a game was more potentially complex than Go, it would be hard to establish the dynamic range of Elo ratings unless a lot of people played it, and hard for skill variation to be that high unless a lot of people studied very hard to become very good at it.
Go is "most complex" only when measured by difficulty of implementing a AI to beat the best humans.
(To be specific, my guess is that we'd have been at the amateur-dan level, and the initial MCTS revolution would have been enough for the computers to win.)
~60 wins atm.
I wonder what the best way of coordination would be. Perhaps they can identify several promising lines and each player chooses one variation to calculate more deeply.
From my own experience playing (as a weak amateur), I feel I'm rarely able to think so systematically -- often ideas I discover while contemplating one line are tried in entirely different variations, so it would perhaps take some getting used to before a team can function optimally.
They would have to use a software to divide the search tree between them and collect their judgements.
I'm wondering how would a mixed team of AIs and humans fare? For example, a team of 1 AI and 4 humans against a team of similar setup. Humans could be assigned to judge parts of the search tree and their inputs aggregated by the AI.
ps: The Japanese term for a game record for the game of Go is kifu: https://en.wikipedia.org/wiki/Kifu
Hence http://gokifu.com/ which tracks all pro games (and then some, I think)
And here's another public attempt, not sure about the progress.
if their goal was to train the best go-bot, they could have had it play not to win but to go down to the wire with very strong players but frequently lose.* Experienced players might have identified it as a bot, but would have dismissed it as "not good enough yet"
*especially as per the recent story that children don't learn when they win, so trains yourself without training your opponents :)
AlphaGo is the best player in the world, so a name like Master fits it :)
They also probably didn't want to choose a human looking name which could mislead people, and instead chose one which could presumably be either of a human or an AI.