AlphaGo plays some unusual moves that go clearly against any classically trained Go players. Moves that simply don't quite fit into the current theories of Go playing, and the world's top players are struggling to explain what's the purpose/strategy behind them.
I've been giving it some thought. When I was learning to play Go as a teenager in China, I followed a fairly standard, classical learning path. First I learned the rules, then progressively I learn the more abstract theories and tactics. Many of these theories, as I see them now, draw analogies from the physical world, and are used as tools to hide the underlying complexity (chunking), and enable the players to think at a higher level.
For example, we're taught of considering connected stones as one unit, and give this one unit attributes like dead, alive, strong, weak, projecting influence in the surrounding areas. In other words, much like a standalone army unit.
These abstractions all made a lot of sense, and feels natural, and certainly helps game play -- no player can consider the dozens (sometimes over 100) stones all as individuals and come up with a coherent game play. Chunking is such a natural and useful way of thinking.
But watching AlphaGo, I am not sure that's how it thinks of the game. Maybe it simply doesn't do chunking at all, or maybe it does chunking its own way, not influenced by the physical world as we humans invariably do. AlphaGo's moves are sometimes strange, and couldn't be explained by the way humans chunk the game.
It's both exciting and eerie. It's like another intelligent species opening up a new way of looking at the world (at least for this very specific domain). and much to our surprise, it's a new way that's more powerful than ours.
I have been watching Myungwan Kim's commentary for the games - and it seems notable that a few moves he finds very peculiar immediately when they are made, he will later point out to as achieving very good results some 20 moves later. So it also seems quite possible that AlphaGo is actually reading this far ahead, to find those peculiar moves achieve better results than from the more standard approaches.
Whether these constitute a 'new way' or not I think depends highly on whether these kind of moves can fit into some general heuristics useful for considering positions, or whether the ability to make them is limited to intelligence's with extremely high computational power for reading ahead.
This. It's a fairly common feature of any AI that uses some form of tree search/minimax, and the effect is very pronounced in chess. Even the best human players can only think 6-8 plies into the feature versus ~18 for a computer. What we can (could?) do is apply smarter evaluation functions to the board states resulting from candidate plays and stop considering moves that look problematic earlier in the search (game tree pruning). AI tends to use very simple evaluation functions that can be computed quickly. They do so given that 1) it allows for deeper search, and a weak heuristic evaluated far in the future often beats a strong one evaluated a few plies prior and 2) for some games (like Go) it's really hard to codify the "intuitions" that human players speak of.
Because search based AI considers board states __very__ far in the future, the results are often completely counterintuitive in a game with an established theory of play. Those theories are born of humans, for humans.
The introduction of MCTS some years back was the first leap towards a human level Go AI (incidentally, MCTS is more human-like than exhaustive tree search in that it prunes aggressively by making early judgement calls as to what merits further consideration). AlphaGo's use of deep policy and evaluation networks to score the board is very cool, and the next step in that journey. What's interesting to me is that, unlike chess AI, AlphaGo might actually advance the human theory of Go. It's possible that these "strange moves" will lead to some very interesting insights if DeepMind traces them through the eval and policy networks and manages to back out a more general theory of play.
I think that Chess machines play perfectly for the next 8 moves, but don't necessarily sense the importance of a Knight Outpost (which may have relevance 20 moves ahead. A proper Knight Outpost will remain a fork threat for the rest of the game).
It is far easier for a Human to beat a Chess Machine at positional play (ex: a backwards pawn shape will probably be a problem at endgame, 30+ moves from now) than to beat a Chess Machine at tactical play (3 moves from now, I can force a fork between two minor pieces)
Do some reading on Stockfish for example if you doubt the veracity of my statement.
But its just as you say: its weighting parameters and heuristics. When Stockfish recognizes a backwards pawn, it deducts a point value. When Stockfish recognizes "pawn on 6th row", it adds a point value to that pawn.
But that's a heuristic. A trained heuristic using games, but still comes down to what I understand to be a +/- point value (like... +35 centipawns).
In contrast, a chess engine truly knows that if you do X move, it will force a Rook / Minor piece exchange in 8 moves.
When you play positionally vs Stockfish, you're arguing with a heuristic (a heuristic which has been refined over many cycles of machine learning, but a heuristic nonetheless that comes down to "+/- centipawns") . When you play tactically vs Stockfish, it is evaluating positions more than a dozen moves ahead of what is humanly possible.
When you play against Stockfish in endgame tablebase mode, it plays utterly, and provably, perfectly.
Take a pick of what game you want to play against it. IMO, I'd bet on its positional "weakness" (yes, it is still very strong at positional play, but it is the most "heuristical" part of the engine)
My experience is with Chess and Chess AI, but in my experience, the more positional knowledge built into the evaluation function, the better the search performs, even if you have to sacrifice some speed for more thorough evaluation. A significant positional weakness may never be discovered within the search horizon of a chess engine because it may take 50 moves for the weakness to create a material loss, so while it's certainly possible that a deep, but carefully pruned search is being utilized, I suspect that some of the Value Network's evaluation is helping to create some of these seemingly odd moves.
For AlphaGo to recognize a position that doesn't achieve a good result for 20 moves, it would often have to search much deeper than those 20 moves (I'm not sure if you're using the term moves to mean ply or both players moving, but if it takes 20 AlphaGo moves for the advantage to materialize, that would be a minimum 40 ply search) to quiesce the search to the point that material exchanges have stopped (again, this is how chess typically does it, I don't know about Go), so the evaluation at the end of the 20 move sequence is arguably more important than a deep search. The sooner you can recognize that a position is good or bad for you, the more time you have to improve the position.
I fixed this behavior by scoring earlier wins higher than later wins. Now it will actually finish games (and win), but almost invariably its edge is very small, no matter how well or poorly I play. Because of the new win scoring, it willingly sacrifices its own advantage if it means securing a win even one turn earlier. (And since scoring is symmetrical, this has the added advantage of working to delay any win it sees for me, thus increasing the possibility of me making a mistake!)
I suppose I could try modifying the scoring rules again, to weight them by positional advantage. A "show off" mode if you like :) And again, with the flip side of working to create the least humiliating losses for itself.
Humans, I think, have the natural instinct to "hedge" themselves in games like go and chess, by creating positional/material advantages now to offset unknowns later. Of course, that advantage becomes useless in the end game, when all that matters is the binary win/lose.
An AI, which may have a deeper/broader view of the game tree than its human opponent (despite evaluating individual position strength in roughly the same manner), may see less of a need to "hedge" now, and instead spend moves creating more of a guaranteed advantage later (as you suggest). And indeed, my experience with my AI is that during the endgame (in which an AI generally knows with certainty the eventual outcome of each of its moves), it tends to retain the smallest advantage possible to win, preferring instead to spend moves to win sooner.
That's actually an excellent way to win chess games. Keep your eye on the mate while the other person is focusing on position and material.
Absolutely. Also worth noting that it may be simply unable to distinguish between good and bad moves if both outcomes lead to a win, since it has no conception of the margin of victory being important.
So it might not be that it increased win probability, but that both paths led to 100% win probability and it started playing "stupidly" due to lacking a score-maximizing bias.
I'm confused. Why would 'make the winning move' not be the way to maximise probability of winning?
I suppose that, in Hive, it is more likely that a path to a win is longer rather than shorter. Hence, when my AI was arbitrarily choosing "winning" moves, it statistically chose those that drew the game out.
Your post should be required reading in this discussion.
People forget how literal computers are.
Humans play that way too. Everyone wants to maximize the chance of leading by >=1 stone.
The difference is that AlphaGo is better at calculating a precise value of a position, so that when uncertainly plays in, AlphaGo can play for, say, "1-3 stone lead", while a human can only get confidence in "1-7 stone lead", and thus needs to play excessively aggressively to overcome the uncertainty.
That's called programming
if you have fully autonomous robots which can fight your war, you'd be able to launch a massive offensive within hours. properly mobilizing defenses and responding to that invasion would take too long, as any command centers would've already been wiped out by the first attack.
Some examples of 5th line early shoulder hits in recent professional play - these situations are not the same as the one seen in today's game, but something like a 5th line shoulder hit is always going to be highly contextual and creative.
http://ps.waltheri.net/database/game/26929/ (move 23)
http://ps.waltheri.net/database/game/69545/ (move 22)
http://ps.waltheri.net/database/game/71408/ (move 22)
http://ps.waltheri.net/database/game/4663/ (move 9)
The only one I can't parse is the last one. There are a lot of variations where I want to know what black's plan is.
For instance, I developed a system that used machine learning and linear solver models to spit out a series of actions to take in response to some events. The actions were to be acted on by humans who were experts in the field. In fact, they were the ones from whom we inferred the relevant initial heuristics.
Everyday, I would get a support call from one of the users. They'd be like, 'this output is completely wrong. You have a bug in your code.'
I'd then have to spend several hours walking through each of the actions with them and recording the results. In every case, the machine would produce recommended actions that were optimal. However, they were rarely intuitive.
In the end, it took months of this back and forth until the experts began to trust the machine outputs.
This is the frightening thing about AI - not only can an AI outperform experts, but it often makes decisions that are incomprehensible.
Later, he did admit that the "overextension" on the north side of the board was more solid than he originally thought, and called it a good move.
He never explicitly said that a move was "good" or "bad", and always emphasized that as he was talking, his analysis of the game was relatively shallow compared to the players. But in hindsight, whenever he point out an "bad-juju feel" on the part of Lee's move, AlphaGo managed to find a way to attack the position.
Overall, you knew when either player made a good move, because the commentator would stop talking and just stare at the board for minutes, at least until the other commentator (an amateur player) would force a conversation, so that the feed wouldn't be quiet.
The vast, vast majority of the time, the English-speaking 9-dan was predicting the moves of both players, in positions more complicated than I could read. (Oh, but it was obvious both players would move there. There were clearly times when the commentator would veer off into a deep distant conversation with the predicted moves still on the demonstration board, because he KNEW both players were going to play out a sequence of maybe 6 or 7 moves forward).
They really got a world-class commentator on the English live feed. If you got 4 hours to spare, I suggest watching the game live.
> I sense a change in the announcer's attitude towards AlphaGo. Yesterday there were a few strange moves from AlphaGo that were called mistakes; today, similar moves were called "interesting".
If I'm an expert in some domain and a computer is telling me to do something completely different ("Trust me--just drive over the river!") I'm certainly going to question the result.
Could AlphaGO be winning in a way similar to left handed fencers having an advantage over right handers by wrong footing them rather than simply being better? Would giving Lee more chance to see this style give him a chance to catch up?
Think Bruce Lee and the creation of Jeet Kune Do. Before him everyone concentrated on improving one style by following it classically, rather than just thinking of 'how do I defeat someone'.
IMHO Lee is the best at the current style of Go. AlphaGO is the best at playing Go. Maybe humans can devise a better style and defeat AlphaGo, but I'm sure AlphaGo can adapt easily if another style exists.
Ke Jie is an arrogant 18 year old and he's been saying on social network in the past couple days how he will defeat AlphaGo.
Swimming. It used to be that swimmers were supposed to be streamlined and avoid bulky muscles. Then a weightlifter decided he wanted to swim. Swimmers today all lift weights.
Programming. It used to be that people built programs in a very top down, heavily planned way. Think waterfall. We now understand that a highly iterative process is more appropriate in most areas of programming.
Expert systems. It used to be that we would develop expert systems (machine translation, competitive games, etc) through building large sets of explicit rules based on what human experts thought would work. Today we start with simple systems, large data sets, and use a variety of machine learning algorithms to let the program figure out its own rules. (One of the giant turning points there was when Google Translate completely demolished all existing translation software.)
Nowadays, top players slug it out baseline-to-baseline.
In terms of stance, we were taught to hit from a rotated position where your shoulder faces the net, and a normal vector from your chest points to either the left or right side of the court.
Nowadays, it's much more common to hit from an "open" position, where your body is facing the net, not turned. This would have been considered "unprepared" or poor footwork in my day, but it actually allows for greater reach. It does make it more difficult to hit a hard shot, but that's made up for by racquet technology and generally stronger players.
Although it takes a few paragraphs until it gets into the details of "today's power-baseline game."
Which is a curious point. The gripes about early brute force search algorithms (e.g. Deep Blue?) were that they felt unnature.
However, as the searches get more nuanced and finely grained, is there a point at which a fast machine begins doing fast stupid machine things quickly enough to feel smart?
Are there any chess / Go analogs of the Turing test? Or is a computer players always still recognizable at a high level?
A Turing test for game players is an interesting idea, it would be useful for designing game players that are good sparring partners rather than brutes that can whipe the floor with you.
As for JKD, people are drawn in by its oriental esotericism, but there's no evidence it is an especially effective fighting style, or that it has something that (kick)boxing does not.
Remember that AlphaGo has spent months developing its own style and theory of the game in a way that no human has ever seen. Its style is sure to have weaknesses, but humans will have a hard time figuring them out on first sight.
Similarly chess computers do better in some positions than others (they love open tactics!) and one of the games that Kasparov won against Deep Blue he won by playing an extreme anti-silicon style that took advantage of computer weaknesses. However Kasparov didn't have to figure out what that style was because there was a lot of knowledge floating around about how to do that.
Therefore I'd expect that Lee Sedol from a year from now could beat AlphaGo from today. And human Go will improve in general from trying to figure out what AlphaGo has discovered.
However that won't help humans going forward. AlphaGo is not done figuring out the game. At its current rate of improvement, AlphaGo a year from now, running on a single PC, should be able to beat the full distributed version of AlphaGo that is playing today. Now the march of progress is not whether computers can beat professionals. It is going to be how small a computing device can be and still beat the best player in the world.
Weaknesses are only relative to capabilities of the opponent to exploit them. If a tank has a weak spot that rockets can hit, but it's being opposed by humans on horseback, is it really a weakness in that context?
Additionally AlphaGo has the advantage that it started with a database of human play, so it has some ideas what kinds of positions humans miscalculate.
As for your tank vs horseback analogy, that's flawed at the moment. AlphaGo is probably reasonably close in strength to the human facing him. Improved human knowledge could tip the balance.
However in the future it will become an apt analogy. Computers are going to become so good that knowing the relative weaknesses in their style of play may reduce the handicap you need against them, but won't give you a chance of becoming even with them. That happened close to 20 years ago in chess, and is now only a question of time in Go.
Yes. A representation of ladders is among the input features of its neural networks.
Stone colour 3 Player stone / opponent stone / empty
Ones 1 A constant plane filled with 1
Turns since 8 How many turns since a move was played
Liberties 8 Number of liberties (empty adjacent points)
Capture size 8 How many opponent stones would be captured
Self-atari size 8 How many of own stones would be captured
Liberties after move 8 Number of liberties after this move is played
Ladder capture 1 Whether a move at this point is a successful ladder capture
Ladder escape 1 Whether a move at this point is a successful ladder escape
Sensibleness 1 Whether a move is legal and does not fill its own eyes
Zeros 1 A constant plane filled with 0
Player color 1 Whether current player is black
(The number is how many 19x19 planes the feature consists of.)
I could easily see the difference in tournaments with other clubs that were not used to left handed players.
(This also applies without a 100% chance of winning, as long as its chances of winning hover near the highest percent it's able to distinguish.)
Even if it did output an actual 100% chance, AlphaGo would still end up picking moves favored by the policy network, so it would probably just revert to playing like it predicts a human pro would.
It's similar to how ray tracing renderers start to return weird speckle patterns when the room is dark enough.
And the policy network chooses branches to investigate, not which one to choose. It adds sample resolution to places pros might play, but doesn't add to the estimated probability of winning.
Edit: Actually, since places pros might play have higher sample resolution, they're less random. So worse moves get worse evaluation, and a higher chance of leading the pack. This might actually bias AlphaGo to play some pretty bad moves - but, again, this is all assuming it's going to win anyway.
The excellent point you're making applies in general to nearly every type of human thinking.
The way we think about other people, our intuitions about probabilities, our predictions about politics, and so on -- all are based on our peculiarly effective, yet woefully approximate, analogy based reasoning.
It shouldn't be surprising in the least when commonly accepted "expert" heuristics are proved wrong by AIs that actually search the space of possibilities with orders of magnitude more depth than we can. What's surprising -- and I think still a mystery -- is how human heuristics are able to perform so well to begin with.
I'm not a Go player, but I saw this same phenomenon as poker bots have surpassed humans in ability. As with AlphaGo, they make plays that fly in the face of years of "expert" wisdom. Of course, as with any revolutionary thinking, some of the new strategies are "obvious" in hindsight, and experts now use them. Others seem to require the computational precision of a computer to be effective in practice, and so can't be co-opted. That is, we can't extract a new human-compatible "heuristic" from them -- the complexity is just irreducible.
They are peculiarly effective only because of lack of comparison. Humans have been the most intelligent species on this planet for millennia, where no other species come even close. We don't know how ineffective those strategies are seen by a more advanced species. Well, until now.
Of course, the counterpoint could be that it's only the case because humans, with their laughable reasoning abilities, are the ones programming those computers.
AlphaGo can’t decide that it’s bored and go skydiving. Humans aren’t merely capable of playing Go. And when they do it, they can also pace around the table, and drink something, all at the same time, on a ridiculously low energy budget. Or they can decide never to learn Go in the first place but to master an equally difficult other discipline. They continuously decide what out of all of this to do at any given moment.
AlphaGo was built by humans, for a purpose selected by humans, out of algorithms designed by humans. It is not a more advanced species. It’s not even a general intelligence.
Your own original point was much better than the one made in response.
"oh what if the machine suddenly came alive!?" has been done 1000 times. But such concepts like: a computer can detect and act patterns which we cannot, in ways that are almost, if not possibly intelligence, are magnitudes more believable, and therefore, compelling.
Of course, those fools underestimated it. They should have known better...
Pretty much the same thing happened with TD-Gammon with it playing unconventional moves, in the longer term humans ended up adopting some of TD-Gammon's tactics once they understood how they played out, it wouldn't be surprising to see the same happen with Go.
“Top competitors who once relied on particular styles of play are now forced to mix up their strategies, for fear that powerful analysis engines will be used to reveal fatal weaknesses in favoured openings....Anything unusual that you can produce has quadruple, quintuple the value, precisely because your opponent is likely to do the predictable stuff, which is on a computer” 
Anand isn't really talking about strategy here, he's just talking about choice of opening. Players with narrow opening repertoires, like Fischer, have always been easier to prepare for than players who play a wide variety of openings.
As far as actual changes to strategy, the most obvious one is that computers tend to value material more highly than humans. So a computer will take a risky pawn if it looks sound, while a human will see that taking the pawn is very complicated and prefer a simpler move.
(1) Online game databases have made it easier for players to track developments in opening theory and prepare to play specific opponents
(2) Chess engines add to this be used to search for antidotes to complicated opening systems
(3) Young players have greater access to high-quality sparring partners - either engines or fellow humans on online servers.
This has lead to the best players becoming younger, and players playing more varied and less 'sharp' openings.
It uses MCTS, which is unlike minimax. It doesn't use temporal difference learning, although they say that the policy somewhat resembles TD.
That doesn't sound like 'essentially built on', its sounds maybe like 'slightly influenced by'
Tesauro's work on TD-Gammon was pioneering at the high level, i.e. combining reinforcement learning + self-play + neural networks.
Looks like citation 46 is the relevant one here.
Until someone got better weapons and suddenly the "rules" of the battlefield that dictated standing in lines across each other made no sense to follow anymore because the original principles that dictated those rules to be good were not valid anymore.
I think this will the theme of our future interactions with AIs. We simply can't imagine in advance how they will see and interact with the world. There will be many surprises.
It's not like this at all; let's not do this sort of thing. Humans are inveterate myth makers (viz. your description of how people conceive the Go board as army units), and our impositions on the world are easily confused for reality.
In this case, there's no "intelligent species" at work other than humans. We made this, and it is not an intelligence, it is a series of mathematical optimization functions. We have been doing this for decades, and these systems, while sophisticated, are mathematical toys that we have applied. We built and trained this thing to do exactly this.
As a student of AI you know that convolutional neural networks are black boxes and are hard to interpret. A different choice of machine would have yielded more insight about how it is operating (for example, decision trees are easier to interpret). The inscrutability of the system is not a product of its complexity; even a simple neural network is hard to understand.
This, actually, is my primary objection to using CNNs as the basic unit of machine learning - they don't help US learn, they require us to put our faith in machines that are trained to operate in ways that are resistant to inspection. In the future I hope that this research will move more towards models that provide interpretable results, so they ARE actually a tool for improved understanding.
You can say the same about your mind too which is a bunch of optimization nodes. If something is intelligent, does it matter if it's evolved in nature or created by a species who is evolved in nature?
> In the future I hope that this research will move more towards models that provide interpretable results
I think it's not really possible to understand in detail how these networks operate on the level of nodes, because emergent behavior is necessarily more complex than the sum of its parts.
A CNN is a pure mathematical function - if you want, you could write it down that way. Given a set of inputs, it will always produce the same output. We don't call a linear regression model an "intelligence", a CNN is no different.
Of course I agree that humans are built up of billions of tiny machines like this, but let's appreciate the vast difference in scale.
> A CNN is a pure mathematical function
That's their basic property, but who are we to say that our cell based neural network is superior? Cells are just compositions of atoms and they are defined by quantum mechanics, which is... "just" math and information.
I also think that Go might be a great communication tool between AI and humans. If you look at the commentary from this angle if's fun to think about like this.
I think the answer must be in figuring out how to decompose the black box of a CNN - it is, after all, just a set of simple algebraic operations at work, and we should be able to get something out of inspection.
I have to imagine Hinton et al. have done work in this regard, but this is far afield for me, so if it exists I don't know it.
Human intuition and to certain extent, creativity are like this as well.
And this is just the beginning with AlphaGo. As we keep on training Deep Learning systems for other domains, we'll realise how differently they approach problems and solve them. It'll, in turn, help us in adapting these different perspectives and applying them to solve other problems as well.
.. that we'll be probably unable to comprehend ourselves.
Which attempts to visualize machine areas of attention that look like: