Hacker News new | comments | show | ask | jobs | submit login

As someone who studied AI in college and am a reasonably good amateur player, I have been following the matches between Lee and AlphaGo.

AlphaGo plays some unusual moves that go clearly against any classically trained Go players. Moves that simply don't quite fit into the current theories of Go playing, and the world's top players are struggling to explain what's the purpose/strategy behind them.

I've been giving it some thought. When I was learning to play Go as a teenager in China, I followed a fairly standard, classical learning path. First I learned the rules, then progressively I learn the more abstract theories and tactics. Many of these theories, as I see them now, draw analogies from the physical world, and are used as tools to hide the underlying complexity (chunking), and enable the players to think at a higher level.

For example, we're taught of considering connected stones as one unit, and give this one unit attributes like dead, alive, strong, weak, projecting influence in the surrounding areas. In other words, much like a standalone army unit.

These abstractions all made a lot of sense, and feels natural, and certainly helps game play -- no player can consider the dozens (sometimes over 100) stones all as individuals and come up with a coherent game play. Chunking is such a natural and useful way of thinking.

But watching AlphaGo, I am not sure that's how it thinks of the game. Maybe it simply doesn't do chunking at all, or maybe it does chunking its own way, not influenced by the physical world as we humans invariably do. AlphaGo's moves are sometimes strange, and couldn't be explained by the way humans chunk the game.

It's both exciting and eerie. It's like another intelligent species opening up a new way of looking at the world (at least for this very specific domain). and much to our surprise, it's a new way that's more powerful than ours.




> It's both exciting and eerie. It's like another intelligent species opening up a new way of looking at the world (at least for this very specific domain). and much to our surprise, it's a new way that's more powerful than ours.

I have been watching Myungwan Kim's commentary for the games - and it seems notable that a few moves he finds very peculiar immediately when they are made, he will later point out to as achieving very good results some 20 moves later. So it also seems quite possible that AlphaGo is actually reading this far ahead, to find those peculiar moves achieve better results than from the more standard approaches.

Whether these constitute a 'new way' or not I think depends highly on whether these kind of moves can fit into some general heuristics useful for considering positions, or whether the ability to make them is limited to intelligence's with extremely high computational power for reading ahead.


> he will later point out to as achieving very good results some 20 moves later

This. It's a fairly common feature of any AI that uses some form of tree search/minimax, and the effect is very pronounced in chess. Even the best human players can only think 6-8 plies into the feature versus ~18 for a computer. What we can (could?) do is apply smarter evaluation functions to the board states resulting from candidate plays and stop considering moves that look problematic earlier in the search (game tree pruning). AI tends to use very simple evaluation functions that can be computed quickly. They do so given that 1) it allows for deeper search, and a weak heuristic evaluated far in the future often beats a strong one evaluated a few plies prior and 2) for some games (like Go) it's really hard to codify the "intuitions" that human players speak of.

Because search based AI considers board states __very__ far in the future, the results are often completely counterintuitive in a game with an established theory of play. Those theories are born of humans, for humans.

The introduction of MCTS some years back was the first leap towards a human level Go AI (incidentally, MCTS is more human-like than exhaustive tree search in that it prunes aggressively by making early judgement calls as to what merits further consideration). AlphaGo's use of deep policy and evaluation networks to score the board is very cool, and the next step in that journey. What's interesting to me is that, unlike chess AI, AlphaGo might actually advance the human theory of Go. It's possible that these "strange moves" will lead to some very interesting insights if DeepMind traces them through the eval and policy networks and manages to back out a more general theory of play.


Wasn't the breakthrough with AlphaGo that it doesn't consider every board combination in the future? Because that there are too many combinations?


Yes, but pruning (not considering everything) is as old as game tree search. Previous Go AIs used MCTS as well. What's new in AlphaGo is a more sophisticated approach to scoring game boards - policy networks that help the AI prune even more aggressively, and a value network that's used to "guess" the winner in lieu of searching to endgame. Note that guessing the winner is just a special case of an evaluation function. For any game, if you could consistently search to the end, your evaluation function is always a -1/1 corresponding to lose/win. AlphaGo is still using MCTS - just a more sophisticated form.


On the contrary.

I think that Chess machines play perfectly for the next 8 moves, but don't necessarily sense the importance of a Knight Outpost (which may have relevance 20 moves ahead. A proper Knight Outpost will remain a fork threat for the rest of the game).

It is far easier for a Human to beat a Chess Machine at positional play (ex: a backwards pawn shape will probably be a problem at endgame, 30+ moves from now) than to beat a Chess Machine at tactical play (3 moves from now, I can force a fork between two minor pieces)


This was true 10-15 years ago. It is no longer true. Chess engines have positional evaluation algorithms that have been trained using many millions of games, and the weighting parameters for different kinds of positional features have been adjusted accordingly.

Do some reading on Stockfish for example if you doubt the veracity of my statement.


Yes, I do realize that.

But its just as you say: its weighting parameters and heuristics. When Stockfish recognizes a backwards pawn, it deducts a point value. When Stockfish recognizes "pawn on 6th row", it adds a point value to that pawn.

But that's a heuristic. A trained heuristic using games, but still comes down to what I understand to be a +/- point value (like... +35 centipawns).

In contrast, a chess engine truly knows that if you do X move, it will force a Rook / Minor piece exchange in 8 moves.

When you play positionally vs Stockfish, you're arguing with a heuristic (a heuristic which has been refined over many cycles of machine learning, but a heuristic nonetheless that comes down to "+/- centipawns") . When you play tactically vs Stockfish, it is evaluating positions more than a dozen moves ahead of what is humanly possible.

When you play against Stockfish in endgame tablebase mode, it plays utterly, and provably, perfectly.

Take a pick of what game you want to play against it. IMO, I'd bet on its positional "weakness" (yes, it is still very strong at positional play, but it is the most "heuristical" part of the engine)


If this is true, why do computers regularly and consistently defeat even the best humans in full games of chess that last dozens of moves?


Because they beat humans at tactical play, like he said. Just because something has a weakness, doesn't mean it isn't better.


It's also possible that the positional evaluation is strong enough that AlphaGo can see the value in a position before the human because of the complexity involved in determining the "value" of a given position.

My experience is with Chess and Chess AI, but in my experience, the more positional knowledge built into the evaluation function, the better the search performs, even if you have to sacrifice some speed for more thorough evaluation. A significant positional weakness may never be discovered within the search horizon of a chess engine because it may take 50 moves for the weakness to create a material loss, so while it's certainly possible that a deep, but carefully pruned search is being utilized, I suspect that some of the Value Network's evaluation is helping to create some of these seemingly odd moves.

For AlphaGo to recognize a position that doesn't achieve a good result for 20 moves, it would often have to search much deeper than those 20 moves (I'm not sure if you're using the term moves to mean ply or both players moving, but if it takes 20 AlphaGo moves for the advantage to materialize, that would be a minimum 40 ply search) to quiesce the search to the point that material exchanges have stopped (again, this is how chess typically does it, I don't know about Go), so the evaluation at the end of the 20 move sequence is arguably more important than a deep search. The sooner you can recognize that a position is good or bad for you, the more time you have to improve the position.


It seems that you are trying to create a new word that describe this new way of looking at the world. If human are able to decode the information contained in those unexpected moves, perhaps by creating a new heuristic, that could be viewed as a way of understanding the features the machine use internally, that is reading the machine brain. If human are able to decode that information creating new heuristics we could say that we are in a new state in IA in which learning among different intelligent species should be studied.


I would imagine it's absolutely thinking that far ahead. That said, it can't possibly search every possible solution, just needs to find an adequate one


There's also the fact that some of the unexpected moves were apparently more about solidifying against a loss than increasing the magnitude of a win. Which has its own kind of eerie implication: since AIs (like all computer programs) do what you say, not what you mean, the "intelligent species" can sometimes work really intelligently towards a goal that wasn't quite what you had in mind. (Gets especially interesting for any AlphaHuman/AlphaCEO/AlphaPresident successors that are given goals more complicated & nuanced than "maximize Go win probability regardless of ending score". BTW, if you haven't already read the Wait But Why series on the future of AI, I recommend it: http://waitbutwhy.com/2015/01/artificial-intelligence-revolu...)


About a year ago I wrote an AI to play the board game "Hive" (shares some similarities with chess). Because I scored all wins equally, it behaved almost exactly like this. It would simply try to minimize my advantage while always keeping open the possibility for it to win, almost like a cat toying with prey. It never actually would make the winning move – however obvious – until it had no other options!

I fixed this behavior by scoring earlier wins higher than later wins. Now it will actually finish games (and win), but almost invariably its edge is very small, no matter how well or poorly I play. Because of the new win scoring, it willingly sacrifices its own advantage if it means securing a win even one turn earlier. (And since scoring is symmetrical, this has the added advantage of working to delay any win it sees for me, thus increasing the possibility of me making a mistake!)

I suppose I could try modifying the scoring rules again, to weight them by positional advantage. A "show off" mode if you like :) And again, with the flip side of working to create the least humiliating losses for itself.


In go, the purpose is to have more territory than the opponent. There is no point in humiliating the opponent by having a big advantage. I think the aim of the strange moves was to increase the confidence of the program in its advance, not to increase the advance.


Sorry, I didn't mean the intent would be to humiliate, just the appearance.

Humans, I think, have the natural instinct to "hedge" themselves in games like go and chess, by creating positional/material advantages now to offset unknowns later. Of course, that advantage becomes useless in the end game, when all that matters is the binary win/lose.

An AI, which may have a deeper/broader view of the game tree than its human opponent (despite evaluating individual position strength in roughly the same manner), may see less of a need to "hedge" now, and instead spend moves creating more of a guaranteed advantage later (as you suggest). And indeed, my experience with my AI is that during the endgame (in which an AI generally knows with certainty the eventual outcome of each of its moves), it tends to retain the smallest advantage possible to win, preferring instead to spend moves to win sooner.


> Humans, I think, have the natural instinct to "hedge" themselves in games like go and chess, by creating positional/material advantages now to offset unknowns later. Of course, that advantage becomes useless in the end game, when all that matters is the binary win/lose.

That's actually an excellent way to win chess games. Keep your eye on the mate while the other person is focusing on position and material.


> I think the aim of the strange moves was to increase the confidence of the program in its advance, not to increase the advance.

Absolutely. Also worth noting that it may be simply unable to distinguish between good and bad moves if both outcomes lead to a win, since it has no conception of the margin of victory being important.

So it might not be that it increased win probability, but that both paths led to 100% win probability and it started playing "stupidly" due to lacking a score-maximizing bias.


But you could indeed humiliate the opponent by actually capturing ALL of his stones. But that won't happen, if the enemy knows at least the basic concepts ... Still, if you play well, you cover much ground - while trying to supress the area of the enemy and even crushing him. But classic go is nice in a way, that it gives weaker opponents a start bonus of some stones - so the game is balanced and domination usually won't happen ...


My brother once played the (then) British Youth Go Champion on a 13x13 board, and lost by around 180 points - literally scoring worse than if he hadn't played at all.


> It never actually would make the winning move – however obvious – until it had no other options!

I'm confused. Why would 'make the winning move' not be the way to maximise probability of winning?


The AI is based on the minimax algorithm [1]. Because of the way Minimax works, the only way for a possible next move to be designated a "win" is if it is a guaranteed win. (The tree is (effectively) fully explored, and the opponent is given the benefit of the doubt in the face of incomplete information.) So, if there are multiple such winning moves, and care is not taken to distinguish the "magnitude" of the win, the AI will choose one arbitrarily.

I suppose that, in Hive, it is more likely that a path to a win is longer rather than shorter. Hence, when my AI was arbitrarily choosing "winning" moves, it statistically chose those that drew the game out.

[1] https://en.wikipedia.org/wiki/Minimax


But once you have guaranteed winning moves, why not pick the shortest one available (in terms of turns)?


Yes, that's what I did after I found the design flaw which effectively threw that information away.


Usually it's because as it searches the move tree, it finds ways for the opponent to maximize their own winning probability and so has to hedge against that. In minimax games sometimes the evaluator finds a long chain of moves that leads to a win, and once it finds that, doesn't necessarily bother trying to find a shorter one. It can be frustrating to tune that out.


That happens if the winning probability of the other move is 1 as well.


maybe he's defining a "winning move" as something with > 50% chance of winning


Thank you for this.

Your post should be required reading in this discussion.

People forget how literal computers are.


> There's also the fact that some of the unexpected moves were apparently more about solidifying against a loss than increasing the magnitude of a win.

Humans play that way too. Everyone wants to maximize the chance of leading by >=1 stone. The difference is that AlphaGo is better at calculating a precise value of a position, so that when uncertainly plays in, AlphaGo can play for, say, "1-3 stone lead", while a human can only get confidence in "1-7 stone lead", and thus needs to play excessively aggressively to overcome the uncertainty.


> the "intelligent species" can sometimes work really intelligently towards a goal that wasn't quite what you had in mind.

That's called programming


Right. Skynet and Terminator are science fiction, but the slippery, unpredictable reality of how computers actually behave is right in front of your eyes as a programmer every day. Sometimes I wonder if science fiction writers do more harm than good: once they make a movie about some possible future, people feel free to dismiss it as "just science fiction", even if they have easily available empirical evidence that something vaguely like the scenarios described actually kinda has the potential to occur.


Not unlike the Simpsons episode where the military school graduation speech tells them the wars of the future will be fought with robots and that their jobs will be to maintain those robots.


thats .. unlikely. this could only happen if two wealthy and highly developed nations nations want to make a spectacle out of a war.

if you have fully autonomous robots which can fight your war, you'd be able to launch a massive offensive within hours. properly mobilizing defenses and responding to that invasion would take too long, as any command centers would've already been wiped out by the first attack.


I wasn't saying it will literally happen exactly as a Simpsons episode predicted, just that it is interestingly relevant for joke from 20 years ago.


I think AlphaGo is playing very natural go! The 5th move shoulder hit that is the subject of so much commentary would fit into the theory of go that players like Takemiya espouse. It has chosen to emphasize influence and speed and has not been afraid to give solid territory early in the games so far. It's very exciting play but not inhuman play, and if professionals are allowed to train with AlphaGo it will surely usher in the next decade's style of play. Don't forget that the game has changed every 10 years for the past 100 years, it should not be surprising that it is continuing to change now!


It didn't look like a Takemiya-style move to me. Takemiya tends to play for a huge moyo in the center. AlphaGo had no such moyo. It wasn't only a strange move; it was also a strange time to play it, and it definitely went against conventional wisdom.


The result of the shoulder hit coordinated with black's bottom formation, and the extension on the 4th line that threatened to cut white's stones off was flexible and could have easily formed an impressive moyo on the bottom. It did not play out that way, but I think that black's strategy was as cosmic as anything Takemiya might have played. His games did not always end with a giant moyo, he was also very flexible. I hope to see written reactions from professional players, and maybe Takemiya will give AlphaGo's style his endorsement :)

Some examples of 5th line early shoulder hits in recent professional play - these situations are not the same as the one seen in today's game, but something like a 5th line shoulder hit is always going to be highly contextual and creative.

http://ps.waltheri.net/database/game/26929/ (move 23) http://ps.waltheri.net/database/game/69545/ (move 22) http://ps.waltheri.net/database/game/71408/ (move 22) http://ps.waltheri.net/database/game/4663/ (move 9)


Those games are really interesting. In the first two, they are both ladder-breakers played by stronger players; my guess is the weaker players set up the ladders assuming that the stronger players wouldn't play a fifth line shoulder hit to break them, and the stronger player didn't back down. In the third game, the fifth line shoulder hits aren't that surprising; they're reductions against frameworks that were allowed to get big in exchange for growing an opposing framework; they're locally bad moves but the global benefits are clear; you'll note that both players play a fifth line shoulder hit.

The only one I can't parse is the last one. There are a lot of variations where I want to know what black's plan is.


Thanks for linking to the examples! That is interesting indeed.


There's an interesting angle to this phrase "intelligent species opening up a new way of looking at the world", which is that we (humans) designed go as a game - a subset of the real world we interact with. Go is "reality" to alphago. The superset of all possible sense data it could have, in principle. Whatever "chunks" AlphaGo uses, if it does use them, all of its policies are built only from subsets of the sense data that is the interactions (self-plays) and inferences from past games. There's nothing outside the game to bring into its decision process. With humans, however, our policies are noisy and are rife with what, for lack of a better term, I would call leaky abstractions.


I think it's more metaphor than leaky abstraction in this case, except to the extent that metaphor is mapping an abstraction of a domain we are trying to understand to an abstraction of one we are better able to understand.


that's an absolutely fascinating way to think about it.


Sometimes optimal solutions don't make sense to the human mind because they're not intuitive.

For instance, I developed a system that used machine learning and linear solver models to spit out a series of actions to take in response to some events. The actions were to be acted on by humans who were experts in the field. In fact, they were the ones from whom we inferred the relevant initial heuristics.

Everyday, I would get a support call from one of the users. They'd be like, 'this output is completely wrong. You have a bug in your code.'

I'd then have to spend several hours walking through each of the actions with them and recording the results. In every case, the machine would produce recommended actions that were optimal. However, they were rarely intuitive.

In the end, it took months of this back and forth until the experts began to trust the machine outputs.

This is the frightening thing about AI - not only can an AI outperform experts, but it often makes decisions that are incomprehensible.


What you said about the expert calling something a bug reminded me of how the commentator in the first game would see a move by alphaGo and say that it was wrong. He did this multiple times for alphaGo but never once questioned the human's move. Yet even with all those "wrong" moves alphaGo won. Didn't watch the second game, so not sure if he kept doing that.


The english-speaking human 9-dan only did this once for AlphaGo yesterday (when AlphaGo made an "overextension" which eventually won the AI the game), but maybe did it approximately 3 or 4 times for Lee (Hmm, that position looks a bit weak. I think AlphaGo will push his advantage here and... oh, look at that. AlphaGo moved here).

Later, he did admit that the "overextension" on the north side of the board was more solid than he originally thought, and called it a good move.

He never explicitly said that a move was "good" or "bad", and always emphasized that as he was talking, his analysis of the game was relatively shallow compared to the players. But in hindsight, whenever he point out an "bad-juju feel" on the part of Lee's move, AlphaGo managed to find a way to attack the position.

Overall, you knew when either player made a good move, because the commentator would stop talking and just stare at the board for minutes, at least until the other commentator (an amateur player) would force a conversation, so that the feed wouldn't be quiet.

The vast, vast majority of the time, the English-speaking 9-dan was predicting the moves of both players, in positions more complicated than I could read. (Oh, but it was obvious both players would move there. There were clearly times when the commentator would veer off into a deep distant conversation with the predicted moves still on the demonstration board, because he KNEW both players were going to play out a sequence of maybe 6 or 7 moves forward).

They really got a world-class commentator on the English live feed. If you got 4 hours to spare, I suggest watching the game live.


Elsewhere in this thread, IvyMike pointed out [1]:

> I sense a change in the announcer's attitude towards AlphaGo. Yesterday there were a few strange moves from AlphaGo that were called mistakes; today, similar moves were called "interesting".

[1] https://news.ycombinator.com/item?id=11257997


The only frightening part of your story is the insecurity of the human experts.


Or, maybe, there could have been bugs in the code.

If I'm an expert in some domain and a computer is telling me to do something completely different ("Trust me--just drive over the river!") I'm certainly going to question the result.


Not really. The alternative is like driving your car into a lake because the GPS told you to.


As a competitive speedcuber (Rubik's Cubes) this makes sense. If I watch a fellow cuber solve a cube, I understand their process even if it's a different method than the one I'd use. But a robot solving it? To my brain it looks like random turns until...oh shit it's finished.


Have you ever managed to learn the human Thistlethwaite algo? It basically lets you solve the cube like a robot would. I'm pretty rusty at cubing nw, but I always wanted to learn it.


I have not. It's just not something I'm very interested in.


> AlphaGo plays some unusual moves that go clearly against any classically trained Go players. Moves that simply don't quite fit into the current theories of Go playing, and the world's top players are struggling to explain what's the purpose/strategy behind them.

Could AlphaGO be winning in a way similar to left handed fencers having an advantage over right handers by wrong footing them rather than simply being better? Would giving Lee more chance to see this style give him a chance to catch up?


I'm not a Go player but play other competitive sports. Humans have a herd mentality...as Op mentioned there's certain styles of playing...which has their own strengths and weaknesses. Sometimes people will not examine other styles that may have better strengths and just focus on the exist one. Then comes along someone who 'thinks outside the box' with a new style and revolutionize the playing field.

Think Bruce Lee and the creation of Jeet Kune Do. Before him everyone concentrated on improving one style by following it classically, rather than just thinking of 'how do I defeat someone'.

IMHO Lee is the best at the current style of Go. AlphaGO is the best at playing Go. Maybe humans can devise a better style and defeat AlphaGo, but I'm sure AlphaGo can adapt easily if another style exists.


Lee isn't even the best human player at the moment, he has a 2-8 loss record against Ke Jie, who's actually ranked number 1 at the moment.

Ke Jie is an arrogant 18 year old and he's been saying on social network in the past couple days how he will defeat AlphaGo.


He seems to have backed off that claim after the second game.


Exponential progress is going to bear down on Ke Jie like a ton of bricks soon.


I've seen this happen with "modern tennis" versus how I was taught to play.


This is interesting. Could you (or someone else whose had this experience) elaborate?


Here are three examples for you.

Swimming. It used to be that swimmers were supposed to be streamlined and avoid bulky muscles. Then a weightlifter decided he wanted to swim. Swimmers today all lift weights.

Programming. It used to be that people built programs in a very top down, heavily planned way. Think waterfall. We now understand that a highly iterative process is more appropriate in most areas of programming.

Expert systems. It used to be that we would develop expert systems (machine translation, competitive games, etc) through building large sets of explicit rules based on what human experts thought would work. Today we start with simple systems, large data sets, and use a variety of machine learning algorithms to let the program figure out its own rules. (One of the giant turning points there was when Google Translate completely demolished all existing translation software.)


Serve-and-volley is pretty much non-existent in modern professional singles tennis. We were always taught to attack the net, and every action was basically laying the groundwork to move forwards and attack.

Nowadays, top players slug it out baseline-to-baseline.

In terms of stance, we were taught to hit from a rotated position where your shoulder faces the net, and a normal vector from your chest points to either the left or right side of the court.

Nowadays, it's much more common to hit from an "open" position, where your body is facing the net, not turned. This would have been considered "unprepared" or poor footwork in my day, but it actually allows for greater reach. It does make it more difficult to hit a hard shot, but that's made up for by racquet technology and generally stronger players.


If you're in the mood for some long form literary tennis journalism about this subject, check out David Foster Wallace's Federer as Religious Experience from 2006.

http://www.nytimes.com/2006/08/20/sports/playmagazine/20fede...

Although it takes a few paragraphs until it gets into the details of "today's power-baseline game."


> AlphaGO is the best at playing Go. Maybe humans can devise a better style and defeat AlphaGo, but I'm sure AlphaGo can adapt easily if another style exists.

Which is a curious point. The gripes about early brute force search algorithms (e.g. Deep Blue?) were that they felt unnature.

However, as the searches get more nuanced and finely grained, is there a point at which a fast machine begins doing fast stupid machine things quickly enough to feel smart?

Are there any chess / Go analogs of the Turing test? Or is a computer players always still recognizable at a high level?


It has been said that a game of Go is a conversation with moves in different areas showing disagreement. The game is also known as 'shou tan' (hand talk). From the commentary, AlphaGO is currently passing the Go Turing Test in almost all cases. There are some moves which some say are uncharacteristic, then later play out well. Or so called mistakes not affecting the outcome of the match. One explanation given was that AlphaGo optimizes for a win, not win by greatest margin, which is a/most valid for human or machine.


Computer players will be recognizable as long as they are designed to win, and not to play the way a human plays.

A Turing test for game players is an interesting idea, it would be useful for designing game players that are good sparring partners rather than brutes that can whipe the floor with you.


Bruce Lee played it very smart and attained a guru status in the West, but there's no evidence he was a world-class fighter, only unsubstantiated claims by his entourage.

As for JKD, people are drawn in by its oriental esotericism, but there's no evidence it is an especially effective fighting style, or that it has something that (kick)boxing does not.


Absolutely! And it doesn't matter in the end...

Remember that AlphaGo has spent months developing its own style and theory of the game in a way that no human has ever seen. Its style is sure to have weaknesses, but humans will have a hard time figuring them out on first sight.

Similarly chess computers do better in some positions than others (they love open tactics!) and one of the games that Kasparov won against Deep Blue he won by playing an extreme anti-silicon style that took advantage of computer weaknesses. However Kasparov didn't have to figure out what that style was because there was a lot of knowledge floating around about how to do that.

Therefore I'd expect that Lee Sedol from a year from now could beat AlphaGo from today. And human Go will improve in general from trying to figure out what AlphaGo has discovered.

However that won't help humans going forward. AlphaGo is not done figuring out the game. At its current rate of improvement, AlphaGo a year from now, running on a single PC, should be able to beat the full distributed version of AlphaGo that is playing today. Now the march of progress is not whether computers can beat professionals. It is going to be how small a computing device can be and still beat the best player in the world.


But when the weaknesses it has require looking 20 ply into the game, can anyone exploit those weaknesses? And furthermore, if the computer itself is able to see 20 ply into the game, then it can spot its own weaknesses and you need to look even further, making the question of whether it's really a weakness.

Weaknesses are only relative to capabilities of the opponent to exploit them. If a tank has a weak spot that rockets can hit, but it's being opposed by humans on horseback, is it really a weakness in that context?


The weaknesses that it has will be of the form that it has wrong opinions about certain kinds of positions. In the case of chess, those weaknesses showed up in closed positions where the program owned the center and large amounts of space. In the case of AlphaGo, the weaknesses will be much more subtle, but will be discoverable and exploitable in time.

Additionally AlphaGo has the advantage that it started with a database of human play, so it has some ideas what kinds of positions humans miscalculate.

As for your tank vs horseback analogy, that's flawed at the moment. AlphaGo is probably reasonably close in strength to the human facing him. Improved human knowledge could tip the balance.

However in the future it will become an apt analogy. Computers are going to become so good that knowing the relative weaknesses in their style of play may reduce the handicap you need against them, but won't give you a chance of becoming even with them. That happened close to 20 years ago in chess, and is now only a question of time in Go.


I wonder if AlphaGo has some specialized knowledge to handle ladders, where stones can have an effect at a distance that might only come into play after 20 moves.


>I wonder if AlphaGo has some specialized knowledge to handle ladders

Yes. A representation of ladders is among the input features of its neural networks.

https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf

Stone colour 3 Player stone / opponent stone / empty

Ones 1 A constant plane filled with 1

Turns since 8 How many turns since a move was played

Liberties 8 Number of liberties (empty adjacent points)

Capture size 8 How many opponent stones would be captured

Self-atari size 8 How many of own stones would be captured

Liberties after move 8 Number of liberties after this move is played

Ladder capture 1 Whether a move at this point is a successful ladder capture

Ladder escape 1 Whether a move at this point is a successful ladder escape

Sensibleness 1 Whether a move is legal and does not fill its own eyes

Zeros 1 A constant plane filled with 0

Player color 1 Whether current player is black

(The number is how many 19x19 planes the feature consists of.)


20 moves may sound like a huge number of variations but when you prune things early, it can be quite manageable. The alphabeta algorithm in chess does pruning quite a lot.


I would also posit that lefties' advantage basically disappears once you get to a certain level in fencing. Past some point, it's basically all just footwork anyways, and your orientation doesn't change the distance of your target (foil and saber at least, can't comment on epee as they seem to just kinda bounce in place a lot even at Olympic level).


Can confirm. My brother fenced at a club that had a lot of lefties. All the righties got used to it quickly, and had no real disadvantage when playing against lefties.

I could easily see the difference in tournaments with other clubs that were not used to left handed players.


Seems unlikely. Training was partly from human games, and partly from self play; if there's some new, off book heuristics at play, there's no way to know that humans would respond poorly to them. Though I suppose it's possible it would notice that humans do poorly on simply off book moves generally.


Why does this seem unlikely? Humans do poorly with "off book" moves in general in sports and other games; it's why new styles of play or management work really well until others get used to them. Why would it be unlikely in Go?


I think an important point was brought up by the Google engineer in the beginning of the game: Humans usually consider moves that put them ahead by a greater margin and base their strategies on that, while computers don't have that bias.


Building on that, I suspect that if AlphaGo thinks it has a 100% chance of winning with any of several moves, it has no way of distinguishing between them and chooses effectively at random. The longer that goes on - and once it hits 100% chance of winning, it will be that way for the rest of the game - the more chances it has to pick bad moves. As long as the move isn't bad enough to ruin its 100% chance of winning, it can't tell the difference between that and a good move.

(This also applies without a 100% chance of winning, as long as its chances of winning hover near the highest percent it's able to distinguish.)


I doubt the value network ever outputs a literal 100% chance of winning, it would at most be a lot of nines.

Even if it did output an actual 100% chance, AlphaGo would still end up picking moves favored by the policy network, so it would probably just revert to playing like it predicts a human pro would.


Once it gets to enough nines, its monte carlo trees will run out of sample resolution. If it can resolve to three nines, then a 99.93% win branch has a 70% chance of being reported as 99.9% and a 30% chance of being reported as 100%. When all the branches here get rolled up, they report some average around 99.93% but not necessarily exactly it. This propagates upwards in the tree, adding more meaningless digits. Adding the evaluation network in increases the number of decimals, but doesn't really change the effect.

It's similar to how ray tracing renderers start to return weird speckle patterns when the room is dark enough.

And the policy network chooses branches to investigate, not which one to choose. It adds sample resolution to places pros might play, but doesn't add to the estimated probability of winning.

Edit: Actually, since places pros might play have higher sample resolution, they're less random. So worse moves get worse evaluation, and a higher chance of leading the pack. This might actually bias AlphaGo to play some pretty bad moves - but, again, this is all assuming it's going to win anyway.


Was that in the official livestream, or is there an interview somewhere, where things like these are discussed?



> I've been giving it some thought. When I was learning to play Go as a teenager in China, I followed a fairly standard, classical learning path. First I learned the rules, then progressively I learn the more abstract theories and tactics. Many of these theories, as I see them now, draw analogies from the physical world, and are used as tools to hide the underlying complexity (chunking), and enable the players to think at a higher level.

The excellent point you're making applies in general to nearly every type of human thinking.

The way we think about other people, our intuitions about probabilities, our predictions about politics, and so on -- all are based on our peculiarly effective, yet woefully approximate, analogy based reasoning.

It shouldn't be surprising in the least when commonly accepted "expert" heuristics are proved wrong by AIs that actually search the space of possibilities with orders of magnitude more depth than we can. What's surprising -- and I think still a mystery -- is how human heuristics are able to perform so well to begin with.

I'm not a Go player, but I saw this same phenomenon as poker bots have surpassed humans in ability. As with AlphaGo, they make plays that fly in the face of years of "expert" wisdom. Of course, as with any revolutionary thinking, some of the new strategies are "obvious" in hindsight, and experts now use them. Others seem to require the computational precision of a computer to be effective in practice, and so can't be co-opted. That is, we can't extract a new human-compatible "heuristic" from them -- the complexity is just irreducible.


> all are based on our peculiarly effective

They are peculiarly effective only because of lack of comparison. Humans have been the most intelligent species on this planet for millennia, where no other species come even close. We don't know how ineffective those strategies are seen by a more advanced species. Well, until now.


This is a good point. I was coming from the point of view that we've had powerful computers for a while, and yet humans were still dominating them, at least until recently, in games like Go, poker, and many visual and language tasks.

Of course, the counterpoint could be that it's only the case because humans, with their laughable reasoning abilities, are the ones programming those computers.


It was not a good point.

AlphaGo can’t decide that it’s bored and go skydiving. Humans aren’t merely capable of playing Go. And when they do it, they can also pace around the table, and drink something, all at the same time, on a ridiculously low energy budget. Or they can decide never to learn Go in the first place but to master an equally difficult other discipline. They continuously decide what out of all of this to do at any given moment.

AlphaGo was built by humans, for a purpose selected by humans, out of algorithms designed by humans. It is not a more advanced species. It’s not even a general intelligence.

Your own original point was much better than the one made in response.


I want to thank you for this comment. It's this kind of subtle, low-key, informed speculation that generates good, hard sci-fi concepts, which are absolutely relevant to my WIP novel.

"oh what if the machine suddenly came alive!?" has been done 1000 times. But such concepts like: a computer can detect and act patterns which we cannot, in ways that are almost, if not possibly intelligence, are magnitudes more believable, and therefore, compelling.

Thanks! :-)


Is it about a tyrannical super-AI that maintains power over the human race by strategically releasing butterflies into the wild at specific times and locations?


Actually, it's a Soviet knock-off of a PDP-10. Constructed in 1970's India, the machine has a 12mhz clock rate, a 4M of RAM, and a directive to bring about "World Peace".

Of course, those fools underestimated it. They should have known better...


Why would it bother if it can just convince people to do the thing it wants done by talking to them?


It is now.


AlphaGo is essentially built on the work that IBM did on TD-Gammon (a reinforcement learning backgammon player) in the 90s.

Pretty much the same thing happened with TD-Gammon with it playing unconventional moves, in the longer term humans ended up adopting some of TD-Gammon's tactics once they understood how they played out, it wouldn't be surprising to see the same happen with Go.


From my understanding, computers have also had this affect on chess. The play styles of younger champions has evolved to the point where unpredictability is actually part of the strategy. I'm not a chess expert by any means, but this quote by Viswanathan Anand (former World Chess Champion) describes it.

  “Top competitors who once relied on particular styles of play are now forced to mix up their strategies, for fear that powerful analysis engines will be used to reveal fatal weaknesses in favoured openings....Anything unusual that you can produce has quadruple, quintuple the value, precisely because your opponent is likely to do the predictable stuff, which is on a computer” [1]
[1] http://www.businessinsider.com/anand-on-how-computers-have-c...


>powerful analysis engines will be used to reveal fatal weaknesses in favoured openings...

Anand isn't really talking about strategy here, he's just talking about choice of opening. Players with narrow opening repertoires, like Fischer, have always been easier to prepare for than players who play a wide variety of openings.

As far as actual changes to strategy, the most obvious one is that computers tend to value material more highly than humans. So a computer will take a risky pawn if it looks sound, while a human will see that taking the pawn is very complicated and prefer a simpler move.


Computers and the internet have changed chess in several ways:

(1) Online game databases have made it easier for players to track developments in opening theory and prepare to play specific opponents

(2) Chess engines add to this be used to search for antidotes to complicated opening systems

(3) Young players have greater access to high-quality sparring partners - either engines or fellow humans on online servers.

This has lead to the best players becoming younger, and players playing more varied and less 'sharp' openings.


Reading the paper, it doesn't at all sound like AlphaGo uses anything that TD-Gammon used.

It uses MCTS, which is unlike minimax. It doesn't use temporal difference learning, although they say that the policy somewhat resembles TD.

That doesn't sound like 'essentially built on', its sounds maybe like 'slightly influenced by'


You're missing the forest for the trees.

Tesauro's work on TD-Gammon was pioneering at the high level, i.e. combining reinforcement learning + self-play + neural networks.


> AlphaGo is essentially built on the work that IBM did on TD-Gammon (a reinforcement learning backgammon player) in the 90s.

Citation needed.


And you'll find it in the AlphaGo paper. It's not a contentious claim.


Citation still needed.


He just gave you a citation. "The AlphaGO paper".


This one, I assume? http://www.nature.com/nature/journal/v529/n7587/full/nature1...

Looks like citation 46 is the relevant one here.


I wonder if this is similar to how musket battles were fought in the american civil war era, with soldiers lining up across each other in a battlefield and taking turns shooting at each other. I hear they did this because the rifles were very inaccurate so it made sense to use a bunch of them at the same time as an area-effect weapon, in effect like a gigantic shotgun.

Until someone got better weapons and suddenly the "rules" of the battlefield that dictated standing in lines across each other made no sense to follow anymore because the original principles that dictated those rules to be good were not valid anymore.


I like your statement: "It's both exciting and eerie. It's like another intelligent species opening up a new way of looking at the world (at least for this very specific domain). and much to our surprise, it's a new way that's more powerful than ours."

I think this will the theme of our future interactions with AIs. We simply can't imagine in advance how they will see and interact with the world. There will be many surprises.


That quote reminds me of "The Two Faces of Tomorrow" by James P. Hogan. One of the subplots is that humans can communicate because of shared experience. We all must eat, sleep, breathe, seek shelter, etc. Communication with an alien or artificial intelligence may be difficult or even impossible without this shared framework.


>It's like another intelligent species opening up a new way of looking at the world (at least for this very specific domain). and much to our surprise, it's a new way that's more powerful than ours.

It's not like this at all; let's not do this sort of thing. Humans are inveterate myth makers (viz. your description of how people conceive the Go board as army units), and our impositions on the world are easily confused for reality.

In this case, there's no "intelligent species" at work other than humans. We made this, and it is not an intelligence, it is a series of mathematical optimization functions. We have been doing this for decades, and these systems, while sophisticated, are mathematical toys that we have applied. We built and trained this thing to do exactly this.

As a student of AI you know that convolutional neural networks are black boxes and are hard to interpret. A different choice of machine would have yielded more insight about how it is operating (for example, decision trees are easier to interpret). The inscrutability of the system is not a product of its complexity; even a simple neural network is hard to understand.

This, actually, is my primary objection to using CNNs as the basic unit of machine learning - they don't help US learn, they require us to put our faith in machines that are trained to operate in ways that are resistant to inspection. In the future I hope that this research will move more towards models that provide interpretable results, so they ARE actually a tool for improved understanding.


> We made this, and it is not an intelligence, it is a series of mathematical optimization functions

You can say the same about your mind too which is a bunch of optimization nodes. If something is intelligent, does it matter if it's evolved in nature or created by a species who is evolved in nature?

> In the future I hope that this research will move more towards models that provide interpretable results I think it's not really possible to understand in detail how these networks operate on the level of nodes, because emergent behavior is necessarily more complex than the sum of its parts.


It's a bit precious I think to say that a human is a "bunch of optimization nodes". I can write code to create a CNN, and I can draw a graph of how it operates on a piece of paper. We can't even decode a few million rat neurons the same way.

A CNN is a pure mathematical function - if you want, you could write it down that way. Given a set of inputs, it will always produce the same output. We don't call a linear regression model an "intelligence", a CNN is no different.

Of course I agree that humans are built up of billions of tiny machines like this, but let's appreciate the vast difference in scale.


My exaggeration was intentional to point out that if you scale up NN based systems, we are not that different :) I do appreciate it, but let's not forget that we have finite nodes, so at one point a machine can surpass us with "just mathematical functions".

> A CNN is a pure mathematical function That's their basic property, but who are we to say that our cell based neural network is superior? Cells are just compositions of atoms and they are defined by quantum mechanics, which is... "just" math and information.

I also think that Go might be a great communication tool between AI and humans. If you look at the commentary from this angle if's fun to think about like this.


As a follow up to your idea, we should explore two paths: first create the most powerful AI, second create subsystems devised to be interpretable. The powerful method could be used to train the interpretable method, that is we need an interpreter to translate from machine AI to human AI, and interpretable systems provide a middle ground.


I think training one function to approximate another function wouldn't help much; we'd inevitably lose the subtleties of the higher-order function and any insights that come with it. If we could train a decision tree to do what a CNN does and then interpret the outcome, why not use decision trees in the first place?

I think the answer must be in figuring out how to decompose the black box of a CNN - it is, after all, just a set of simple algebraic operations at work, and we should be able to get something out of inspection.

I have to imagine Hinton et al. have done work in this regard, but this is far afield for me, so if it exists I don't know it.


Having a machine that gives you feedback in the middle of the game perhaps could be used to describe what is the weak point of a decision tree, and in which situations the method is good. It could detect some situations in which decision trees are good, then use that decision tree to understand what is happening and with that new understanding devise a new method in the middle. We could train a decision tree using new very powerful information about the value of the game in the middle of the game, that is new and powerful.


> they don't help US learn, they require us to put our faith in machines that are trained to operate in ways that are resistant to inspection.

Human intuition and to certain extent, creativity are like this as well.


The same thing happened in chess. Computers play in a very "computerish" way that was initially mocked, but became hugely influential on how humans play chess. Computer analysis opened up new approaches to the game.

http://www.nybooks.com/articles/2010/02/11/the-chess-master-...


> It's like another intelligent species opening up a new way of looking at the world.

And this is just the beginning with AlphaGo. As we keep on training Deep Learning systems for other domains, we'll realise how differently they approach problems and solve them. It'll, in turn, help us in adapting these different perspectives and applying them to solve other problems as well.


> It's like another intelligent species opening up a new way of looking at the world

.. that we'll be probably unable to comprehend ourselves.


I believe that when Google talked last year about DeepMind playing those 70's Atari games, it also surprised the team with some of the tricks that it learned to be more effective in the game. So this is quite interesting stuff.


The analogy I can come up with, based on your post, is of something like addition. We don't know how we add numbers in our heads; but we somehow do it. Some people can do it very, very quickly[1], but won't be able to explain how they did it. On the other hand: a computer doesn't look at digits and numbers; it just looks at bits and shifts them around as appropriate.

[1] https://en.wikipedia.org/wiki/Shakuntala_Devi


Abstraction is the domain we need to research before we understand intelligence in general, the ways our abstraction is determined by nature and more importantly the ways that will become possible when we surpass it.


Can you give an example of an "unusual" move? I'm a (very) novice Go player, and I think it'd be really interesting to see some specific commentary on how the machine is playing the game.


Your metaphor about army units has got me thinking: When are we going to see the next generation of AlphaGo, but applied to a real world army?


so, is it not possible to get the log of its thinking and take a look at why it took certain step later?!


It might look something like attention detailed in Show, Attend, Tell: http://arxiv.org/abs/1502.03044

Which attempts to visualize machine areas of attention that look like: http://www.wildml.com/wp-content/uploads/2015/12/Screen-Shot...


A great breakthrough could be to decode the information contained in the feature space of the nn or the rnn. A topological language in which shapes and chains are explained by analogies with real world situations and actions. Being able to share our vision and communicate our intentions (the weight given to the distinct features and the links among the several layers of the nn - the overall plan) should transform the concept of AI into one of CAI communication between intelligent agents to create a synergistic approach).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: