Hacker News new | past | comments | ask | show | jobs | submit login
Google reveals secret test of AI bot to beat top Go players (nature.com)
495 points by cpeterso on Jan 4, 2017 | hide | past | web | favorite | 205 comments

One thing that isn't made clear in this writeup is that Master plays in a very nonhuman style, as opposed to the version of AlphaGo that beat Lee Sedol, which mostly played like a strong human except for a few surprising moves. My first guess when I saw Master's games was that it was a program like AlphaGo that had its policy network trained from scratch rather than being bootstrapped by being given the goal of imitating the moves of strong humans. I'm eager to find out whether that was the case or whether it just moved away from human-like play during a very long self-play phase.

It really is a bit scary to see. I would not have guessed that human strategy was that deficient. Computer chess programs still tend to play with human-like strategy (partially because humans have coded their evaluation functions!) but godlike tactics. Master is not really playing like a human at all.

As computers are able to evaluate positions faster (and therefore deeper), the "godlike" tactics are dominating over human-style strategy. It used to be that computers played "computer-like" moves because they didn't understand the position. Now, they play computer-like moves because "understanding" the position isn't as important as just being able to see 25+ moves ahead.

In a nutshell, positional play in chess is simply heuristics we humans use to be able to evaluate a position in lieu of being able to calculate deep non-forced lines. Computers do use this to an extent (as you point out, we coded their evaluation functions) but positional play matters less when you see all the outcomes of every possible tactic with 100% accuracy. So computers tend to play reasonably human-like in the openings, but by the time you reach the middle game they'll happily enter lines where their pawn structures are shattered, pieces appear superficially to have little coordination, and where their king safety appears compromised (all things humans rarely intentionally do), all because they've seen that it works out 25+ moves in advance.

they'll happily enter lines where their pawn structures are shattered...

Makes you wonder if computers will essentially ruin all the novelty of innovating and discovering new tricks/strategies in age-old games. No more will genius' have their moments of brilliance and improvisation that lead to a stunning victory or upset. No more will commentators debate and regal over phenomenal victories in ages past.

With a computer, it's just the optimal play strategy and you were just playing a god who knew the quickest path to terminating your king. When a human runs with non-standard play strategies in almost any game, it's considered brilliant, innovative, game of the century, etc...

Probably. I think it basically proves that any rule structured game without randomness is solvable.

Games like go and chess are necessarily solveable by definition, right? If you mean solveable in our lifetimes, by current technology, that's something else entirely, and much less likely.

Beating human players in go is like finding a good heuristic for an NP-complete problem. Solving 19*19 go is like proving P!=NP, i.e. we don't even have tools that can approach the problem.

Not sure if this is true for Go. I'm not sure, but it seems to me that at some point a problem becomes "too complex" to solve. I.e. Go has about 2.08168199382×10^170 legal positions [1] and the number of atoms in the observable universe is only up to ~4×10^81.

I don't know if there are some theorems, thoughts, philosophies about whether this means it can't be solved, but at least it must be extremely difficult.

[1] https://en.wikipedia.org/wiki/Go_and_mathematics#Legal_posit...

In a theoretical sense, Go is certainly solvable, as we can easily construct a Turing machine that plays out every possible move, and provably terminates (given an appropriate ruleset). Practically, the universe will certainly be expected to terminate first.

There may be other ways to solve the game, but we don't know what they are. Because we know it is theoretically solvable, we cannot rule out a practical approach to solving it by some mathematical magic even if we have no idea what that would look like.

By analogy, we can prove many things about infinitely many integers by mathematical induction, but if we didn't have that technique, such proofs might seem impossible.

While mind-boggling, I'm not sure it's generally meaningful to compare counts of actual things (even atoms) with counts of possible arrangements. The latter are in a world of their own[1].

[1] https://en.m.wikipedia.org/wiki/Orders_of_magnitude_(numbers...

There is a trivial disproof of your claim.

Two players write a turing machine with at most n states with two symbols. The player that produced a terminating turing machine that produces the most 1 symbols on the tape before terminating wins.

The optimal strategy for this game is producing a busy beaver, a feat shown not to be computable.

this comment im making mainly me thinking out loud. It is not insightful:

I was thinking that for a fixed n that doesn't really work, because there are only finitely many options, but I guess if n>~2000 , ZFC cannot show the winning strategy to be the winning strategy? Is that what you meant?

Given any two machines which halt, finding the one that ends with more ones is computable. Assuming at least one of the two machines halts, which one wins can be computed in the limit? By which I mean, if the process is allowed to have a "who is currently winning" (the one that already halted if only one has, the one that halted later if both have, or neither if both haven't) thing, and the limit of that is whatever it eventually never switches from. I guess that works even if neither halt.

Uh... I'm just saying stuff you already know to try to think through it myself.

Edit: I guess the question is then, what exactly do we mean by solvable?

Do we mean that there is an algorithm that outputs an optimal move on every turn? For any n, there is such an algorithm. The one that has the correct move hard-coded. Maybe we mean that there is an algorithm that probably always outputs an optimal move? In this case, well, I suppose it depends on the axiom system. Hm.

Any rule structured game with fixed finite bounds is solvable.

But no one could play because you can't create a Turing machine that provably halts, no?

You can prove that any given Turing machine halts or does not halt, but there's no single algorithm that can prove that for every Turing machine.

The trivial case of a Turing machine that can be proven to halt is one with only one state: halted.

The trivial disproof still doesn't disprove anything about go and/or chess. Both chess and go have rules preventing repetition of moves (three fold repetition, and rule 8, respectively), and have a limited pool of possible future states. Therefore, there is no game of go or chess that does not halt. Thus, "Games like go and chess are necessarily solvable by definition, right?"

Yes but this case requires Turing machines that produce very very large (BB large) outputs and then halt. Is that provable for a specific machine?

Any specific machine can be proved to terminate or not terminate - albeit not always in ZFC. For instance, BB(10) can be computed, just as BB(10000) can, just that the latter cannot be computed in ZFC.

Probably still solvable with randomness. Add a few more layers of convolution networks.

I believe you nailed it yes. Notably, computer games do look reasonably humanlike when they're even,for example when playing another equally strong computer. But once they can outcalculate the opponent this disappears.

Compare this to lines that the humans have memorized after deep analysis, like poisoned pawn Najdorf. The play there certainly doesn't look humanlike either.

I would not describe the moves dfan is talking about as tactical. Rather, they appear to exhibit a very different positional judgment from what humans use/different strategic aims.

What you say sounds plausible, but I do not believe it is backed up by any analysis of the games.

Positional play is nothing more than our human attempt to perform short-circuit evaluation of positions by using heuristics that are easier than calculating non-forced lines precisely to 20+ ply. If you imagine a perfect chess-playing computer, it would have no need of any sort of positional evaluation — literally every move it chooses would be based upon the pure tactical outcome of having evaluated every subsequent move in advance.

When playing against humans, modern engines can accurately evaluate so much deeper than we can that their play is frequently indistinguishable from the purely tactical play of some such perfect computer (at least, once out of the opening, once the number of "good" moves is constrained a bit allowing computers to evaluate to significantly deeper ply).

You say that computers exhibit a different positional judgment, which is true in the "technically correct" sense (their evaluation function is pretty much the literal definition of their positional judgment) but at depths of 25 ply, 30 ply, or even greater, the simple truth is that they are highly willing to enter lines where their king is exposed, they have pawns doubled, they trade away a good bishop and keep their bad bishop, or all of the above (all what we'd consider anti-positional play) just because they can see a concrete outcome (a won piece, a strong attack, etc.) that we can't.

So yeah, by some definition they play positionally. But it's really by a definition that's only perhaps useful to other computers and not us humans; to us, it's significantly closer in practice to what we consider highly tactical play.

dfan is talking about go. What you're saying is accurate about chess, but go programs aren't the same as chess programs.

> Now, they play computer-like moves because "understanding" the position isn't as important as just being able to see 25+ moves ahead.

I don't think you can "see 25+ moves ahead" in Go. The branchout factor is just too big.

Well, stouset is talking about chess, but in Go, Monte Carlo Tree Search plays out all the way to the end of the game; it just does so in a much less exhaustive way, for exactly the reason you mention.

One of the nice things about Go is that branches rejoin quite a bit, mitigating some of the branching factor concerns.

You can prune more effectively. Download a program like Leela and let it analyze some positions. You'll get very deep mainlines just like in chess.

My comment was in a subthread about computers playing chess in a more human-like manner.

if we had unlimited computing power, couldn't we solve any know problem in a way that will ALWAYS outbeat any other way to solve it?

I would call its style unorthodox rather than nonhuman. It still plays common josekis (standard opening sequences) but often chooses uncommon variations. Its mid game is full of startling moves backed by VERY good reading. There's definitely still discernible strategy that us mortals can learn from.

If I recall correctly, the version that beat Lee Sedol was trained on amateur games plus self-play. My guess would be that this new version relies more heavily on pro games.

> My guess would be that this new version relies more heavily on pro games.

Unlikely, since AlphaGo can now generate large numbers of "pro quality" games from scratch. I think it's far more likely it is an autodidact at this point.

They solved heads up poker in this manner recently. They claim that the chances anyone can beat this computer in the long run are now infinitesimal.


Still a good way to go to beat no limit holdem I'd assume.

A group from CMU appears to have solved no-limit heads-up hold-em. It's only a matter of time (and compute power) for a full ring game.

No-limit is far more difficult than limit due to the risk of catastrophic failure. A Nash equilibrium robot won't make any money. A robot must identify a weakness in you, then deviate from equilibrium to exploit your weakness. So long as you're playing deep stack, you could simply play the Bertrand Russel chicken story (echoing David Hume): The farmer feeds it every day, so the chicken assumes that this will continue indefinitely. One day, though, the chicken has its neck wrung and is killed. It's the "maniac" style. Pretend to be an idiot that plays too many hands. Don't lose your shirt. The robot will learn that you're always bluffing. Eventually you have the nuts and you take everything.

If you have a deep stack you can bluff, but your chances of winning aren't high if you don't have the nuts after all. You can only lose so many times before it becomes the martingale strategy.

This especially doesn't work against multiple opponents.

Well, I said you should be careful :-)

That's a good strategy against a bad robot, not the latest batch.

The version that beat Lee Sedol was trained on pro games.

Their Nature paper says "We trained the policy network p_sigma to classify positions according to expert moves played in the KGS data set. This data set contains 29.4 million positions from 160,000 games played by KGS 6 to 9 dan human players; 35.4% of the games are handicap games."

It is possible that they fed it some pro games after the Fan Hui games but before the Lee Sedol games, but that would be weird; at that point it was already learning from self-play rather than trying to match human moves.

That said, I don't think that Master's better performance comes from being trained on pro games. The AlphaGo version that played Lee Sedol played much more like a human pro than Master does.

> games played by KGS 6 to 9 dan human players

I'm confused. I thought 9-dan players were considered pro? That's the highest ranking you can get, right?

There are multiple dan scales. The KGS scale is an amateur dan scale. I don't know how much the scales overlap generally, but I'd imagine a 9 pro-dan professional to be somewhere around 12 dan on amateur scale (pro scales also have more dense scaling). However, the scales reach the ceiling at 9 dan by convention.

Even the abbreviations differ: 9d (amateur dan) vs 9p (pro dan).

KGS 9 dan players are pros or amateurs that are professional level like former insei. The highest rank is almost 11d (it still says 9d, but the graph goes even higher):


There's an Elo rating table on this Wikipedia page which pretty much corroborates/agrees with this:


I think those servers have their own ranking system that does not match the "official" rankings (which I think cap amateurs at some lower dan rank)

This is correct. A players rank will typically differ between both server and go association. Sensei's Library holds more information: http://senseis.xmp.net/?RankWorldwideComparison

Got a source? I can only find references to training on amateur games (e.g. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol#Alpha...)

No, but I'm pretty sure the parent post is right. Pro games were included.

Edit: I'm not actually that sure. I'm asking around right now (with go players, not DeepMind people).

> Its mid game is full of startling moves backed by VERY good reading.

This is pretty similar to what chess engines do.

Perfect play is likely inhumanly aggressive on blacks part part and white making zero moves. Compared to that this is very human style of gameplay simply based on a different strategy culture as it where.

I don't see why perfect black play should be any more aggressive than perfect white play. Care to elaborate?

Black moves first, on a 3x3 and 5x5 both end up 100% black with any white piece being captured. Many other board shapes don't but Go is played on a 19 x 19 board. We don't know about 9x9 or even 7x7 so the pattern is hardly set in stone. Still it seems likely.

Now, with a perfect white play there may be moves an imperfect black player makes which causes white to attack. But, perfect play on both sides probably means any white stone gets captured so white plays zero stones.

> Black moves first, on a 3x3 and 5x5 both end up 100% black with any white piece being captured.

That is true. The same, however, doesn't hold for larger boards (such as 19x19).

> But, perfect play on both sides probably means any white stone gets captured so white plays zero stones.

For boards larger than the small boards you mentioned above, this is completely untrue.

I don't know much about Go but I thought a comment about AlphaGo was interesting that it played in a way to marginally beat the player, which was different that most masters played, which is to clearly beat the opponent by as wide a margin as possible.

Is this accurate? Does MasterP also use this style? Are there humans that can play this way?

(I'm asking you because you seem to know what you are talking about here.)

Thats not true. Go is a game where the safety of the lead is more important than the amount at all times, and its taken into account strategically all the time.

What alphago showed however, is that once in the lead it made mistakes: small mistakes that didnt jeopardize the game, but lost the lead. If all paths lead to rome, it doesnt matter which is shorter. Humans however, always think of the best after safety.

What was terribly cruel to see as a Go player was the computer playing poorly early on: showing that it already knew it was going to win.

Modern chess engines take a lot of effort to hide this problem. For example, an engine might know from a table that it can win a king+rook vs king+pawn endgame, and then throws away its second rook to reach such a position.

Humans using the engine don't like this however, and so the authors build in heuristics to make it play more "humanly". Things like using scores for positions, rather than "chance of winning" is also largely for the sake of the users.

Humans play for a large lead because they don't have enough memory/power to accurately estimate the value of their positions, so they play for a buffer -- AlphaGo has higher confidence in its valuation, so it can play it closer -- ~85% confidence of winning by 5 stones (with room for error) vs 99% chance of winning by 2 stones

I believe that accuracy/"self-confidence" is part of it. However, I think it's also the case that AlphaGo has a monte carlo tree search in addition to the neural net, so it sometimes plays more conservatively than it needs to because it overweights obscure possibilities ("defending here is not necessary, but by doing so, I prevent some number of playouts where I play a dumb move and lose, and I can still win even if I defend").

Humans do the same thing, playing conservatively in a situation where they're far enough ahead. The difference is that a human sometimes looks at a move and says "This move works, and gains points. There is no risk." For a bot using MCTS, everything is a probability.

Another factor is that many micro-endgame sequences simply have a "correct answer" that loses the least points. Any human who's played Go for more than a few months knows these sequences, and if they choose to answer a certain move, will prefer the answer which is "always correct" to another move which would also win the game. This naturally leads to the winning player preserving their margin even when they could throw it away, while the machine has no such bias, and will just as happily throw away the margin as preserve it.

I think this is somewhat incorrect- The creators of AlphaGo made it clear that their system does not take the opponent into account at all, it just answers the question "What is the strongest move right now?" and plays that move, without taking the opponent into account. In other words, it does not have any mental model of the opponent.

However, you are correct insofar that it doesn't care about winning by large margins, it prefers winning by smaller margins if it can achieve this with a higher win probability.

A perhaps more accurate way to say it is that AlphaGo's models its opponents as a copy of itself.

So do human players, with rare exceptional ocassions

Playing trap moves is not that rare and it shows that you expect the opponent not to know a complicated variation.

Strong players dont play "trap moves"

Who said anything about strong players?

> AlphaGo

human players are not alphago

The context of this thread is AlphaGo. Why would we bother discussing what bad human players do when the topic is clearly about play at the highest levels?

The highest level players play special variations against each other hoping that the other player doesn't know it. They CLEARLY play against what they think the other player knows, not what they know (since they have studied this variation on purpose to prepare)

As a semi-pro player I can assure you thats not how you play Go.

Interesting. From chess it seems that, for important matches, players will deeply study each other's games and try to get the other player into positions that they may be less used to playing and less comfortable with.

Is this not done in go?

Its more important to smoothen out your own weaknesses than looking for your opponents. A weakness in your style of play is going to make you lose way more many games than your ability to find some weaknesses in some other player.

To become a pro you have to go through insane levels of competitions, you need to be strong, not find a weakness in the 100's of players you will face to just have a shot to become the lowest level of professional.

That's just appeal to authority. Lee Sedol's move against AlphaGo in game 4 could be considered a trap because it actually didn't work, but it was complicated enough to trick AlphaGo.

Appeal to authority is only deductively invalid, not inductively invalid.

But it's not even a good authority, since strong amateurs (semi-pros) have many opponents (amateur tournaments are played in the same day or weekend and have many matches), while top players prepare specifically for one opponent in tournament finals (each match played on a different day).

You never study your up coming opponents previous games?

As a pro you study everyones games as a mean to get all the latest information possible, but there isn't really much you can do as a top pro to play against your opponents likings: the era were that could bear fruits ended decades ago.

For example, US Congress has a lot of matches in a day so you don't even know who you're going to face. Only top players study the games of their opponents because they have time to prepare in finals of big tournaments.

All competitive-game-playing AIs ask, "What will my opponent play in response to this move?" It's possible for an AI to evaluate a move based solely on the resulting board position, but it wouldn't be very good. Pretty much all AIs play many turns out to see if the move is any good. In the case of AlphaGo and Monte Carlo tree search, they actually play to the end of the game many times. To do this, they must of course play moves for each player.

Ah but I think the key here is that it doesn't say "how will this player respond" but "how would a player respond".

No matter how I've played against it to get to where we are, it'll play the same from that point on. It won't identify me as a risky player from my pattern, nor will it try and classify me as "unpredictable" in some way. It'll play each move as though it has sat down at an already in-progress game between two random opponents.

> It's possible for an AI to evaluate a move based solely on the resulting board position, but it wouldn't be very good. Pretty much all AIs play many turns out to see if the move is any good.

I would strongly argue that these are identical situations. Playing out scenarios in your head but taking into account no history is the same as "evaluating a move based solely on the board position".

All players play to marginally beat their opponent. Playing too aggressively when you're in the lead is "losing a won game" because it gives opportunities for your opponent to create turnarounds.

In the end of a well-played game the losing player will find that they perhaps get more points than they expect, but each time they win a tradeoff with a slight margin the position on the board becomes much more solid and fixed than it ought to be. A 10 point lead with a large variance consolidates toward an unfaltering 1/2 point lead.

Indeed, commentary on Master (P)s games seemed to suggest this was exactly how endgames went.

Early in the game, confidence in winning is usually correlated exactly with point margin... but at the same time, score in the early parts of the game is very hard to estimate. It's repeatedly noted that AlphaGo plays a very "influence oriented" game which means that it eschews confidently having many points for having lots of "power" on the board which will later translate to points.

So, all together I'd say that AlphaGo plays very well here and doesn't have too much of an obvious computational bias.

The one true oddity is once AlphaGo's internal win probability reaches 100% it starts playing idiotic moves. The reason is simple: it's just searching for moves which have the highest expected win rate and at this point nothing it can do is bad. It won't lose points or anything, but instead just plays moves that are obviously pointless.

> had its policy network trained from scratch rather than being bootstrapped by being given the goal of imitating the moves of strong humans

I just watched a couple of games, and I can't agree. Master's fuseki looks a lot like human fuseki. It plays some original, unusual, confusing, non-human looking moves, yes. It also plays some moves at what would normally be considered the "wrong time" conventionally. But the fuseki still looks highly derived from human games to me -- human with some (sometimes major) tweaks.

When AlphaGo is trained fully from first principles alone, I expect another Shin Fuseki -- a revolution in opening strategy. I would be surprised if Go Seigen, Kitani Minoru, et al had finally discovered optimal opening strategy at the beginning of the 20th century (developments since then have mostly been small refinings of what was started in the Shin Fuseki era, not revolutionary).

Some people were discussion that it seemed like Master just didn't care that much about the opening, since "it knows it'll win anyway".

On chess computers certainly seems to have supported that basically any opening move is playable with tight enough play.

Maybe it does not use the same opening style when it plays against a version of itself.

It would be very interesting to have pro-players comment on published records of alphago self-play.

Maybe alphago has discovered a new balance between black and white (that is a new optimal value of the komi) but when playing with the human defined of the value of the komi its optimal style is also different than what it would be otherwise.

You mean, like these three AlphaGo self-play games from September? ;)


(analysis by Gu Li and Zhou Ruiyang, two top pros; standard komi)

Note that these games look much more human than the ones dfan was describing. There are surprising ideas, but they are still much more normal.

I don't know about that.

I'm a mere 1 dan, but I fail to see this marked "difference in normality".

What specifically looks "much less human" to you in these newer games?

It's certainly a point of contention, but I think it's the general tenor of comments by other players. Though in hindsight, I wouldn't call it inhuman per se. Humans do play strange moves. It's just a big deviation from pro orthodoxy.

In this games described here, the floating reduction of the wall looks very odd to me: http://lifein19x19.com/forum/viewtopic.php?f=15&t=13929&p=21.... I also thought the first game against kiss88 featured a lot of very unorthodox moves: the double tenuki in the top right and the shoulder hit against the 44 + knight's move in the lower left especially seemed novel to me. The two space extension from the 34 point is also uncommon, but less surprising.

Another move that didn't seem familiar to me was the shoulder hit and large knight's reduction in the lower right, but that may just be a tactical variation that's unfamiliar.

P.S. I'm only 3k AGA, 4k OGS, so take my personal opinion with a grain of salt! I do think I have a decent feeling for what other people are thinking. Ke Jie made a comment about no human having scratched the surface of go.

Agreed, AlphaGo's moves are definitely unorthodox. And yes, humans with their "narrative" style of playing and "intuition" can barely scratch the surface of reading-heavy perfect-information zero-sum games like Go. No contention there.

But the question was whether these Master games look less human than the previously published self-play games by AlphaGo. And I just don't see it.

The first and third self-play games in particular look crazier (to me) than anything I've seen in this Master collection.

But again, just a measly 1d amateur commenting on unworldly 9p vs ?11?p battles :) People at my level can barely glimpse the tip of the iceberg.

Can you give some examples of what you consider non-human? Really curious since I don't know much about common strategies in Go.

Human moves tend to fit a narrative and be explainable, although often for very concrete reasons. For example:

"I am sketching out territory while attacking an opponent group."

"I am making my group safe so that I will not have to worry about its life while I accomplish other strategic goals."

"I am making a very strong group so that I can use it to make it harder for my opponent to accomplish anything."

Master, on the other hand, will sometimes seemingly just plop stones down in the middle of the board in a way that is hard to assign a narrative to. One can list a bunch of ways that the stone might come in handy in various futures, but it's too "vague" a move to be played with confidence by a human professional go player.

Of course, human strategies may change in response...

When playing against a much stronger player, it's always very hard to figure out why they play tenuki (make a move somewhere else on the board, apparently uncorrelated to the current fight).

When AI gets strong enough (and it seems like it has already), it will just tenuki everyone all the time, while winning. Sounds like exactly what's happening already. It's past the event horizon for human understanding.

> It's past the event horizon for human understanding.

AlphaGo creators could "rewind" the whole program state to that move and inspect the tree search probabilities according to the board states it looks through to find a list of board states that generate a cumulative highest probability of wining by doing a move in that exact odd spot.

My guess would be that while humans tend to put stones with a single, double, or sometimes triple "reason", or in AI words, "high probability of local effectiveness in upcoming several turns", AlphaGo, with his ability to see further, can see past the local effectiveness into more global effectiveness and higher probability of winning further down the road.

In other words, those tenuki moves might actually be past the event horizon of human understanding, but only if inspected by looking at them and thinking ourselves. If we use AlphaGo itself, it should be possible to find out the reason for every single tenuki it will ever do.

> past the event horizon for human understanding

I think this phrase is going to pop up more and more frequently.

This thread is literally the only Google search result for this phrase (for me)...

I searched for it in incognito mode and saw the same. Pretty interesting; it's such a nice, catchy phrase.

"Outside the light cone" is more on point -- it's far enough away or moving fast enough that we'll never be able catch up.

To be fair, humans haven't had a lot of time to analyze AlphaGo's play. It's possible that we could understand these moves after some study.

"past event horizon for human understanding" I think you are too pessimistic and that "centaur Go" will evolve in the same way that "centaur chess" is, where a player (or team of players) with access to the software can outperform the stand-alone software. I guess my vote is for "intelligence augmentation" over AI, if only because opaque models/algorithms cannot benefit from human creativity.

Centaur chess players do not outperform the best chess engines. That is based on a misrepresentation of one of the first centaur chess tournaments. Human modified moves only reduce the strength of the best chess engines now. The benefit that was reported was one of tactics one centaur chess competitor had over other centaur chess competitors, not centaur chess vs stand alone chess engine.

It is a misunderstanding that has been breathlessly repeated by futurists and transhumanists for years.

And it's such a strange claim, you'd think it would be obvious that it makes no sense. Nobody would ever try to claim "I think that a team of a top pro and a mediocre amateur would be stronger than the top pro alone", but that's basically what they're doing with centaur chess.

Why do startups staffed by amateurs sometimes outperform teams of experts at established firms? Creativity and new perspectives can yield surprising results.

> Human moves tend to fit a narrative and be explainable, although often for very concrete reasons.

I'm not a Go player, but could it not be that if the AI played its moved in a more human order you could see what was going on and assign a narrative to the moves, but the AI can see the order of the moves doesn't matter sometimes so it seems to play more vaguely/randomly to observers? For example, say the AI played a set of moves at the top of the board that you could give an attacking narrative to and then it plays a set of moves at the bottom of the board that you could give an defensive narrative to. If you intermixed the move order for both narratives you, it seems like the AI is playing in a nonhuman way when really its moves have a narrative but humans are too fixated on the move ordering.

More likely, playing it in order commits to a certain approach too early, and makes that approach predictable. Starting from the middle leaves other options open.

Given that it trains against a copy of itself, with equivalent predictive powers, it makes sense that it would pick moves that have lots of branching possibilities because that would increase its effectiveness against itself.

I've heard a similar approach described in military strategy at all levels: rather than looking for a single dominant tactic, you try with each "move" to create so many potentially-viable future positionings at once that your opponent cannot predict you in order to effectively concentrate their effort.

It'd be very scary to watch a "sibling" to AlphaGo play a 4X game.

DeepMind has set their sights on StarCraft II next: https://deepmind.com/blog/deepmind-and-blizzard-release-star...

Directly thought of StarCraft while reading this comment: never revealing your strategy and constantly attacking while being defensing and investing in economy to keep your opponent on the defensive. Like another commenter said, StarCraft 2 is DeepMind's next bet.

I am not too familiar with StarCraft but from what I can gather, it relies deeply on micromanagement. It seems to me that any half competent AI strategy wise would destroy any human player just with raw clicks per seconds during fights.

That's true, with StarCraft you have to basically limit the actions per minute the AI can perform to human levels, otherwise it can effortlessly out-control a human player. The game was only balanced to be fair for players with human capabilities, if you take that assumption away the unit and race balance goes totally off the rails.

There is the question of what your AI takes as its input though. If you feed it the rendered frames of the games there are a lot of open challenges in comparison to feeding it a friendlier representation of the game state.

There is also the question of teaching the AI a strategy vs only giving it the rules of the game and letting it develop strategies by itself.

In comparison to Chess or Go, StarCraft has a lot more game elements to understand and conceptualize. StarCraft is played with incomplete information, because of the fog of war - so for an AI to play it properly it has to have persistent knowledge of what it has seen of the game state and be able to draw conclusions from that about the probable overall state of the game.

As far as I'm aware they force the AI to only interact at a certain frequency, limiting this kind of advantage

You call it narrative when you understand what's going on.

You call it chaos when it's far above your head.

Good insight, I think there are implications for the Cynefin framework model that offers four possible state spaces: simple, complicated, complex, and chaotic. This classification may have as much to do with our state of information (or depth of understanding) as an objectively "chaotic state."

This is usually called the metagame and evolves over time. My guess is that due to these unseen tactics the metagame will change and human playstyle will become more AI-like and adapt to the changing circumstances. The real question is whether people would be able to keep pace with rapidly changing AI playstyles?

Depends on if people can figure out how to evaluate the more random seeming 'AI style' moves. With the complexity of Go it's possible to only way to really tackle the complexity is to wrap groups of moves together in a local narrative.

> The real question is whether people would be able to keep pace with rapidly changing AI playstyles?

If we're far enough from "perfect play", then the answer is no.

AlphaGo often plays surprising moves. Typically high-level commentary describes the horizon of "surprising" here as playing wider, faster, and more influence-oriented than current professionals think is appropriate. AlphaGo is repeatedly praised for identifying moves which seem too unconcerned with the opponents threats and instead take more power on the board in a leisurely fashion. The first big surprising move it played was a "4th line shoulder hit" which was thought to give too much territory to the opponent (a "3rd line shoulder hit" is considered a very fine and popular move). AlphaGo showed that at the right time even the 4th line variant is good since is gave AlphaGo large influence at the right time and position.

A lot of go comes down to timing. AlphaGo doesn't seem to play moves which are inhuman in that they just make zero sense as much as plays moves which are more daring than humans would like to try. Then, infallibly, it turns out that AlphaGo's daring move had the mark of being amazingly well-timed.

> I would not have guessed that human strategy was that deficient.

With as much freedom as Go allows I think it would be surprising if humans had stumbled upon an optimal strategy (or Master for that matter, I'm sure there is still much to be improved!).

The fact that Go commentators talk in terms of local strategy and narrative and anything other than the end-game from the very beginning made me feel fairly confident that Go was not a game that humans would ever reach optimal strategy levels at.

And in addition to that, winning condition is also extremely fuzzy. Looking at the Master games, I couldn't tell why he is wining at all, granted I don't know much about Go.

So if it requires some experience just to recognize a winner, and as far as I know sometimes even professionals can't tell for sure who is wining, it's pretty safe to say Go game is just too complex of a game for humans to come to optimal strategies in any reasonable amount of time.

You only need to do the math and look at the huge exponent to figure out that this is indeed the case.

Since AlphaGo's original training was to predict human moves, it should know how surprising a move is in addition to knowing how strong it is. My thought at the time was it could improve its game against humans by giving surprise value some weight -- when it can pick a slightly weaker but highly surprising move, pick the surprise.

It definitely does know how surprising a move is - it generates 'surprisingness' numbers for both itself and its opponent based on how likely it is a human pro would have come up with the move.

I like your idea of maximising the 'alienness' of play though probably not to the exclusion of playing the strongest moves.

You're probably right, if they were optimizing for short-term win rate vs humans. The real goal is to advance the state of the art in AI though, in which case you wouldn't want to use 'cheats' like that to make the problem easier.

Another interesting aspect is that AlphaGo gets a larger advantage over human players if it plays novel moves and forces rare situations that the human players are not familiar with. So if AlphaGo is allowed to self-play a long time and "drift away" in strategy space from human games that could help a lot. This is even more the case in time-limited matches where the humans are forced to play intuitively in out of sample states.

Professional go player Otake Hideo supposedly said he would ask for three stones if playing against God. That was before AlphaGo.

My guess has always been that the real gap is much higher, and a perfect player could give 6 or even 9 stones to the world's top players.

In perfect play there would be no such thing as joseki. I don't think it would be a game we would even recognize.

As play gets closer to optimal it gets more and more difficult to play more efficiently than your opponent. To play so much more efficiently than your opponent to overcome a handicap of 6 stones strikes me as extremely unlikely at pro level.

At amateur level, say a 2d vs a 2k player would perhaps have a 99.9% winrate. To make it an even game, would take 4 handicap stones. But at pro level, I think it's possible for player A to have a 99.9% winrate against player B, but on two handicap stones, for player B to be the favorite.

Even an engine that is enormously successful against top human players would struggle at high handicaps vs them.

In fact I agree, and if a perfect player were available, pro players would quickly get better at taking handicap!

What I really wanted to express is the amount of headroom available between top players and perfect play. When I say nine stones, I mean, whatever the win rate is between an idealized 9p and 8p player, there would be nine more such steps between the 9p player and god. Probably more. I don't necessarily mean that god would have even odds giving nine stones to top players, because that's a different game.

However, I have seen enough games where strong amateurs are taken apart with shockingly high handicaps by top pros, especially in faster games, to wonder. I think we simply fail to imagine how strange perfect play would be. Even if a pro spent the rest of their life thinking about the next move, they are unlikely to find the one true best move. A player that always played that move would be so far ahead of anything we've seen that we just can't imagine how much better it would be. Imagine knowing at move 10 that the best move, given perfect play by both, leads to a 3.5 point win in 256 more moves, while the second-best move leads to a 2.5 point win after 310 moves. Just stating it this way shows the amount of headroom there is above human play.

The same thing has happened in poker. The computer plays moves that are "obviously bad" according to human heuristics -- frequent limp opening and donk(ey) betting -- yet the computer is able to incorporate those moves into its strategy successfully.

Interesting reddit discussion before reveal. Assesses bot identity and gameplay: https://www.reddit.com/r/baduk/comments/5l3l7e/chinese_ai_cr...

The reddit thread also links to more interesting discussion on the board lifein19x19:


We know that Master can figure out what it would play. We also know that its predecessor had a model for what moves a human professional would be likely to play.

What I would find truly fascinating is if Master could divide moves that it plays, which professionals wouldn't, into groups based on a similar internal categorization of the moves. And then see if human minds can look at any of groups and come up with a human understandable principle that humans had been missing about the game.

The point here is not so much to improve human play (though it presumably would do so), but as a step towards having an AI that can break down its internal model into principles that can be used to train another AI to learn those principles. Just like how a human expert can learn to turn expertise into something that can be taught to other humans.

This has several potential benefits. The first is that human experience suggests that this type of introspection tends to improve our own competency. The second is that we could have a single AI trained by multiple specialist AIs to get a compact "generalist". And the third is that this is a path towards having AIs that can discover things then teach them to humans.

The whole idea might fail horribly. But I'd like to see it given a shot.

Your described method of understanding his moves might work for some moves, but there will inevitably be moves that are just a sum of so many different probabilities that are so far down the road that even looking at the result nobody would be able to actually recognize such a move on slightly different board.

At some point, if this method would be used, it would probably require a second AI that would help understand the main AI, because the primary "explanation" would still be too complex and/or subtle for us to comprehend.

Obviously "automatically extract the principles in an understandable format" is a long term and probably unachievable goal.

However there is hope that cluster analysis on data about the internal reasoning process can successfully identify groups of positions that "seem to share a common principle". Success in that is a first step towards lots of interesting things.

This seems like the beginnings of the plot of an anime I'd want to watch. Season two would probably start with the developer's getting complacent and a Chinese AI entering the scene.

Really hope to see one machine against another. In future we'd probably have game tournament of AI machines. Example my tensorflow-build machine against your openai-build machine in a StarCraft game.

Exactly that has already been going on since 2010 :)

There's a livestream of bots facing off here: https://www.twitch.tv/certicky

There's a pretty interesting Starcraft AI tournament going on right now actually, though I'm not certain whether any of the bots are using deep learning techniques: http://sscaitournament.com/

Corewars, where the program is written by a neural net.

Sounds like RoboCode.

Season three would follow "Colossus: The Forbin Project" with the two bots demanding to be linked together, joining forces, and taking over the world.

No Game No Life[0] is somewhat related, it starts out with a mysterious and anonymous player beating everyone at online games.

[0]: https://en.wikipedia.org/wiki/No_Game_No_Life

This is impressive. According to the article, the bot has not yet lost.

I think at the end of 2017, we may have to say that computers have conquered Go.

Interesting that the timing is 20 years after mastering chess.

It's important to note that the games were played on a fast-paced "blitz" time setting where AIs are particularly strong compared to humans.

I can already barely follow what is going on on the board, so I'm sure watching slower paced games against pros would be interesting.

The games were played with 5 seconds per move for the computer and 30 seconds per move for the human. (Where unused time is lost, not saved).

I am not sure if slower time settings favoring the human is still true -- AlphaGo works a little bit differently from traditional computer go programs.

Earlier versions of AlphaGo against Fan Hui, the slower official matches were 5-0 in favor of AlphaGo, whereas in the unofficial blitz games, Fan Hui managed to win some games.

The series is over with a score of 60-0.

You can find all the games at http://tieba.baidu.com/p/4922688212?pn=1

It's chinese but the pictures are universal and the comments don't really matter much.

By the way, Google Translate makes a pretty horrible work at translating those pages. I think they need to add some more DeepMind to it :-)

To be fair, they probably don't have much training data for the jargon of the game of go or of any other game.

> By the way, Google Translate makes a pretty horrible work at translating those pages. I think they need to add some more DeepMind to it :-)

I think they made an announcement about that.

> Found in translation: More accurate, fluent sentences in Google Translate


So it doesn't seem to work so well.

One example: 구리 (Roman : guli for player Gu Li), is translated to Copper (Literal Word meaning).

One area google translate could look at is the name identification from the context and using roman instead of literal translation.

How to know if Master is white or black in each game?

After the diagram there is the Sgf file with the moves. PB is player black and PW is player white. AlphaGo is Magister in the first 20 games and Master in the last 40. Two accounts on two different servers.

It would be good to start experimenting with handicap stones to understand how many stones stronger the bot is.


Also experimenting with a new ranking system beyond 9 dan.

Elo Ratings are unbounded.

Yes but players are usually presented by their rank, e.g: Ke Jie 9P, Lee Sedol 9P.

Then, professional dans in some Go associations are granted by total wins over their career rather than relative strength to another rank (e.g: Japan).

There is already a wide range of strength among 9p players. In addition, once a player reaches 9p, they never lose it, so it includes players in their prime as well as players who have declined.

Yes, the ranking system you refer to is the one based on total wins over a player career and doesn't consider loses like ELO or other rating systems do.


Fernando Aguilar, a 6 dan amateur from Argentina (http://senseis.xmp.net/?FernandoAguilar) defeated 2 Nihon Kiin 9 professional dans (Hasegawa Sunao and Yo Kagen), which is unexpected given the substantial rank difference.

Two things:

1) The two 9p professionals you mentioned achieved 9p status before the new promotion system you linked to. The new standards are much more stringent (though still based on lifetime achievement).

2) The standards differ by country. All the major countries now give out 9p ranks sparingly, but I think China especially may be quite difficult.

Sure but #1 is not so relevant since the old promotion system was also based in total number of wins rather than strength.

My understanding is that it was based on Oteai performance: there was an annual tournament and you had to score well enough against players of roughly your strength during a single year to improve. Winning half your games for 5 years in a row would not result in a promotion. Whereas today, you can grind out the requisite number of wins over the course of 2 years, 5 years or 10 years.

As such, the old system was, roughly speaking, based on your historical performance, not just wins.

But the bigger point is that it was very permissive. There were something like 70 9p professionals in Japan, and there will be quite fewer in the coming years (I believe Iyama Yuta is the only 9p aged less than 30).


This game is a mirror go upto move 72. Black player is https://en.wikipedia.org/wiki/Chou_Chun-hsun

I would also like to see how the strength is affected by computing resources. For example, one server, versus 10 servers, versus a room full of servers, versus an entire datacenter. I wonder how close this is to playing a perfect game.

I would also like to know if two perfect players would always end up with a draw, or if they would each win 50% of the games.

Either the white player would win in 100% of matches, or the black player would win 100% of matches.

Go rules prevent draws by giving Black a non-integer score bonus (this is called Komi, http://senseis.xmp.net/?Komi); By definition of "perfect play", a Black perfect player either always wins games, or a White perfect player always beats him.

Your reasoning is correct but the bonus score (komi) is given to white, which is second to move and is behind in the race at building territory (the thing that gets scored).

It's not only about the hardware but also about the training data sets, the neural network topology...

Then, if something better than MCTS is found that will provide advantage too. Remember that those bots are hybrid solutions.

Wonder when we will see AI tutors appear? Seems like there is a massive potential to train an AI super player then have it teach you. Does anyone know of such tools yet? Guess there are probably many AI augmented things we use all the time (Google Search) but just don't know it. Fascinating to think.

There was a recent post here about a new LiChess feature that identifies mistakes in your (or someone else's) games, prompts you to look for a better move, and tells you the best move if you can't figure it out. I think this is a form of AI tutoring.


The same feature has been in GNU Backgammon (and other programs, I guess) for a long time. Also, it allows you to request you or the AI get good or bad dice rolls, which is an interesting training option.

Very cool. Thanks!

Some go programs already have features going in that direction, where they can look at a particular board state and show you which moves they think are best from that position. People have been starting to use them to review their games to see how an AI stronger than them would have played differently.

For example Leela (a free program) does this, and can display it with a "heatmap" style: https://sjeng.org/leela.html

Crazy Stone Deep Learning ($80 USD) has some similar analysis/hint features: http://www.unbalance.co.jp/igo/eng/

I'm not an expert, but I imagine these AIs attack the game using methods that just don't work with humans - predicting long deep branches of the game that are impossible to keep in your (well, at least my) head.

The way I recall building simple game AIs is to create a board evaluator, then build a game tree, prune as necessary, and min-max as system constraints allow. I'm sure "real" AIs are much more sophisticated but I don't know how much of what it does can be taught to a human.

It seems that AlphaGo has got even better since when it defeated Lee Sedol. This time, AlphaGo is undefeated against other top players. However, it also gives other players a chance to practice against AlphaGo and finds weaknesses in AlphaGo, which is super important for any competition.

It would be super-weird if AlphaGo became worse that the previous version.

Neural networks have a bad habit of regressing. When training continues they tend to replace earlier skill with newer skill.

The team would surely detect that. In the original paper they even mention this issue, and as a safe-guard they also test later networks against previous versions.

True, they are aware and monitor for it. Just pointing out that it's not a super weird thing to happen with a NN. In fact it's a super normal thing that has to be continuously monitored.

> Go is regarded as the most complex board game ever invented

I'm not sure that's an accurate statement.

How about the following statement?

"There is no widely played game for which an Elo rating system shows a larger measurable range of skill than Go."

This is a precise and measurable statement. The standard deviation of the Elo system used in Chess is 200, and the range from the best humans to rank amateurs is 14 times this standard deviation. When the Elo system was adapted to Go, the standard deviation was set to 100, and the range is 29 times the standard deviation.

The possible skill range, of course, measures something how complex the possible strategies are. And Go trounces every other game by this measure.

I think that gives Go a pretty defensible claim to "most complex strategy of any popular board game".

Complex is being used in a different context here though. You are meaning complex as in strategic complexity. The parent (I think) is using it in terms of the games rules. Go is pretty simple in principle: place stones on the board one at a time, and remove enemy pieces completely surrounded by your own. But it's fairly complex to know what exactly constitutes being surrounded. Chess has 6 different kinds of pieces that have different movements, and different kinds of rules depending on the state, and promotion at the end, castling, and en passant. Checkers is a much simpler game because there's only 2 kinds of pieces, and 2 kinds of moves. Strategically, Chess isn't leaps and bounds more complex than Checkers since Carlsen has an Elo rating of 2800 in Chess, and Tinsley had an Elo rating of 2700 in Checkers and I believe it's the same deviation of 200.

Checkers has adjusted its Elo system over time. I believe that it now matches chess except with a floor of a 1000 rating. From http://icheckers.net/ratings/, the current ratings range goes from 1000 to 2297 among active players, and up to 2510 among inactive players.

Even if you use the larger range of inactive players, that's a range of 7.5 times the standard deviation. Which is considerably less than chess.

Tinsley isn't on that list, and I saw it claimed that his rating was 2700: https://en.wikipedia.org/wiki/Marion_Tinsley

Chess has a floor of 1200 for novices. So I'm not sure your point.

The Elo system for Checkers has changed over time. Tinsley had a rating around 2700, but he would not get that rating today.

Chess ratings from the USCF have a floor of 100. What you are thinking of is that they start people at a provisional rating of 1200 and then let them drift to where they belong. But a person who just knows how the pieces move will quickly head down towards that floor.

Know of any resources that help figure out what the Elo SD of a particular game is, and the range? Also, know of anything written about various ways of measuring the complexity of a game? I'd never thought about the number of SDs thing.

By this measure, is there a more complex game than Go, but which is not as widely played?

(Question from someone who does not know anything about board games.)

I ask because a natural question arises: can one engineer a game specifically so that it is extremely difficult for a machine to master compared to a human -- due to some strategic complexity, or some approach that does not lend itself well for current AI models.

edit: looks like there's at least one http://arimaa.com/arimaa/

Probably not.

Even if a game was more potentially complex than Go, it would be hard to establish the dynamic range of Elo ratings unless a lot of people played it, and hard for skill variation to be that high unless a lot of people studied very hard to become very good at it.

Yeah, "deep" is probably a better term than "complex" to describe Go.

For example, Arimaa, which has a more complex ruleset and a bigger game tree. It was designed to be difficult for AI, but in practice the complexity was difficult for humans too, and AI was developed to beat the best humans using classical chess AI techniques and some clever heuristics.

Go is "most complex" only when measured by difficulty of implementing a AI to beat the best humans.

If the human race had collectively had only a decade of Go-playing experience (as was the case with Arimaa), it might have been possible for an AI coded with "classical techniques" to defeat them.

(To be specific, my guess is that we'd have been at the amateur-dan level, and the initial MCTS revolution would have been enough for the computers to win.)

A counter example would be the easiest way to disprove it.

Cones of Dunshire? :)

I haven't met a formidable Cones of Dunshire computer player, AI or otherwise yet, so for now I remain unconvinced :)

what about "most complex board game that is more than a few centuries old"?

The Master bot was discussed a bit in on Google Deepmind 2016 round up: https://news.ycombinator.com/item?id=13312219#13314592

~60 wins atm.

I'd like to see if a small team of the worlds best go players could beat AG.

How much stronger is a team of top players than its strongest member?

I wonder what the best way of coordination would be. Perhaps they can identify several promising lines and each player chooses one variation to calculate more deeply.

From my own experience playing (as a weak amateur), I feel I'm rarely able to think so systematically -- often ideas I discover while contemplating one line are tried in entirely different variations, so it would perhaps take some getting used to before a team can function optimally.

There's a very interesting book called The Go Consultants (http://www.slateandshell.com/SSJF003.html) which describes how professions have worked together during an extended time consultation game (2 v 2).

> I wonder what the best way of coordination would be.

They would have to use a software to divide the search tree between them and collect their judgements.

I'm wondering how would a mixed team of AIs and humans fare? For example, a team of 1 AI and 4 humans against a team of similar setup. Humans could be assigned to judge parts of the search tree and their inputs aggregated by the AI.

I'm pretty sure this[1] is the archives of the bot's play. You can download the SGF files of the matches from there and view them online at EidoGo[2]

[1] http://www.gokgs.com/gameArchives.jsp?user=Master

[2] http://eidogo.com/upload

Marcel Grünauer over on LifeIn19x19 has made a compilation of all 60 of Master's games. Found it just then! (A zip archive of 60 sgf[1] files) I _think_ that's all of them as of now.


[1] https://en.wikipedia.org/wiki/Smart_Game_Format

ps: The Japanese term for a game record for the game of Go is kifu: https://en.wikipedia.org/wiki/Kifu

Hence http://gokifu.com/ which tracks all pro games (and then some, I think)

Ah! Thanks for your evidence in support of cunningham's law[1]. I'm delighted to have access to the real deal.

[1] https://meta.wikimedia.org/wiki/Cunningham%27s_Law

It was not on KGS, but Tygem and Fox, which are Asian servers.

Did any team reproduce the AlphaGo technology yet?

Here is the facebook version: https://github.com/facebookresearch/darkforestGo

And here's another public attempt, not sure about the progress. https://github.com/Rochester-NRT/RocAlphaGo

if their goal was to reveal what they were doing to the world, this is a fine way to go about it and build some anticipation and get a PR splash.

if their goal was to train the best go-bot, they could have had it play not to win but to go down to the wire with very strong players but frequently lose.* Experienced players might have identified it as a bot, but would have dismissed it as "not good enough yet"

*especially as per the recent story that children don't learn when they win, so trains yourself without training your opponents :)

Can AI beat Fed?

Anyone know the origin of the name they chose? Did they name their bot after the rapper Percy Miller?

I seriously doubt the Percy Miller connection.

AlphaGo is the best player in the world, so a name like Master fits it :)

They also probably didn't want to choose a human looking name which could mislead people, and instead chose one which could presumably be either of a human or an AI.

We will need a Kobayashi Maru exam for this.

..wonder when they're gonna "reveal" their secret orwellian its-all-for-the-good-of-humanity retina-scan-ml-trainingcamp(yes, reads almost like german) project from that UK-ish part of the world..

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact