Hacker News new | past | comments | ask | show | jobs | submit login
Lee Sedol Beats AlphaGo in Game 4 (gogameguru.com)
1395 points by jswt001 on March 13, 2016 | hide | past | favorite | 448 comments

Relevant tweets from Demis;

    Lee Sedol is playing brilliantly! #AlphaGo thought it
    was doing well, but got confused on move 87. We
    are in trouble now...

    Mistake was on move 79, but #AlphaGo only came to
    that realisation on around move 87

    When I say 'thought' and 'realisation' I just mean the
    output of #AlphaGo value net. It was around 70% at
    move 79 and then dived on move 87

    Lee Sedol wins game 4!!! Congratulations! He was
    too good for us today and pressured #AlphaGo into
    a mistake that it couldn’t recover from
From: https://twitter.com/demishassabis

I'll risk assumption that somebody from Deepmind team is reading this.

Guys, please, publish charts of win prob estimated by alpha go in time during these games. Some heatmap telling which moves did it consider as best for both sides during the games would also be cool, but that's surely more time consuming to prepare.

It would be great to be able to have such things for top pro tournaments in the future.

Actually, it's a tradition that both Players should replay the game after the match, discuss about good moves, bad moves and what they were thinking during the match.

I felt so sorry for Lee Sedol when I saw him lose the second match, facing an empty chair ,and he could only ask one of his friend to review the game.


Or the helpless look of Aja Huang... like "I have no idea how to decipher that neural network, that's part of why you are here".

He made $30k for losing that game, so I don't feel too sorry for him.

If it really was about money, I don't think he would have spent so much of his life dedicated to Go ...

yeah, not to belabor it, but if you can do something like be a champion in go or chess, chances are you have the mental skillset to do something exponentially more lucrative.

This is not true. Scientific studies have shown that skills in abstract strategy games are so specific to the game that they do not transfer to other tasks. The urban legend about Chess grandmasters all being geniuses is false, in other words. They're actually worse at most jobs than people who've been doing those jobs for awhile, because they lack the experience, and they aren't more predisposed to having a genius intellect that would let them overcome that.

A Chess or Go champion is probably doing the single-most lucrative activity that they are capable of.

What scientific studies are you referring to?

Of course chess grandmasters aren't all-around geniuses, but many traits that are prerequisites for a successful chess player certainly are translatable into careers in other fields, and correlate with above average brainpower (good memory, discipline, long attention span, spatial intelligence etc.)

Botvinnik was an accomplished engineer, Euwe had PhD in mathematics, Anatoly Karpov is a millionaire, interestingly enough a lot of recognized chess players had careers in music (like Taimanov or Smyslov)...

Being a grandmaster surely requires an above average intellect, there is no such thing as a chess savant. While it's an urban legend that Kasparov's (arguably the greatest chess player ever) IQ was in the ballpark of 190, he did clock at 135. Such a result is not unheard of, yet still placing him in top 1% or so.

There's a whole body of research out there, but I just scratched the surface of it. Here's the first result that's coming up from a random Google search. https://psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html The references should be worth exploring, and I bet there's more recent research that expands on it as well. Here's a quote from the paper:

> In a recent review, Ericsson and Lehmann (1996) found that (1) measures of general basic capacities do not predict success in a domain, (2) the superior performance of experts is often very domain specific and transfer outside their narrow area of expertise is surprisingly limited and (3) systematic differences between experts and less proficient individuals nearly always reflect attributes acquired by the experts during their lengthy training.

I will say this for Chess or Go grandmasters: They have drive and dedication (and in some cases, compulsiveness). That alone would probably allow them to do better than the average person in another field if they had pursued that field from the get-go. Also, I'd caution you against relying on hand-picked anecdotes; I could just as easily pick out a bunch of Chess players who weren't good for anything else. You'd need broad-based statistics.

You seem to be confusing two different claims:

1. A champion Go/Chess player could retire, and jump a successful career as <Something else>. You refute 1.

2. A champion Go/Chess player could have pursued a career in <Something else> from the start and made more money. You have not refuted 2.

For an example, many physics PhDs become successful software engineers and quants, making a mid-career jump to a different field.

Thanks for the link. Well, of course I just quoted a few examples without pretending it's legitimate piece of statistics, however, there hasn't been all that many world chess champions so far (and Botvinnik, Smyslov, Euwe, Karpov, belong in that category), and thus a few PhDs or successful intellectual careers is already enough to show that something clearly sets this bunch apart from the general public. You're less likely to find a plumber or a shelf-stacker in that bus (no disrespect for people in these professions). Whatever causes that is another matter. I agree with you that drive and dedication - and not necessarily some innate talents - could be the crucial factor in this.

I'm not sure you're disagreeing with the person you're responding to. Just because the skills involved are very specific doesn't imply that the people with those skills only are capable of obtaining said skill.

What the person you're responding to means is that if they have the mental capacity (and also discipline) to play the game at such a high level, they could probably also excel in other professions if their goal was to make money instead of doing something they loved.

Of course we can debate whether they would achieve such success if they were doing something they didn't enjoy as much, but I think most people would argue that most chess or go champions could make more money or in other words do more "lucrative" activities if they chose to do them instead of dedicated all their time to the game.

You are relying too much on the concept of a generalized mental capacity, which many researchers (maybe even most) would say does not exist. You seem to be assuming that grandmasters are just amazingly smart and they happen to apply those smarts to one particular skill, but what's closer to the truth is that they happen to be particularly amazing at that one skill, in a way that doesn't really correlate strongly with having amazingly high skill in other areas.

If you took the people who would have grown up to be chess grandmasters, intercepted them just before they started spending significant amounts of time studying chess, and had them instead spend all that time studying programming or physics or something, how do you think they would rank up?

You seem to be arguing against the idea, "A chess grandmaster is a genius and therefore can walk into Google and immediately start doing more and better work than most of their senior programmers". I don't think anyone believes this (correct me if I'm wrong).

I think a more serious idea is, "A chess grandmaster is a genius and therefore learns faster and has a higher performance ceiling and such than most people, and if they spent a couple of years learning to program, they could become an entry-level Google programmer, after which they would rise more quickly than most hires, and eventually would outperform most of Google's senior programmers."

By the way, I think most of the best chess players were extreme chess prodigies. (Just looked at Kasparov, Karpov, Shirov, Kramnik, Anand, and Carlsen's Wiki pages; all but Kramnik had the year listed, and they all became grandmasters around age 17-19, except Carlsen, who was around 14. Kramnik's page mentioned winning a gold medal for the Russian team at age 16, and that he wasn't a grandmaster when selected for the team and this was unusual.) I think this is consistent with them being highly gifted children, who choose to spend their time doing chess.

>I don't think anyone believes this (correct me if I'm wrong).

While not exactly what you wrote, people do think that someone is a genius at some subject can become a genius at another area with less work than it took for someone in either field to originally become a genius at that field, especially society groups the areas together (so sports star becoming master programmer is far less likely to be believed than chess master becoming master programmer).

> Scientific studies have shown that skills in abstract strategy games are so specific to the game that they do not transfer to other tasks

Do you have a reference for that please? I'm interested in the subject.

Though being able to compel yourself to do great things seems like a completely different skill set from being able to do great things on someone else's dime or on a team.

For example, my brilliant programmer friend that can't hold a job. Or the artist that can only paint their own inspirations. Or the savant that doesn't get along with anybody.

I dunno, Bobby Fischer was kinda deranged, for instance, and that often hurts outcomes in otherwise "lucrative" positions. Incidentally, he is credited with raising chess player compensation through his demands.

Generally I'd guess you're right though.

Maybe not a leadership position but I've worked on the engineering side of quant work and am familiar enough with fairly advanced Go / Chess players (e.g. national youth champions for age brackets for the US) to know that most of them could make enough to retire off one years salary during the boom years of algorithmic trading (~2003-2007ish).

I'm not sure how you define "deranged" (AFAIK, that's not a medically defined term within the DSM-IV) but most of those brilliant people end up being a little 'off'. My father was an academic, one of the people he went to graduate school with was working in Boston while I was a child. He was absolutely groundbreaking work but he's so difficult to collaborate with (think: the mannerisms of Richard Stallman) that he's been floating around universities until his welcome is worn out. He can figure out remarkable things in higher level computational chemistry, but he can't really figure out humans. Had he decided instead during the 80s to work at Renaissance instead of pursuing academic research, he almost certainly would be worth in the low hundreds of millions.

Well, mental issues did break his chess career just the same.

With the number of tournaments he has won (many of which have a winner's purse of <$100 000) I suspect he is already a millionaire.

Those are the big tournaments -- do the little tournaments stack up, or is it a year of practice leading up to an annual big win?

I can not find it using google, but there was a Go champion who had stop playing go during one year to earn money playing mah jong.

False dichotomy? It is quite possible to enjoy an activity and gain money at the same time.

The best (like absolute best) football|basketball|baseball players in the world make approximately what per game?

Well, defensive end Olivier Vernon just signed a deal with the New York Giants which will pay him an average annual salary of around 17 million dollars. And there was a huge signing bonus as well.[1]

NFL seasons are 16 regular season games, plus 4 pre-season games. And then there are the playoffs, which a given team might or might not make or advance in.

All told, given that the signing bonus is amortized over all the games he plays, the annual salary, etc., I think it would be fair to say that Vernon will make around a million dollars a game.

Aside: Vernon isn't necessarily "the best" DE in the NFL, but due to market forces and the way things work with the salary cap, free agency rules, etc., the contract he just signed is one of the largest for a defensive player in the league. QB's tend to make even more, but I can't recall a really high profile QB who has signed a big deal recently.

[1]: http://www.spotrac.com/nfl/new-york-giants/olivier-vernon/

Osweiler QB from the Broncos just signed a huge deal with the Texans...$18mil/year but I haven't heard how much of that is guaranteed so I'm sure.

A really big issue with NFL contracts is this "guaranteed money" thing...I believe the NFL is the only major US sports league who give player contracts without it, so you have to take those salary numbers with a grain of salt.

34 guaranteed over first two years

Football is kind of an oddcase since it the most popular sport in the richest country, and it runs very few games per year.

Best is around 10x that, but at the cost of their body and long term health. Nobody is playing professional football|basketball|baseball in there 50's, and even just 40 is pushing it. Where pro go players can 60+.

Gordie Howe played in the NHL until he was 53.[1]

Not really refuting your statement, which is essentially true, but it's worth your time to read the wikipedia page.

He played on a professional hockey team with his two adult sons at one point(!). He played in the NHL in five different decades.

[1] https://en.wikipedia.org/wiki/Gordie_Howe

Yep, it's interesting to see a few edge cases out there. There are a tiny number of people that came close to 50 in baseball:basketball:football.

  Julio Franco played baseball till 49.
  George Blanda played football till 48
  Nat Hickey played baseball till 45.

Interestingly, the world champions seem rather young.

If as many people who pay to watch (NFL|NBA|MLB) paid to watch Go, you'd likely see similar payouts.

I would like to see AlphaGo play 21 against Steph Curry.

I.e. Something like [1] for every move. I was a bit disappointed to learn that such table is only available for that particluar move at the first reading.

[1] http://www.nature.com/nature/journal/v529/n7587/fig_tab/natu...

Off-topic: we (the human species) have built an AI that can master the game of Go, but we don't yet have the intelligence to publish charts correctly on the web. That blur of a JPG is half a megabyte!

Yeah honestly that would be much better presented as an SVG since its mostly solid colors and geometric shapes. Every browsers supports them, and all the big image editors as well; the trouble is that the rest of our software and culture hasn't caught on. Why am I taking bitmap screenshots of websites when they're almost completely text? We need to start building a tools ecosystem around vector graphics. An extra bonus would be a machine-readable representation of the board as well. We live in a digital age, but its almost like we're cavemen still dealing with the modern equivalent of analog formats. (And at least analog has fidelity)

Screenshots might not be the best example, since there may be some "shiny cute effects" going around. However, reading your idea sprang another one in my head: screenshots shouldn't be restrained to the same rendering resolution as the rest of the system/the screenshot resolution should be configurable.

I don't require the screenshot to be instantaneous, I require it to appear instantaneous. In that sense, if the whole rendering pipeline is working to give me a framerate of 60 fps, then I could spend ten times as much rendering a screenshot without that delay being noticeable. Also, why on earth does Windows (don't know the behaviour on Linux/Mac) apply ClearType to a screenshot? That has always bugged me, there are some situations where you tolerate it and others where it hurts.

It didn't master anything. What the DeepMind team did was make an "AI" which means it can _imitate_ (to a certain degree) of "playing go".

What's the difference between "imitating" playing go and playing go, when it can beat any one us at go?

That would be awesome.

"Do not anthropomorphise computers. They really hate that" (NN)

By NN you mean a neural network, of course? That makes even more sense...

NN = nomen nescio ("I don't know the name") You use it when you cannot name the source of a quote, or when you would like to protect the source.


Wouldn't "-Unknown" convey the same sense without the possible confusion?

I think this conveys a slightly different meaning.

When I see "-Unknown", I interpret it as "somebody said it, but no one is quite sure who".

"(NN)" in the current thread was used to indicate "I personally don't know", which does not imply the quote is unattributable.

NN is (was) also used in neapolitan comedy, derived from earlier use throughout the Roman Empire and Middle Age. The "figlio di NN", or "son of NN" means someone who was found and adopted (typically by nuns) and whose parents were unknown.

NN can be used in general for people whose origin is uncertain. I think in this specific case it's a bit misleading - although the wikipedia article seems to suggest that NN can also be used as a synonym for "unknown", although from a historical perspective it is a bit incorrect.

The literal translation from Latin creates some confusion if you didn't know the context.

I hope this helps.

So NN can be used as an school version of anonymous?

... wrote the bot trying to hide its identity

Using the acronym of the phrase defeats the purpose, though, since it looks like someone's initials.

Not so coincidentally, "This was when things got weird. From 87 to 101 AlphaGo made a series of very bad moves." [1]

The bad moves in the eyes of humans could be risky bets or the horizon effect. [1] [2] AlphaGo's use of deep neural nets (value networks) to evaluate board positions should significantly help counter the horizon effect, but since move 78 by Lee Sedol which turned the situation around was unexpected by some top pros (Gu Li referred to it as the 'hand of god' [1]), the patterns which follow are likely rare in possible game states and therefore not strongly embedded into the value networks, leading to AlphaGo's loss.

I hope the DeepMind team will help enlighten us on this in the near future.

[1] https://gogameguru.com/lee-sedol-defeats-alphago-masterful-c...

[2] https://en.m.wikipedia.org/wiki/Horizon_effect

78 was hard, but not impossible: If you watch the AGA commentary of the game, they had two pros at the time 78 happened, and they found 78 as the best answer a few minutes before Lee did, expecting AlphaGo to go with a stronger, yet still good not good enough 79, that left the game even, instead of basically lost. Then they were elated about how AlphaGo seemed to have failed to read the whole thing, and instead of doing preparatory moves for a nasty ko fight, the best option at the time, AlphaGo just took a route that provided almost no compensation.

If anything, it seems to me that AlphaGo's problem here might be the time management: Seeing a really scary situation, Lee Seido just sank many minutes into reading the problem, going pretty much all the way to byoyomi time. A human, after seeing something like that, would figure out that their assessment of the situation and their opponent's is very different, and spend a lot of budget trying to figure out what was wrong. AlphaGo just didn't see the problem, and didn't just budgets its time to analyze the position to death. It moved slower than before, but not really that much, and ended up making moves a kyu player could see as terrible.

Either way, I'd love to see Deepmind giving us all a good postmortem of the 70-100 range of moves.

I concur that better time management may have made a difference.

Beyond that, I think that AlphaGo may still be missing a type of component. From the descriptions of it, the policy network generates possible moves from board positions, and the value network evaluates the probability of desirable outcomes. How this is different than human play is that strategic assessment and planning are implicit in the middle layers rather than a 'conscious' element to searching and decision making. I'm not saying that this is a necessary component as AlphaGo has already done exceptionally well. I do believe this kind of 'middle-out' processing producing and evaluating strategic concepts could make it better handle unusual circumstances. Being trained on high amateur and pro games, it will best respond to the most conventional of those types of games, more unconventional the game becomes, the worse it would fare in terms of efficiency of move generation and choices of which to evaluate.

I got something different from the AGA commentary. The pros did consider 78, but came to the conclusion that it didn't work! I'm just a 1kyu player but I too can't see any way to make 78 work after Black plays 79 as an atari at L10, as suggested by the pros. So I'd be very interested to see some analysis that shows how White could get a fair result after Black 79 at L10.

It would be interesting to see a neural net evolve to take into account player state, not just game state. I play chess, and no where near professional levels, so I don't know if this anecdote is valuable, but if I see my opponent looking at a particular area of the board, I tend to take a second look.

I suppose beating humans isn't AlphaGo's primary motive though - learning to play a perfect game of Go in general is probably more difficult than playing the perfect game against a particular person.

Having the AI use eye-tracking to predict an opponent's moves is a truly terrifying thought. Just by tracking the eye with millisecond precision you could probably work out the strategy they were reading. It's so game-breaking that it shouldn't be allowed as an input, and indeed, is easily defeated with reflective sunglasses anyway (like a lot of pro Poker players wear).

The AI player can't give up information in this manner because it lacks eyes, so I'd say that it should not be able to use this information from the human player.

Seems like just another avenue to trick the computer

This actually came up in the commentary. Michael Redmond actually mentioned that some amateurs watch to see where their opponent is looking, but he called it merely a "trick", and he said it is not useful in professional play.

That type of metagame seems useful though, in the sense that you can start to sense the state of mind of your opponent. If they are working harder than you think they need to, you might get away with a riskier move for instance.

Just in case someone wants the commentary around this move 78


Here's the Myungwan Kim commentary around that same move: https://www.youtube.com/watch?v=SMqjGNqfU6I&t=1h33m. He had been looking with Hajin Lee at variations involving the move beforehand (https://www.youtube.com/watch?v=SMqjGNqfU6I&t=1h28m1s), so he immediately noticed that AlphaGo made a possible error in its response.

But Kim also jumped a bit backwards and forward for a few moves, until he finally decided it was a mistake. It didn't seem 100% clear.

Professionals are understandably hesitant to call AlphaGo's plays mistakes after the first few games, because AlphaGo had played a few moves that seemed like obvious mistakes (the eyestealing tesuji under the avalanche and the fifth line shoulder hit) but later proved to be very strong moves. Remember that when go is your career, your reputation is based partly on your ability to analyze games accurately, so calling a move a mistake and finding out later that it was strong would be a big embarrassment.

Any chance of explaining the mistake on move 79 to someone who's only played a few casual games of Go?

AlphaGo enlarged a group that was at risk of atari (loss to opponent) instead of correctly trying to divert attention to elsewhere on the board or starting a new fight or simply writing it off. If basically added stones to a group that was almost certainly dead and allowed the opponent to punish it heavily, thus giving away more stones than it had to.

OT: What's with using monotype for quotes? That's breaks line wrapping and makes it hard to read on mobile or small screens. I don't get why people do it.

"People" don't do it; HN does it. Indented lines in a post are always monospaced, on the assumption, I suppose, that they're likely to be code examples. There's no way to turn this off AFAIK.

And no, HN does not use Markdown.

I know the formatting system. My question is why the users are formatting their text (e.g. with 4 places leading each line) *suc that HN renders it as monospace, which then forces a sideways scroll on mobile and small screens. That seems to make it harder to read for some with no corresponding upside.

Oh. Well, I think they just don't realize that HN will monospace it.

I agree with you, it's annoying.

They don't realize it even after seeing their post go live? And seeing what other people's monotype quotes look like?

I never like to assume malice, but ...

The people indent the text (or fail to remove the leading white space when copying).

It isn't a huge time sink to wrap a paragraph in stars.

I think mostly people forget that > works and that therefore you can do

> quotes like this

since it's markdown - often you get commentish things that allow limited HTML plus indented code blocks, and people get used to using the latter as the only thing that works everywhere

'>' doesn't "work"; it doesn't do anything on HN. It's just a convention that readers recognize as indicating quotation.

HN does not use Markdown.

At least ">" doesn't break anything.

    Pre is broken on mobile, forcing users to scroll a horizontal line which is incredibly annoying if it's a long line.
Here's a screen shot: http://imgur.com/Ru56wMK

And pre with very long unbroken lines is even worse.

I wasn't expressing a value judgment, just trying to explain how HN works.

I agree with you that it's annoying when people post long indented lines.

it can be touch-scrolled here, there just aren't scroll bars. Took me a while to find, but maybe your browser is buggy, eve.

I noticed people were doing this and wrote a userscript a while ago to make it work. I kind of wish HN supported it.


I wrote in a while ago to ask that it be officially supported, but apparently that would break something that HN's spam filter relies on.

Looks good in desktop browsers.

not really: http://i.imgur.com/uxqERat.png (you have to be very careful about what indentation level you're at, how long your lines are, etc, etc, things that people just don't do most of the time)

It's also customary when quoting large sections of text. I'm not sure if this carried over from email, or from academic texts (where if your quote is more than a line or two, it needs to be formatted differently).

I think mostly people forget that > works and that therefore you can do

> q

How frigging smart Sedol is?

Kind of cool to know you're as smart as algorithms that require 1920 CPUs and 280 GPUs

They've said it doesn't require that, AlphaGo running on a single machine beats the cluster they're using 25% of the time.

So based on current data, Lee Sedol is exactly as good AlphaGo running on a single machine.

Doubtful; I don't think comparing such small W/L distributions will be illustrative.

On the other hand, the Nature paper shows the single 8 GPU machine performs similar to the 64 GPU cluster, but the larger clusters perform a comfortable margin better. [0]

By a single machine winning many games relative to the distributed version, really it's just saying that the value/policy network is more important than the monte carlo tree search. The main difference is the number of tree search evaluations you can do; it doesn't seem like they have a more sophisticated model in the parallel version. The figure suggests that there are systematic mistakes that the single 8 GPU machine makes compared to the distributed 280 GPU machine, but MCTS can smooth some of the individual mistakes over a bit.

[0] http://www.milesbrundage.com/uploads/2/1/6/8/21681226/877172...

Probably better. If we assume a uniform distribution across possible win rates of Sedol vs AlphaGo, then update it with bayes rule, we get 33% chance that Sedol will win the next match.

That's not factoring in other information, like Sedol now being familiar with alphaGo's strategies and improving his own strategies against it.

So there is a good chance he is now evenly matched with AlphaGo, and likely much better than the single machine version.

There's a bit of a slight of hand in this statistic -- yes, they can do runtime on a single machine, but it took the compute power of a small country to train the neural nets that are loaded onto that one machine.

That's not really sleight of hand. Lots of things take more energy to produce than to run. It's like saying a 400W electric motor can put out as much power as a fairly fit human, but it's 'sleight of hand' because it took a whole factory to make the motor.

And it took decades of play for Sedol to become a top player. I find the similarities a mix of satisfying and amusing.

You're conveniently forgetting that this "AI" is a representation of tens of millions of amateur plays which is far more than a few decades in total time. Not that Lee Sedol needs someone like me to defend him, but remarks like yours are very misleading.

A human is far, far, leagues, more efficient at learning than today's AIs. These AI requires millions of hours of man time of data to even come close to competing at the level of an expert person which did the same, and even arguably far better, in a "few decades".

Even on a single machine it has the "memory" of virtually any Go game AlphaGo could be fed.

If you look at DeepMind's numbers, AlphaGo is meant to win more than 50% of the time against human players (expert). Right now, my opinion is that its actual capability in wins is less than 50%.

We're in the very infancy of the field though. Give it a year and it's likely you'll have a neural-net-based Go program that runs on a single machine that can defeat every single human professional. Human Go players have been getting better for millennia; don't assume that a program that's only two years old has no room for improvement left.

It took some serious hardware for Deep Blue to defeat Garry Kasparov. Now there are smartphone apps with that same level of Chess-playing skill. And if anything the AlphaGo approach is more amenable to running on lower-specced hardware without requiring help from Moore's Law (because you simply train it better).

It is believed now that Lee Sedol has a really good ability in reading AlphaGo's moves now (see: Micheal Redmond and AGA Game 4 commentary). Game 5 is going to be very interesting. I'm pretty sure the DeepMind team is just as excited. After game 4, they were noticeably far more ecstatic in the after-game press review.

Hassabis and Silver kind of reminded me of developers given the details of a bug that was notoriously difficult to find.

edit; I can't wait for the reviews of the entire 5 game series. If a book came out I'd very likely buy it. A book discussing both Go and AlphaGo AI at the same time consisting of people among the top of their respective fields would be amazing.

I wish some of these had been re-tweeted by the @DeepMindAI account. I'll definitely be keeping an eye on Demis' account during game 5.

It feels really weird to see someone being showered with congratulations for beating a computer program. What exactly is he being congratulated for? For probably triggering and then capitalizing on a bug in AlphaGo's AI? For showing that human resolve, perseverance and a "fighting spirit" can trump a flawed AI, at least until the AI gets fixed? For giving DeepMind extremely valuable test data that will only accelerate the refinement of AlphaGo's AI? For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary?

As noted by the live commentator, this has the opportunity to be the third revolution of the game of Go, opening again new ways of playing for the whole of humanity.

Imagine what AI, in different fields, means for humanity if it has so much to teach us, just by being able to "think" differently. I sure hope that one day they start writing philosophy and, doing so, potentially legitimize everything that currently makes humans unique and extraordinary.

What are the other two revolutions of game of Go?

cf post game 3 conference commentary : https://youtu.be/qUAmTYHEyM8?t=20639

(also during the actual game with more details, but I can not find the exact time again). Edit : can not find game3 commentary but during game 4 he is coming back to it again with interesting details as for why this is a great opportunity to inspire human players : https://youtu.be/yCALyQRN3hw?t=7097

Basically, once in ancient Japan and more recently a Chinese origin player in Japan, both incredibly strong players, surprised everyone with never before seen moves that subsequently where integrated in modern game theory. The hope is that the same could happen here.

a computer that loves wisdom?

Perhaps you'd like a shot at explaining why AI is an "amoral field of study", and why it could "delegitimize everything that currently makes humans unique and extraordinary"?

Dolphins are about as intelligent as us, too. Are dolphins amoral? Do they delegitimize Beethoven, Tesla, Gödel, Einstein, and da Vinci?

> Dolphins are about as intelligent as us, too. Are dolphins amoral? Do they delegitimize Beethoven, Tesla, Gödel, Einstein, and da Vinci?

If we invented dolphins, and they began to replace and displace us, then maybe? I don't think it is about the intelligence, but in how we use it of course. Dolphins don't really have any control of my life or those around me.

That sounded a lot more paranoid than I meant, I actually agree with what I think you are saying

> Dolphins are about as intelligent as us, too

How are you making this comparison?

I think Bud wanted to make a different point. Ie that AI doesn't delegitimize us---whatever that even means. Dolphins were just an example. He could have used Neanderthals, or Aliens.

Not really related, but dolphins are assholes!


Frankly many pre-modern, pre-civilizarion humans were assholes. Read many studies or accounts of life in relatively isolated tribal societies and hair curlingy awful accounts of violent and sometimes institutionalised abuse are uncomfortably common. Leave the safety catches off basic human urges for long and it can get pretty ugly.

> Frankly many pre-modern, pre-civilizarion humans were assholes.

While it's somewhat news to me, I'm very glad to hear we've moved past that.

I hope that was irony.

Very much so. Many modern, "civilized" humans are assholes, and now we've got global audiences for it!

Yes, I know, it's just completely insane to declare that we've somehow qualitatively moved past our brutish origins as long as torture, rape, and murder remain de facto extensions to politics.

I have no reason to doubt that quantitatively we have made some progress - although Pinker's arguments are most definitely not universally accepted in the scientific community.

Dolphins definitely delegitimize Beethoven.


Lee Sedol will forever be known as the first human to defeat AlphaGo in Game 4, and potentially the last one too.

That's quite an accomplishment.

Maybe they should fork version 18 as the hard but human betable version of alphago

How about just for playing a good game of Go?

So I suppose the fact that the opponent is a computer program should not be factored into the reaction to Sedol's win at all.

I hope your reductionist explanation is not accurate because it would imply that we are already so highly conditioned to machines and to AI that this match is thought to be no different from a match between two humans.

The machine beat him 3 times and was more or less expected to win again. It didn't, which would imply that Sedol played an exceptionally good game. Seems pretty obvious to me.

You're acting like following arbitrary sets of rules to maximize a value function isn't something that computers excel at. The only thing that was holding up computers for Go was simply time; there were too many possibilities to consider. Humans are really, really good at heuristically trimming possibilities, but sometimes to the detriment of finding maxima.

You know that computers beating us at chess allowed us to get better at chess, right?

Fan Hui, the professional player they beat late last year, already improved his game by quite a bit from the occasional match with AlphaGo (deepmind hired him as a consultant to test). He's won every single game in the last European championship, and moved from around top 600th player in the world to around top 300.

Wow. I know nothing about go, but 600--->300 sounds like quite an improvement.

>For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans

Or, writing from the point of view of our mechanical successors to this world, for helping to advance a highly ethical field that could exterminate that genocidaly murderous evolutionary abomination that was the human race. Who incidentally thought they were extraordinary but couldn't even play Go.

I don't think this comment should be so shouted down. Besides the politeness that typically comes with these events, it does feel strange to congratulate, on top of normal politeness, Se-dol's victory.

And for a game that was thought to be many decades away in terms of computer capabilities, this probably should be a time to think of the possible consequences of AI improvement and take a close look at it.

I think (or at least hope) most of the downvotes have nothing to do with congratulations, and only refer to the part where AI is called "an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary".

That line of reasoning would be worth a laugh, if it wasn't so widespread in the general population.

Perhaps still better than my friend, who thinks suffering is what makes humans special. Discussions with him are, well, interesting.

Maybe introduce your friend to buddhist ideas of ending suffering via meditation. Might be another interesting discussion.

I think Buddhism is where he got that idea from.


>Both have to do with right and wrong, but amoral means having no sense of either

Well that's true then! A nice quote is from Feynman in regards to scientific research.

"To every man is given the key to the gates of heaven; the same key opens the gates of hell. — Richard Feynman "

We really have no idea yet how this AI research will play out.

I don't quite like the line "Delegitimize all that makes people unique and special" But that's my only real disagreement.

A quote by Lee Sedol from years ago (might be apocryphal, couldn't find the original source):

Q: We heard there's now an anti-Lee Sedol website in Korea?

A: I don't even have time for my fans. I don't care about haters. ("나를 좋아하는 팬들에게도 신경을 못 쓰는데 그들에겐 당연히 신경 끈다.")

If developments in AI can delegitimise human intelligence, then it was never "legitimate" in the first place (whatever that means).

>For giving DeepMind extremely valuable test data that will only accelerate the refinement of AlphaGo's AI?

You're treating Lee Seedol as if he is a fixed dataset to be trained on. Why can't he also be a "NN" who can also "refine" his AI[0] and thus be hearder for Alpha Go to compete with?

You're putting too much faith in the machine and dropping the person, that might not be completely fair.

[0] Albeit not artificial in this case.

> For helping to advance an amoral field of study that can potentially delegitimize everything that currently makes humans unique and extraordinary?

If you care about being unique and extraordinary more than about reason, knowledge, truth, the observable reality, and the search for what it really means to be sentient, then and only then you may call AI "amoral".

Also, you are being racist against artificial sentient beings, and being racist is hopefully not what makes humans extraordinary.

racist is, imo, a strange word to use there

Yes to all of the above, except the last.

If it's true that AlphaGo started making a series of bad moves after its mistake on move 79, this might tie into a classic problem with agents trained using reinforcement learning, which is that after making an initial mistake (whether by accident or due to noise, etc.), the agent gets taken into a state it's not familiar with, so it makes another mistake, digging an even deeper hole for itself - the mistakes then continue to compound. This is one of the biggest challenges with RL agents in the real, physical world, where you have noise and imperfect information to confront.

Of course, a plausible alternate explanation is that AlphaGo felt like it needed to make risky moves to catch up.

The same happens to people, especially people that study theory. You can totally throw them off their game by making a non-standard move, even a relatively bad one as long as it breaks their existing pre-conceived notions about how the game should progress.

Of course against a really strong player you're going to get beaten after that but a weak player strong on theory will have a harder time.

That's what Gary Kasparov attributes as one of the reasons he lost to deep blue.

Well also that the machine could study his style of play, whereas he was playing a stranger; one whose style changed during each game thanks to humans hand altering the logic. Not really fair. At least, different to how humans play chess.

To be fair, Lee Sedol was also changing style during the game. The opening for game 4 is heavily based on the one for game 2, except when he made a mistake.

Do you have a link toward this interview?

Wait, in the interview he says the opposite: "That move had no impact on me whatsoever. Move on."

Thank you very much.

What you are talking about here is called "label bias". [2] It is present only if training is done badly.

When you have a game of Go, or Super Mario level. You don't want to make your decisions by just checking the local features and doing them, because it can be the case that by compounding errors you end up in a state you never saw, and all of the future decisions won't be good.

One can avoid these situations by training jointly over the whole game.

For example, maximum entropy models can work for decision making problems but their training leaves them in a "label bias" state because the training is trying to minimize loss of local decisions, instead of trying to minimize the future regret of current local decision.

The solution to these label bias problems are Conditional Random Fields, or Hidden Markov Models. You could accomplish the same with Recursive Neural Networks if you trained them properly. For example, there is no search part (monte carlo tree search, or dynamic programming [viterbi] like it is in CRFs or HMMs) in RNNs but they are completely adequate for decision based problems (sequence labeling etc.). Why is that the case? Because search results are present in the data, there's no need to search if you can just learn to search from the data.

If DeepMind open-sourced the hundreds of millions of games that AlphaGo played, it is quite possible to train a model that wouldn't need a Monte Carlo search and would work quite well, because you would learn the model to make local decisions to minimize future regret, not to minimize its local loss. [1]

The only reason why reinforcement learning is used is because there are too few human games of Go available for the model to generalize well. Reinforcement learning can be used in the setting of joint learning because you play out the whole game before you do the learning. This means that you can try to learn a classifier that will minimize the regret by making a proper local decision. Although, as far as I know, and can see from the paper, they didn't train AlphaGo jointly over the game sequence.

But! Now they have a lot of data and they can repeat the process.

[1]: http://arxiv.org/abs/1502.02206

[2]: http://repository.upenn.edu/cgi/viewcontent.cgi?article=1162...

I don't think we're talking about the same concept. I'm not familiar with the concept of label bias, and the literature I'm familiar with has not referred to label bias as the problem I'm talking about. Also, I'm not sure how a problem with probabilistic graphical models translates to the neural net policies of AlphaGo.

I fail to see how a "per-state normalization of transition scores" translates to there being a bias in value networks towards states with fewer outgoing transitions.

Yes, the "label bias" is more of a structured learning / joint learning term that is present in natural language processing. But reinforcement learning suffers only if you do the learning to minimize local loss of the decision (label) - if you try to build a classifier that minimizes its loss on local decisions, instead on sequence of decisions.

Their value policy network isn't trained jointly and can compound errors. There are approaches with deep neural networks that don't have a joint training but work pretty well. The reason is that networks have a pretty good memory/representation and by that they avoid much of the problems. But for huge games like Go it is quite possible that more games need to be played for these non-structured models to work well.

Again, I don't think we're talking about the same concept. I also fail to see how training over an entire trajectory is going to help you with trajectories you've never seen. Also, these nets are definitely trained with discounted long-term rewards.

They train using trajectories but train them to guess the trajectory locally, not globally. Discounted long-term rewards are just a hack, they aren't joint learning.

The concept of label bias, or decision bias is a joint/structured learning concept. It is a machine learning concept, it has nothing to do with the application. There are training modes with mathematical guarantee that the local decisions will minimize the future regret.

Joint learning is done not on the whole permutation but on the markov-chain of decisions, which is sometimes a good enough assumption. For example, the value policy network of AlphaGo is percisely a Markov chain, given a state, tell me which next state has the highest probability of victory. The search then tries to find the sequence of moves that will maximize the probability, and then it makes the best local decision (one move). It works like limited depth min-max or beam search. They do rollouts (play the whole game) to train the value network, but it is now a question if they train it to minimize the local loss of the made decisions, or if they train it to minimize the future regret of a local decision. As I've stated before, minimizing joint loss over the sequence, or minimizing local loss over each of made decisions, is exactly influencing if there will be bias or not.

The whole point of reinforcement learning is to create a huge enough dataset to overcome the trajectories-not-seen problem. The training of the models for playing Go is entirely a whole different kind of a problem.

Now when they have hundreds of millions of meaningful games they can skip the reinforcement learning and just learn from the games.

The illustration of the "label bias" problem is available in one source I referenced. Terms like compounding errors and unseen state are there. The "label bias" is present only in discriminative models not generative ones. Which means that AlphaGo - being a discriminative model, can suffer from "label bias" if it wasn't trained to avoid it.

Yes, I'm pretty sure we're not talking about the same thing. I'm precisely talking about the trajectories not seen problem. Nothing is going to save you from the fact that the net has not seen a certain state before.

That's not really a problem. Given a large enough dataset you want to generalize from it - there are always states not present in the dataset - the whole point is now to extract features out of your dataset to allow generalization on unseen states. Seeing all of the Go games isn't possible.

The compounding errors problem that stems from decision bias isn't because you haven't seen the trajectory, it is because the model isn't trained jointly.

We're talking about the same thing. You just aren't familiar with the difference present between joint learning discriminative models and local decision classifiers (Markov entropy model vs conditional random fields - or recursive CNNs trained on joint loss over the sequence or recursive CNNs trained to minimize the loss of all local decisions).

In the case of Go, one would try to minimize the loss over the whole game of Go, or over the local decisions made during the game of Go. The latter will result in decision bias - that will lead to compounding errors. The joint learning has a guarantee that the compounding error has a globally sound bound. (proofs are information theory based and put mathematical guarantees on discriminative models applied to sequence labelling (or sequence decision making))


Checkout the lecture below, around the 16 minute mark it has a Super Mario example and describes exactly the problem you mentioned. The presenter is one of leading figures in joint learning.


It is completely supervised learning problem. But, look at reinforcement learning as a process that has to have a step of generating a meaningful game from which a model can learn. After you have generated bazillion of meaningful games you can discard the reinforcement and just learn. You now try to get as close to the global "optimal" policy as you can, instead of trying to go from an idiot player to a master.

Of course, the data will have flaws if your intermediate model plays with a decision bias. So, instead of training the intermediate to have a bias, train it without :D

Yes, Hal Daume is referring to the issue I brought up. I'm not interpreting his comments as referring to issues with training the model jointly - he's referring to exactly what I'm describing - never having even seen the expert make a mistake. The only solution is to generate trajectories more intelligently (which is in line with Daume's comments).

Yes, it is true. In the case of Super Mario he does the learning by simulating level-K BFS from positions that resulted in errors (unseen states) and thus minimizes the regret for the next K moves.

Although, if you checkout his papers, the problems I've talked about, when you have more than enough data and when you know you should be able to generalize well you still can get subpar performance if you don't optimize jointly. AlphaGo model isn't optimizied jointly but its power mostly lies in the extreme representation ability of deep neural networks.

This is great stuff, thank you.

Both are plausible for humans as well, I would say (in a more general sense). Certainly mistake spirals from not quite knowing how to deal with the consequences of the first mistake have happened to me personally, and I recall descriptions of people in poverty having to play more aggressively to get a shot at the part of civilization they want to be in, though unfortunately I don't have a source.

May well also depend to a not so "exact" forward pruning

It is just a complex system leaving one of this basin of attraction to try to reach another one. (probably because of an evaluation of risk that deemed it).

And for a complex system transitions happens to be sensitive to the conditions and with quite a lot of impredictability.

AI cannot do smooth transitions because they lack the intuition of what smooth means, and that's how to win against them.

1) identify a basin of attraction(apparition of a bounded domain of evolution in a space phase) 2) set the AI in a well known basin by tricking it; 3) imbalance the AI by throwing garbage behaviour that kick him out of the basin in a random direction 4) let the human win in the chaos that ensues.

Of course it is better done with a software to help you. A real time spase phase analysor.

The point is like in a lot of domain, construction of an AI requires more energy than a software for sabotaging it.

But once you get the framework of thoughts to win against an AI you can get all the AI.

There is a possibility that the AlphaGo team, knowing that it has already won a decisive victory, wanted to spare the champion a crushing 5-0 defeat and so commanded the AI to lose on purpose.

The AlphaGo team has stated that they no longer have the ability to make changes to the AI used for the match. The rules of the exhibition match prohibit that.

They maybe could have anyway, but it would be cheating: just the same as if they'd Mechanical Turk'ed it by e.g. having Ke Jie actually choose the moves to play.

Yesterday I kind of entertained the thought that they could have tried to make Alphago waste a move somewhere at the beginning of the game to give Sedol the equivalent of a stone advantage and see how Alphago could handle that. But it's clear that any move like that would have been immediately detected by the professionals who can read the game as I read the morning newspapers, so no.

I don't know why you're being down voted. The fact that the human won, makes it now so much more interesting! It surely is the best outcome possible both for human go players and Google as well (having won the first 3 already).

Nah, they let the european champion lose 5-0.

Incredibly unlikely.

In the post-game press conference I think Lee Sedol said something like "Before the matches I was thinking the result would be 5-0 or 4-1 in my favor, but then I lost 3 straight... I would not exchange this win for anything in the world."

Demis Hassabis said of Lee Sedol: "Incredible fighting spirit after 3 defeats"

I can definitely relate to what Lee Sedol might be feeling. Very happy for both sides. The fact that people designed the algorithms to beat top pros and the human strength displayed by Lee Sedol.

Congrats to all!

Yep! As someone who was rooting for DeepMind, I like this result, for two reasons: Lee Sedol earned it - he's behaved like a true sportsman all the way - and it gives us some interesting information (yesterday we only had a lower bound on AlphaGo's strength; today we also have an upper bound).

>yesterday we only had a lower bound on AlphaGo's strength; today we also have an upper bound

I think it's premature, establishing bounds with good confidence interval requires tens or hundreds of games. Specifically, 3:2 result would be really inconclusive.

The first data point tells you more about bounds than any individual subsequent data point. Knowing that a human can defeat AlphaGo is enormously important.

To use an analogy, having confirmation of contact by even a single alien species would be hugely important, way more so than exactly nailing down the number of alien species. Knowing that something is even possible is oftentimes the most important aspect that needs to be ascertained, and contact (or a win, in Lee's case) does that unequivocally.

At the time chess was "the" problem, humans could beat computer too from time to time. Now the chess is solved. Same will likely happen to go in the years to come. But the point of this was not the go itself. It was the demonstration of potential capabilities of neural network and other machine learning algorithms.

Chess isn't solved. It's just that computers have gotten sufficiently better at it than humans that humans don't have a chance.

If Chess were truly solved, then you wouldn't be able to make a new AI program that could do better than even odds against the existing ones. But that's not the case, and incremental advancements in Chess-playing programs are made all the time. There are even tournaments where Chess programs play each other. If Chess were solved, such a thing wouldn't make any sense, just like how there are no Tic-Tac-Toe tournaments because that game is solved.

Even that isn't strong enough. The game is determinstic.

If chess were _solved_ we'd know a strategy to allow one of the players (likely white) to always win, or for either of them to always force a draw. (and we'd know which of these strategies were possible for chess).

Consider, say there is a first move white could choose such that no matter what moves black makes, white will win. Then the first would be true. Instead consider, that there is no such move, and any first move has choices where either could win-- if some of those are ones which would force a black win, then the first is again true but for black. Otherwise, a draw can always be forced. These are the only possible outcomes for a solved game of chess.

I've read some articles in Korean press that suggested AlphaGo team picked Lee as their match and not Ke Jie (currently #1 ranked, 19 years young) because there's a lot more public records of Lee's plays over his much longer pro career (nearly 20 years now). Thus more material for AlphaGo to train with and against.

So it's not totally arrogant of Ke Jie to suggest he could beat AlphaGo. AlphaGo has not much 'experience' in dealing with Ke Jie.

And Lee winning of game 4 shows a human is still indeed more capable than any AI. He basically reprogrammed his game on his own. Sorta.

That was specifically discussed in the after match conference today. Demis said that it wasn't trained specifically against Lee but was trained against strong amateurs on the Internet initially before playing against itself. He said that even if they'd wanted to, alpha go requires a much larger number of games than are available by Lee Seoul to train so it wouldn't have been possible anyway.

It was discussed pre-match for game 2 or 3 as well, and Lee's games were described as "a drop in the ocean" I believe.

Came up because of an assertion from the interviewer that chess AI had been trained against it's opponent specifically.

Err, so if AlphaGo wins game 5, does that show 'an AI is still indeed more capable than any human?'

Exactly. The faint whiff of bullshit.

Ke Jie just happens to be #1 right now. Lee Sedol is more impressive in that he's been a world-class Go player for much longer. Twenty years from now, if Ke Jie is as dominant as Lee Sedol is now, then he'd be a good pick then. But either way it's not a big deal; there's not that much daylight between the two, so picking the more famous and well-known one to go up against is a safe bet.

Personally I want to see a discussion game with the top Go experts (including both Lee Sedol and Ke Jie amongst others) competing against the next version of AlphaGo, in a game with much longer time limits.

I don't think top pros view it that way. He is of a new generation that play thousands of games over the internet; that study the game socially, while Lee is much more a loner. Whether Ke Jie will dominate as much as Lee did will say more about the strength of his competitors than his strength vs Lee.

Another reason could be that Lee is a much bigger name in go than Ke Jie. Sure, Ke Jie is stronger now but Lee was a dominant/top player for a long long time.

Many of the reinforcement learning techniques require data sets in the billions to make much sense of it all. This is likely the reason why the AI had to play against itself so much, because there simply wasn't enough data in general, even including all recorded games.

So Lee's games are sort of a drop in a bucket as far the performance of the AI goes.

Interesting point here: if you are the best Go player in the world, and you play against an AI that can beat you, it seems really possible that it makes you that much better. It would be interesting to see what the win ratio would be after match 10, match 50 or so on.

My friends and I (many of us are enthusiastic Go lovers/players) have been following all of the games closely. AlphaGo's mid game today was really strange. Many experts have praised Lee's move 78 as a "divine-inspired" move. While it was a complex setup, in terms of number of searches I can't see it be any more complex than the games before. Indeed because it was a very much a local fight, the number of possible moves were rather limited. As Lee said in the post-game conference, it was the only move that made any sense at all, as any other move would quickly prove to be fatal after half a dozen or so exchanges.

Of course, what's obvious to a human might not be so at all to a computer. And this is the interesting point that I hope the DeepMind researchers would shed some light on for all of us after they dig out what was going on inside AlphaGo at the time. And we'd also love to learn why did AlphaGo seem to go off the rails after this initial stumble and made a string of indecipherable moves thereafter.

Congrats to Lee and the DeepMind team! It was an exciting and I hope informative match to both sides.

As a final note: I started following the match thinking I am watching a competition of intelligence (loosely defined) between man and machine. What I ended up witnessing was incredible human drama, of Lee bearing incredible pressure, being hit hard repeatedly while the world is watching, sinking to the lowest of the lows, and soaring back up winning one game for the human race.. Just incredible up and down in a course of a week. Many of my friends were crying as the computer resigned.

Given the 78 was a "divine inspired" move, and Demis tweeted that AlphaGo's mistake was move 79, my guess is that Black 78 was not in AlphaGo's search tree at all. It had pruned that move as being too "unlike what a human pro would do", and had not considered play beyond that point. Then it had to suddenly backtrack, drop its entire search tree, and start rebuilding from scratch. It took only a couple minutes to do so, but that's still not enough time to build up the depth and breadth of a tree that it had been constructing up to that point in the game. So it ended up playing a suboptimal move simply because Lee Sedol played a move that was simultaneously brilliant/strong and novel/unexpected, and it had to respond somehow.

The problem may well have occurred before then, too. If move 78 was incorrectly pruned from the tree for being unlikely, then AlphaGo had many turns in which it should have been considering counter-moves to move 78 (and thus prevent the situation that allowed move 78 to even happen in the first place), but it didn't.

>"divine-inspired" move

Just letting others know, this expression is a rather common way of saying a single move that changed the course of the game. There were "divine-inspired moves" that AlphaGo made in the first three games too.

Well it's not exactly common. It's used to describe the very best moves at the highest level of play.

I mean yeah, the decisive move _I_ make when I play my noob games is hardly divine, but in top level play, it happens every game.

AlphaGo's mid game today was really strange. Many experts have praised Lee's move 78 as a "divine-inspired" move.

Add to that the moves where AlphaGo basically threw away stones by adding to formations that would be removed from the table. Even I, a complete, lousy, amateur, could see that they were a mistake.

To be fair, those moves were made when AlphaGo was already behind. It's just not any good at dealing with being that far back. The AI just has no concept of what to do while behind: What a human would do is to go for positions that are very complicated, making the chances of sloppy play much higher. Instead, it makes moves that have to be answered in only one or two ways, but that are very easy to read by even an amateur human.

Training an ai to make good play in a bad situation would require it to train in ways that are very different than the AlphaGo vs AlphaGo training that it spent a lot of time doing. And why do that, instead of trying to make itself good while the game is even, or when it's winning?

It's a bit like how it's different to train in chess to play in pro games, vs training to hustle amateurs in the park: You are not making the best move, but a good move that will confuse the opponent the most. You are trying to exploit a bad opponent: Very different play.

Crying tears of joy I hope... or are your friends computers?

So AlphaGo is just a bot after all...

Toward the end AlphaGo was making moves that even I (as a double-digit kyu player) could recognize as really bad. However, one of the commentators made the observation that each time it did, the moves forced a highly-predictable move by Lee Sedol in response. From the point of view of a Go player, they were non-sensical because they only removed points from the board and didn't advance AlphaGo's position at all. From the point of view of a programmer, on the other hand, considering that predicting how your opponent will move has got to be one of the most challenging aspects of a Go algorithm, making a move that easily narrows and deepens the search tree makes complete sense.

Or maybe it was just the computer version of grasping at straws. None of the future gamestates looked good, so it ended up picking whatever could at least theoretically lead to a comeback, even if that would require Lee Sedol to miss a completely obvious move.

This looks like a possibility to me. I'm not sure that Alphago has any idea how strong an opponent it is facing at any given moment. It's played games against venues and it's played games against professionals. But if a set of complex advanced moves are still likely to lead to a narrow defeat, it might well pick some stupid moves that in theory could allow it to win back a dominant position if the opponent plays badly.Since it doesn't have a model of the competence of its opponent, that might appear to be a viable strategy because against some opponents and in some past games it's played, that could work.

This is really interesting, because forming a model of our opponent and tailoring our strategies appropriately is fundamental to how humans approach competitions.

> Toward the end AlphaGo was making moves that even I (as a double-digit kyu player) could recognize as really bad.

Maybe its training was focused on winning, not loosing narrowly. So as soon as it became obvious it can't win. It was just making silly moves because it was less researched scenario.

Some people when they see they can't win they do silly moves just for fun.

They're not really silly moves, it's just that in order to get into a good state for AlphaGo, Lee Sedol would have to miss the obvious response to those moves.

If he had made a dumb move and AlphaGo capitalized on it we would be talking about how the AI goaded him into a sense of safety.

If he had made a dumb move in response then we would be talking about how he made a really really dumb move, that even an amateur could have seen.

The moves made by AlphaGo there were very bad.

If the situation was unwinnable then in a certain sense all moves are equally bad.

This is not the reason. Monte Carlo programs typically play bad-looking moves in way ahead / way behind situations. In these situations "bad" moves will sometimes not alter the win probability they measure, so they don't know how to disguinish a more natural move from one that looks bad. This was mentioned in the commentary when one of the DeepMind guys came in.

Go programmers have taken various steps to mitigate this behavior, such as dynamically adjusting komi to trick the engine into thinking it is a closer game, but I don't know if AlphaGo uses any such technique.

A bit like getting a confused parser back on track. We've witnessed the first digital cold sweat.

I'm curious - is this the same thing that it was doing in the matches that it won? People have been talking about how it seemed to be toying with its opponent towards the end of those matches, making moves that didn't gain it much, but maybe it just doesn't actually understand what moves to play some of the time and it simply hasn't mattered before.

I doubt its loss function tries to maximize the prediction of the opponents next move.

I doubt it's considering that directly. What I'm guessing (without having really studied the paper in depth) is that if you have two board positions which would score identically in the end, but where one has fewer remaining liberties than the other (i.e. one is deeper into the tree), then it will be scored more highly.

In other words, the humans commentating the game were evaluating the moves as non-sensical because the outcome (AlphaGo plays here so Lee plays here) is a foregone conclusion and doesn't change the human evaluation of the board position. What it does do is remove uncertainty (AlphaGo plays here, but Lee screws up and plays somewhere else). In their evaluation, humans tend to value that uncertainty (i.e. counting on the possibility of a mistake), but I'd guess that AlphaGo penalizes the uncertainty (i.e. known board positions are scored higher than potential board positions), leading it to over-value simple advancement of the board in the end-game.

The bigger the game board, the more possible states, the better chance a human player has against the AI. Is this the same heuristic in play here - Play a game with more unknowns, gain an edge?

Do you think Lee could use this as a way to crack AlphaGo?

From my limited knowledge of the game, a few of the moves that Lee made before AlphaGo "lost its mind" were a tad on the aggressive side. The conventional wisdom in Go is to prefer more conservative moves (increasingly so as the game progresses). Usually, if your opponent is being overly aggressive then you want to play more conservatively and wait for them to make a mistake, but in AlphaGo's case, it attempted to match Lee's aggressiveness move-for-move, and Lee was able to capitalize.

If I had to guess (and this is pure speculation), AlphaGo has no concept of waiting for its opponent to make a mistake. Instead, it assumes its opponent will continue to make the best possible follow-ups, and so AlphaGo feels overly compelled to "keep up". In this case, that did it in.

If this is what happened, then yes, I would expect Lee to be able to capitalize.

> If I had to guess (and this is pure speculation), AlphaGo has no concept of waiting for its opponent to make a mistake. Instead, it assumes its opponent will continue to make the best possible follow-ups

One of the DeepMind guys just confirmed that this is how AlphaGo operates in the press conference.

"When you're up against someone smarter than you are, do something insane and let them think themselves to death."

-- Pyanfar Chanur (C.J. Cherryh)

Which makes sense. It doesn't even have a concept of a human opponent or human-style mistakes - it plays against Lee just like it plays against itself when training.

This is by design and admitted by the developers - in fact it plans only for what it judges optimal opponent moves. A modification to the game may be to keep on reserve a set of options for nonoptimal opponent play although in actual practice the simplicity of design here (maximize probability of winning and only plan on optimal opponent moves) may be one reason why it plays as well as it does.

In that case, theoretically at least, we could train AlphaGo by getting top Go players to play many games against each other where one or both players is making very aggressive moves when reasonable?

Well, that's where AlphaGo and the progress in Go AI that it represents is so exciting! The game of Go is so fluid with such a huge number of possible positions that players tend to adopt certain styles of play en masse. I've heard it said that you can identify a Go player's mentor or "house" just by the style of play they use.

This has also resulted in larger shifts in playing style over time. Studying very old (and I mean very old...700+ years old) games can be entertaining and even educational in the abstract, but you won't want to directly adopt the style of play because the game has evolved.

It's already been mentioned a couple of times that AlphaGo almost certainly represents just such a shift. Top players will learn from it, and I'd even be willing to bet they will beat it with some regularity once they do!

Ultimately, what sets apart Go geniuses is their ability to play creatively in the face of seemingly insurmountable challenges. So the big question is how "creative" AlphaGo can be. Is it merely synthesizing strong play from known positions? Can it introduce novel strategies? And if it does, will it be able to adjust as other Go masters adjust to it and bring their own brand of creativity to play?

To answer your original question, this very well could introduce a new era of more aggressive play to the world of Go. Only time will tell...

AlphaGo will learn from any new styles and apply them effectively without mistakes from fatigue or inattention.

This incarnation of AI is not creative, it wont generate new play styles, that is still the domain of top human players for now. But it will ruthlessly learn and adopt any new and improved strategies. That's really the point to take away from its success so far.

AlphaGo mostly plays against itself, meaning it learns in a very separate environment. It certainly might come up with novel strategies.

I suppose that's possible. However Demis Hassabis has said many people have noted that AlphaGo makes human-like moves. He commented that it made sense since AlphaGo taught itself from the games of human players.

Yes, the starting kernel of its learning this time was a collection of human games. This has caused the Policy Network (which says "given this board state, these moves are worth investigating in more detail") to be biased towards more human-like moves.

But they're already working on a new version of AlphaGo which isn't trained on any human data at all. It starts by making truly random moves and improves from there. This will require much more processing time and probably an order of magnitude more "self-play", but it will probably result in truly novel strategies that aren't part of the current human metagame.

It could also be that human-looking moves are already close to the best possible. Neither AlphaGo nor human players can play god's hand, but they're approaching the same location.

Is Go-Space understood well enough to know either way?

The OMG-AI people claim that AGI would be dangerous because it would reliably innovate in new spaces and out-predict humans.

So a true super-AGI would make go moves that were unexpected and incomprehensible with some percentage of misleading fake-outs, but it would still win most or all of the time.

If the human exploration of Go-Space is close to the god's hand bounds, this can't be true.

My intuition (and it's really only just that) is that Go space is large enough that AI would be able to outplay humans while still not even beginning to approach "perfect" play. If so, then I would also expect that humans should be able to follow the lead of AI into new areas of Go space, and outplay the AI (at least until the AI has a chance to learn and catch up).

We'll know if this is the case in a couple of years, if the competition between human and AI goes back-and-forth (unlike Chess, where after AI was good enough to beat humans, it could do so reliably).

Either way, it's interesting to note that AlphaGo had literally thousands of games to learn from to find weaknesses in human play, but Lee Sedol seems to have only needed 3 before he was able to find weaknesses in AlphaGo's play.

> Either way, it's interesting to note that AlphaGo had literally thousands of games to learn from to find weaknesses in human play, but Lee Sedol seems to have only needed 3 before he was able to find weaknesses in AlphaGo's play.

To be fair we can't know how many games Sodol played in his own head to figure this out.

Step 1 of exploiting this is "play a brilliant move to win a fight in the middle game", which is not the easiest thing to repeat.

The crucial play here seems to have been Lee Seedol's "tesuji" at White 78. From what I understand this phrase in Go means something like "clever play" but is something like sneaking up on your opponent with something that they did not see coming. Deepmind CEO confirmed that the machine actually missed the implications of this move as the calculated win percentage did not shift until later. https://twitter.com/demishassabis/status/708928006400581632

Another interesting thing I noticed while catching endgame is that AlphaGo actually used up almost all of its time. In professional Go, once each player uses their original (2 hour?) time block, they have 1 minute left for each move. Lee Sedol had gone into "overtime" in some of the earlier games, and here as well, but previously AlphaGo still had time left from its original 2 hours. In this game, it came down quite close to using overtime before resigning, which is does when the calculated win percentage falls below a certain percentage.

To underscore hyperpape:

Tesuji isn't a trick play, it's more like a power play. Each player can read out how a fight is going and see their line far into the future. Two professionals will pick two lines, two suji, which are in balance and push up against one another tightly.

A tesuji is a part of the line which is suddenly showy or strong. It could mean a failure for the opponent if they had not taken enough of an advantage in the struggle to this point or if they do not have a counter tesuji available.

Indeed, that might be the design of a set line: one side continually loses ground to the other forcing the other to take these small advantages all so that the first side has an opportunity to play a tesuji and return to balance. Many such lines are canonicalized ("joseki") and known to any professional. Moreover, professionals regularly identify potential tesuji and expect their opponents to as well.

Tesuji has no implication that your opponent won't anticipate it. Both sides can know that the tesuji is there, and it's still a tesuji.

AlphaGo was in over time in game 2.

To add to this, it seemed to have no problem makkng moves wothin the 60 second window (at that lte tage of the game).

I suspect that in almost all cases it has already converged on a move well before sixty seconds, and so if forced by time pressure to choose a move before it has achieved the confidence it's looking for, it will probably still do the right thing most of the time (especially in the endgame, when the branching factor is cut down enormously). It just uses additional time because it has it. I would love to see a graph of AlphaGo's thinking showing certainty of time, and how long it was taking to come up with the move it ultimately ended up playing, and how that certainty evolved over its thinking period.

Another way to look at this is just how efficient the human brain is for the same amount of computation.

On one hand, we have racks of servers (1920 CPUs and 280 GPUs) [1] using megawatts (gigawatts?) of power, and on the other hand we have a person eating food and using about 100W of power (when physically at rest), of which about 20W is used by the brain.

[1] http://www.economist.com/news/science-and-technology/2169454...

Definitely not anything close to gigawatts: ALL of Google (not DeepMind, but the entire company) uses only about 260 megawatts.

Probably on the order of one megawatt or so.


Still, wow! And a human is using 100W or so: (~400 max) https://sustainability.blogs.brynmawr.edu/2012/07/31/underst...

And don't forget, the human grows organically from a complex system of DNA that also codes for way more than playing Go! And is able to perform a lot of tasks very efficiently, including cooperating together on open ended activities.

And building an AI that can play better than itself :)

On the other hand, they've only been working on Alphago for two years.

We can estimate the power consumption: 1920 CPUs as in cores, or physical packages? If the latter they are ~100W each, so that's 192kW; if it's the former, depending on how many cores per package, a fraction of that. The GPUs are likely to be counting physical packages (and not cores, which is much more) and they draw 300-400W each, for a total of ~300kW. Add a bit of overhead and I'd say 500kW (half a megawatt) is a good rough estimate.

The figures quoted for the cluster are about 25 times higher than those quoted for a single machine, so I would guess the cluster consists of 25 machines. 20 kW, or 166 amperes at 120 volts, per machine seems a bit high to me.

Sure, but AlphaGo will probably evolve much faster. In a few years it will run on much smaller devices, as happened to chess programs.

Exactly - the watts comparison is a bad one. Stockfish running on an iPhone (~5 watts?) can play world class chess.

A single A15 core at around 1Ghz has more Gflops of power than deep blue had across it's whole system (11.38Gflops).

1920 CPUs (a 4-core haswell from 2013 is around 170Gflops). 280 GPUs (previous gen Nvidia K series peaks at around 5200GFLOPS). That's 1,782,400Gflops or around 150,000x more processing power. If they were running latest-gen hardware, then the would be closer to 200,000x faster.

Given that Moore's law is slowing down and the size of the system, we're a long way from considering putting that in a smartphone.

People are focusing way too much on the current hardware that AlphaGo happens to be running on.

AlphaGo is still a very new program (two years since inception). It will get significantly better with more training, or, equivalently, it will stay at the same strength while running on much less hardware.

Don't read too much into what one particular snapshot in its development cycle looks like. Humanity has had hundreds of millions of years to maximize the efficiency of the brain. AlphaGo has had two years. It's not a fair comparison, and more importantly, it's not instructive as to what the future potential of AI algorithms looks like.

It should use less that 0.5 megawatt, 1920 * 150W (High end server CPU) + 280 * 300W (Nvidia Tesla cards)

We need to keep our brain running 24x7x365, though, so in the long run, deep mind is more efficient.

There were a few jokes made during the round about how AlphaGo resigns. Turns out it's just a popup window! http://i.imgur.com/WKWMHLv.png

For anyone wondering, that's Ubuntu Linux (version most likely 14.04; Unity interface).

off-topic: DeepMind should switch to a tiling window manager like i3 for increased keyboard-only productivity :)

The developers probably use whatever window manager they want. I'm personally a fan of i3 also.

But there is nothing wrong with keeping the frontend machine used in this Go match in default Ubuntu desktop environment since its only purpose is to play Go with a graphical user interface anyway.

I know. I was half-joking.

Also given the progress of DeepMind so far, it's very likely that whatever desktop setups they have, is working very well for them.

Probably Goobuntu. ;-)

Probably not, as they are not sitting on a corp workstation on the corp network :)

They could be remoted into one?

AlphaGo resigns

The result "W:Resign" was added to the game information.

Edit: Tinyyy is right.

According to this picture[1], it is more likely "W+Resign". I'm curious why a plus sign is used instead of a colon!

[1] http://gall.dcinside.com/board/view/?id=baduk&no=109200&page...

In go results are Color+amount of points, when counted. So the plus is then left for uniformity in resigns/forfeits

Likely they're using SGF format to store game records:


Bad google+ joke withheld.

I think it reads W:Resign

No, it's "The result “W+Resign” was added to the game information"


Usually the nomenclature is "W+Resign"

"White wins by resignation".

The title bar looks like it's GoGui, so it's just GoGui's reaction to a Go Text Protocol resignation?

Looks like it, see:


And search for MSG_RESIGN_2

Yeah, I was worried that if the score ends up 5-0, we could never see how AlphaGo resigns. Good to see Lee Sedol's victory.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact