> The hardware requirements were stunning (280 GPUs and 1920 CPUs for the larges...

Smerity · on March 14, 2016

The best discussion in terms of hardware adjusted algorithmic process I've seen is by Miles Brundage[1]. The additional hardware was the difference between amateur and pro level skills. It's also important to note that the non-distributed AlphaGo still had quite considerable compute power.

Now was a sweet spot in time. This algorithm ten years ago would likely have been pathetic and in ten years time will likely superhuman in the same manner that chess is.

None of these constitute a general advance in AI however.

[1]: http://www.milesbrundage.com/blog-posts/alphago-and-ai-progr...

dsharlet · on March 14, 2016

I don't really buy his argument. Lots of other companies with plenty of resources have been attacking this problem, including Facebook and Baidu. People have been talking about Go AIs for decades. If it was just a matter of throwing a few servers at Crazy Stone or another known Go algorithm, it would have been done already.

Smerity · on March 14, 2016

The companies may have plenty of resources but those resources were not solely dedicated to this problem. You mention Facebook and they were indeed on the verge of putting time into this with - though their team is far smaller (1-2 people) and they still used less compute resources. From the linked Miles article:

"Facebook’s darkfmcts3 is the only version I know of that definitely uses GPUs, and it uses 64 GPUs in the biggest version and 8 CPUs (so, more GPUs than single machine AlphaGo, but fewer CPUs). ... Darkfmcts3 achieved a solid 5d ranking, a 2-3 dan improvement over where it was just a few months earlier..."

cjbprime · on March 14, 2016

That can't be right. Amateurs do not beat pro players 25% of the time, yet single machine AlphaGo beats distributed AlphaGo 25% of the time.

emcq · on March 14, 2016

I don't think comparing win loss distributions is particularly insightful.

By a single machine winning many games relative to the distributed version, really it's just saying that the value/policy network is more important than the monte carlo tree search. The main difference is the number of tree search evaluations you can do; it doesn't seem like they have a more sophisticated model in the parallel version.

This suggests that there are systematic mistakes that the single 8 GPU machine makes compared to the distributed 280 GPU machine, but MCTS can smooth some of the individual mistakes over a bit.

I would suspect that the general Go-playing population of humans do not share some of the systematic mistakes, so you likely won't be able to project these win/loss distributions to playing humans.

simonh · on March 14, 2016

Then it would seem that AlphaGo on a single machine isn't equivalent to an Amateur. Presumably it's really good even when running in a single machine, and the marginal improvement for each additional machine tails off quite quickly.

But when you're pitching it at a top class opponent in a match getting global coverage, you want all the incremental improvement you can get.