Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not surprising.

However, I am not buying the claim that the algorithm is going to transform many other domains:

" The system could help optimize power grids, he says, or streamline shipping routes, or refine scientific research."

by Wired https://www.wired.com/2017/05/googles-alphago-levels-board-g...

The main concern is data efficiency. AlphaGo essentially baked good movies into value and policy network by playing millions of times. In reality, unless you have a really good simulator, deep reinforcement learning is almost never applicable.



You can view there being 5 dimensions/concerns with large scaling factors containing issues needing solving to be generally applicable.

1. How to search 2. How to resolve breadth concerns 3. How to resolve depth concerns 4. How to resolve large training requirements 5. How to form and resolve recursive goals

With AlphaGo, it looks like they may have solved 3

1. Search. Monte Carlo will likely be used from here on out. Much more efficient than full brute force search (chess style). 2. Breadth - Policy network. Reduced branching factor from 250 to between 5 and 10. 3. Depth - Value network reduced depth from roughly 75 to < 20

Next they're working on applying reinforcement learning to their two NNs. Ie seeing if they can optimize how well they can do and how quickly they can learn. Maybe there are large gains there.

If so, the final task would be finding a game or situation that required nesting of sub-goals like everyday life does.

They've made some impressive gains.

> In reality, unless you have a really good simulator, deep reinforcement learning is almost never applicable.

I could see Deep Mind applying this to something like catching a ball in the near future. Just to show it's generality. It's not that different of a problem space than learning to play breakout.


Chess AI does not use brute force search. It evaluates board positions with heuristics (eg. pieces captured, blocked pawns, possibility of castling, etc.), and uses the estimated chance of winning to prune the search tree with techniques such as alpha-beta pruning[0].

Both board evaluation and tree pruning are more difficult in Go. In Chess, capturing material is very important, but capturing stones in Go is often useless because the stones are in a bad position ("dead"). You can't decide who's winning just by counting captures. It's very difficult to hand code a board evaluation function, but AlphaGo's neural network solved the problem.

However, the neural network alone, with no tree search, was only enough to reach strong amateur level. Unlike Chess, where moves usually have consequences in the near future, a move in Go can be important tens or even hundreds of moves later. Alpha-beta pruning doesn't handle very distant consequences well, but Monte Carlo tree search does much better. Combined with the neural network it was enough to beat humans.

[0] https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning


Minmax as used in chess is nothing more than a Depth First Search constrained within the space of moves of agents acting in their own self interest. It searches the maximum number of plies it can (you can think of a ply as a move by a single player in chess[0]) within the time & memory it has allowed.

This approach is brute forcing the search space restricted by computational limits. Alpha-Beta is just an optimization to minmax and not a fundamental change in the search approach.

Chess and Go are very different games and need to be treated as such. Lots of the recent excitement about Go is due to the innovations in machine learning and specifically deep learning that have been discovered. Particularly their performance on such a complex game - significantly more complex than chess. It was predicted that Go wouldn't be able to match top strength professionals for another 5 to 10 years.

AlphaGo Played Lee Sedol last year (considered one of the top players of the past decade) and won.[1] AlphaGo beating this prediction by so much was a large part of the excitement for last year's event.

The reason for the excitement was demonstrating that Monte Carlo search, which was originally dismissed as inferior, was a significantly better approach for the game of Go. Monte Carlo is a probabilistic based approach that uses random sampling - this is very different from the DFS approach used in chess.[2] It has proven be to very effective and without another breakthrough will almost certainly be the option used going forward.

[0] - https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol

[1] - https://en.wikipedia.org/wiki/Ply_(game_theory)

[2] - https://en.wikipedia.org/wiki/Monte_Carlo_method


> AlphaGo essentially baked good movies into value and policy network by playing millions of times.

I don't think that's a very good description of how AlphaGo was trained at all; you're essentially saying it merely overfits the training set, yet it clearly generalizes rather well to unseen board situations and still evaluates them sucessfully. No machine learning system would be found usefull if all it could do is merely memorize the training data.

Re the use of deep reinforcement learning, well for one the role of reinforcement learning in the first version of AlphaGo, the one described in the Nature paper was rather limited, and a small part of its training; it just made a ~3d KGS policy network into a ~5d KGS bot, and used to generate a training sample for the value net. If we had enough recorded human games to train the value net directly, that'd be an unnecessary step anyhow. And you could create such a training set w/o reinforcement learning since there are pure monte carlo bots stronger than 5d KGS - but that'd be far more computationally expensive.

But its still not really true that there aren't obvious applications of deep reinforcement learning - indeed robotics is one promising application, and that seems rather relevant. this paper initially demonstrated an impressive improvement in manipulative tasks, and you can prob follow its numerous citations for newer stuff: http://arxiv.org/abs/1504.00702

I do agree that this exact architecture in AlphaGo prob doesn't have applications beyond just teaching us how to play go better; it seems too specialized. I believe they mean it in just the vaguest possible sense; that the kind of deep algorithms demonstrating incredible performance in AlphaGo have diverse applications; but this should not come as a surprise to anyone even loosely following what people have done with deep learning in the past couple of years anyhow.


Go works precisely because it is a small closed system. An interesting match (from an AI perspective) would be a pro playing alphago on an unusual board (eg, one in the shape of a cat). The pro would take everything he knows about the game and apply it to the odd situation. Alphago is so specifically tuned that it cannot even handle any case except 19x19 (and maybe 9x9). Another interesting question would be small rules changes like "you may not play on any star points or any point directly touching them until turn 30".

Go has deep strategy, but it is very well defined in terms of what can and cannot be done and those rules are not particularly complex. Power grids in contrast are far more complex. There are thousands of rules, but also many more thousands of unwritten assumptions and case-by-case analysis. A final issue is that there exist unsolved and unrecognized problems.

The last AI winter (deep learning is just the latest rebrand) came from researchers overstating their accomplishments and making promises about general intelligence that could not be kept. Any claim about anything that requires general intelligence in the near future is undoubtedly overpromising.


> Alphago is so specifically tuned that it cannot even handle any case except 19x19 (and maybe 9x9).

Do you have any sources to back this assertion? It sounds unintuitive as I know object recognition sytema are usually trained on small images but they generalize well to arbitrary image sizes. What you are describing sounds like overfitting.


The paper itself repeatedly says that all 48 layers of the policy network are 19x19 matrices. To make the point though, they initially train alphago using actual games. After a hundred thousand or so training games, it's finally ready to start playing and learning. There are less than a couple dozen recorded games on larger boards.

If you haven't played go very much, you may thing that "it's just a bigger board". 19x19 is commonly used because it has an even balance of edge and center influence (in reality, edge influence seems to be slightly higher). With the 13x13, corner plays have overwhelming influence in the center. At 9x9, there is basically no center strategy at all. Normal strategies starting in the corners and expanding influence toward the center don't work as effectively with larger boards (the larger the board, the more this becomes true).

This is a much different issue than image recognition in that strategy doesn't scale in the same way that images do.


I'm a huge proponent of alpha Go and I think it is a revolutionary leap.

The key I think is,

> yet it clearly generalizes rather well to unseen board situations and still evaluates them sucessfully

I'm not sure this has been proven to be meaningful in a general sense, as you seem to also imply. Extrapolation can be a tricky answer subtle business. What about unusual board sizes, for which no training data exists? Or if you changed a rule? I'm sure deepmind would say the adversarial approach would work for these cases, but I'm not sure it would. Would be very interesting to see if humans could 'learn' a new state more quickly than the algorithm.

That might provide a hint that the algorithm is 'just' fitting the data well (with appropriate baked in regularization, of course). Or if it can more generally 'learn' given system rules.


Hm, well you are no doubt right that it doesn't generalize well to a change of rules. Reminds me of that game DeepZen played. It was trained with a komi of 7.5 and it played too soft and lost when in the actual match komi was 6.5 (or maybe it was the other way around?). A human does not have much trouble adapting to such small rules variations, but at least the version of DeepZen that played that match was hard-coded for that exact komi value, because that's what what used in all of its training examples, and wasn't given as a parameter. It shouldn't be a hard limit of the approach - indeed I think AyaMC was said to have been trained with some flexibility in its komi.

Still, I think AlphaGo does demonstrate amazing positional judgement in unseen board states, and that this is visible in the details of how it plays out particular situations. No two games are exactly alike - difficulty of go for computers is precisely in its extreme combinatorial explosion - and in particular tactical situations every detail of the situation matters. Yet you can see AlphaGo judging the correct sequences of moves, "knowing" how to make a particular group alive for eg, even when a particular other move seems more natural. And probably the most amazing thing about how it plays is how early it becomes completely sure that its got an advantage on the board, and how precisely it judges how much it needs to keep the advantage to the end. Every detail of the board is again relevant here, and basically no human would be so confident so soon. A go bot that couldn't adapt its tactics to unseen situation would be easy to beat; just ensnare it in a large complicated fight, and you're going to kill a big group and guarantee a win. Ofc people tried this in some masterP games, and turns out AlphaGo is tactically just as strong.

So, its basically like with other generalizations you can get from machine learning; a net trained on say ImageNet will generalize to different poses, occlusions, contexts and variations of objects similar to what it was exposed in training etc and still do a superhuman job of classifying such pictures, but will naturally be quite hopeless with completely unseen items. So too AlphaGo seems to know the game of go, generalizing from seen examples to correct judgements in other states, but would be quite hopeless if tested on even a slight variation of the game rules.


'data efficiency' -- that's a good candidate for the next buzz imo. People talk a lot about how massive amounts of data regressed over giant networks of dumb functions happens to perform well on many tests, but they ignore the fact that the performance of the system is sick. It fails if you tweak the input by small amounts that would never confuse a human. You shouldn't NEED that much data. You need better priors. Current ML systems rely on arbitrary coincidences to succeed at multiple choice tests. I did that in courses I didn't go to class. They are primitive and childish and won't scale with pure data/processing. We are still missing something.


Prior is in some sense the fruit of data.

Humans develop prior by accumulating experiences. Laws of physics are discovered through experimentation and introspection over many data points.

So what we really need is a life-long learner - a machine learning algorithm that could extract knowledge from many tasks and store that in its long-term memory. The fact that we don't have an architecture for long-term memory is the main road blocker towards this goal.


DeepMind claims that AlphaGo has already paid for itself when it was used to cut Google’s data center cooling costs, so it is not infeasible there may be many other domains which may benefit.


Where did you read that? Are you sure they weren't talking about machine learning in general?


It has been cited in a number articles quoting Demis Hassabis; Hassabis repeated the claim in his recent talk on AG (specifically, that deep RL related to AlphaGo has reduced Google datacenter cooling costs by 40% and is saving Google hundreds of millions of dollars a year) and added in the additional interesting detail that Google has changed its datacenter design plans to add in additional sensors & controls for the deep RL agents.

They haven't gone into detail as to how exactly this is done, but my guess is that they use their historical data series on cooling/electricity consumption to bootstrap and then it learns online normally with policy gradients (since cooling and temperature control seem like something with continuous actions rather than a few discrete ones, so an actor-critic rather than DQN).


If you look more closely at the details here (beyond just deepminds blog post) you'll see that 1. It has not yet been deployed to all data centers. They just turned the system off for a period of time and looked at how much extra money it cost to run the data center like that, then extrapolated.

2. There was already a Google engineer back in 2012 who applied neural networks to this problem and saw huge gains (There is a blog post about this somewhere). In the deepmind blog post, they don't compare to this system, do for all we know it could just be a small refinement of that system. It is actually not clear whether deep RL was actually used or not from the deepminds blog post: it is only kind of implied.


https://www.theverge.com/2017/5/30/15712300/alphago-ai-human...

"Say you’re a data center architect working at Google. It’s your job to make sure everything runs efficiently and coolly. To date, you’ve achieved that by designing the system so that you’re running as few pieces of cooling equipment at once as possible — you turn on the second piece only after the first is maxed out, and so on. This makes sense, right? Well, a variant of AlphaGo named Dr. Data disagreed.

“What Dr. Data decided to do was actually turn on as many units as possible and run them at a very low level,” Hassabis says. “Because of the switching and the pumps and the other things, that turned out to be better — and I think they’re now taking that into new data center designs, potentially. They’re taking some of those ideas and reincorporating them into the new designs, which obviously the AI system can’t do. So the human designers are looking at what the AlphaGo variant was doing, and then that’s informing their next decisions.” Dr. Data is at work right now in Google’s data centers, saving the company 40 percent in electricity required for cooling and resulting in 15 percent overall less energy usage."


At the very least I think this is deceptive because the source (If you keep clicking on links) for the 40 percent savings is the original blog post, and there's no other information saying things have been rolled out fully, yet the Verge is seems to imply that here.

I am somewhat convinced that something resembling RL was used based on your argument in the other thread, but I think even you would agree that calling it "a variant of alphago" is a pretty big stretch.


> If you look more closely at the details here (beyond just deepminds blog post)

Where would that be? As far as I know, the DM blog post and Hassabis's occasional discussions are the most detailed public information available. And they don't mention that it's just a brief demo.

> 2. There was already a Google engineer back in 2012 who applied neural networks to this problem and saw huge gains (There is a blog post about this somewhere)

I don't remember this.



So, that was 2 years deeper into the deep learning revolution, post-DeepMind acquisition, doesn't actually say it was a NN (the diagram could be literally any ML model from linear model to random forest), doesn't say they reduced costs by anything approaching 40%, or even are using it in production at all aside from the one instance they patched around some downtime.


"Today we’re releasing a white paper (PDF) on how we’re using neural networks to optimize data center operations and drive our energy use to new lows."


Ah, missed that. In any case, the paper confirms what I said: they haven't used it in practice, and the only time they have was the brief one mentioned in the post where it resulted in a small PUE saving (it quotes 0.02, off an unspecified reduced load but note for comparison the average PUE of ~1.12, so saving anything remotely like 40% is unlikely).


Here is a followup from the same lead author of the paper referred to in that first blog post (Jim Gao) who apparently was involved in Deepmind's project. Note the conspicuous lack of any sort of reference to deep reinforcement learning

https://blog.google/topics/environment/deepmind-ai-reduces-e...


Using forecasting for 'control' doesn't make too much sense (why the need to train a second ensemble to prevent overshoot if it's just supervised learning?), and the first author on that post, is not Gao but Richard Evans who is a DeepMind deep RL researcher (most recent publications: "Deep Reinforcement Learning in Large Discrete Action Spaces", "Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions", "Reinforcement Learning in a Neurally Controlled Robot Using Dopamine Modulated STDP").


Misremembered the year. Actually 2014


I can believe "deep RL related to AlphaGo" but that was somehow transmuted to "[AlphaGo] was used to cut Google’s data center cooling costs", which doesn't make sense.

(Just nitpicking here.)


FYI I heard this first had from a DeepMind employee.


There are a lot of problems for which you effectively have limitless data.

For example, optimizing the shape of an aerofoil, wing, wind turbine, ship, pump, etc. The "real" data is the result of computationally costly fluid dynamics simulations. The network uses as much of those as it likes to modify the design to be better.

There are also a lot of problems where there is nearly limitless data (ie. > 10^12 bytes of data). Examples would be MRI/CT scans, all internet video, all TV channels, etc.


>There are a lot of problems for which you effectively have limitless data.

>For example, optimizing the shape of an aerofoil, wing, wind turbine, ship, pump, etc. The "real" data is the result of computationally costly fluid dynamics simulations.

I'm doing something similar at the moment. The data is not limitless. It's in fact quite expensive.

In most cases you still need to generate the large number of simulations in your training set, and the simulations are computationally expensive to do. If you're doing it on a metered platform you can tie the cost of the data set directly to the number of cpu hours need to generate it, and from there to a dollar value.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: