The linked article says "The bot received incentives for winning and basic metri...

mannykannot · on Aug 17, 2017

It also says "We also separately trained the initial creep block using traditional RL techniques." I have no idea how significant that is, but it seems to be getting a fair amount of attention.

gcp · on Aug 17, 2017

It's a highly specific procedure that happens before there is interaction with the opponent, so without "handing over" the understanding that having creeps on your high-ground is good, it's very hard for the learning to see through the noise and discover this.

damnfine · on Aug 17, 2017

Exactly why this is not impressive to me. The point is to be able to learn the rules, but all I see is some of not only the rules, but the actions already prespecified in many cases. Yes its hard, and thats why humans still rule the roost.

sp332 · on Aug 17, 2017

Humans learn from each other and don't puzzle everything out from first principles. So I wouldn't put that restriction on a computer intelligence either.

mannykannot · on Aug 17, 2017

That is fine, so long as we are not expected to hold both that it is not a big deal when it is not achieved, and is a big deal when it is achieved. Personally, I incline towards the latter.

Twirrim · on Aug 17, 2017

Surely all this RL process did was speed up what the computer would have learned, by a large stretch? The "cost" factors they chose would have hit them ultimately, regardless of if the bot stood still in the base, or wandered off elsewhere.

mannykannot · on Aug 18, 2017

My guess is that in general, many complex strategies are effectively unreachable without something like analysis, on account of intermediate states being disfavored, leading to algorithms being trapped by local minima in the cost function (I don't know whether that would be an issue for this game, specifically.)

aoeuasdf1 · on Aug 17, 2017

Not necessarily - maybe a mixed strategy of

(1) Not creepblocking at all, and letting your opponent have your creeps under their tower

(2) Modest creepblocking to punish (1)

(3) Severe creepblocking to punish (2)

Would be better than some hand-trained RL creepblocking which is divorced from game outcomes.