Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The linked article says "The bot received incentives for winning and basic metrics like health and last hits". So apart from losing/winning, losing health is bad, last hitting is good. You could add more, but apparently that's all OpenAI used.


It also says "We also separately trained the initial creep block using traditional RL techniques." I have no idea how significant that is, but it seems to be getting a fair amount of attention.


It's a highly specific procedure that happens before there is interaction with the opponent, so without "handing over" the understanding that having creeps on your high-ground is good, it's very hard for the learning to see through the noise and discover this.


Exactly why this is not impressive to me. The point is to be able to learn the rules, but all I see is some of not only the rules, but the actions already prespecified in many cases. Yes its hard, and thats why humans still rule the roost.


Humans learn from each other and don't puzzle everything out from first principles. So I wouldn't put that restriction on a computer intelligence either.


That is fine, so long as we are not expected to hold both that it is not a big deal when it is not achieved, and is a big deal when it is achieved. Personally, I incline towards the latter.


Surely all this RL process did was speed up what the computer would have learned, by a large stretch? The "cost" factors they chose would have hit them ultimately, regardless of if the bot stood still in the base, or wandered off elsewhere.


My guess is that in general, many complex strategies are effectively unreachable without something like analysis, on account of intermediate states being disfavored, leading to algorithms being trapped by local minima in the cost function (I don't know whether that would be an issue for this game, specifically.)


Not necessarily - maybe a mixed strategy of

(1) Not creepblocking at all, and letting your opponent have your creeps under their tower

(2) Modest creepblocking to punish (1)

(3) Severe creepblocking to punish (2)

Would be better than some hand-trained RL creepblocking which is divorced from game outcomes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: