I am embarrassed to say that I am confused about what the outputs are from the "...

gdb · on July 18, 2018

(I work on the Dota team at OpenAI.)

The output is a trained neural network!

mooneater · on July 18, 2018

Hi Gdb, next week I am giving a presentation on your awesome Dota work to the local data science community in vancouver BC. I have reviewed the info your team has released so far and i have a few questions:

- I saw no mention of CNNs, is it true CNNs are not used even for the 8x8 terrain grid input?

- do you have any comments about rapid+PPO vs say impala+vtrace? Would the ability to use more off-policy data be very helpful here?

- any comments on how you selected the reward constants?

- was the teamwork/tau something your team came up with or was this a known approach?

- the attention keys are most interesting, can you comment on why they dont flow through the lstm? Does it make it easier for the network to quickly change unit attention or some other reason?

- any comment of the choice of single-layer LSTM vs multilayer ostensibly for operating on longer timescales?

- does this result mean that HRL is less critical than some people thought?

- any comment on magnitude of compute, like in the post from may?

Thank you for sharing your fascinating work!

tejaswiy · on July 18, 2018

Could you go into some more detail on the actual engineering mechanics? Does each bot have an instance of the neural net model it runs a separate PC? How often do you feed game state into the net? What's the output of the network (bunch of movement / item / spell commands) that are fed in through the game driver?

zhynn · on July 18, 2018

Oh, good question, I didn't think of that either. there one NN that consumes the state for each of the bot players and then returns the "next action" for that bot, or is there a separate NN for each of the bots, and does that NN run on the LAN machine or is the LAN machine just running the game code and python agent which is mediating the game code and the NN?

lightbyte · on July 18, 2018

I think OP wants to know how the neural network actually plays the game. I think in this case the dota client has an api for bots that it can use?

gdb · on July 18, 2018

Yes, there's a bot API.

We dump state from the bot API each tick and send it over GRPC to a Python agent, which formats the state into a tuple of Numpy arrays. That Numpy array is passed into 5 neural networks (one per agent), each of which returns a tuple of Numpy arrays. Each tuple is decoded into a semantic action, which is then returned to the game via GRPC.

chlvsl · on July 18, 2018

Is the entire system one agent, which is then replicated across 5 bot instances, or do you have a specific network per hero?

zhynn · on July 18, 2018

Does the entire NN run on the game laptop or is it passing the tuple back to OpenAI for processing?

drexlspivey · on July 18, 2018

NNs don't need much resources to run once trained. It's just a bunch of matrix multiplications.

baq · on July 19, 2018

So are 3d engines. It all depends on size of operands. Constant factors are important.

rishav_sharan · on July 19, 2018

Are you seeing this? How can you let this go unchallenged?

https://twitter.com/eternalenvy1991/status/10196414446030520...