Hacker News new | past | comments | ask | show | jobs | submit login

As someone who was once semi-pro in dota (4400 MMR, get rekt), it's freaky watching these bots play. It's uncanny. Little things... Like, when the bots are taking a tower, one of them will stand in front of the tower and tank the creep wave, so that their creeps do more damage on the tower. They had to learn this.

Insta-TPing right when an enemy wastes their stun and can't cancel their TP.

Grouping up as 5 at the beginning of the game and pushing into the enemy jungle. Pubs never do this.

The most interesting part is that OpenAI appears to be discovering new knowledge in the dota scene. For example, they always take the ranged barracks first, never the melee. This is exactly the opposite of what the pro scene does. Therefore, the smartest pro team should study what the bot is doing and trust that on average it's a better idea to always focus on the ranged barracks first. After all, if it was a bad idea, they probably wouldn't do that.

The most hilarious part was when OpenAI paused the game, then resumed it. This illustrates that there is still some unexplainable randomness.

Question for OpenAI: Is it more accurate to think of the bots as 5 separate minds, or a single mind controlling 5 heroes?

EDIT: By the way, TI is going on right now! https://www.twitch.tv/dota2ti If you're new to the scene, take a peek. TI is always so high energy -- even if it's hard to follow what's going on, listening to Tobi (the shoutcaster) go nuts during the game is always a highlight.

And of course, /r/dota2 has the best memes anywhere, hands-down. https://www.reddit.com/r/DotA2/




> Question for OpenAI: Is it more accurate to think of the bots as 5 separate minds, or a single mind controlling 5 heroes?

They released their architecture: https://www.reddit.com/r/MachineLearning/comments/9533g8/n_o...

In the above thread, a reddit user noted that 512 out of the 2048 unit input into the LSTM of each bot is shared (max pooled across players).

This means they are telepathically linked and never need to worry about communicating, disagreeing, etc. They know how the others are interpreting their local inputs because it's explicitly shared. So it's not really fair to call it 5 separate minds.

You can't really call it a single mind either because if you assume the LSTM is the "mind" (since it's the only place that has memory of previous state), that state isn't read by any other bots.


This is fascinating. Thank you!


I'm also very excited about what these bots can tell us about DOTA but I think a certain degree of skepticism is necessary due to the bots not actually having been trained truly tabula rasa. The training contained lots of little "nudges" to reward the bots for things that lead to winning. I.e. there are nudges to get them to go to lane to start, there are nudges to get them to buy the right items. I don't know for sure but I wouldn't be surprised if they kill the ranged barracks first because there's a built in reward for killing barracks that's the same for both melee and ranged barracks but the ranged ones have less health so they value that a little bit higher since it's a more consistent reward than trying to get the melee one. I hope that someday they get to a point where they can truly train from scratch with no human nudges, I'm not sure how much more impressive of an ML result that would be (I'd imagine meaningfully so but I'm not an AI researcher) but it would certainly be a much more interesting result in terms of our understanding of DOTA.


As an admittedly lower-mid level player, when it comes to the ranged barracks at least, I suspect there's actually some value in attacking them over the melee racks simply because the ranged ones do not regenerate their HP, unlike the melee racks.

Whenever my team is pushing high-ground and we're not certain we have the time and strength to completely 100-0 the melee racks, we always try to hit the ranged one first, otherwise we'll just get pushed back off the high ground and all the damage we did will just be regenerated, negating the value of all the resources we expended in the push.

Ideally, we prefer to take the melee if we know we can, since there's more melee creeps in a wave, the buff is worth more, but if we're not sure we'd rather do damage we know won't evaporate in a few minutes.


> since there's more melee creeps in a wave, the buff is worth more

This is true - but not the whole story. Since ranged creeps do significantly more damage taking the ranged rax means that your lane still pushes a bit and importantly it accumulates ranged creeps as it goes along. So if the lane is left, when it hits their base it will do hella damage to buildings if they don't address it. (Bonus melee supers would last longer and do more base damage over their lifespan - but what you want is damage in the time it takes them to TP).

(also trash tier player, so just my thoughts)


That's exactly why pros do it too. If you aren't sure you'll take it, you go for the ranged. The strange thing is that the AI always goes for the ranged, even if it's sure it can take it.


Keep in mind that it's not just pure learning from zero, with these crazy emergent behaviors coming out. There is a fair amount of "coaching" involved. The OpenAI team explained this last year when they revealed the SF mid 1v1 version of their Dota bot.

People were astounded that the SF mid was creep blocking at an extremely high level, since wave positioning/management is a very complex behavior/process that doesn't pay off directly right away. The OpenAI team said that they basically taught the SF specifically how to creep block, and they've done this with other behaviors as well, like denying allied creeps, as well as a lot of the meta-level strategic play like warding/vision and item builds/usage.


The thing is, OpenAI are terrible at the game. If they try to coach too much, they'll ruin their chances at beating the top Dota team. They have to let the bot discover its own winning strategies.

For example: There are an infinite number of places to place a ward. One way to train the bot is to preselect all possible ward locations, reducing it to 30 or so common ones. Another way is to make an optimization algorithm where the bot focuses on trying to maximize the "strategic vision" (if it's possible to come up with a measure for "strategic") and then let the bot place wards wherever it wants. After hundreds of years of self-play, it should figure out the best place and times.

As I write this out, I think you're probably right. There are too many aspects of the game for a purely-random algorithm to be effective... E.g. item builds. But I'm holding out hope that's just because they haven't figured out a good way to encode all dota items into a distance measure.

AI has proven over and over that humans aren't so special. And humans know how to adapt to the game.

That said, I wish OpenAI would be completely transparent as to what's emergent behavior and what's not. :)

Oh, one last interesting thing: Icefrog is going to roll out a big patch after this TI, just like he always does. I wonder how much of the bots' knowledge will transfer over? Or if they'd be better off training from a clean slate?


During the OpenAI stream where they first played against 5 other players, they said that the AI isn't picking items at all, it's just using pre-made recommended item lists. They also said that the match was being played on an older patch so they may just not update if the upcoming patch makes large changes.


>I wish OpenAI would be completely transparent as to what's emergent behavior and what's not.

Totally agree with you here. This would also help actual DOTA players to analyze what is truly emergent behavior that might open new doors to high level play styles, and what is just weird coaching/niche optimization choices by the OpenAI team.


For what is worth, they have been very clear about that kind of stuff in their blogposts and interviews. It just happens that the information is a bit disjointed due to that format.


>Oh, one last interesting thing: Icefrog is going to roll out a big patch after this TI, just like he always does. I wonder how much of the bots' knowledge will transfer over? Or if they'd be better off training from a clean slate?

The bots aren't even playing real Dota, let alone current patch.


That's not really true. There are restrictions, but it's real dota. The fact that people haven't had time to adapt to the rules is irrelevant; the bots' strategy is very similar to TI4, when deathball meta was in fashion and games would end in 13 minutes.

OpenAI are here to compete on equal footing. They aren't going to stick with the old patch, I would guess.


Watching the showmatch as a dota player, it was clear it's not "real" dota. I played Turbo while it was new, and the game they are playing is much more similar to that. Having _almost_ instant regen whenever you need it is a gamechanger.

It's sort of a "Theseus's paradox" situation.


Until the AI is thinking about how to deal with a possible 5th pick brood/huskar/meepo it's not real drafting. Until it's coping properly with courier snipes mid, courier use distribution etc. It's just playing a variant of dota.

It's still really fucking good. But it's not dota _yet_


re "That said, I wish OpenAI would be completely transparent as to what's emergent behavior and what's not. :)"

This is a pretty good summary - the main bit to know is reward shaping https://medium.com/@evanthebouncy/understanding-openai-five-...


> AI has proven over and over that humans aren't so special.

I disagree. I think AI has shows that we are very special.


Perhaps one day they’ll be able to create the analogue of what AlphaGo Zero (learning from scratch) was to the original AlphaGo (learning from human play). That would be very impressive.


Imo, going from OpenAI Five to some kind of OpenAI FiveZero or whatever is like 1000x the amount of progress of AlphaGo to AlphaGo Zero, just based on the complexity of game mechanics and the fact that it's 5v5 and not 1v1.


and perfect vs imperfect information - the policy network has to forecast enemy positions and deduce enemy goals. At least in Go you know full game state


I think you are putting too much trust in how good OpenAI is at bigger strategy right now. They have a very push focused strategy that fits with the limitations they have put on the game.

When OpenAI wins against human opponents it seems mostly be because it is so much better at cooperating and can jump the enemy so quickly. That combined with continued ferrying of regen that would not work in a normal game.


Mm, everyone said the same kind of thing last year. It's worth being skeptical, but I'd rather be a true believer and be proven wrong. It seems like betting against technology to say that they can't beat the top dota team in a fair fight within a couple years.


I think the argument people used last year is still the same argument though. The 1v1 SF mid was all about computer-precision mechanical skill with hitting creeps and HP management. People said the AI would never be able to match pro-level meta/team strategies. While OpenAI Five is insanely impressive, it's still very glaringly obvious that their range of play styles is very restricted. Many of the pros, even ones who have played against (read: got crushed by) OpenAI Five, still think it's very far away for the bot to out draft a pro team with the full ranked game rules and hero pool.

The push heavy death ball strategy is pretty much optimal for the high-precision, perfectly coordinated, mechanical fighting skill the bots have. They get a few kills using the mechanical skill and then they group up and press their advantage as hard as they can.

It seems like all the rules/mechanics they are still working on are the more abstract out-of-the-box stuff that evolved to deal with these types of 5-man "teamfight" hero lineups... (warding, game-contextual item builds, courier management, full hero pool with all the "rat" split pushing heroes, etc.)


> Many of the pros, even ones who have played against (read: got crushed by) OpenAI Five, still think it's very far away for the bot to out draft a pro team with the full ranked game rules and hero pool.

Neither AI experts or DotA experts have the necessary background to predict how close anything is to any level of future performance. DotA players are probably the worst to ask, because the AI is already stronger than they are at a subset of the game and they have no insight into how quickly that subset could be generalised into other aspects.

OpenAI has a list of something like 7 restrictions. Nobody has any real idea of how quickly these restrictions can be lifted once the OpenAI team has an AI that has mastered the game with those restrictions.

Eg, 5 invulnerable couriers is obviously a huge restriction - but once OpenAI knows it gains a benefit by ordering items, how difficult is it to lift that restriction? Nobody knows. Might be easy, might be hard.


> OpenAI has a list of something like 7 restrictions.

Some of those are very major, though, and I think the word "restriction" is a bit misleading. Having 5 invulnerable couriers is not really a "restriction" in that it limits or simplifies parts of the game -- it's just a fundamentally different mechanic that changes the way the game can be played.

> DotA players are probably the worst to ask, because the AI is already stronger than they are at a subset of the game and they have no insight into how quickly that subset could be generalised into other aspects.

I think that's a little unfair. Most folks have been pointing out that OpenAI's current momentum-based "deathball" strategy seems to fall apart without infinite regen and a limited hero pool, both facilitated by the current set of restrictions.

I'd agree that nobody really knows how well OpenAI will adapt to the full game, but I disagree that the criticisms I've seen are meritless. OpenAI's current level of play is definitely impressive, but I think there's still room for skepticism given the current restrictions. I (and I think a lot of others) would be pretty disappointed if the TI showmatch happened with the turbo mode couriers still enabled.


> I think that's a little unfair. Most folks have been pointing out that OpenAI's current momentum-based "deathball" strategy seems to fall apart without infinite regen and a limited hero pool, both facilitated by the current set of restrictions.

People said similar things about Go AIs and ko fights. And in the end it turned out that neural networks handled kos fine but ladders were a challenge.

On the deathball strategy in particular, consider that we expect a superhuman DotA AI to change the DotA metagame, so playing off-meta doesn't tell us anything. AlphaGo would invade 3-3 point a lot more enthusiastically than a human player. This was considered a classic beginner mistake for many years; now the theory has been readjusted to cope with the fact that AlphaGo stuck with it and just considered it a good move.

We can safely say that the courier change has made a deathball strategy more powerful and it seems quite likely it is not an optimum strategy. But we can't be sure until OpenAI tests it, and we absolutely can't be sure that OpenAI won't just learn a new style when the conditions change.

The criticisms have merit, but nobody has enough data predict anything about the future. Particularly a professional DotA player.


>I (and I think a lot of others) would be pretty disappointed if the TI showmatch happened with the turbo mode couriers still enabled.

Totally agree. This one change alone is _so central_ to both the bots laning strategies and meta-game team strategies. They can't just leave heroes in lanes forever no matter what, and have all 5 heroes literally never go back to the well if there aren't 5 couriers. Not to mention their initial item builds, stats-only-4-man-the-lane-for-first-blood bullshit wouldn't work at all without constant ferrying of regen on the couriers.


>semi-pro in dota

>4400 MMR

OP is being extremely satirical here by the way. He means he's not great but knows how to play (and definitely not semi-pro) but that context might be lost if you don't play Dota!

>Is it more accurate to think of the bots as 5 separate minds, or a single mind controlling 5 heroes?

They answered this on the last stream, iirc it's 5 identical clones with the same goals, but not sharing any knowledge, info, or decisions with each other.


RE: Semi-pro

He also mentions that this was in the past. 4 or so years ago 4400 MMR was in the top 1% of dota players. MMR creep has happened significantly since then.


1v1 me


I think 4400 MMR places OP in at least Ancient-1 ranking which is approximately the 95th percentile. I'd call that at least semi-pro.


Errrrr, at best a dedicated amateur.

I'm just 3.5k (I think that's 70th percentile), but I know lots of 4.5k players. To describe the average 4.5k player: Probably has regular groups of people they play with at different skill levels (anywhere from 2.5-5k+), regularly plays battle-cup on Saturdays, maybe played amateur JoinDota league, maybe had a laugh and played open qualifiers only to lose in the first couple rounds, and probably log between between 10-20 hours per week into the game.

4.5k players know how to play to a very good standard and beat the vast majority of other players, but are miles away from the weakest of the professional scene. 4.5k doesn't even appear on the leader boards.


I definitely agree with your statement, however..

Solo - the guy that is the captain of Virtus Pro, was at 4k for the longest time. There's more to dota than just mmr.

There are players at 5.5-6k range that still do not understand the basics of team play, but are just extremely mechanically gifted and are in great gaming shape.


4400 isn't even good enough for amateur tournaments.


> The most hilarious part was when OpenAI paused the game, then resumed it. This illustrates that there is still some unexplainable randomness.

I asked about this in a previous thread[1] and received a response that a network blip caused all the players to drop from the game, in which case OpenAI Five was programmed to pause.

[1] https://news.ycombinator.com/item?id=17700001

EDIT: Fixed thread link.


From reading their website, each bot gets its own neural network and its own reward function, so in that sense they are 5 separate agents.

When training starts out, the bots solely focus on their own reward functions. This is so they can learn very basic things like how to move around, how to attack, and so on.

Over time, the combined team reward function gradually gets weighted more and more heavily, so that teamwork is encouraged.


The pause was because human players disconnected. They covered that in the post match session. Looks like no one heard that.


> Therefore, the smartest pro team should study what the bot is doing and trust that on average it's a better idea to always focus on the ranged barracks first. After all, if it was a bad idea, they probably wouldn't do that.

This is exactly what is happening with Go right now. Many pros are emulating and learning from AlphaGo (Zero) and are starting to play moves that were always thought to be suboptimal until now.


As someone who has no knowledge whatsoever about Go, is there some way I could learn more about what you mention? I’m really curious to see how people are trying to learn from the bot, but I’m not too sure how to find out.


> The most hilarious part was when OpenAI paused the game, then resumed it. This illustrates that there is still some unexplainable randomness.

This was done by the game coordinator -- the humans' machines DC'ed during the game. https://news.ycombinator.com/item?id=17700233


I heard that the pause was a behavior scripted by the developers, which happened in response to a network connection issue.


Yeah there's no way they're explicitly integrating pauses-in-time into the winrate calculation... There are so many other central/meta game mechanics that they haven't solved yet, it wouldn't make any sense to incorporate a behavior like strategic pausing.


I wonder if AI self-play has any implications for e-sports balance. Could balancing be aided, or at least improved, by ultra-strong AI players, reducing the chance of broken mechanics making it to live?


Sure seems like it would. A big part of DOTA balance these days is Icefrog (DOTA's BDFL) talking to pro players and about the meta-game and observing pro play to figure out what should change. Having pro-level AIs that you could just set lose to play 100,000 games and then come back to you with statistics about the current patch would probably help a lot.

I think it would also raise the level of play that we see from pros. That's what happened with chess bots when they got good enough to be the best human players.


One specific thing that bots could help Icefrog solve: Many games are draft wins, i.e. a >90% chance of winning solely due to the heroes you've picked. There's almost no point in playing these games out for 20-40 minutes; it's boring for everyone involved.

This will always exist in dota, but being able to simulate hundreds of years of drafting + games could help reduce this.


I really doubt that at TI level the drafting advantage is significant enough to give one team 90% advantage. My guess would be closer to 2/3, granted thats still just a complete guess. I think often analysts will use the draft as an excuse to in retrospect justify the outcome of the game, but that doesn't bear on the reality of the situation at all. Though, I would be really interested to see some data that shows the significance of drafting.


They're already made great strides into expanding the pro-level meta hero pool over the last several major patches. As an observer it feels like the viable pro hero pool is bigger than ever this year at TI8.


>Many games are draft wins, i.e. a >90% chance of winning solely due to the heroes you've picked.

Can you explain?


Each team of five picks the hero’s they will play the game with. This is like an extra complicated game of rock-paper-scissors. At the end of the picking, one teams composition may be be so setup to exploit the weakness of the other, that the actual 30-40 minute game is almost pointless.


You pick back and forth though, right? How does one screw up the draft so badly as to have only 10% chance of winning?


Not mentioned was each team gets to ban out 5 heroes as well.

This is a bigger factor in my opinion. Each team alternates two bans, then two picks, then there is another round of two alternating bans followed by two picks and finally a single ban and pick round for the 5th hero.

Various in-meta heroes are usually "first pick/ban worthy" which means they tend to get picked or banned in the first phase and tend to shape the rest of the draft as teams will build the core of their strategy around the first phase heroes or around countering oppositions first phase heroes.

Another strategy is to avoid "showing your hand" during first phase by first picking strong but generic heroes that can fit into many potential lineups to keep opponent guessing. This leads to a lot of mind games where even commentators don't know what role the hero is going to be played in until the culminating 5th pick when the draft comes together.

Some teams are very good at specific strategies or have certain players exceptionally skilled at individual heroes which necessitate certain first phase bans against them lest they have an advantage.

For instance If a team is known for having a player good at the hero "Wisp" it will often force out a first phase Wisp ban from opponents because it is the kind of hero that when played well can be absolute nightmare to play against.

In some ways I find the draft mini game to be just as interesting as the main game especially in the longer tournaments where you can see new metagames emerging as captains adjust their pick strategies.


There are so many drafting combinations that it's not quite obvious it's a bad idea until you hit the 3 minute mark.

Take OpenAI game 3 as an example. The first two games, OpenAI wiped the floor with the humans and taunted them that they had a >90% chance of winning. The third game, OpenAI was saying the bots had an >80% chance of losing by 5min. The sole difference was the draft.


Training prices would have to drop significantly. It's much cheaper to take a PR hit and apply a hotfix instead of spending $100k+ on building a self-play AI.


If it were really just 100k it would be a huge win to have the AI. That's one developer for three to six months.

(I'd handicap the crossover somewhere between one and two orders of magnitude more expensive than that.)


The AI cannot pause the game. That's not one of its available actions.


Yes. Yes it is.


No. No it isn't.

Go and look how it works: https://blog.openai.com/openai-five/


I'm sorry, I must be missing the part where your link says that pause is not a valid action available to the bots just as it is available to the humans.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: