Hacker News new | past | comments | ask | show | jobs | submit login
The International 2018: Results (openai.com)
106 points by legatus 7 months ago | hide | past | web | favorite | 36 comments

It's clear that there's still a long way to go. Too much manual feature engineering and handtuning is needed right now due to the extremely weak reward signalling of current reinforcement learning algorithms, crippling OpenAI in the later stages.

The cool thing is that bruteforcing computational power seems to get us decently close. I'm optimistic that with renewed interest in the reinforcement learning field, breakthroughs will be made on the algorithmic side within a matter of time.

> We don’t believe that the courier change was responsible for the losses. We think we need more training, bugfixes, and to remove the last pieces of scripted logic in our model.

It is interesting to consider that manually scripted behavior may have been responsible for some of the more strange moments in the matchups; possibly they may have had hardcoded place-ward() routines that resulted in all of that clumped-together warding... or maybe even "check-roshan-pit()" without adequate training data to identify Roshan's respawning patterns?

The item purchasing is still scripted, which means that Five receives wards whether it wants them or not. One explanation for the ward dumping is that Five is just trying to free up inventory space.

Use the backpack? A hard five rarely owns 9 items and wards stack.

Roshan checking behavior evolved more naturally than the warding. At the start of training roshan was changed to have very low health so ai could stumble into the pit and be able to defeat roshan 1v1. Eventually the ai learned to check roshan and to group up to defeat it, but for some reason they never managed to learn about the respawn cooldown.

While there are certainly broader insights to be made, one tidbit I think I noticed that interested me had to do with Axe.

Between the two games, we saw Axe played by both the human team and the AI team. When played by the humans, blink-calls were completely shutdown by the AI's superhuman counterinitiations. That made enough sense. When the Axe was played by the AI though, I don't recall Axe ever even attempting any blink-calls. I'm curious if this might be the result of the AI overfitting to itself -- at AI reaction speeds, blink-calls are not a very useful maneuver, and thus the AI learns not to perform them.

Against a group of humans though, Axe's blink-call initiations are arguably the hero's biggest selling point.

We didn't get to see most of the hero pool, but I wonder how much the AI overfitting to AI playstyles will hinder the bots against humans in the future.

Of course, the bots have many other issues which loom larger atm imo but I felt interested in enough in this tidbit to point it out.

The reaction speed should be random (80-200ms or more - maybe quite a tail latency) instead of fixed during training to be more consistent with human behaviour and allow abilities to sometimes work so the bots are more likely to sometimes try abilities when the payoff makes sense...

Dota is balanced around human reaction speeds so it makes sense to have an equivalent reaction distribution, otherwise, it gets arbitrary where a bot trained on 200 ms may behave differently to a 20ms bot.

Axe also seemed to have no idea how it's ult worked. Would simply use the Culling Blade for the small amount of damage each time rather than waiting for the hero to get below the health threshold.

Or maybe the blink-then-call sequence was not tried by the bot in the training. Therefore it doesnt never got rated.

Thats the thing I tried to understand. The blink-to-call is very specific of a sequence. Youd have to blink in middle of several enemy units and call them. What are the chances of the bot doing that randomly and therefor understanding its value?

Specially if the bot realizes random blinks are a watse of resource or jumping in middle of enemy is a bd strategy, the chanced of it trying that wih Axe and therefore learning it gets lower.

Blink calls seems to be a very predictable strategy for axe. If there's more than 2 enemys heros next to each other, with X number of teammates within close proximity to deal support DPS, blink call.

How much training datasets were these bots given though?

I wish they’d use a better name for their engine than “Five”. Half the time I get confused if they’re talking about of Five players or Five engine or 5 something else.

Use a unique name like “Galaxy” which doesn’t represent anything remotely close in the game - Spell names, skills, etc. There is a huge amount of stuff going on in the game and it’s such a heavy cognitive load for an outsider who doesn’t play Dota, it was annoying to keep checking if they meant the name of the engine or number five in a game of Five vs Five. Or Five vs 5!!!? I’m so damn confused.

Same thing here: https://openai.com/five/

Number #2 bullet point says “Defeat five of the world’s top professionals

Five will attempt this live at The International in Vancouver’s Rogers Arena this week!”

It is such a poor choice.

The capitalization is obvious enough for me.

I was listening to the commentary on Twitch and it was a mess.

Here are some insights into how OpenAI fine-tuned the rewards and short-term actions of the bots https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae939...

The numbers seem pretty arbitrary to me, that's probably what this blog post is talking about when it mentions why it lost.

Since these are hyperparameters, some of which are annealed over the entire training period, and given the fact that the training required ungodly amounts of computing time, I think it is just impractical for them to have fully checked whether they were set optimally. They probably went with what seemed good, and trusted deep networks to pick up the slack. (this is total speculation on my part).

I do think if they'd used some more sophisticated RL algorithms, perhaps with intrinsic curiosity, or some kind of hierarchical task learning, they might have been able to reduce their training time and maybe been able to tune their hyperparameters a bit more

One of the clear weakness of the current OpenAI Five is its warding, with wards oftentimes being placed in bases. Perhaps the amount of vision that the team has can be a short term reward. Likewise, it currently does not engage in too much counter warding with sentries.

The game vs. Pain clearly demonstrated how humans can use wards to gain an information advantage over the bots that otherwise had a great chance of winning the game.

Idk anything about any of it. But, what if open AI is "bad" at ward placement because it plays itself so much, it already know exactly where it would be if it were the other team?

The developers talked about the bad warding, and it's a combination of them actually being bad at it, and not knowing how to manage their inventory so they'll just plant wards in order to free up inventory space.

I hope they continue the project over the next year. I am really curious if they will be able to teach the AI to be better at late game strategies and long term planning. (Item builds might be an interesting challenge in this area as well)

Someone asked me, "How does OpenAI know the MMR of its bots?"

I don't know. I assume it's similar to how AlphaGo measures its ELO ranking. But the strange thing is, this is hundreds of years of self-play, not a public pool of humans playing against each other. How does MMR in simulation translate to MMR in real matches?

Before pointing out that it's possible for an ELO rating, consider that Dota MMR is a bit different – every game you win, you get +25. Every game you lose, you get -25. This changes at the very high / very low levels, or if the matchups are wildly imbalanced, but that's the general setup. Or it was, a few years ago.

Does anyone have a guess?

We estimate it based on when we start being evenly-matched against teams with a given average MMR. (As you might expect, our usual pattern is to lose to a test team consistently, then start being evenly matched, then consistently beat them.) The version of this diagram on our Five page also include bars indicating when we started and stopped testing against various teams: https://openai.com/five/.

We don't claim this is perfect, which is why we label the chart "estimated MMR".

Ah, that seems like an excellent system. It removes all of the guesswork of some formula-based solution. Empirical observations tend to be more accurate.

It might be worth hiring one of the top dota pros to coach OpenAI's bots: they could point out each small mistake that the bots made which pros wouldn't make. That might make it more tractable to beat the top team next year. (The actual dota coaches might be even better for this purpose than the pros, too.)

Looking forward to the outcome! The goal of "beat the top dota team" is one of those ambitious ones that few companies take on. It's really interesting to see the incremental progress.

How would the pro point out the mistakes to the bot? The bot cannot understand human language.

Not to the bot, but rather write down the mistakes and decide (as an AI programmer) how best to address them.

The solutions would probably take the form of adjusting weights, adding additional dimensions, removing or refactoring existing signals, or deciding that extra training would naturally solve the problem.

Last time (like 2 years ago, haven't played much since) I checked it wasn't exactly 25 MMR per match. It does depend which team has higher overall MMR to begin with, and values adjust from that 25 value.

I had an MMR of around 4.4k solo/team, I think the average player had 2.25k MMR, the standard above average player had around 3.5k to 4k MMR. Anything past 4k MMR exponentially made matches way more intense. Its always been this way as far as I remember even back in DoTA 1 days

4k MMR in DoTA skillwise is about the equivalent of Platinum in rocket league, or Diamond in Overwatch

The constant +-25 is just a consequence of the matchmaking enforcing balanced MMR matchups. It's not like chess where you're more likely to have some kind of imbalance. That's why it breaks at the extremes.

Progress is much farther along than I expected. At the same time I wonder how well the bots adjust to the meta. Dota 2 changes every time a new patch rolls along. I also wonder if bots could make some of the more unconventional picks work. As an example, I love Medusa, and she's one of the least picked heroes in pro dota

>Progress is much farther along than I expected.

I had thought this as well, but the more I think about it the more I'm questioning how far along the bots actually are.

They're excellent at teamfighting. No doubt about it. But they seem to be inferior to humans at nearly everything else. Basically anything that requires strategic thinking, the bots pail in comparison to even a 3-4k mmr player. I'm really starting to suspect that the bots have essentially been able to be successful purely due to their tactical abilities. And I see very little reassuring progress in strategic abilities tbh. For example, the highground push with no buybacks on the AI side that ended in a 5-man buyback and teamwipe from the humans was a glaring strategic error. The warding is bad. The Rosh play is terrible. Item choices (another enormous strategic element) are still scripted. Nearly all heroes with abilities that require significant strategic thinking are left out of the AI's hero pool. Axe was pushing Radiant top without a TP while his base was getting destroyed and all of his other teammates were dead.

On one hand, it's easy to feel like OpenAI Five is making good progress because it's legitimately challenging pros. Upon deeper analysis though, the bots haven't actually demonstrated ability at more than a 2.5k-mmr level on anything other than laning, teamfighting, and perhaps defensive map movement. Given that I haven't seen much evidence that the AI is making progress on the truly strategic elements of the game, I'm not entirely convinced that what we are seeing isn't close to a plateau already.

More breakthroughs are certainly possible, but I've seen enough to make me more skeptical than I was before I'd seen OpenAI Five play with my own eyes.

I agree with you though I still think what the OpenAI team has achieved is impressive. Dota has two levels: the strategic and the tactical. The AI is very good at the tactical level (teamfighting) but poor at the strategic level. Perhaps they could train a "Captain AI" through replay analysis that would direct the player AIs?

I could see the Captain AI having various modules to help it decide what should be done next. For example, a drafting module, a module to predict where the enemy players currently are (some kind of threat heatmap?), what the enemy players are likely to do next (i.e., where they will farm, smoking, roshing, etc.), and what items the team needs (i.e., centralized item purchase decisions). I imagine these modules would be trained in an entirely different way from the way the AIs are trained in-game.

Technically, I can't help wonder if some kind of hybrid ANN-expert system architecture wouldn't work better. ANNs are rightfully popular due to their effectiveness but they seem so inefficient in this case. A few hand-coded rules could eliminate the basic mistakes the AI makes, and for the rest they could evolve the rules using some type of genetic programming. The latter would still allow the AI to come up with its own novel approach to the game.

I feel the bots greatest strength are their "aimbot" abilities. They have such great precision from their "targetEnemy(obj)". They know perfectly when in range or not. They are never looking on the other side of the map.

Without this they would get stomped.

I think you're right. The mechanics of the game are HUGE. From misclicking, miscasting, mistiming a spell, last hitting, etc.

Because Medu hasany counters and for her own counters a few heroes (terror blade for example)

Question for the OpenAI team: have you ever thought about applying Five to other games, as it is?

I'd suggest Total War: Warhammer II (it has an interesting competitive scene, and has both tactical combat, and strategy gameplay), which is very different than DotA 2, but it would be super intriguing to see how it performed, and how fast it could learn.

If you built Five in the right way, it should be able to learn with mostly any other game.

I can also imagine you offering this to gaming companies in the near future, so that they could provide a decent computer AI instead of the crap they usually offer now :)

(source: I used to be a videogamer in my teens, and occasionally still play some strategy games, not as often as I would like to :D)

IIRC some of the main reasons OpenAI chose Dota2 were that it ran on Linux and had a bot API. Does Total War fit these requirements?

Starcraft is what you are thinking of, specifically sentdex makes a lot of tutorials regarding this.


DoTA has always been a much more micro focused, whereas starcraft is both macro + micro focused. Infact, the popularity of the game stemmed through this. MOBA (the genre DoTA is based on) is essentially the "Fun" parts of starcraft, controlling and leveling up your hero

MOBA is from a game called Aeon of Strife, a custom Starcraft 1 modded map. I played it a few times growing up.

When I hear about AI and video games, I get slightly confused though with this terminology. We've had hardcoded "AI" for 20+ years already that's actually been fairly decent. "AI" was also the official term used in many of these games.

When I hear of AI and video games now, I assume everyone is talking about neural networks, machine learning, etc.

Starcraft. But I think it’s far more challenging to build AI for.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact