Hacker News new | past | comments | ask | show | jobs | submit login
How to Train Your OpenAI Five (openai.com)
87 points by dsr12 on April 17, 2019 | hide | past | favorite | 44 comments

I'm still kind of disappointed with how much hubbub came out of these games. I watched most of them and it just in general felt like mediocre dota even against the pro team. I'm not saying I am amazing but I wish this article came out sooner to kind of put a reasonable contrast in the stuff that was coming out. While I enjoyed the idea of the drafting phase it is hard to not look at the hero pool available and see only two or three reasonable lineups (both of OG's lineups trying to take advantage of invis heroes were both weak in general in the current version), and I think the shown drafts display that. It's hard not to feel like that OpenAI solved the problem of that limited hero pool and ended up with a superior team to humans who weren't familiar with that version of the game.

I don't think that's a wrong outcome, but people trying to frame it that OpenAI beat OG at their own game in the same lines as go, chess, or checkers seems misleading. I think I am just still feeling sensitive to the sentiments and outcome that ended up being the outcome of systems which got on the press pipeline like watson. I wish it was more celebrating how they were able to create this ensemble learning environment, so I'm hoping for that more detailed analysis especially how they were able to adapt (and hopefully not throw away) the learning they did after some of the really large patches.

There are so many ways in which I feel this argument breaks down.

* OG would absolutely handily win against any non-professionally-contracted dota team under the exact same challenge conditions. (i.e.: OpenAI Five's test was about as good as they could give it at this stage.)

* OG's play within the scope of the game was novel, in that they were deliberately picking strategies that OpenAI Five didn't "understand" (i.e.: creep skipping) and still lost.

* OpenAI's play was novel, even within the constraints of the pool: No human team would pick the 4-core + CM lineup that OpenAI favoured. Even if you played with that limited hero meta, 4-core would likely not end up being a deliberate strategy picked by human teams with any frequency.

* OpenAI's play in all team fights was demonstrably better, playing exactly up to the limits of the heroes they were using and no further. (It is, in fact, OpenAI's teamfight which won them the game.)

* A number of the reductions and restrictions in hero-pool were to allow for OG to have a fighting chance: All heroes with controllable illusions or clones were banned because OpenAI had too much of an advantage in "micro".

If you sum all of those things up, I think it looks painfully obvious that OpenAI Five, if trained against the full hero pool, would completely and utterly dominate gameplay at the professional human level. It wouldn't even be close.

> OG would absolutely handily win against any non-professionally-contracted dota team under the exact same challenge conditions.

Because OG are mechanically better than them.

Flawless reactions make for a big difference (see euls/hex scripters for the impact of that on a game). Throw in perfect global state and zero mistakes, it becomes very very hard to out-skill.

The question that people actually care about is whether the AI can be strategically better. If you watch pro dota you'll see that there are often teams which flash, briefly super strong and then fall off as better teams study them and learn to counter their play. OG themselves have had serious problems with CIS teams that play fast (making them arguably the worst team vs a fast playing AI).

The hero pool is also important, as currently it's very limited in offering counters to deathball.

> OpenAI Five, if trained against the full hero pool, would completely and utterly dominate gameplay at the professional human level. It wouldn't even be close.

Fundamentally disagree. The mechanical advantage increases but the strategic disadvantage also increases. Time will hopefully tell.

> The question that people actually care about is whether the AI can be strategically better.

We already know the answer to that; computers dominate humans strategically. There are two types of strategic decisions - those made using maths, and those made by guessing. Computers are perfectly capable of doing both, and better at at least one. Besides, Go is far more strategic than DotA and computers are better than us at Go.

The reason we are seeing research in videogames is purely tactical. Historically computers couldn't compete tactically with humans because there were too many variables and decisions to write up an if-then loop that made sense. DotA is 95% a game of mechanical domination, 5% strategic concerns. The strategic concerns could be modelled with some pretty basic Bayesian models and probably be world class; it is item-hero combinations, a few rules about location on the map and guesswork.

All the challenge is taking that strategy then implementing it mechanically - doing all the stuff OpenAI 5 is demonstrating regarding jukes, tactical positioning, spell timings and judging relative strength with incomplete information.

DotA isn't as fast as it looks either, it is quite a slow game in terms of reflexes. Reflexes tip close games, but positioning tips the average game. OG was losing because computers are just better at estimating margins of safety.

> A number of the reductions and restrictions in hero-pool were to allow for OG to have a fighting chance: All heroes with controllable illusions or clones were banned because OpenAI had too much of an advantage in "micro".

This one is not so certain. One of the OG players (Notail) mentioned after the game that the AI definitely had a few weaknesses such as:

* AI didn't bother checking the trees after the splitpushing hero disappeared from the map.

* AI doesn't handle invisible units well.

* AI has poor warding.

He said he regretted not exploiting these weaknesses more. There are certain splitpush ("rat Dota") heroes that potentially could have helped OG to the detriment of AI e.g. Antimage, Furion, Morphling, Juggernaut, etc. Instead, what we saw was OG using a non-traditional split push hero (Viper) from the limited hero pool try to split push, which failed because the hero got ganked too often (Viper does not have natural escape mechanism unlike many of those splitpush heroes above and he didn't itemize for splitpush e.g. Shadowblade first item) and it doesn't push very fast. With these splitpush heroes and well-placed observer wards, I think a splitpush-based lineup could possibly take the game.

OpenAI's play was very impressive. I did not see this coming after OpenAI lost to some of the weaker pro teams at last year's TI. However, I would give OG even chances at winning a longer series (Bo5 or Bo7) over 5 or 7 days where they could learn from OpenAI's playstyle and figure out strategies on how to exploit them similar to how in longer Dota 2 tournaments, we see initially successful strategies on Day 1 get countered by the tournament final. Another analogy is the 1v1 bot that OpenAI released. When released, even pro players would lose to it. However shortly after, the combined intelligence of the Dota 2 community discovered an exploit where if you lead the enemy creepwave into the jungle, the AI bot glitches out and the human player eventually wins [1].

[1] https://www.reddit.com/r/DotA2/comments/6t8qvs/openai_bots_w...

> AI didn't bother checking the trees after the splitpushing hero disappeared from the map.

This isn't true always, AI did check trees near radiant bot T1 in the co-op match.

Also, I am hoping OG competes against OpenAI during this weekend and streams it. It would be exciting to watch.

Not sure I buy the argument that humans can learn to beat an AI by exploiting its weaknesses. This argument has been made many times. Kasparov said something similar when he lost to Deep Blue, yet not too long after that chess engines became unbeatable.

That argument will fail if the game is “solved”, OpenAI with current limited same-kind-of-heroes hero pool is far from it. OpenAI current heroes pool is just about exact execution/coordination, which in and of itself is amazing nonetheless but strategical heroes like ember spirit/arc warden/nature prophets are those that bring Dota2 to much higher strategical levels.

Well maybe, but so far we have seen that AI improves quickly enough that other factors are irrelevant.

What I am generally pointing out is that if you had a professional team or a high level amateur team that spent a year practicing a fundamentally different version of the game and they beat OG that really doesn't say a lot. It's like if a pro team played an odd custom game mode against people who played it all the time and already knew some specific gimmicks. Dota just in general is much less a hard input mechanics game compared to most which makes it interesting as a project for AI research but makes dropping in a bunch of players even harder. In my opinion if an FPS game played competitively is 80% raw input skill and practice/20% knowledge of the game, dota is closer to 30%/70%.

Putting a Riki mid in the first game and a Slark pick the second game doesn't really tell me a whole lot, since Open AI's team bought dust against Riki and the hero pool says 'early push and win or lose' which generally defeats a Slark that generally needs items this patch (47% win rate at this point). They clearly indicated in prior posts that them buying dust and dealing with invisibility are part of the skills, and it's really not hard to end up with a strategy of while in an attack/aggressive state and see Riki, use dust if available.

Many teams use a 4 core/hard 5 strategy at the not top tier and it's reasonably effective. The 4-defend-1/3 core strategy generally only works because Roshan/Aegis is in the game to make that min/maxed core that's ahead take additional chances with less risk and also when there are clear heroes that are just a tier above the rest in the current patch or meta. If you take that out or remove aegis planning from your team 4 core is almost trivially the way to go. Even in my casual experience this is generally how I've ended up playing so maybe it's less surprising to me. Once again I point at given the format was a professional team that is built around a 3 core strategy/mindset really the right team to play in this mode?

The two drafts for Open AI were Sniper, Gyro, CM, DP, Sven and CM, Gyro, Sven, WD, Viper. Both of these are push to win at 20 minute drafts and they were played as such. I did not feel like OG did enough ordinary/basic strategies given to carry them to the mid-late game like their drafts indicated they wanted to do (ES, WD, Viper, Riki, SF and Sniper, ES, DP, Slark, Lion). They tried some clear gimmick strategies just to see if the bots were totally broken, if they intended to do creep skipping they would have picked Axe who was in pool. It's hard for me to felt like they played better when it was the natural power spike of their draft, as if Open AI was held off or maybe a more reasonable hero was chosen other than Riki/Slark and a more coherent strategy with what was provided was used it would have been a better game when it hit 30-40 minutes. Instead their experiments with gimmicks didn't really pay off and they lost to the hard push team and didn't have a strong hero set to push out the lanes and create an opportunity if they failed.

Also, the Open AI drafts are ones that I feel like I were constantly playing in 2012-2013ish until a lot of the new heroes were added to the pool to make them less popular and relevant as they introduced movement/displacement mechanics that make the original hero pool less practical. Since those new heroes are taken out, this old form got put back in.

I really doubt that illusion control/micro control would have really made a difference and if it did I would have actually preferred to see that because that is new and novel. Dota is very different to previous RTS game AI's where every unit if controlled properly was of equal strength and could be maximized. Dota illusions generally are just worse in every dimension and have value in disjointing spells, doing rat things, vision, and maximizing some items/abilities that have fixed attack modifiers but otherwise are not a primary source of damage (100 damage with a 33% damage illusion after 15 armor is like 17 damage which is not a big swing when heroes regenerate constantly). Compare this to mutalisk micro where a perfect division of an attack from a group can mean two or three units instantly killed instead of one which generally would be the limit of human control. Divide and conquer with illusions for attacking purposes I would say is generally a bad strategy because I still believe killing the hero in front of you before attacking the next guy is still probably the best option. I find it hard to believe that perfectly controlled illusions would affect the game in a meaningful way compared to highly skilled players, and would have loved to see that. Sure some really great Meepo players can shine, but the game inherently has solutions to that problem and most others (echo slam, burst magic damage). In fact that's the kind of thing that I enjoy the most out of watching AI driven gameplay is seeing when they push the limits of what humans can even accomplish and to see if I can learn anything from it.

There are also plenty of other real world equivalents that once you change the rulesets that other people do better like speed limiting in NASCAR at Talledega, a football team that is better in the snow, or a hockey team that plays in a rink with different dimensions. I just wished that we had ended up with a more reasoned view of what happened and could celebrate their actual achievements in context than saying they beat a pro team at their own game which is kind of where the headlines have been going. This is just fundamentally different to what Alpha Go or Deep Blue did where they soundly and completely conquered on the same playing field.

I was making slides to discuss this model in my deep learning class tomorrow. From what I can find on the model, it isn't that complex -- https://d4mucfpksywv.cloudfront.net/research-covers/openai-f....

The real trick seems to be just massive amounts of self-play. This model had the human equivalent of 45,000 years of experience playing DOTA 2, with 250 years of experience per day.

Please help me by correcting any misunderstanding I have:

So the large section is run for each unit that is known to the agent. For heroes, each modifier / ability / item that hero possesses is processed. Since it’s max-pooled, only one of the modifiers / abilities / items would actually be considered by the agent. Later, the known units are grouped by their relation to the agent and max-pooled again, so the agent would only consider a single enemy non-hero and single enemy hero at a time. Also, what activation function is used for the FC (fully-connected) layers that aren’t relu?

Outsider here. How do you measure the equivalent experiences of a model? Tens of thousands of years ...... how is that possible? Does any timing trick involve in your training/evaluating phase?

The game has an in-game clock, and there are regular "timed" events that take place throughout the game that are basically a hard cap on when you can realistically win the game. (i.e.: To earn gold & experience, you must kill 'creeps', and waves of creeps spawn every 1 minute in game. You'll need a certain amount of gold & experience to have the in-game strength to take down defensive towers.)

An average human game lasts 30 to 45 minutes. If OpenAI simulated 1,000,000 games, that'd be over 28 years of non-stop gameplay to a human.

It makes me wonder, what to do then if an game of interest does not have such timing mechanism? To put it another way, how to train the agents as fast as AlphaGO (no timing issues at all) or OpenAI Five (in-game overclocking) if the environment can only be simulated in the same time scale?

Any thoughts?

Speculating: the game has a clock. I assume they are running the game sped up, at a faster tick rate than humans play at, and also many instances, but if you could sum how many game-seconds elapsed in all the training games you could come up with a number that would make sense.

It's not full scale DotA (limited hero pool makes a huge difference imo) but it's still incredible.

Obviously, this limited rule set is not pure DotA, and so the humans we're at a slight disadvantage.

The results of letting anyone play against it will be the huge test - AI often fails to "cheese" strategies, and this will be a good test of whether any exist. I'm excited to see the results.

They told they didn’t choose all heroes as the AI was already much better at micromanaging the creeps. Limiting the selection apparently gave humans a better chance to win.

(edited for more info) That's incorrect. Each team has strategies and preferred heroes and OG won the international ( a sort of world championship ) by ignoring trends and sticking to a limited preferred hero pool. Since then, they've been consistent at underperforming vs. other human team.

So by limiting the amount of heroes, humans are clearly at a disadvantage. In the last international, from the hero pool, 110 heroes were picked and 98 were banned.

The micromanagement claim seems fair when it comes to illusions.

There's a line between puff-up marketing and being actively misleading and OpenAI continually crosses it.

With only 25 heroes allowed (of > 100) they were unable to get >5k MMR...nowhere near pro level. What are they at with the full hero pool? 3k?

I'm not trying to cast aspersions on their AI work - it's obviously an incredibly hard problem. But instead of admitting that they failed to make a competitive team, they instead play a stripped down game that's easy for their bots' limited strategies, and then publish triumphant marketing releases claiming mission accomplished.

This seems to be a common occurence whenever new AI achievements pop up. A lot of "now that we've beaten xy" comments were also uttered around the starcraft match for example, but from the one game that was played without giving the AI a camera advantage it was clear that the play was still subpar.

The industry and researchers in particular should really be more careful with their claims, and honestly also try way harder to actually completely solve the challenges in question.

This is such a pessimistic view of things. As the hero pool expands, the training challenge increases at the rate of n^10. So going from 16 to 25 - we have ~100x blowup in hero combinations, likely requiring significantly more compute to train.

  OpenAI Five is the first AI to beat the world champions in an esports game, having won two back-to-back games versus the world champion Dota 2 team, OG, at Finals this weekend.
That's their opening sentence, and it's actively misleading.

It's pretty sad people excuse this. Every single press release OpenAI have made in the history of this project has been actively misleading.

What they've achieved is actually impressive, too, so the deception is doubly annoying.

... what do you suppose will happen as OpenAI transitions to being a more commercially oriented company with a fiduciary duty to its investors?

A lot of pros don't have a hero pool that large. This sounds like the self driving car problem, AI has to be perfect despite the fact the humans are worse.

The bigger issue is the diversity of the pool, more so than the size.

True, any given team will tend to strongly favor a subset of heroes, but that subset by necessity must allow for a wide range of play styles and strategies.

The limited pool for OpenAI largely favors 5-man "deathball" tactics, which OpenAI plays fantastically well (an amazing achievement in its own right), but also excludes all the heroes best suited to countering that strategy. Sniper, for example, is an extremely strong hero in just the right circumstances, but is generally considered fairly weak and readily countered (in the current patch) by heroes such as Phantom Assassin or Spirit Breaker, neither of whom are present in the OpenAI pool.

Aside from anti-deathball specific counter picks, heroes that favor a wildly different approach to the overall strategy (split push/"rat" dota) such as Tinker are also excluded.

Most importantly, being able to work out what kind of strategy the enemy team is going for during the draft and adapting your own picks and bans accordingly (and luring your enemy into useless bans or counter-picks) is a huge part of the game. Once the draft is over, teams then have to be able to adapt their desired strategy to the reality of the team composition they landed on, and the best approach may change several times over the course of the game. I really want to see an AI that can realize that something isn't working in its current deathball strategy and that it needs to switch to playing both sides of the map at once by keeping the enemy occupied in once place while taking objectives in another (or vice versa).

To reiterate, I am incredibly impressed by what OpenAI has already shown, and I look forward to seeing where they go next, but I do think there are some very interesting problems in the full game that OpenAI has yet to demonstrate a solution for. I have similar slight disappointment with the recent AlphaStar StarCraft II project, which was a similarly amazing piece of work but also seemed to indicate that each agent (swapped out after each match with the human player) was only really able to execute one strategy and had no ability to recognize that something wasn't working and adapt (as seen in the final (and only) loss to the human player).

So, do I think that human pro players would defeat an OpenAI trained on the full unrestricted game in a best-of-three match? No, almost certainly not, but I do suspect that the humans would win in, say, a best-of-seven, if they had the full suite of tools available to allow them to discover and adapt to what I see as fundamental shortcomings of the AI.

>So, do I think that human pro players would defeat an OpenAI trained on the full unrestricted game in a best-of-three match? No, almost certainly not, but I do suspect that the humans would win in, say, a best-of-seven, if they had the full suite of tools available to allow them to discover and adapt to what I see as fundamental shortcomings of the AI.

For now.

I'm optimistic that we'll figure out a solution in the long run, but I have yet to be convinced that any modern machine learning architecture will be able to get us there.

In essence, all of these game-playing models are playing by "instinct" (and playing very, very well), but still lack the ability to "jump out of the loop" to critique and adjust their own behavior on the fly. This is why the AlphaStar agent was vulnerable to being manipulated by the human players drop-harassment, and I believe OpenAI will demonstrate similar weaknesses once it opens to the public this week.

Machine learning has opened up all kinds of incredible advances that we can apply to the real world (see Boston Dynamics locomotion, or OpenAI applying the approach in their Dactyl robot hand project), but machine learning alone can't take us to general AI and I suspect it will always remain susceptible to this kind of deliberate manipulation.

They WERE at 5k. But this week they beat og.

This is incorrect. From their post - they beat OG at 17 heroes in the pool. At 25 heroes in the pool they max'd at ~5k.

I think what people don’t get when they want to celebrate this is that there is no dominant strategy for a game like this. To play optimally you need to adapt your plan according to what your opponent goes. If anything, there is evidence that this AI, as well as the Starcraft one, don’t do this. They just have a world view of what works best and try to do that. And then it kind of works because the AI is much better at micro.

It would be very ineffective if this AI had to input everything through a human body somehow. Now to be fair, it would be very difficult to not win on micro. There’s no real set of constraints I can think of that would make it a fair match.

Ideally they should probably be shifting to a game where micro just isn’t as big of a differentiator. Or, consider moving to a game where a strategic player issues commands and human players execute them. Off the top of my head I can’t think of anything that really fits that build.

Alternatively, spy party (an incredible game) could be an excellent candidate as it requires deception, and a strategic awareness of what the other player is thinking. And, at a high level of play, micro is practically flawless.

Dominant strategies in the game theoretic sense work regardless of what you do.

Chess is a strategy game with no micro. So is anything turn based.

and that’s why I said there’s no dominant strategy. It does not exist.

Chess has no micro, but also has no hidden state. You don’t really need any concept of what your opponent is thinking to play optimally.

Poker ... has dominant strategies. Has hidden state.

They should use a pair of robotic arms

I’m intrigued by their “5k” with 25 hero mention. Are the bots worse because they have less proportional training time or because the extra hero abilities are making it more difficult to rely on their deathball strat? Considering how every public performance of openAI has shown nothing but relentless aggression Id speculate that it’s the latter.

As continues the trend, people constantly write off the next improvement in AI as "not that big of a deal" or "not actually AI because X" or "underwhelming". I suspect they'll be doing this all the way up until AGI.

It's underwhelming because the titles are misleading, not because the achievements aren't impressive. Beating humans by outhinking them and still using human-like reflexes and all game resources available is like 1000-cool-points, and beating them as they are now is 500-cool-points. Yes it's an incredible achievement, but they get you into the article with a misleading line, and so it's to be expected that all of your hype is deflated as you read, rather than building up your excitement. I just with they'd be honest about the shortcomings in their AI.

It's like the joke from Futurama about why humans don't watch the Robot Blernsball League (basically future-baseball).

Bender: Now Wireless Joe Jackson - there was a blern-hitting machine.

Leela: Exactly! He was a machine designed to hit blerns. Wireless Joe Jackson was nothing but a programmable bat on wheels.

Bender: Oh, and I suppose Pitch-o-Mat 5000 was just a modified howitzer!

Leela: Yep.

you realize this is qualitatively, fundamentally different from that, right...

how about this prediction: people will be suggesting that their machine learning models are a big step towards AGI, forever.

“In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months...”

Clearly very exciting and a remarkable achievement but does anyone else marvel all the less when such colossal resources are needed?

Obviously it’s not entirely brute force but it seems much too brute force to be considered “intelligent”.

Given how many years of brute forcing through the solution space it took for evolution to pre-train a human brain, I'm guessing we are not intelligent either.

Well less than that for one and to do a whole lot more, at a far lower wattage :)

Has openai released the code for how they are training the models?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact