Hacker News new | comments | show | ask | jobs | submit login
OpenAI Five Benchmark: Results (openai.com)
281 points by runesoerensen 43 days ago | hide | past | web | favorite | 122 comments

I'm really excited to see the limits get lifted, in particular around items and couriers.

The team of semi-pros/ex-pros/pros that played against the AI commented that the AI was using the highly unusual 5 invincible couriers to enable a style of play that isn't possible in normal dota. The AI's solution to dota was unrelenting aggression once a small early game lead was established after ~10 minutes. This early aggression was possible because the AI was able to ferry items (healing consumables) with such frequency that once the human team made one mistake in a team fight it wasn't possible to recover.

Normally after winning an objective you are forced to reset as you have expended a lot of resources to order achieve your objective, and you are actually most vulnerable to a counter play right after you win something big. With a constant stream of healing this risk was significantly reduced.

Also, the drafting is a clear limitation that needs to get lifted. The AI was essentially pursuing a "death-ball" strategy of grouping as 4 or 5 and pushing right down a lane, which can be countered by picking mobile and fast heroes that can put pressure on different parts of the map and slow down the "death-ball" by forcing reactions. However, none of those heroes were draft-able, and so the humans were forced to play the AI's strategy. The AI's strategy favors team fight coordination (laying stuns correctly, and correctly calculating whether damage through nukes is sufficient to kill any particular target) and reaction times, at which the AI was clearly superior to the human team.

Great summary.

I'll add:

- The coordination between AI bots are clearly beyond human level. Or at least as demonstrated from similar performance from the humans on similar style of heros.

(It might not appear too different from the show match, but based on my 2-3k hours watching pro games, the coordinations are noticebaly better than the best team in history, aka Wings gaming 2016 TI champion).

I am not sure how such coordinations are modeled in dnn, which itself seems the most valuable from this research.

- In general I think with this show match, it pretty much sealed the doom of human players in dota2.

As it shows that the general approach is scalable and capable to handle the problem itself. As from laning to team fight, and item building, the AI did not show weakness at all.

I was worrying about AIs general inefficiency in deriving the winning strategy, laning stage, and team fight coordinations, which turns out to be obviously superior to human players.

Drafting probably will be even more favorable to AIs. The challenge would be can they train faster by observing the change log, I.e. finding winning strategy without training from scratch each time after a patch release.

I seem no reason AIs lose to vp/liquid/lgd (the top 3 going into ti8). The idea that split pushing hero can deal with the team fights seems underestimate the AIs discipline, which is clearly superior to the best humans.

- Last is how much computing resources are used in the training and playing. Hopefully value can team with open ai to release a benchmark bot team for calibration and a different ladder systrm of playing different AI strength level

The AI did show lots of areas of weakness, which still need better training. Their warding was very awkward, they never used smoke ganks, and they didn't play optimally against invisibility. The human team got a lot of mileage out of rushing a Shadow Blade, which is a cheesy strategy that a coordinated team should be able to easily counter.

That said, the AI still won despite these limitations, which shows that it is very strong in other areas of the game.

> As from laning to team fight, and item building.

I was disappointed to hear one of the devs say that item selection was hard-coded following a popular guide (don't think they communicated this before).

This is OK, because what matters the most is that it won. It was previously impossible to build hard-coded AIs that would even beat decent players, and now the AI has beaten some pro players (albeit not the best of the best); but it'll still be nice to see superior item buying strategies being learned.

It's not as bad as it sounds. The bots are hard-coded to follow a item guide that is published by TorteDeLini, but at the same time many human players do the exact same thing. I'd say that most players blindly follow those guides from 1.5K to the low 3K mmr brackets most games.

As someone who plays Dota, the matches were a clear demonstration of sloppy snowballing and brute force cheesing. Coordination 'appears' to be beyond human level because the bots are collectively synced on a team goal value. If a true 'pro' or pro team was allowed to observe and play multiple games against this bot, I'm more than certain one could find an exploit of such an unintelligent mathematical approach to action selection at the individual or group level. In fact, this would be what you would use dark seer strategically for or a whole host of other characters and functionality that is currently banned.

Not everything can be calculated especially when tricks are intentionally done to throw a bot off.

> I am not sure how such coordinations are modeled in dnn, which itself seems the most valuable from this research. There's a tuned group/individual driver function centered on various calculations. This is not actually a valuable part of the research as its dynamic and game dependent and can't cover all of the possibilities thus why someone broke their 1v1 bot (corner case)

> In general I think with this show match, it pretty much sealed the doom of human players in dota2.

If you are indeed a player and have viewed that many hours of dota2, I question the nature of such a comment. A great player wouldn't look to see how to 'beat' their bot, as it is a bot w/ no intelligence, the strategy would instead be to try to break it and shove it into corner cases. It's not playing the full range of characters that were intended to disrupt cheesy snowballing so I wonder why you're making such an optimistic statement .. being that you claim you have watch so many dota matches. Do you play much yourself? Maybe that would change your opinion.

> As it shows that the general approach is scalable and capable to handle the problem itself. As from laning to team fight, and item building, the AI did not show weakness at all.

I'm starting to see a pattern with your commentary. The gameplay look like your typical "south" players. Hardly anything impressive : Aggressive boneheaded tower diving and aggressive and cheesy snowballing.. If you can last past 30min, you outwit and outplay such people in the mid/late game.

> superior to humans > superior to humans > superior to humans

More than 50% of the Dota 2 dynamics aren't even present and are restricted at the moment. Are you getting paid for this post?

> A great player wouldn't look to see how to 'beat' their bot, as it is a bot w/ no intelligence, the strategy would instead be to try to break it and shove it into corner cases.

I cannot see any differences between "beating opponents" vs. "break it and shove it into corner cases", that's always how dota games are played out.

Unless we formed different views on how Dota is played through our thousands of time watching games (I sensed from your statement that you had similar gaming experiences :) )

I have thousands of hours logged playing various Dota variants. I rarely watch as it's not a good way to retain skills or understand what's going on. If you play enough, you should understand exactly what I'm saying. I watch on rare occasions to get exposed to something I haven't thought of or tried but that's about it. Beating an opponent especially a good one is far different that 'breaking a bot'. What I mean by 'breaking the bot' is literally doing things to confuse an unintelligent mathematical algorithm into constant miscalculations and mis-predictions and for bonus points for finding a bug in the way it individually or collectively performs actions/interprets data. It's how their first 1v1 bot is defeated. This doesn't occur to a human being because we have intelligence and aren't just optimization algorithms purring along using game state data.

This is what Strong AI is centered on.

Your point regarding the fact that the bots ‘may’ not be adaptive to surprising strategies is a good one. We do not know for sure in the case of OpenAI Five as there are too few public games to look at.

AlphaGo Lee (the version which won 4-1 against Lee Sedol) did seem to get thrown off track by Lee’s surprising move and lost that game.

However, AlphaGo Zero, which is based on some of the same principles/sets of algorithms, were much stronger than AlphaGo Lee (More than 3 stones according to DeepMind. Three stones is about a difference between top pros and top amateurs/beginning pros.) and seemed like it would be insusceptible to any surprises thrown its way from human experts.

The difference was that AlphaGo Lee learned from play records of human Go experts while AlphaGo Zero did not and only learned via self-playing. Dota 2 is clearly more complex than Go but if the same principles apply then an AI trained from pure self-plays would be adaptive to most surprises in the domain, if the system had explored those edge cases before (which depends in turn on how the self-plays were conducted during training).

(As a side note: OpenAI Five probably chose the “simple-minded” snowballing-cheesing strategy because it determined from extensive experience that the strategy is most likely to yield a win given its capabilities (which are advantageous to humans in some respect like instantaneous global information observation, great coordination, consistency, etc). This is very different from the reason some human players choose the strategy. Perhaps precisely because Five bots don’t get sloppy that the strategy is so effective for them.)

My feeling is that humans are not adaptive to changing game flows either.

Most pro games are played with a strategy that is settled once draft is finalized. If the strategy turns out not working, Humans did not show noticeably different adaptivity.

Occasionally, a versatile team can transition from a late-game oriented line up to play a split push game. But usually such transition is based on a suiting draft, which requires the team members to be versatile in playing their heroes in slightly different styles; and a well-oiled team coordination to transition from one to another style.

> the “simple-minded” snowballing-cheesing strategy

In the show matches, there is no cheesing. It's plain team fight + push; the AIs executed the plan with ruthless precisions.

TBH, a typical pub game is best described as strategy-less game play. And pro games probably have 3 styles of play:

- Team fight

- Stick-together push

- Split push

The most close team that shows vastly better versatility is Wings gaming, which pretty much run any lineups they feel fitting.

Sadly the team disbanded after TI6, otherwise, their match against with OpenAI would be the most interesting thing I can imagine.

> My feeling is that humans are not adaptive to changing game flows either.

They are. I'd actually argue that this occurs much more in a pub game than with pros. I'm largely against the concept of a pro for this reason as it amounts moreso to having settled on lockin strategies moreso than intelligent/active/dynamic exchanges. I play a lot of pub games for this reason... To enjoy the heightened dynamics. Tons of rotations and adjustments. Tons of punishments for a great player hot dogging to break their psyche. Lots of very intense examples of dynamic human intelligence.

> Most pro games are played with a strategy that is settled once draft is finalized. If the strategy turns out not working, Humans did not show noticeably different adaptivity.

You're speaking moreso of 'pro games'. I encounter a great deal of dynamics outside of this grouping... It's where a lot of intelligence comes into play. I think a lot of people are completely uninformed about the game who watch others play a lot w/o actually playing themselves. Pro games are literal theatre for the masses like in a large number of professional leagues. The real stuff happens outside of the spotlight.

> Occasionally, a versatile team can transition from a late-game oriented line up to play a split push game.

This happens in just about every game I play... Tons of rotations and readjustments when things aren't working out [happens sometimes w/ no communication]. Tons of split pushes.. Strategic ganks. If people have good emotional stability, there will be a pronounced reflection/change after a massive team death incident.... The point of these games are highly intellectual battles. Its why it's a disservice to restrict any features of the game. It's how they maintain balance to avoid the game devolving into idiotic bot like cheesing. Games live and die based on how much cheese is present.

> But usually such transition is based on a suiting draft, which requires the team members to be versatile in playing their heroes in slightly different styles;

I'd expect pros to have these skill-sets yet I don't see it much because in such showcases its more about optimization and lockin strategies than dynamics.

> and a well-oiled team coordination to transition from one to another style.

Happens in regular pub and ranked matchups all the time many times w/ little to no communication. As long as someone is not an emotional child, it can sometimes be stressed and instantiated over prolonged swearing and yelling at various players. This is what's maybe missing from the Pro-league... Someone getting in your ass openly for doing something stupid like continuing to battle 3 well organized bots 3v1.

> In the show matches, there is no cheesing. It's plain team fight + push; the AIs executed the plan with ruthless precisions.

One of the games opened with 4 bots diving bottom tower to get a kill and persistently pushing bot. The human 'pro' sat there hashing it out even though he could have ran to safety and avoided another death and no one from top or mid tp'd to bot on the human team. This hamfisted cheesy snowballing occurred in every match on the bot's behalf because OpenAI restricted the gameplay to favor it. Even so, in a pub someone would have been swearing to the top of their lungs on a mic telling the $@(#@(%* at top/mid to immediately TP and punish such a brazen exchange especially with creeps all over them. Absolutely nothing was precise about the gameplay from the humans or bots. It was the kind of slop I see on servers from the southern portions of the world and punished heavily by any seasoned players. I guess this is where 'pros' are a meme and I've served a good number of them up with gameplay outside of their carefully scripted comfort zones.

> TBH, a typical pub game is best described as strategy-less game play. And pro games probably have 3 styles of play:

Typical pub is chaos which is why I've seen a number of pros get their behinds handed to them in it. They're sort of like bots in that they think they have the game completely figured out and have a golden strategy no one can defeat. It's a flaw not a good trait. In ranked, you're going to see some amazing gameplay even w/ random non-party individuals. Anyone who plays knows about the games where its like a symphony playing. Limited talking, tons of rotations/ganks/readjustments/team fights/split pushes/team pushes/ratting/baiting/etc.

"Pros" are not Pros in my book. They're a group of players who center on a optimal echelon of gameplay that everyone at that level tends to agree upon. Throw some dynamics in and they fall apart.

What I saw across all of the OpenAI bot games is nothing to go home writing about. If they were true to things, they'd show how these bots play in all-pick no restrictions. They claim to be after Strong AI not Weak based game bots. It's not about winning/losing... It's how you play.

This is enough of my personal commentary on this issue. People are unable to see past these approaches and what they truly are and that's fine with me at this point.

Catch you on the flip side.

Really curious: If pub-styled plays are much more adaptive/superior to pro-leagued style as you said, why don’t top pub players simply team up (and perhaps sharpen some ‘simple’ micro skills) and take on pros in TI to win > $10 million prizes and live really well?

What would really happen when pub-styled play is actually used in pro scenes for million-dollar prizes? If it is actually superior, why none of the pro teams caught up and tried using it to win?

> We do not know for sure in the case of OpenAI Five as there are too few public games to look at.

Thus the nature of a canned showcase demo. We do know they have a slew of restrictions. As an avid player, I know exactly why : because such combos require much deeper and true intellect to play efficiently. Even as such, given that I know i'd be up against an optimization algorithm, my strategy would be to create as much chaos and uncertainty as possible. Information theory is clear as to the impact this would have : It would be unstructured noise that would be hard to optimize and likely not seen before or significantly reflected in the AI's weighting system. This is the basis of adversarial attacks. I'm sure with a decent amount of games I'd be able to figure out a suitable one for 5 linked bots.

The perspective as to what's going on with this demo is much different if you actually play the game. I've actually seen a number of games like this bot exhibited. It's a strategy low skilled players engage in with the hope of overwhelming opponents with brute force. The character restrictions favor it. So its not by accident that this all converged into a demo that favors an unintelligent brute force optimization bot.

It favors something that can do range/hit point calculations quickly/accurately. Snowballing is required because there is no broader intelligence among the bots. When the bots snowball, it's essentially just one big optimization function. When they're stretched apart, the calculations are much harder.

Knowing what I know about the game and the fact that I'm up against a Weak AI bot with an optimized model, I'd know exactly how to screw it up with an adversarial attack. I'd train a team of people on that and show everyone exactly what human intelligence of capable of and why its superior. This happens in your average dota 2 match constantly.. Low skill players attempt brute force strategies just like these bots and you essentially wait them out and pick them apart. This isn't a new and amazing style of gameplay or something. There's already names for it.

When I used the term 'sloppy' I meant against the spirit and nature of the game and w/o consideration of the 'way in which one wins'... Ambushing towers at open 4v1 or 2 is some very hamfisted foolishness. Even in regular pub games with upper avg. players, there'd be a sharp punishment for such bro-tier gameplay. It usually results in an equally massive 'gank'. The way the human players responded in these pressure scenarios really has me questioning the whole event as I see avg. random players make far better decisions every day in dota.

That's just my unfavorable two cents. I'm not impressed because I understand how their bots are doing what they're doing, where the advantages lie, and I'm aware of what restrictions they placed on the game in favor of their bot.

Elon claims he's worried about a dark future with AI, it's actually solutions like this that are most scary because there is zero intelligence and a [by any means possible so long as you achieve the object] steering function. If you want to unleash chaos and destruction on the world and see a darker side to human intelligence you've never seen before, start releasing such 'weak AI' to manipulate people from the shadows. This is not strong AI or a path to it. It's more of the same Weak AI provided with exclusive and insane amounts of computer power/data and an objective to optimize for by any means necessary. In cases where it dominates, it's almost certainly a reliance on finding loopholes/flaws in a particular game not actual intelligence. You should see the danger in this right away.

Funny because OpenAI originally opened with the spoopy terminator like dangers of AI being so destructive we needed a group like them to save us... To now openly sharing such unintelligent and dangerous weak AI optimization platforms in the mainstream fear. Sort of like the 'Do no Evil' Mantra that was just slogan.

I think this is a great engineer accomplishment that no doubt taught them a lot. I don't see any broader 'safety' ideology underlying this... Just another great team of people trying to achieve AI like everybody else utilizing popularized approaches. It's better to just come out and say that. We can drop the 'Save the world from AI'/'Safety' superman talk and get to the brass tax of what they are doing and how, if at all, its different from what anyone else is doing in the space.

> The perspective as to what's going on with this demo is much different if you actually play the game. It's a strategy low skilled players engage in with the hope of overwhelming opponents with brute force. The character restrictions favor it.

As a dota player who has been in the 99.5th percentile mmr at several occasions (right now at 98th) I disagree with this and a lot of the stuff you're trying to say. Dota is a strategy game, and the meta dictates what strategies are strong at a given point in time. The death ball strategy that the bots played is a result of that being the best strategy in the bot meta. So in contrast to what you said, it's not low skilled players that play these strategies, but rather high skilled players that play whatever strategy is popular in the meta (regardless of how 'intelligent' it is), in order to increase the chances of winning.

My impression is that when rich and powerful people talk about "the dangers of AI" what they really mean is "the dangers of AI (to me when it's not controlled by me)"

It is nothing new, or particularly bad. If we (good guys) will not have (insert powerful technology), then bad guys will have it and everyone will be worse off.

"not everything can be calculated" that's not how neural networks work. it develops a super complex model of the game by itself by playing a huge amount of game, and optimizing to progressively learn to win more

Why wouldn't it have good coordination? A bot has access to a perfect model of how the other bot would act - itself.

Also, computer engines didn't seal the doom of human players in chess and in go, so I don't get why it would do so in dota.

I think by 'seal the doom' he just means that this result shows that OpenAI is almost definitely going to be able to defeat a pro team in an unrestricted game of DotA.

Which I'm still not completely sold on. It's likely, but the remaining restrictions aren't trivial by any means. There's an outside chance that removing one or more of them is going to brickwall their progress.

One should keep in mind that some of the restrictions were in place to prevent the bots from having too easy of a time. For example, the anti micro/illusion rule was intended to limit the obviously superior micro coordination of the bots.

I'm not sure that's true? I can see the bots being utterly terrifying with meepo in a teamfight - but would need supports stacking, proper farm prioritisation (much more use of jungling and ancients), etc etc.

I genuinely believe the bot would win a game of turbo against any team in the world. But remove _all_ of the restrictions and it's not clear that it doesn't just lose at the moment

They specifically said they would have to implement a special case for heroes that control more than one unit in the future.

So you're saying that even before they set up a rule about microing illusions to protect humans from a feature that they have not yet implemented nor, I assume, have trained the model on?

Not only that, but also lets not forget humans learn as well. Meaning the more games players play against the bot the better they would become at understanding and defeating it.

> Why wouldn't it have good coordination? A bot has access to a perfect model of how the other bot would act - itself.

As far as I know it is five (Hence the name) individual AI instances controlling each character and with basically no AI to AI communication.

It is not one overriding AI controlling all five.

I have no idea if the AI instance controlling each character is identical though, if so then your statement still holds true I guess (Assuming each AI has the exact same information to work with which might be the case). It would be interesting to see if AIs specialised.

There's a presiding team value function that impacts and steers team play. The bots 'communicate' through this. There's nothing magical going on.

As a counter bot strategy, I'd work on how to break and trick it using multiple-stepped logic that an optimization function would be unable to see beyond. I'd also use varying tactics of chaotic/sporadic configurations. The bot isn't 'playing fair' nor should a human w/ intelligence. The advantage being that a human can think along a multitude of strategies and adapt. The bot is only optimizing some steps ahead.

Their 1v1 bot was defeated in this manner and it just goes to show what true intellect and superiority is. I've played random pub games w/ little to no communication and have had all other 4 players converge on different strategies based on a perception of what's going on. If someone decided to cheese/snowball, you simply wait it out and let them push themselves into a nightmare. I saw little to none of this in the games I watched which leads me to question the intelligence of said 'pros'.

The team value function is just a hyperparameter that describes how greedy the individual agents are. At the start of training the team spirit is 0 and the bots are only rewarded for their own actions. This encourages them to learn basic micro skills, like last hitting. As training progresses the team spirit is increased. When it finally reaches 1, the bots value a reward for a teammate as highly as a reward for themselves.

The actual source of the "communication" is not the team spirit parameter, but the basic fact that the bots have been trained together and they receive the same inputs when making decisions. Unlike humans, who have a limited focus to their attention, the bots can look at the whole map at once. They don't need to communicate because the already "know" what their allies will do when given the same input.

I think it's because the organizers want to make sure the bots will have a good performance here. Ofc OpenAI is awesome but it's impossible to cover such a complicated game as DotA within just 1 year. What they have achieved though is still awesome.

I'm just slightly bugged by the fact that the developers didn't take action execution into delay consideration. The response time is 200ms but humans also need some more time to drag the mouse and click to perform the action. Their insane reactions actually make me less impressed.

Dota is a bit deceptive like that; it is secretly a slow and deliberate game. IMO most of the deaths occur 2-5 seconds before the action begins when a hero gets out of position.

There was a moment in game 1 that was the exception proving the rule for me. The bot playing lion successfully disabled the human initiator on earthshaker at the end of the game. It looked like a superhuman reaction, but it was also a bit different from all the other fights of the game where it was usually the fundamental position being too far in the AIs favour - they had a gold advantage and had been developing a lead through the entire game by consistently trading deaths 1-0, 2-1 or 3-2 in engagements.

The potentially superhuman reaction took the game from "looks like bots are winning" to "humans resign now", but the vast bulk of the advantage was that the bots simply had a better understanding of which team enjoyed a superior position. I would not be surprised if higher bot reaction times (+100-300ms range) weren't all that impactful on the results.

It'll be really interesting when the courier distortion is removed and the AI has to play more defensively. Also, I suppose the actual, harder to articulate, complaint in the "reaction time" complaint is that the bot teammates have the capacity to chain abilities more accurately and have played so many hours together there is an advantage there that isn't 'fair'. It'll be a fun milestone when they can drop a single bot in a pub game where their teammates aren't all that coordinated.

If we had the compute luxury, I would love to see more AIs trained with faults like a longer reaction time or deficiencies to make them more human.

But yes, very exciting next stages when there are item builds, more heroes, and standard couriering.

I got the impression that they weren't particularly looking for a hero pool with a "deathball" meta, and the hero pool they selected had more to do with those being some of the simplest heroes to program at first. There is a lot of overlap between with the heroes they implemented and the set of introductory heroes that are recommended for new players who are playing their first games.

Regarding the insane reaction, I wonder if there is a natural way to handicap the AI to more human-like reaction times. Reaction time is not a good measure on its own because human reaction time can vary a lot depending on the level of surprise.

That said, my impression from watching the game was that the power of the AI had less to do with perfect reaction time and more to do with their "hive mind" coordination. If an enemy ever gets in the wrong position, they are immediately punished by a concerted attack. Humans have a harder time doing this because they need to communicate their intentions first. Sometimes each player will be focusing their attention on a different target.

Yep. This is clear to even a novice player to be a strategically chosen death ball/snowball configuration. Some of the least skilled players in Dota shoot for it but is undone by a broader range of unironically named 'intelligence' characters. It also becomes unwound in longer matches. Snowball/death ball cheese is easily bot-able and a human being would have a hard time winning vs it because you'd have to range and hit point calculations and act on it faster than a computer which a human being cannot.

> That said, my impression from watching the game was that the power of the AI had less to do with perfect reaction time and more to do with their "hive mind" coordination.

Yes, which is why they restricted character selection to favor it. There's no reason to call it a hive mind as it doesn't possess that. There's a global steering function presiding over 5 bots that need to act swiftly based on global dmg/etc. This is why they restricted the game to snowball cheese. A human can't beat this just as a human can't beat a TI-89. This is why, if you look closely, they absolutely destroy the bots when the team is separated and the human players aren't playing like greedy noobs diving towers.

> If an enemy ever gets in the wrong position, they are immediately punished by a concerted attack.

> bot : (10-9) =1 (I win) Strong AI right around the corner

Has there been any official information on how they came to choose their hero pool?

Not that I can recall.

The drafting limitation is a joke.

Game 3, the OpenAI was forced to reveal its carry pick (HARD CARRY SLARK) as first pick. What a farce.

Watch again. OpenAI picks in game 3 came from the audience and twitch chat. They were purposely making all the bad picks for the OpenAI team.

If anyone from OpenAI sees this:

The one thing that really surprised me yesterday was when OpenAI Five seemed to pause the game. The commentators speculated that “it was learning” because the humans had paused in game 1 due to lag.

I assume that’s not right, as OpenAI five is not training itself as it’s playing (wouldn’t make much sense to add one more game sample to the billions it has already trained on).

I thought it was interesting that the commentators had this misconception, and was wondering what lead to OpenAI Five hitting pause.

A network blip caused all the players to drop from the game.

Incidentally, we'd just changed the code a few days earlier from "automatically surrender when a human disconnects" to "do nothing if a human disconnects; automatically pause if all humans disconnect". Had done a lot of advance planning for what might break!

Off the tangent question here -- Do you have any plans to pit OpenAI Five against the winners of this year's 'The International 8' tournament scheduled for later this month?

Folks in the DotA 2 community are going crazy over this possibility, would be incredible if this happens.

Looks like they are at least playing pros from TI8[0].

[0]: "These results give us confidence in moving to the next phase of this project: playing a team of professionals at The International later this month." from https://blog.openai.com/openai-five-benchmark-results/ at the bottom

Regarding the commentators, I saw that part and took it as them having fun with the situation, not really making a statement about how the technology works. Esports commentators tend to be more jokey/entertaining than traditional sports in my experience.

Exactly, the joke was that openai learned "bad manners".

I think the bot was setup to pause if it ran into any lag or other minor issues (was at the event and heard this from someone there)

The bots did not pause the game. An observer did.

In case anyone would like to watch a VOD of the event, it can be found here: https://www.twitch.tv/openai/video/293517383

It was a really wonderfully done event overall and there's several interviews throughout with different folks from the OpenAI team.

(These are unofficial highlights btw, not mirrors)

I wonder how long it will be for them to build a version that actually uses the screen images as opposed to all the data from the Bot API. Seems like it would be a lot harder when you add an image processing step and restrict processing of information from across the map without actually scrolling over there.

The point isn't to see if the model can process images. OpenAI's goal is to see if they can recreate the ability to plan and strategize over a partial information continuous long time horizon environment.

You wouldn't want AlphaGo to have to input it's commands using robotic hands right? It's the same thing in that, sure it might be interesting, but that isn't what we care about. Image processing and robotics controls are largely solved. Showcasing that a model can gain the ability to plan and think is the novel stuff here, and is the path to where "artificial intelligence" if any appears. That's the ultimate goal in playing any of these games.

>> Image processing and robotics controls are largely solved

Image processing and robotic control are very far from being solved problems. I guess you are saying that in the case of alpha go it would not be a super difficult step to have a camera and robotic hand physically move pieces around, and that's probably true. But I think in the DOTA case are new image processing challenges that interact with the AI in interesting ways.

I'm mostly talking about the need to move the game's camera around to gain more information. If you don't see your ally on your screen and need to see how they are handling a gank or something (full disclosure I don't play DOTA at all this could be a silly scenario). Then the AI would have to recognize this and move the camera to the allies location in order to gain that information. So really the novelty here would be in the network to somehow realize what information it needs and then further to learn how to gather that information. I honestly think that sounds like an extremely difficult next step.

Human beings are restricted to what's on the screen, which includes not only the camera's perspective of the playfield but also, crucially, the minimap in the corner of the screen. Plus there's weird stuff like how you don't have perfect information about the health/mana of your teammates unless you hold down Alt... so yeah OpenAI is "cheating" somewhat and it would be really cool to see, once it evolves further, restrictions that allow it to better mimic human player capabilities.

That said, everything they've done so far is absolutely incredible (especially now that the AI can draft!!)

This is something i noticed; a human initiator would get counter initiated almost instantly, every single time by OpenAI. The blink dagger is much less effective. Pro humans do this too, but not every single time with perfect timing.

Humans dont concentrate on the whole screen, attention is directed...

It would be interesting to take this project up a few levels later on and see how it compares to direct API interaction.

I would love to see camera/mechanical interface like mentioned by others. Similarly, like you said humans don't focus on the whole screen. I would love to see how well the AI could perform if it was given something like blinders where only a small portion of the screen is in focus at any one time much like how human eyes work.

I believe human counter-initiation is only frame-perfect when the human is anticipating the initiation to happen (baiting).

Otherwise, you still have to add time to react plus mouse travel time.

There is actually an interesting paper on that, you can find it on the youtube channel two minute papers.

Basically, they let an AI loose on a simplified version of Quake's Capture the Flag. The AI processes game video output only and has learned several key strategies. The latest update has the AI with a winrate of 71% against top humans. Unlike the DOTA match the AI has no restricted reaction time.

The AI seems to be jittering the camera left and right to reconstruct a 3D image reliable from the screen which is aquite interesting way to compensate for lack of 3D vision (and the compensation our brain is capable of naturally to get a 3D intuition from a 2D image)

> OpenAI's goal is to see if they can recreate the ability to plan and strategize over a partial information continuous long time horizon environment.

If this was actually the goal, they would add further mechanical restrictions beyond the 200ms delay to simulate the way humans players play. That way, human and AI would be on a roughly even mechanical playing field, leaving the differentiating factor only strategy/tactics.

As it stands, it looks like their victory is as much based on raw mechanical superiority as it is strategic/tactical superiority. Computers being able to be pixel-perfect accurate at all times, issue commands at ludicrous speeds, etc. is kind of an uninteresting advantage in the context of building strategic AI.

What in your opinion would be a sufficient delay? 200 milliseconds is well within the average reaction time of a human, especially where at the pro level, I'm betting it's much lower than that.

That doesn't account for focusing on other things. There are a multitude of things to take into account while playing dota that can pull your attention; you can't always directly focus on your character in expectation of a blink initiation.

but that's moving the goal post: why is attention a factor here at all?

Because ~200ms reaction time isn't exactly accurate when people are comparing focus on one action versus focusing on many things at once. Reaction time is going to be delayed then for humans (unless they happen to be expecting it at that particular moment), but for bots that doesn't happen.

So this current ai is uninteresting because bots can always instantaneously begin to react on any feedback, whereas humans have to pan and drag the camera around to look at different feedback in the first place, let alone react. Mechanically, humans also have to move the mouse all over the place and think of key combinations, in addition to reacting. Not just clicking a static box on cue.

It -would- be interesting if bots were limited just like humans to the camera view, -not- an API that continuously feeds them information. The bot would then have to learn how to prioritize working the camera, and it would be limited to only what the camera sees, etc.

I think the ai isn't winning on superior speed and reaction time alone (but it is indeed a factor).

Computers already have perfect memory and recall, so when the image recognition tech becomes good enough to only rely on the visual input, are you then going to say the bot must now limit its recall to "human" levels?

No, but that at least would be interesting, since it would be playing using the same mechanics as a human, and with the same limitations (of the camera, etc). Not using an API.

Reaction time is usually measured as time from stimulus to a button press.

For many things in Dota you also need to move the mouse cursor to a specific point on the screen which obviously takes longer than just pressing a button.

Per the architecture, the model does use a CNN to process minimap data.

Yes but the comment you're replying to is saying that it would be interesting to only rely on information from visual input on the screen, rather than on getting (for example) the absolute XY position of every player as a direct input to the network.

It's like writing a bot that would play chess using video feed from camera. Yes, it's doable and an interesting problem on it's own but completely unrelated to what openai is doing.

Chess is a poor example because it's a turn-based game whereas Dota is real-time and incredibly fast-paced. Visually parsing a chessboard to know which piece is which is trivial, and also you have all the time in the world to do it (between turns). In Dota things happen so quickly, particle effects pop off all over the place, and through all of it, you have to constantly manually re-place your camera in the optimal position.

In chess, both players always have perfect information about the game-state, and this is far from the case in Dota. OpenAI does account for fog of war, so it's not COMPLETELY omniscient, but it is still more omniscient than human players ever have the ability to be, without having to fiddle with the camera etc.

yes I see what you are saying with the chess example, but I think in this case adding the visual layer actually adds interesting problems related to what openAI is trying to do. See my reply to lawrenceyan above.

According to the Q&A at the end of the event, one of the main obstacles for using the regular game as output instead of the bot API is that self play would become prohibitively expensive. The AI plays thousands and thousands of games every day and you would need an enormous amount of GPU resources to render all of those.

That's a bullshit argument. You can seperate the CV portion from the analysis portion and just train the analysis portion by giving it realistcally limited information.

That's just equal footing, I say.

To be pedantic, equal footing here would be about GPU resources needed to interpret the images. The point I mentioned was about the GPU resources needed to render them to the screen in the first place.

I'd say the technology for building a bot with computer vision is already here or close to it. So building a working one would not take long.

The problem, as they stated in the QnA is that the image processing would take much more hardware and computing power, this increasing both cost and training time, as you would not be able to run games as quickly anymore.

During the thread from the competition there were a few comments suggesting that the neural network could easily be run on a personal computer.

Given the massive network architecture linked in the post (https://s3-us-west-2.amazonaws.com/openai-assets/dota_benchm...), I am rather curious what hardware was used to make predictions for the Benchmark match. Especially due to the 2048 unit LSTM.

The model outputs are also interesting; they're all discrete actions (even movements), no continuous outputs (aside from win probability).

I think it just comes down to the fact that discrete action spaces are easier to work with for most RL frameworks (not sure what exactly open AI is using here though).

not sure I follow; what's so interesting there? Wouldn't a continuous output ultimately have to be translated into a discrete action anyway?

The big ones are the X and Y offsets/moves; you'd expect it the output to be an explicit coordinate to move to/act upon (e.g. a specific value in the -400 to 400 space), but per the original announcement (https://blog.openai.com/openai-five/), there's a 9v9 grid space. (-400 to 400 by increments of 100 each axis)

Although making discrete choices 2-3 times a second is indistinguishable from continuous movement anyways.

The included video of "growing pains" - aberrant behaviors found during training - is interesting. This kind of thing is fairly common when training RL agents: some discussion of how the OpenAI team solved these issues would be nice.

Side comment: Open AI's design is excellent and it doesn't get in the way of reading their articles. They've taken a Stripe-like design approach without getting too overboard. Well done!

There’s a reason their site is Stripe-like: they hired Stripe’s creative director:) https://dribbble.com/luddep

and Stripe's former CTO: https://www.linkedin.com/in/thegdb/

>> These results show that Five is a step towards advanced AI systems which can handle the complexity and uncertainty of the real world.

Great results that deserve congratulations- but they show nothing of the sort. The distance between a game and the real world is about twice as big as the distance between a game and a simulation of the real world (which a game isn't even). Winning at dota is nothing like negotiating the real world.

I can see your comments going up to "Weak AIs designing, producing and controlling agile robots is a great step, but it is a long way before they learn enough human psychology to perform as well as a door-to-door salesperson."

I'm not sure what will follow.

Related discussion from yesterday's match: https://news.ycombinator.com/item?id=17693169

Prediction and hope: within 1-2 years, Open.AI will be able to offer a GREAT AI-powered challenge to human players across many games.

I occasionally play strategy games and I am always frustrated by a very stupid AI that gets challenging only by beefing up its stats. I would love to play against a smart AI.

I don't do multiplayer because: I like to play offline and/or whenever I want, and I play so rarely that most other human players wouldn't enjoy playing with me.

Open.AI team: please listen!

Yes an actual good AI in Civ would vastly improve that game for me.

Can someone ELI5 what the significance of this is?

Why was Dota chosen as the game for an AI to get good at?

Is Dota more difficult than Go? Why or why not?

DotA is substantially different compared to Go in the following main senses:

1) In Go, you are allowed to see the entire board and pieces at all times -- it is a complete information game. On the other hand, games like DotA have partial information because you are not able to see where your opponents are and what they are doing at all times.

2) Go, Chess and many of the Atari games are single player games. OpenAI wanted to see if machine learning can be applied in a multiplayer setting where the problem needs to be solved at a global / team level.

3) DotA has so many mechanics and strategies where you have a lot of choices to be made. One of the challenges is whether one can look at the overall outcome of the game and reason about what particular choices went into the winning or losing of the game. This (long event horizon) makes it extremely difficult to learn such models.

OpenAI interfaced with a complicated game like DotA through an API provided by Valve which made it lot easier. Instead of seeing the game screen, they got snapshots of data of around 35KB per observation (co-ordinates of heroes, creeps etc). In the absence of this API, they would have had to use substantially more computational resources to render the in game graphics and this would also make the training process extremely slow.

This benchmark was an experiment that demonstrated that tackling such a class of problems is indeed possible (given a lot of computational resources and an environment to train in). During the interviews in between and after the games, they mentioned that the algorithms that they have used can have many applications in all fields a̶l̶t̶h̶o̶u̶g̶h̶ ̶s̶p̶e̶c̶i̶f̶i̶c̶ ̶e̶x̶a̶m̶p̶l̶e̶s̶ ̶w̶e̶r̶e̶ ̶n̶o̶t̶ ̶p̶r̶o̶v̶i̶d̶e̶d̶ @crsv's comment describes the example they talked about.

Perhaps an ELI15, but hope this helps.

In short, it's significant progress in the OpenAI team's quest to build really effective self trained artificial intelligence. Conquest in a problem space like this has real world applications that could translate to things like extremely effective artificial limb dexterity (according to the talking points from the event).

Dota 2 was chosen because it's a nice combination of pre-determined rules (to an extent) and an extremely complex problem set of possible moves and actions. Dota's development team also was really supportive to this effort and they have an API that suits the Open AI team's purposes of interacting with the game really well. Great fit all around.

Dota 2, especially when played 5v5, is orders of magnitude more complex than in terms of possible decisions and moves than Go.

Two major reasons:

The first being that Dota is a team game which requires teamwork to win. This presents a new challenge of having actors work towards both personal and team goals in a balance (much like real players) to be able to win, which is very difficult to train.

The next is that Dota and Go are two very different kinds of games. Go is an "information complete" game, where all players have access to the entire game state at any given time. Dota, on the other hand, is an "information incomplete" game, as teams are vision restricted: there's no guarantee on the state of anything out of vision, meaning that the AI has to develop what most players call "game sense" in order to be effective.

On a tangental note, it's also an interesting problem from a state space perspective. Go is technically "solvable" to a point where you could (with a currently unobtainable amount of computing power) find an optimal move, but Dota is almost unfathomably more complex: 10 heroes picked from 115, each with the ability to hold any combination of 9 items from about 150, with abitrary health and mana values, at abitrary positions on a large map, not even mentioning the non player units (creeps and neutrals). If Go's state space is our solar system, already a difficult scale to comprehend, Dota's is the whole galaxy.

[Typed on mobile, apologies for any typos.]

> The first being that Dota is a team game which requires teamwork to win. This presents a new challenge of having actors work towards both personal and team goals in a balance (much like real players) to be able to win, which is very difficult to train.

I don't think this is actually a significant problem for an AI, because each AI 'player' will be the same copy of code, thinking the same way. They don't need to explicitly communicate if they have the same thoughts and expectations.

They wrote about it here (The Problem section): https://blog.openai.com/openai-five/

> Is Dota more difficult than Go? Why or why not?

For AI, yes. It's real-time and there's a fog of war (no perfect information). The former means that the possible action/choice space is vastly larger at any given moment in time, and you have to draw a line between small/meaningless differences in actions (e.g. a one-pixel difference in movement target generally won't be significant) and meaningful differences. The latter means that the AI needs some kind of mental predictive model where it guesses at what the opponent has likely done in the time that their actions have not been visible.

To add to what the others have already said:

One interesting thing about Dota is that it is a slower-paced game which isn't heavily reliant on twitch reflexes. To win the bot needed to show good teamwork.

Dota2 also has a developer-friendly "bot API" and replay system. I'm certain these affected their choice of game as well :)

How much of an AI’s advantage is ”I/O” related and how much is actual strength in strategy/gameplay?

Are the AIs actions rate-limited, delayed, and perhaps time randomized a bit so that an AI doesn’t get an advantage because it can to actions faster or better synchronized to exact times, or more responsive (say to avoid any action under a certain reaction time, all actions are scheduled 50+rand(5) after the AI queues them)?

5 thousand lifetimes of play versus .45 lifetimes, I think the outcome was predictable!

yeah, but you gotta add hundred + millions of evolutionary learning to model the physical world and the actions of our peers, built in to our (mammal) brains

Why did i miss it?! :(

I think first it was planned for 28th now i read it one day later :/

It was originally scheduled for July 28th, but was rescheduled to yesterday. Well, at least you can still watch the twitch vods :/

That said, you still have a chance to catch the match against a professional team that is going to happen sometime between the 20th and 25th.

Is the ai forced to use voice communication and chat to coordinate bots?

there is no communication at AI between bots afaik

Would you be interested in developing an agent for Hearthstone?

Already done, sort of. Blizzard was not amused, as botting makes the game less fun, so it wasnt released.


There's a substantial community around Hearthstone AI: https://hearthsim.info/

Did you think the AI would win despite the adversarial draft?

I'm curious to know if the AI can win with a 50-50 draft. It can be argued that the main reason the humans lost was because they didn't have a feel for the meta of this 18-hero version of Dota, while the bots did. What if you give the humans an equal chance at the game wrt the draft? That would have made for a much more interesting game than having the humans play a game they've never played before in their lives.

When they demo'd this over a year ago, they gave the human player a laundry list of instructions and the bot was still defeated once the keen human player used their intellect to analyze what was going on and 'break the bot'. The fact that the bot didn't recover or understand how it was being broken examples the true nature of what's going on : It's just a dumb bot w/ lots of training time. If intelligence were present, it too would understand like the human being what was going on and change things up. This is actual the crucial thing that needs to be demonstrated and targeted but you see .. This gets in the realm of the class of 'hard problems' that a number of these well funded and popularized groups aren't attacking.

As for Dota, indeed the broader game was designed and is constantly updated to be an intellectual challenge. Cheese strategies exists. When discovered, game rebalances are conducted to ensure players don't settle in on brute force exploits.

In just 5-10min of watching some of the matches, I already know a handful of characters that would 'break the bot' on this 5 bot ensemble. This is powered by human intellect (the truly significant part). The bots trained across time horizons greater than a human lifetime in this unorthodox and slanted arrangement. Human beings seasoned on a more well rounded and balance game are then thrown to the bots and showcased. I know who would be sold on this and its not your average Dota player.

Again, the fact that it is possible to 'break the bot' examples that the whole thing is a charade. I have no doubt they might have gained some experienced/understanding with engineering this solution but the actual result is lipstick on a 'backpropaganda' pig.

More fundamental deep research and less theatrics is in order. I'd also advise becoming seasoned in the thinking behind a particular game if you're trying to test and develop an artificial form of intelligence to navigate it and beyond.

It seems likely that actual pros instead of a mismash of commentators and community stars would be able to win game 2 even with the draft favoring the AI.

For what is worth, those commentators are actually very good players. Every year they team up in the qualifiers for the world championship and they usually make it very far.

I'm an active player with enough hours logged to interpret the game. That being said, the observations should also be obvious to even the most novice player. I don't need a commercialized and obviously favorable and biased stream of commentary clouding a much more sound analytical capability. I'm pretty sure they're not going invite commentators that point out the obvious reality as to what is going on. Per the commentators, bot like behavior gets extra positive humanized characterization and is marked as intelligent. Whereas snowball-cheese is known as the lowest tier and least intelligent strategy in the game. There's even a set of memes in the community for people who attempt such strategies.

People need to start doing more critical analysis of their own and stop relying on commercialized and biased information and commentary when settling on an viewpoint. Watching others play only gets you so far in understanding. When you play and you see what I'm saying for yourself, you can skim through the provided clips and understand exactly what's going on.

The fact that this continues to get hyped up vs someone stating what's going on is plain sad.

This was my first reaction as well. It would have been more interesting if the AI5 bots drafted to minimized distance from 50/50 odds, not to maximize odds of winning. This puts more emphasis on the ability of the bots to play the actual game, not the draft "game."

For a research organization openai sure seems to be putting a lot of effort into PR. Beating players using a subset of the game is impressive but there is no mention whatsoever of the fact that only a subset of the game is allowed. The research is cool by itself, why do they feel the need to promote so heavily that the AI is beating human players "in the first 17 minutes of" rather than giving a realistic picture of their progress. This is how you create an AI winter.

I think the third game would probably have been a lot more tense, if they'd let the OAI5 make the last one or two picks by itself.

I bet they weren't quite sure what they had to do to make the game even. The previous rounds were hugely in favor of the ai and it sounds like they don't have much experience playing against humans at this level.

I think they wanted to show that the humans could win one

I like how their priority is not removing the significant restrictions of couriers, items and heroes, rather playing a team of professionals on a "custom game" based on dota heroes

If you watch the match or read their blog, you’ll see they frequently say removing those restrictions is their top priority. They simply wanted to see what level the AI is already at by playing top-level humans for the first time. Hence the name “benchmark”. In fact, they have been removing restrictions at a fairly rapid pace already.

the goal is scientific pursuit, are you saying this is not meaningful?

If that were the goal, why would they do so very little sience and so much PR?

As a seasoned player of Dota watching the replays, I can see exactly what's going on. There's lots of snowballing/cheese which are the least common denominators of gameplay and least intellectual. Such dynamics would no doubt be dominated by a bot like program which is why they choose them. I could imagine that people who aren't seasoned players are impressed by this but this reads like an average wood league "south" game. People who play Dota will know what I'm referring to in reference to "south". In such games, the strategy is simple : You beat out the brute-force cheese snowballing w/ intellect. However, this is where they no doubt decided to put up the artificial red tape.

You're 100% right and being down voted for it as I no doubt will. I am not impressed because I've played a ton of games and observed a ton of dynamics. You can tell in the first 10min what level and type of game play is occurring and it isn't of the intellectual variant. It's of the bot variant. Broader Dota was designed and constantly updated to combat such gameplay. When people settle on cheese strategies or exploits, they intentionally change the game balance to prevent it.

I'm more interested in how this is achieved than what it has been tuned to try to trick people into believing. Looking under the hood, I see the same thing when I look across all Weak AI solutions : Lots of hand coded steering functions with dynamic weights populated by incredible amounts of brute-force random searching (past a human lifetime of average gameplay).

While this may secure headlines and funding, this is miles away from the real direction you have to go towards Strong AI. It doesn't seem many well funded/popularized groups understand this thus will ultimately end up fooling themselves. Point out this truth and you get down votes. Essentially the broader community plugging their ears which is why a large number of these efforts are due for a significant failure.

I downvoted the parent for saying that removing these restrictions is not OpenAI’s priority. That’s incorrect and shows the poster didn’t watch the video or read their blog posts. Nobody’s claiming OpenAI has created the perfect Dota bot. Again, “benchmark”.

As for your point, we’ll have to see what things look like as OpenAI continues to rapidly remove these restrictions! Or you could provide us some literature on the broad strokes you’re making.

If it were a priority vs other items, it would have been done. It's not a priority and one of the last things on the list because doing so exposes their Bot to a much broader range of intellectual challenge as per the game design. Now, without sound rebuttals, I too have been downvoted. This is why a certain aspect of the tech community will be forever resigned to Weak AI masquerading as strong AI. You aren't open to new ideas or challenges and mask the glaring issues of a particular approach behind canned demos.

My points stand on their own. I take downvotes w/ no sound rebuttals as a signal that I'm saying something that right and an inconvenient truth.

> Or you could provide us some literature on the broad strokes you’re making.

I began my work by not pretending that weak AI is something that other than what it is and prioritized the more challenging aspects of AI

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact