Each paper should start with unambiguous description of:
1. What are the inputs of the model.
2. What are the outputs of the model.
3. What is the overall size of the model. Size, not parameter count.
4. What part of the domain has been manually encoded into the architecture and what has been learned over the training period.
5. What are the restrictions on the domain compared to real life.
6. How the performance is evaluated.
This should be on the first few pages. I.e. the descriptions of what the model does should precede the description of how it does it.
> OpenAI Five won 99.4% of over 7000 games.
The players in those remaining percentages played repeated rounds against the AI and eventually started winning more often than not. The AI had only one strategy (deathball) and once top-skill-tier players learned how to play against it, they had a >50% chance of winning.
You don’t say something is misleading if it isn’t true. You say it is untrue.
Like, if you took a world-class team and had them play against random opponents I'd be surprised if they lost more than one game in a hundred.
In comparison to chess, according to the ELO guidelines where a beginner has an ELO of 800, an average player is ~1500, a professional is ~2200, and only four people have a rating of ~2800, then we'd expect Magnus Carlsen to almost never lose against an average player, and to win around 99% of games against a low tier professional player.
> 3. What is the overall size of the model. Size, not parameter count.
What's the difference?
1. They should play the full game without restrictions.
2. They should have the same input and output as humans. So a direct link to the graphic and sound cards outputs as input. And a direct link to USB as input. (Let the ai be a mouse and keyboard driver).
I don't think bot should have any artificial delay under these circumstances.
Isn't this a bit of a leap though, it came with massive caveats to the game:
1. Drafted from a 17 hero pool. 17 out of 115??
2. No summons or illusions. Again drastically reduce possibilities in game.
3. This AI is trained against real players for years, so it has enormous experience against this type of opponent. The opposite applies to humans who never compete against bots, so have no experience against this type of opponent. If I recall correctly the more people played against the bots the better humans performed in successive games. Winning two games with all these restrictions and caveats is still impressive, but it feels like they overstate things. Not to mention the flawless mechanics and communication between the bots...
Drafting from that pool, item and skill builds, last-hitting, creep aggro, laning in general, jungling, item and spell usage, ganking, team-fight positioning, pushing objectives, warding, map control, farm priority, when to retreat vs engage. All of these require an understanding of micro vs macro goals and how they relate.
Surely this qualifies as "a difficult task."
And that's being said by someone who is on the worse send of bell curve :)
AIUI the bulk of the training was self-play.
> Not to mention the flawless mechanics and communication between the bots...
The bots had no communication channel.
> The opposite applies to humans who never compete against bots
That was covered by the openai open matches. humans could play against the AI for several days and find exploitable flaws. Most didn't, a handful did. That is pretty impressive considering that humans can learn while the AI is frozen during those matches.
Maybe but it also has played against humans in many instances across many years time span.
> The bots had no communication channel.
The execution between them was flawless though, acting as one mind whereas a team of human has to communicate ideas. It's a clear advantage, but it just feels hacky in a way since it's not really comparing the same thing. 5 humans are a diverse group of people. Maybe it's not fair to knock the bots for this behavior though.
> That was covered by the openai open matches. humans could play against the AI for several days and find exploitable flaws. Most didn't, a handful did. That is pretty impressive considering that humans can learn while the AI is frozen during those matches.
I'm talking about playing against OG, who didn't spend days playing against bots. Beating regular players is great, but not the accomplishment they are portraying. The bots had a pretty specific playstyle, and if pros had dedicated time to beating them I think there would be different results.
Those are evaluation matches, they don't feed into the training data.
> The bots had a pretty specific playstyle, and if pros had dedicated time to beating them I think there would be different results.
Well, the pros could have joined the open games too, I don't know if any did.
A team of humans that has played together extensively will often intuitively know what the other human will do in a given situation without explicit communication. The fact that humans can also coordinate in unusual circumstances where intuition is insufficient is an advantage that the current version of the AI lacks.
In any case, simplifying the game is usually done to make training far cheaper.
"We removed these to avoid the added technical complexity of enabling the agent to control multiple units."
So maybe it could be to their advantage, but it's also not something that could be easily technically accomplished at this time?
Well, anyway, I am OK with some degree of PR additions...
The two main measurable parameters of performance are:
1 - reaction time
2 - rate/volume of actions (i.e. Actions Per Minute)
And I would argue some there should be an additional consideration of some form of:
3 - mouse-click accuracy
I read through the details of the implementation, and they did decent at 1, 2 but overall need to do better.
Their reaction times end up as a random draw between 170-270ms. I think raw, simple visual reaction times for a pro gamer could be ~200ms, BUT that’s just for a simple “click this button when the light changes” type of tests. There are “complex reaction time” tests where you sometimes click, but other times don’t (eg a red or green light), and reaction times in that case are around ~400ms. I think if a pro is in a game situation where they anticipate their opponent will take some action and are ready to immediately respond, 200ms is a fair reaction time. But that’s not the usual state through a game, and the bot effectively has that perfect anticipation mindset at all times. So not crazy, superhuman reactions, but definitely not completely realistic/fair either.
In regard to action rate, they allow the model to take 1 action every 7.5 ms - which translates to 450 APM. The very best pro gamers are in the 300-350 APM range. And i think a humans actions include various thoughtless click spamming (which AI doesn’t need to do), as well as visual map movement/unit examination that an AI would not need as much of with a direct, comprehensive feed available information. So the sustained 450 APM seems pretty superhuman to me - BUT dota 2 is much less of a APM intensive game, and certainly sustained APM isn’t as important. And humans get get higher APM in important burst moments whereas this AI is at an exact fixed rate of 450 APM. So all-in-all, the APM is maybe fair (at least close to fair)
The mouse click accuracy piece, however is pretty unfair if the ai can make precise clicks across the screen with no affect to reaction time. This factor isn’t considered at all by the AI team. I feel they should either add in some randomization to simulate inaccuracy, or cause delayed reaction time based on how far the mouse would have to move.
With all these factors combined - I still feel this is not quite a fair test. But it’s closer than other’s I’ve seen, and it’s still a very impressive overall achievement! I’d love to see them go the small extra distance of constraining these mechanical performance parameters just a bit more. I feel that would make a BIG difference in the level of strategy required to beat the best humans. They’re SOOO close to amazing me!