and one history of the project
- OpenAI's DoTA 2 system wasn't playing the full game. I think the final version could play 17 of the 117 heroes, and the opposing human players were also restricted to playing this subset of the game.
- DeepMind's StarCraft II system reached a level above "above 99.8% of officially ranked human players.", so it isn't trivial to argue that this amounts to defeating top players.
The bigger issue in my eyes was that while OpenAI 5 defeated the world champion team OG, when they let anyone in the world fight it, some ingenious players figured out a pretty robust method to consistently exploit and defeat the bot. As I haven't heard any buzz about OpenAI 5 since then, I think it was more or less unsuccessful unless they can show that their training method produces unexploitable bots (instead of bots that are really good against certain strategies)
I am not sure if the training is done live or not--that is does the algorithm learn based on each game against a real, live player? Or do they just train the model offline, then allow players to play against the static model?
When it was first demonstrated, it really looked like it was doing very smart things (while also taking advantage of the fact that it doesn't have attention lapses and hand fatigue) and reacting well to different strategies on a level that was freaky to me.
Something humans hadn't thought of yet, but were also psychically capable of doing.
Too me that will be the true moment that the AI has really surpassed us.
But still really damn handy!
The question is, can it then apply partial applications of learned techniques to create unique offenses and defenses in novel situations I think.
I guess you could phrase it that way, but that's essentially the problem statement for developing a strategy for an imperfect-information game. So I would say it is a flaw in the principle if their final output is exploitable.
In both cases, it was indeed a static model but more recent work which is called MuZero is not static and achieves great results in board games and atari.
And a few games of being exploited is nowhere near enough data for the AI to be re-trained.
Also, did the StarCraft AI have to move the camera around? I remember watching the show matches. In those the AI lost the match that it had to handle the camera and couldn't just give impossible orders that a player's interface would not allow.
As another commenter remarks, there are holes to plug in terms of exploitable behaviours that are locked into the model, but this too I'm confident they will find a general method of preventing; on the other hand, it's not like humans aren't susceptible to similar exploits by competitors in situations where they decide to cease innovation/learning
The problem is, I don't think there is a "general method of [prevention]" because that's not how neural networks work.
It's not easy to fix things like this because you can't just say "yeah just don't do that dumb thing anymore", the network has to be re-trained to learn the exploit.
The way DeepMind tried to get around this is by having a league of AIs playing against each other which try to exploit each other and expose their weaknesses. It worked pretty damn well, but people still found ways to exploit the AI.
If there's an exploit that's sufficiently rare and unpredictable, then that seems like the only way (and indeed it should be a sufficient way, if done right) to address it.
It hasn't worked yet in Starcraft because the strategy space is so much larger and the action space is also much larger. The networks are too small relative to this space, and humans can still put the bot into a situation it can't handle.
I'm going to guess that Starcraft will end up like these other games once the hardware etc advances another 5-10 years, and we'll have an unexploitable bot. The main reason I'm thinking this is we have unlimited training data, unlike with self-driving. We can make the models arbitrarily good.
The bot still won't have an ounce of common sense beyond what it's trained to do. It's just that it will have been so exhaustively exposed to every nook of the search space that a human won't be able to find any exploits.
However, it doesn't follow that it's easy to extend to beating top players consistently. If that was the case, it probably would have been done.
AlphaStar was nerfed quite heavily to achieve near APM parity with humans. The early version that beat Liquid-TLO had superhuman spikes in APM (despite having the same mean APM) but they addressed that later. The bot's APM is now significantly less than the best humans' typical APM, which makes it roughly fair since the bot never misclicks.
AlphaStar is legitimately good at strategic reasoning, strategic planning, responsive build orders, responsive scouting, long macro games, etc. It's not the best in the world at these things yet (only top 1 percent+ level) and it does still rely on cheese build orders a lot, but still, what has been achieved is incredible.
This isn't really true, it still has a heavy mechanical advantage.
I'd ballpark it at low or mid masters for overall strategy and tactics, top tier for 'reading' an incoming fight and deciding whether or not it should take the fight, and superhuman on mechanics/control.
Also I don't agree that it was low masters for strats or tactics. You can't beat GMs with worse/same mechanics unless you have good strats. Besides, low masters players are pretty bad and the replays show that this bot had super tight and highly optimized builds, it was a big fan and good at early to midgame cheeses.
The main valid criticism that I remember is that it wasn't the best at long games when the utility of its sharp build wore off and it needed to think on its feet, but I don't recall the details.
That AI's can beat human players via superior interface control is obvious, of course, and uninteresting from an AI standpoint. Starcraft has had AI's with perfect roach/stalker/marine/etc control for a while. The problem was that the overall strategies weren't good enough.
AlphaStar did make massive improvements there, to be sure, very impressive ones. But it still relied on out-controlling human players to get the edge against pros.
Unlike Alphastar, these AIs are intended primarily to play each other because as you say inhuman perfection in execution is not an interesting difference. This means it makes sense for them to exploit behaviour in the game itself that would be inaccessible to humans (e.g. "speed mining" by individually controlling every worker) as well as executing ludicrously multi-pronged mid-game attacks since they can just as easily manage six individual small battles as one larger frontal assault.
That site links a Twitch channel which automatically plays random games between bots with auto-camera, but if you prefer human commentators (and as a bonus, speeding up the period when the game is clearly lost but bots rudely never resign since it's not as though politeness scores points) there's https://www.youtube.com/watch?v=oLpEzq_6_go which is the next ESChamps tournament cast later on Thursday.
Since there is hidden information, you could always miss a corner of the map where the enemy hidden some units and you lose the game.
Is Alphastar "perfect"? no.
Is it better than 99.9% of all humans? absolutely.
You don't need to create a perfect agent in most cases, self driving a classic example.
If you were to deploy an agent that drives 95% better than all humans the effects would be huge.
It would still fail in some scenarios where professional drivers won't be it doesn't really matter because most people are not that.
Let me give you an example . When AlphaStar was playing on the ladder a player in Diamond league (~70-80th percentile) beat AlphaStar easily using mass Ravens. If you're not aware of the strategy, it's a turtle strategy where the player masses air units and is generally terrible.
But AlphaStar was confused by the strategy, and so it lost by a large margin.
Deploying an AI which can be exploited like this is asking for trouble.
It seems to you weird but the same agent probably wins against GM's most of the time. humans have weaknesses too.
The AI simply leans on its strengths just like humans do.
This is the whole problem though. AlphaStar beats GMs but can lose to weird strategies.
On the other hand, GMs will almost never lose (Most likely >99% winrate) to a Diamond player no matter how weird their strategies are.
The AI has strengths, but it also has glaring weaknesses. Imagine if you had an AI flying a plane and 99% of the time it was far better than a human pilot but 1% of the time it crashed and killed everyone. I would not fly on that plane.
Maybe a bunch more training data and time would solve this type of problem, but I'm skeptical.
First of, no human player achieves 99% winrate against diamond players. there are many cheeses, one miss-step and you lose. GM's can lose to Diamond players.
Now for the main part, you're saying and I'm rephrasing here:
Even if the AI is statistically better than humans because it has some weaknesses I'm going to prefer the human.
But still at the end of the day, the AI does a better job on average and will be safer to use than human pilots!
We already heavily rely on software\algorithms for our most important things. all modern vehicles use electronic systems that monitor\manage several key components, stock market is heavily managed by bots.
If AI can do a significantly better job than human, I would choose the AI, even if it behaves strange in that 0.1% of cases. humans are not as reliable as you think.
They definitely would. You underestimate the difference in skill. Top players almost always beat other GM players and maintain very high winrates in top GM.
See for yourself: https://www.nephest.com/sc2/?season=46&queue=LOTV_1V1&team-t...
> But still at the end of the day, the AI does a better job on average and will be safer to use than human pilots!
I agree, but only if that 1% or 0.1% or whatever is not exploitable by someone malicious.
We need sufficient quantities to claim 99% winrate, for highly ranked players even with 200 games(which is still a low number since a single loss can massively affect results) are not even close to 80% winrate. probably with enough games it will be even lower.
Maintaining 99% winrate is extremely hard as you can only lose a single game out of 100. people get tired, try new stuff, simply don't pay attention or just get caught off guard by a new thing.
As for "malicious exploitation", it does poses a risk in some environments but the question then becomes exactly the same.
Is the AI less exploitable than the average person?
If so, it doesn't matter.
People are generally not exploitable in the same way an AI is because we can subjectively assess situations and learn on the fly.
This is a good example of why I think your argument doesn't hold water: https://twitter.com/nikitabier/status/1372726911105855488
On the 99% winrate, I feel like you're either being purposefully obtuse or have no experience with competitive games.
Majority of the winrates are >70%, but even 60% is insane for a competitive game especially at the very highest level. It is ridiculously hard to maintain a winrate this high even over 30 games.
You seem to be thinking about this from a statistical perspective (I.e. moar samples) without realizing that this is baked into MMR (You're matched with opponents as close to your skill level as possible). These players have to maintain high winrates just to stay at this MMR because they can earn as low as literally 0 MMR for a win and lose up to 60 MMR for a loss.
These players are also around 3000 MMR higher than Diamond players. Using the Elo model , this equates to a 99.998% winrate.
100 games in a row is also not feasible. That's ~20 hours of playtime assuming 12min games.
But it's not as noteworthy as implied on the path to AGI.
A diamond player that has mastered one weird cheese will absolutely take more than one in a hundred games off a GM - even off a tournament pro. Even Serral chokes in way more than one in a hundred matches.
Complete bullshit. You don't know the game at all if you believe this. The person you're arguing with is well known over on the Starcraft subreddits. Maybe listen to them.
In practice, hidden information cheeses simply aren't enough for a diamond to take a game off a top pro, even one time in a hundred. They'll sniff it out sufficiently and then just outcontrol the fight every time.
The worry isn't about perfect winrate, it's about finding strategies that can consistently cause the AI to lose over and over.
In a cooperative environment, a high percentage is great.
In a competitive environment, that .1% of scenarios where it's really weak will suddenly become the majority of games it faces.
I'm not sure its been proven that the most successful overall strategy is unbeatable. Besides perfect skill, you still need to worry about all in strategies. In my eyes, losing to cheese could still be possible even if you're the best overall player.
I do think its fair to say these AIs should be able to grow after losing to a strategy once.
Of course it is. But do the same cheese to a top player over and over and it will rapidly become ineffective, usually within a game or two. Each AlphaStar agent can just be exploited endlessly.
AlphaStar basically got to 'good enough' strategy, then won on control, which computers obviously have a massive advantage with.
Limiting the number of playable heros in DoTA2 really isn't important when it comes to evaluating the skill of the AI. Most real players trying hard to win already play with a limited hero pool dictated by the current patch verison.
Drafting can be massive when deciding the outcome of games even at the lower skill levels. Opening up the entire hero pool just means you're largely evaluating the ability to draft in the current patch more than actual playing ability.
It would not surprise me if the OpenAI 5 would have lost against OG had the entire heropool been available, had the series been long enough or had the prizepool been big enough for OG to take the game seriously enough to warrant picking a splitpushing strategy (which is considered cowardly in some circles).
Tinker is an example of a hero that took (and still takes) a lot of gold to useful. If you're drafting tinker to stop early game push strats you're going to have a bad time. If anything Tinker is actually an example of an hero that the AI would probably excel at, the same goes for any micro dependent hero.
In general including heroes that excel with great mechanics is a poor way to evaluate how well the AI actually plays the game. No one doubts an AI can send inputs faster than humans.
the game is balanced around the entire roster. in a pool of 17 heroes, the answers to the locally optimal strategy likely don't even exist within said subset. drafting proper compositions, adapting to the opponent's heroes, and dynamically changing your gameplan during the match are all part of actual playing ability.
No, the rebuttal is how can someone who has never played the game before claim x,y,z is important or not?
I found some similarities with what occurred with Deepmind's Alphastar AI.
One of the weaknesses that seem to manifest in this piece too is the handling of unfamiliar scenarios.
The AI is very confused once it experiences something that was rarely seen in its learning data. Destroyer's big drones confused the bot quite a bit.
Deepmind solved it by intentionally creating agents that introduce different\bizzare strategies(which they called exploiters) in order to develop robustness against such strategies.
Completely agree that adding something like the "League" used by AlphaStar would be one of the top priorities if you wanted to push this project further. I don't think CodeCraft is sufficiently complex to really allow for several very distinct strategies in the same way as StarCraft II, but I would still expect training against a larger pool of more diverse agents to increase robustness quite a bit.
Trying random stuff just sounds stupid but with enough compute and data, I guess it could overpower smart creatures like us.
I agree that CodeCraft is vastly simpler than StarCraft but the idea is the same. just try random stuff(sometimes with better logic behind it) until something works and then optimize it to perfection.
>Seeing as my policies are currently the world’s best CodeCraft players, we’ll just have to take their word for it for the time being.
I really hope this inspires some competition! How long until there is a leaderboard? :)
Do you have recommendations to learn more about RL? Is CodeCraft a game?
I quite like https://karpathy.github.io/2016/05/31/rl/ as an introduction to some of the ideas behind modern RL. Beyond that, I just recently found out about https://github.com/andyljones/reinforcement-learning-discord... which lists a lot of other high-quality resources.
CodeCraft is a programming game which you can "play" by writing a Scala/Java program that controls the game units. It's not actively developed anymore but still functional: http://codecraftgame.org/