DeepMind’s StarCraft Bot Has a 191-Year Head Start on Humanity
AlphaStar learned from half a million human games, then played itself 120 million times to master its technique
DeepMind, Alphabet’s A.I. research firm, has built an artificial intelligence system capable of defeating a vast majority of the world’s StarCraft II players, according to research published Wednesday in the journal Nature.
The DeepMind team debuted AlphaStar, its StarCraft II-playing bot, earlier this year in show matches against top esports professionals. But the new research details secret matches held this July with players who opted into being randomly matched against the program.
DeepMind deployed three versions of AlphaStar, which each learned the game in a slightly different way. StarCraft II players climb a ladder in the game’s highly competitive multiplayer modes, attaining different ranks depending on their skill level. The first two versions of AlphaStar were good enough to reach the highest tier of play, Grandmaster. After 30 games as each playable race in the game — the insectoid Zerg, advanced alien Protoss, and scrappy human Terran — AlphaStar placed in the top 0.15% of players in the European region.
StarCraft II is a competitive video game defined by its complexity. Each player is tasked with growing an army, building structures to further their offensive, defensive, or productive capacity, with the ultimate goal of exploring their surroundings, finding their enemy, and destroying them. Hundreds of units must be independently orchestrated, like workers who mine resources or soldiers who target opponents with special abilities. That’s why it took AlphaStar more than 120 million games played against itself, and hundreds of years of accelerated game time, to master StarCraft II.
Every time AlphaStar makes a move in the game — selecting a unit to gather valuable vespene gas, directing an air fleet to target an invading Colossus, and so on — it’s technically choosing from 10-to-the-26th-power potential options, according to DeepMind. That can also be expressed as a potential 100,000,000,000,000,000,000,000,000 moves.
Strategies in StarCraft II are typically sorted into two categories: micro and macro. Micro strategies are how players maneuver individual units, and macro strategies have to do with how players spend resources and upgrade their armies. For AlphaStar to master this gameplay, DeepMind couldn’t just rely on the system’s ability to learn on its own — it needed human help.
For those micro strategies, researchers trained AlphaStar on recordings of previous games humans had played, which the makers of StarCraft II, Blizzard, had released to the A.I. research community.
“[Human data] basically gives a diverse base of strategies of roughly what humans do,” says David Silver, principal research scientist at DeepMind, in a press call earlier this week. “That might include some basic ideas of how to use ground units or air units, then the system starts playing games against itself and starts to change those strategies, evolve them, and adapt them to become better and better.”
It took AlphaStar more than 120 million games played against itself, and hundreds of years of accelerated game time, to master Starcraft II.
That means the A.I.’s strategy didn’t emerge from randomness, and it didn’t have to learn the basics of how to move each unit. It’s similar to a human player sitting behind a more experienced friend and watching them play the game about 500,000 times.
AlphaStar’s ability to learn from and improve on human micro strategies was demonstrated earlier this year when DeepMind first showed off the program. In one match, AlphaStar broke its units into small divisions and flanked its opponent in multiple places on the map, a move that required intense precision and coordination.
“If I play against any human player, they’re not going to be microing [units] this nicely,” the game announcer, RotterdaM, said. “What we saw there, that’s not human.”
As you might expect, training on human data helped AlphaStar anticipate how its human opponents would play, but it also allowed the program to play like a person and use recognizable strategies.
For AlphaStar to master all three playable races in the game, DeepMind created a “league” for the A.I. system to play against itself. It played 120 million games, in a labyrinthian ranking and matching system which allowed the strongest current bot for each race to be matched against “exploiter” bots, which tried to poke holes in the dominant strategies AlphaStar preferred.
In that way, the research nods to a recently popular machine learning trick called generative adversarial networks (GANs). One network tries to generate a piece of data, like an image, and the other tries to tell if it’s real or A.I. generated. The two networks “argue” back and forth, until eventually the “critic” algorithm is reliably fooled because the generated data is so close to an image taken by a real camera. If you’ve heard of deepfakes or seen incredible A.I.-generated imagery, you’ve seen the work of one of these GANs.
One of the reasons StarCraft II has been so difficult to master is that it’s an information imperfect game. That means unlike Go or Chess, players can only see parts of the map their units have explored. They don’t know what strategies their opponents are using. In this way, it’s a lot like poker, another imperfect information game that was effectively mastered by an A.I. system created earlier this year by Facebook researchers.
DeepMind sees tackling these kinds of games as a crucial step in bridging A.I. research from the realm of toy problems, simplified versions of real-world scenarios, to the real situations that toy problems are meant to mimic. In StarCraft II, the imperfect information is an obscured battlefield, but in the real world, the blind spot of a robot’s camera might function the same way.
In moving toward realistic uses of the technology, there’s also a matter of how much data is needed to teach AlphaStar, the researchers say. In addition to the human data, AlphaStar needed to play 200 years of the game against itself in order to reach its current level. Seeing as that StarCraft II was released in 2010, humans pitted against the machine are at a 191-year disadvantage.
Update: A previous version of this article misidentified the journal featuring DeepMind’s new research. It is Nature.