Sometimes, poker is all about the bluff. Make the table believe you have a full house when you really have a low pair, and it can pay off big time. Read your opponents — a grimace here, a smirk there — and bet accordingly.
It’s not a skill you’d think computers would be particularly good at. But new research published in Science today shows that A.I. can learn to respond to fibs without needing to even see anyone else at the table, and outwit the best human poker players. It’s a development that may have implications far beyond the casino.
A poker-playing bot called Pluribus recently crushed a dozen top poker professionals at six-player, no-limit Texas Hold ’em over a 12-day marathon of 10,000 poker hands. Pluribus was created by Noam Brown, an A.I. researcher who now works at Facebook, and Tuomas Sandholm, a computer science professor at Carnegie Mellon University in Pittsburgh. (The two co-authored the paper in Science.)
If each chip in the experiment were worth a dollar, Pluribus would have made $1,000 an hour against the pros, according to Facebook, which published its own blog post on the research. (That haul greatly exceeds what experienced pros could expect, even playing at a table that included some amateurs.) Brown conducted most of his poker research while earning his master’s and PhD at Carnegie Mellon from 2012 to 2019, but he’s worked at Facebook for the last nine months and joined the company full-time in June — part of a wave of A.I. academics being hoovered up by tech companies.
“I think this is really going to be essential for developing A.I.s that are deployed in the real world.”
Cleaning up at the poker table isn’t the ultimate goal of Brown and Sandholm’s research, though. The game is really a simulator for how an algorithm could master a situation with multiple deceptive adversaries that hide information and are each trying to pressure the other to quit. A.I. can already calculate probability far better and far faster than any human being. But poker is as much about coping with how humans lie as it is about reading the cards, which is exactly why it’s a useful game for A.I. to learn.
“I think this is really going to be essential for developing A.I.s that are deployed in the real world,” Brown told OneZero, ”because most real-world, strategic interactions involve multiple agents, or involve hidden information.”
This isn’t Brown’s first time bringing an A.I. to the poker table. While working toward his PhD at Carnegie Mellon in 2017 under Sandholm’s tutelage, he debuted Libratus, an earlier poker-playing bot, which handily defeated human professionals in no-limit Texas Hold ’em games played one-on-one.
The new bot, Pluribus, doesn’t adapt to other players at the table — it won’t try to understand how John and Jane play the game differently. It doesn’t have a tell — a sign that they might be bluffing or in fact actually have a good hand — and it only bluffs when it’s calculated that it’s a sound strategy, statistically speaking.
“People have this notion that bluffing is this very human thing where you’re looking at the other person and the other person’s eyes, and trying to read their soul, and trying to tell if they’re going to fold or if they’re bluffing right now,” Brown told OneZero. “That’s not really what it’s about. It’s really a mathematical thing. Bluffing is all about balancing betting with good hands with betting with bad hands, so that you’re unpredictable to your opponents.”
While most games that A.I. has mastered so far — like Go and chess — can be endlessly complex, what they have in common is that all the information about the state of the game and the players is visible for everyone. Poker differs because you don’t know what your opponents have in their hands. It’s as if your opponent’s king and queen could be placed anywhere on the chessboard and be made invisible. Since you don’t know what your opponents know, you can’t easily predict how they’re going to act, or why they’re making certain decisions.
A.I. typically thrives when it not only has all the information necessary, but has seen a certain situation before. Google’s self-driving cars can operate because Google has thoroughly mapped the locations they’re driving in. Image recognition software like Facebook’s photo-tagging A.I. learns how to tell dogs and cats apart by looking at millions of images of each animal.
But poker is a game of edge cases and hidden information — rare situations that are statistically improbable, all lined up in a row. Any of the five other players at the table could have nearly any combination of cards at the beginning of the hand, and each player can bet nearly any amount of money. There are so many combinations of potential bets that Brown and Sandholm had to make tweaks to reduce the complexity of the game the bot can perceive. For example, they “bucketed” similar bets, like $200 and $201, to make the bot more efficient.
The way the Pluribus was trained, however, was much like many other game-playing A.I. It played against itself millions of times, making moves completely randomly at first, until it slowly figured out which moves would result in positive outcomes. It does this by tracking what the researchers term “regret,” meaning it traces other potential outcomes in a hand and comes up with a score of how much it “regrets” not taking another specific action. These regret scores are additive, so the more the algorithm doesn’t take the right action, the more it regrets it. These regret scores are then used to take the action it “regretted” not taking more often in future games.
Facebook gives an example of a training hand where the bot has two jacks. The opponent checks, so it checks. Then the opponent bets. The bot calls, or matches, and it turns out the opponent has two kings. The bot loses. After the hand, the bot simulates what would have happened in variations of the same hand.
Replaying the hand, if the bot had raised the bet instead of matching it, the opponent would have folded. The bot would have won. It “regrets” not taking this action, raising the regret score, which means in a similar situation it will raise more in the future.
When the bot is actually playing the game, it uses a series of other mechanics to balance its style of play. That includes considering how it would act if it had every other potential variation of a hand.
This is all useful for A.I. well beyond the poker table, because people in the real world can and do lie, just like they do at cards. They can behave irrationally. They can make mistakes. Imagine a near future with self-driving cars on the road. Google’s car might approach an intersection, where it stops to let a human driver through. That human driver could start, then accidentally spill coffee on their lap and come to a sudden stop to frantically wipe it up. Distracted, they start driving again before realizing — whoops — they’re in an intersection, so they suddenly brake again. That’s a lot of mixed signals for the A.I. behind the self-driving car: It’s like a bluff.
In this instant, Google’s car now has to operate in a situation where it can’t trust another driver on the road. It doesn’t know what’s happening in the person’s car — why it stopped, when it will go again, whether it will stop and go again in the future — but it has to take some action. The same problems could arise when the self-driving car is going around blind turns, or in heavy rain — both situations that would degrade the information from which it can draw.
An algorithm that doesn’t accept everything it sees as truth might be helpful.
A similar example could be drawn with Facebook’s own News Feed, where the company’s myriad bots trawl user content to tag, categorize, translate, and prioritize it. You can imagine how it might be useful for a content moderation bot to make better decisions with limited information if a user is trying to bypass anti-spam filters or upload banned images, for example. A moderation bot might also have to contend with other bots on the platform that are trying to post problematic content.
“If you’re deploying an A.I. system in the real world, it’s interacting with other A.I.s or humans,” Brown said. “There might be situations where [another] A.I. might be trying to behave in a deceptive way or dishonest way. And the A.I. has to be able to cope with that if it’s going to be effective.”
When the ability to distinguish between truth and lies is a fundamental enough issue to bring tech executives to Capitol Hill, an algorithm that doesn’t accept everything it sees as truth might be helpful.
Of course, this isn’t a solution to fake news or a promise of a new day at Facebook. But it might be a tool the company could use in the complex, never-ending war to understand and manage the unprecedented amount of information its users generate.
With this tool tested to the limits in poker, Brown will now move on to other problems that can be solved by game theory inspired algorithms. “I think this is really the final major challenge in poker A.I.,” he said. “We don’t plan to work on poker going forward. I think we’re really focused on generalizing beyond.”