Games like poker or StarCraft, are Bayesian, they involve imperfect information, forcing AI to shift from certainties to probabilities—a huge leap in complexity. When you add bluffing into this mix, things get even trickier.
We often think bluffing is a human specialty, but John von Neumann—the absolute GOAT of mathematics—revealed it's fundamentally strategic. Bluffing emerges naturally as optimal play in games where unpredictability is key.
Meta's Pluribus bot took this idea further. Instead of solving poker upfront, they trained it through millions of games using a regret-minimization algorithm (Monte Carlo CFR). Basically, Pluribus learned from past mistakes, adjusting its play to avoid regretful moves.
And incredibly, this mathematical approach naturally led Pluribus to bluff. It bluffed so convincingly it made professional players fold winning hands.
It’s not just poker, either. In SC2, AlphaStar pulled the same trick—faking out pro players with feints and fake units. Nobody coded “bluffing” into it. Those deceptive plays just emerged as AlphaStar figured out how to win under the fog of war.
Meta’s CICERO bot, designed for the negotiation game Diplomacy, learned to sweet-talk, backstab, and outright lie to win—even when its creators tried to make it play nice. Turns out, if deception helps an AI win, it’ll find a way—even beyond what we intend.