implementation-guide
Table of Contents
- Overview
- Choose Your Path
- Option A: Build From Scratch
- Option B: Use a Library
- Complete Kuhn Poker Implementation
- Testing Your Implementation
- Progression Path
- Resources
Overview
This guide walks you through building a working CFR (Counterfactual Regret Minimization) poker AI. By the end, you’ll have a trained agent that plays optimal Kuhn poker and understand how to scale to larger games.
Prerequisites:
- Python 3.8+
- Basic understanding of game trees and probability
- Familiarity with recursion
Time estimates:
- Kuhn Poker from scratch: 2-4 hours
- Using OpenSpiel: 30 minutes
- Full Texas Hold’em: Weeks to months
Choose Your Path
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| From Scratch | Deep understanding, full control | More work, easy to introduce bugs | Learning, research |
| OpenSpiel | Production-ready, many games | Heavier dependency, less intuitive | Research, benchmarking |
| pyCFR | Pure Python, readable | Not optimized, small games only | Learning, toy problems |
| RLCard | Easy API, multiple card games | Less game-theory focused | RL research |
Recommendation: Start from scratch with Kuhn Poker, then move to OpenSpiel for larger games.
Option A: Build From Scratch
Step 1: Environment Setup
# Create project directorymkdir poker-cfr && cd poker-cfr
# Create virtual environmentpython -m venv venvsource venv/bin/activate # Linux/Mac# or: venv\Scripts\activate # Windows
# Install dependenciespip install numpyStep 2: Define the Game
First, understand Kuhn Poker:
- 3 cards: Jack (0), Queen (1), King (2)
- 2 players, each antes 1 chip
- One card dealt to each player
- Actions: Check (c), Bet (b), Call (c), Fold (f)
from enum import IntEnum
class Card(IntEnum): JACK = 0 QUEEN = 1 KING = 2
class Action: CHECK = 'c' # Pass, no bet BET = 'b' # Bet 1 chip CALL = 'c' # Match opponent's bet FOLD = 'f' # Give up
# Game constantsNUM_PLAYERS = 2NUM_CARDS = 3ANTE = 1BET_SIZE = 1Step 3: Implement Information Sets
import numpy as np
class InformationSet: """Tracks regrets and strategies for a game state."""
def __init__(self, num_actions: int): self.num_actions = num_actions self.regret_sum = np.zeros(num_actions) self.strategy_sum = np.zeros(num_actions)
def get_strategy(self) -> np.ndarray: """Get current strategy via regret matching.""" # Only consider positive regrets positive_regrets = np.maximum(self.regret_sum, 0) total = positive_regrets.sum()
if total > 0: return positive_regrets / total else: # Uniform random if no positive regret return np.ones(self.num_actions) / self.num_actions
def get_average_strategy(self) -> np.ndarray: """Get Nash equilibrium approximation.""" total = self.strategy_sum.sum() if total > 0: return self.strategy_sum / total return np.ones(self.num_actions) / self.num_actionsStep 4: Implement CFR Traversal
import numpy as npfrom itertools import permutationsfrom information_set import InformationSet
class KuhnCFR: def __init__(self): self.info_sets = {} # key -> InformationSet self.num_actions = 2 # check/fold or bet/call
def get_info_set(self, card: int, history: str) -> InformationSet: """Get or create information set for card + history.""" key = f"{card}:{history}" if key not in self.info_sets: self.info_sets[key] = InformationSet(self.num_actions) return self.info_sets[key]
def is_terminal(self, history: str) -> bool: """Check if game is over.""" if len(history) < 2: return False # Terminal states: pp (check-check), bb (bet-call), # bp (bet-fold), pbp (check-bet-fold), pbb (check-bet-call) return history in ['pp', 'bb', 'bp', 'pbp', 'pbb']
def get_payoff(self, cards: list, history: str, player: int) -> float: """Get terminal payoff for player.""" opponent = 1 - player
if history == 'pp': # Both checked # Higher card wins ante if cards[player] > cards[opponent]: return 1 else: return -1
elif history == 'bp': # Player 0 bet, player 1 folded return 1 if player == 0 else -1
elif history == 'pbp': # Check, bet, fold return 1 if player == 1 else -1
else: # Showdown after betting (bb or pbb) pot = 2 # Both bet if cards[player] > cards[opponent]: return pot else: return -pot
def cfr(self, cards: list, history: str, reach_probs: list, player: int) -> float: """ CFR recursive traversal.
Args: cards: [player0_card, player1_card] history: string of actions taken reach_probs: [p0_reach_prob, p1_reach_prob] player: player to update regrets for
Returns: Expected value for the traversing player """ current_player = len(history) % 2
# Terminal node if self.is_terminal(history): return self.get_payoff(cards, history, player)
info_set = self.get_info_set(cards[current_player], history) strategy = info_set.get_strategy()
# Get action values action_values = np.zeros(self.num_actions) actions = ['p', 'b'] if history == '' or history[-1] == 'p' else ['p', 'b'] # After bet: p=fold, b=call if len(history) > 0 and history[-1] == 'b': actions = ['p', 'b'] # p=fold, b=call
for i, action in enumerate(actions): new_history = history + action
if current_player == player: # Update reach prob for current player new_reach = reach_probs.copy() new_reach[current_player] *= strategy[i] action_values[i] = self.cfr(cards, new_history, new_reach, player) else: new_reach = reach_probs.copy() new_reach[current_player] *= strategy[i] action_values[i] = self.cfr(cards, new_history, new_reach, player)
# Expected value under current strategy node_value = np.dot(strategy, action_values)
# Update regrets for current player if current_player == player: opponent = 1 - player for i in range(self.num_actions): regret = action_values[i] - node_value info_set.regret_sum[i] += reach_probs[opponent] * regret
# Update strategy sum for average strategy info_set.strategy_sum += reach_probs[player] * strategy
return node_value
def train(self, iterations: int) -> float: """Run CFR training for specified iterations.""" cards = [0, 1, 2] # Jack, Queen, King expected_value = 0
for i in range(iterations): # Iterate over all possible card deals for deal in permutations(cards, 2): deal = list(deal) # Train for each player for player in range(NUM_PLAYERS): expected_value += self.cfr( deal, '', [1.0, 1.0], player )
return expected_value / (iterations * 6 * 2) # 6 deals, 2 players
NUM_PLAYERS = 2Step 5: Train and Evaluate
from cfr import KuhnCFR
def main(): cfr = KuhnCFR()
# Train print("Training CFR...") for i in [100, 1000, 10000, 100000]: cfr.train(i - (i // 10 if i > 100 else 0)) # Incremental
# Print strategies print(f"\n=== After {i} iterations ===") for key, info_set in sorted(cfr.info_sets.items()): strategy = info_set.get_average_strategy() print(f"{key}: Check/Fold={strategy[0]:.3f}, Bet/Call={strategy[1]:.3f}")
if __name__ == "__main__": main()Expected Results
After 100,000 iterations, you should see strategies close to Nash equilibrium:
Player 1 with Jack: Initial: Check 79%, Bet 21% (bluff frequency) After check-bet: Fold 100%
Player 1 with Queen: Initial: Check 100% After check-bet: Fold ~67%, Call ~33%
Player 1 with King: Initial: Check ~67%, Bet ~33% After check-bet: Call 100%Game value should converge to approximately -0.0556 (= -1/18).
Option B: Use a Library
OpenSpiel (Recommended for Research)
# Installpip install open_spiel
# Or from source for latest featuresgit clone https://github.com/google-deepmind/open_spiel.gitcd open_spiel./install.shQuick Start:
import pyspielfrom open_spiel.python.algorithms import cfr
# Create Kuhn Poker gamegame = pyspiel.load_game("kuhn_poker")
# Create CFR solvercfr_solver = cfr.CFRSolver(game)
# Trainfor i in range(10000): cfr_solver.evaluate_and_update_policy()
# Get average policy (Nash equilibrium approximation)average_policy = cfr_solver.average_policy()
# Print strategy for info state "0" (Player 0 with Jack, no history)print(average_policy.action_probabilities("0"))Available Poker Games:
kuhn_poker- 3-card simplified pokerleduc_poker- 6-card, 2 betting roundsliars_dice- Related imperfect info game
pyCFR (Learning-Focused)
git clone https://github.com/tansey/pycfr.gitcd pycfrpip install -e .from pokergames import *from pokercfr import *
# Create Kuhn pokergame = kuhn_poker()
# Create CFR solvercfr = CounterfactualRegretMinimizer(game)
# Trainfor i in range(10000): cfr.run()
# Get strategystrategy = cfr.profileRLCard (RL + Game Theory)
pip install rlcardimport rlcardfrom rlcard.agents import CFRAgent
# Create environmentenv = rlcard.make('leduc-holdem')
# Create CFR agentagent = CFRAgent(env, model_path='./cfr_model')
# Trainfor episode in range(10000): agent.train()
# Save modelagent.save()Complete Kuhn Poker Implementation
Here’s a single-file, production-ready implementation:
#!/usr/bin/env python3"""Complete Kuhn Poker CFR ImplementationRun: python kuhn_cfr_complete.py"""
import numpy as npfrom itertools import permutationsfrom typing import Dict, List, Tuple
class KuhnPokerCFR: """CFR solver for Kuhn Poker."""
PASS = 0 BET = 1 NUM_ACTIONS = 2
def __init__(self): self.regret_sum: Dict[str, np.ndarray] = {} self.strategy_sum: Dict[str, np.ndarray] = {} self.iterations = 0
def _get_strategy(self, info_set: str) -> np.ndarray: """Get current strategy via regret matching.""" if info_set not in self.regret_sum: self.regret_sum[info_set] = np.zeros(self.NUM_ACTIONS) self.strategy_sum[info_set] = np.zeros(self.NUM_ACTIONS)
regrets = np.maximum(self.regret_sum[info_set], 0) total = regrets.sum()
if total > 0: return regrets / total return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS
def _get_average_strategy(self, info_set: str) -> np.ndarray: """Get Nash equilibrium approximation.""" if info_set not in self.strategy_sum: return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS
total = self.strategy_sum[info_set].sum() if total > 0: return self.strategy_sum[info_set] / total return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS
def _cfr(self, cards: List[int], history: str, p0: float, p1: float) -> float: """ CFR recursive traversal. Returns expected value for player 0. """ plays = len(history) player = plays % 2 opponent = 1 - player
# Check for terminal states if plays >= 2: terminal_pass = history[-1] == 'p' double_bet = history[-2:] == 'bb'
if terminal_pass: if history == 'pp': # Both passed - higher card wins return 1 if cards[0] > cards[1] else -1 else: # One player folded return 1 if history[-2:] == 'bp' else -1
elif double_bet: # Showdown - higher card wins 2 return 2 if cards[0] > cards[1] else -2
info_set = str(cards[player]) + history strategy = self._get_strategy(info_set)
# Calculate action utilities util = np.zeros(self.NUM_ACTIONS) node_util = 0
for a in range(self.NUM_ACTIONS): next_history = history + ('p' if a == 0 else 'b')
if player == 0: util[a] = -self._cfr(cards, next_history, p0 * strategy[a], p1) else: util[a] = -self._cfr(cards, next_history, p0, p1 * strategy[a])
node_util += strategy[a] * util[a]
# Update regrets for a in range(self.NUM_ACTIONS): regret = util[a] - node_util if player == 0: self.regret_sum[info_set][a] += p1 * regret else: self.regret_sum[info_set][a] += p0 * regret
# Update strategy sum reach_prob = p0 if player == 0 else p1 self.strategy_sum[info_set] += reach_prob * strategy
return node_util
def train(self, iterations: int) -> float: """Train for specified iterations. Returns game value.""" cards = [0, 1, 2] # J, Q, K util = 0
for _ in range(iterations): for deal in permutations(cards, 2): util += self._cfr(list(deal), '', 1, 1) self.iterations += 1
return util / (iterations * 6) # 6 possible deals
def get_strategy_profile(self) -> Dict[str, Tuple[float, float]]: """Get all Nash equilibrium strategies.""" profile = {} for info_set in sorted(self.strategy_sum.keys()): strategy = self._get_average_strategy(info_set) profile[info_set] = (strategy[0], strategy[1]) return profile
def play_vs_human(self): """Interactive game against trained bot.""" import random
cards = [0, 1, 2] card_names = ['Jack', 'Queen', 'King']
while True: random.shuffle(cards) human_card, bot_card = cards[0], cards[1]
print(f"\nYour card: {card_names[human_card]}") print("You go first. Enter 'p' to pass or 'b' to bet:")
history = '' pot = 2 # Both anted 1
while True: player = len(history) % 2
if player == 0: # Human action = input("> ").strip().lower() if action not in ['p', 'b']: print("Invalid. Enter 'p' or 'b'") continue else: # Bot info_set = str(bot_card) + history strategy = self._get_average_strategy(info_set) action = 'p' if random.random() < strategy[0] else 'b' print(f"Bot: {action}")
history += action
# Check terminal if len(history) >= 2: if history[-1] == 'p': if history == 'pp': winner = 'You' if human_card > bot_card else 'Bot' print(f"Showdown! Bot had {card_names[bot_card]}. {winner} wins!") else: winner = 'Bot' if history[-2:] == 'bp' else 'You' print(f"Fold! {winner} wins!") break elif history[-2:] == 'bb': winner = 'You' if human_card > bot_card else 'Bot' print(f"Showdown! Bot had {card_names[bot_card]}. {winner} wins the pot of 4!") break
if input("\nPlay again? (y/n): ").lower() != 'y': break
def main(): print("=== Kuhn Poker CFR Training ===\n")
cfr = KuhnPokerCFR()
# Training with progress for i in [1000, 10000, 100000]: print(f"Training to {i} iterations...") cfr.train(i - cfr.iterations)
# Display results print("\n=== Nash Equilibrium Strategies ===") print("Info Set | Pass/Fold | Bet/Call") print("-" * 35)
for info_set, (pass_prob, bet_prob) in cfr.get_strategy_profile().items(): card = ['J', 'Q', 'K'][int(info_set[0])] history = info_set[1:] if len(info_set) > 1 else 'start' print(f"{card}:{history:6} | {pass_prob:.3f} | {bet_prob:.3f}")
# Theoretical game value print(f"\nExpected game value: ~-0.0556 (= -1/18)") print("Player 2 has slight advantage from position.\n")
# Offer to play if input("Play against the bot? (y/n): ").lower() == 'y': cfr.play_vs_human()
if __name__ == "__main__": main()Testing Your Implementation
1. Verify Convergence
Track exploitability over iterations. Should decrease to near-zero:
def compute_exploitability(cfr): """Compute how exploitable the strategy is.""" # Best response calculation (simplified) # Full implementation requires best response oracle pass2. Check Known Equilibrium Values
For Kuhn Poker:
- Game value: -1/18 ≈ -0.0556
- P1 with Jack bets ~21% (bluff)
- P1 with King bets ~33% (value)
3. Unit Tests
def test_terminal_payoffs(): cfr = KuhnPokerCFR()
# King vs Jack, both pass -> King wins 1 assert cfr._cfr([2, 0], 'pp', 1, 1) == 1
# Jack vs King, bet then fold -> Jack wins 1 assert cfr._cfr([0, 2], 'bp', 1, 1) == 1
def test_convergence(): cfr = KuhnPokerCFR() cfr.train(100000)
# Jack should bluff about 1/3 of the time at equilibrium strategy = cfr._get_average_strategy('0') # Jack, no history assert 0.15 < strategy[1] < 0.35 # Bet probabilityProgression Path
Level 1: Kuhn Poker (Hours)
- 12 information sets
- 3 cards
- Full traversal feasible
- Goal: Understand CFR mechanics
Level 2: Leduc Poker (Days)
- ~300 information sets
- 6 cards (2 suits × 3 ranks)
- 2 betting rounds
- Goal: Scale basic CFR
Level 3: Limit Texas Hold’em (Weeks)
- ~10^12 information sets
- Requires abstraction
- Use MCCFR (sampling)
- Goal: Learn game abstraction
Level 4: No-Limit Texas Hold’em (Months)
- ~10^160 information sets
- Requires:
- Card abstraction (hand bucketing)
- Action abstraction (limit bet sizes)
- MCCFR with external sampling
- Blueprint + real-time solving
- Goal: Research-level implementation
Resources
Tutorials
- AI Poker Tutorial - Comprehensive course
- Vanilla CFR for Engineers - Clear Python walkthrough
- labml.ai CFR - Annotated implementation
Libraries
- OpenSpiel - Google DeepMind’s game research framework
- pyCFR - Pure Python CFR for learning
- RLCard - RL toolkit for card games
- PokerRL - Deep RL for poker
Papers
- Zinkevich et al. (2007) - Original CFR paper
- Lanctot et al. (2009) - Monte Carlo CFR
- Tammelin (2014) - CFR+
- Brown & Sandholm (2019) - Pluribus
GitHub Implementations
- simple-poker-cfr - C++ simplified Hold’em
- poker-cfr - Rust Hold’em implementation
- KuhnPoker - Python Kuhn with exploitability