Table of Contents

  1. Overview
  2. Choose Your Path
  3. Option A: Build From Scratch
  4. Option B: Use a Library
  5. Complete Kuhn Poker Implementation
  6. Testing Your Implementation
  7. Progression Path
  8. Resources

Overview

This guide walks you through building a working CFR (Counterfactual Regret Minimization) poker AI. By the end, you’ll have a trained agent that plays optimal Kuhn poker and understand how to scale to larger games.

Prerequisites:

  • Python 3.8+
  • Basic understanding of game trees and probability
  • Familiarity with recursion

Time estimates:

  • Kuhn Poker from scratch: 2-4 hours
  • Using OpenSpiel: 30 minutes
  • Full Texas Hold’em: Weeks to months

Choose Your Path

ApproachProsConsBest For
From ScratchDeep understanding, full controlMore work, easy to introduce bugsLearning, research
OpenSpielProduction-ready, many gamesHeavier dependency, less intuitiveResearch, benchmarking
pyCFRPure Python, readableNot optimized, small games onlyLearning, toy problems
RLCardEasy API, multiple card gamesLess game-theory focusedRL research

Recommendation: Start from scratch with Kuhn Poker, then move to OpenSpiel for larger games.

Option A: Build From Scratch

Step 1: Environment Setup

Terminal window
# Create project directory
mkdir poker-cfr && cd poker-cfr
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install numpy

Step 2: Define the Game

First, understand Kuhn Poker:

  • 3 cards: Jack (0), Queen (1), King (2)
  • 2 players, each antes 1 chip
  • One card dealt to each player
  • Actions: Check (c), Bet (b), Call (c), Fold (f)
kuhn_poker.py
from enum import IntEnum
class Card(IntEnum):
JACK = 0
QUEEN = 1
KING = 2
class Action:
CHECK = 'c' # Pass, no bet
BET = 'b' # Bet 1 chip
CALL = 'c' # Match opponent's bet
FOLD = 'f' # Give up
# Game constants
NUM_PLAYERS = 2
NUM_CARDS = 3
ANTE = 1
BET_SIZE = 1

Step 3: Implement Information Sets

information_set.py
import numpy as np
class InformationSet:
"""Tracks regrets and strategies for a game state."""
def __init__(self, num_actions: int):
self.num_actions = num_actions
self.regret_sum = np.zeros(num_actions)
self.strategy_sum = np.zeros(num_actions)
def get_strategy(self) -> np.ndarray:
"""Get current strategy via regret matching."""
# Only consider positive regrets
positive_regrets = np.maximum(self.regret_sum, 0)
total = positive_regrets.sum()
if total > 0:
return positive_regrets / total
else:
# Uniform random if no positive regret
return np.ones(self.num_actions) / self.num_actions
def get_average_strategy(self) -> np.ndarray:
"""Get Nash equilibrium approximation."""
total = self.strategy_sum.sum()
if total > 0:
return self.strategy_sum / total
return np.ones(self.num_actions) / self.num_actions

Step 4: Implement CFR Traversal

cfr.py
import numpy as np
from itertools import permutations
from information_set import InformationSet
class KuhnCFR:
def __init__(self):
self.info_sets = {} # key -> InformationSet
self.num_actions = 2 # check/fold or bet/call
def get_info_set(self, card: int, history: str) -> InformationSet:
"""Get or create information set for card + history."""
key = f"{card}:{history}"
if key not in self.info_sets:
self.info_sets[key] = InformationSet(self.num_actions)
return self.info_sets[key]
def is_terminal(self, history: str) -> bool:
"""Check if game is over."""
if len(history) < 2:
return False
# Terminal states: pp (check-check), bb (bet-call),
# bp (bet-fold), pbp (check-bet-fold), pbb (check-bet-call)
return history in ['pp', 'bb', 'bp', 'pbp', 'pbb']
def get_payoff(self, cards: list, history: str, player: int) -> float:
"""Get terminal payoff for player."""
opponent = 1 - player
if history == 'pp': # Both checked
# Higher card wins ante
if cards[player] > cards[opponent]:
return 1
else:
return -1
elif history == 'bp': # Player 0 bet, player 1 folded
return 1 if player == 0 else -1
elif history == 'pbp': # Check, bet, fold
return 1 if player == 1 else -1
else: # Showdown after betting (bb or pbb)
pot = 2 # Both bet
if cards[player] > cards[opponent]:
return pot
else:
return -pot
def cfr(self, cards: list, history: str, reach_probs: list, player: int) -> float:
"""
CFR recursive traversal.
Args:
cards: [player0_card, player1_card]
history: string of actions taken
reach_probs: [p0_reach_prob, p1_reach_prob]
player: player to update regrets for
Returns:
Expected value for the traversing player
"""
current_player = len(history) % 2
# Terminal node
if self.is_terminal(history):
return self.get_payoff(cards, history, player)
info_set = self.get_info_set(cards[current_player], history)
strategy = info_set.get_strategy()
# Get action values
action_values = np.zeros(self.num_actions)
actions = ['p', 'b'] if history == '' or history[-1] == 'p' else ['p', 'b']
# After bet: p=fold, b=call
if len(history) > 0 and history[-1] == 'b':
actions = ['p', 'b'] # p=fold, b=call
for i, action in enumerate(actions):
new_history = history + action
if current_player == player:
# Update reach prob for current player
new_reach = reach_probs.copy()
new_reach[current_player] *= strategy[i]
action_values[i] = self.cfr(cards, new_history, new_reach, player)
else:
new_reach = reach_probs.copy()
new_reach[current_player] *= strategy[i]
action_values[i] = self.cfr(cards, new_history, new_reach, player)
# Expected value under current strategy
node_value = np.dot(strategy, action_values)
# Update regrets for current player
if current_player == player:
opponent = 1 - player
for i in range(self.num_actions):
regret = action_values[i] - node_value
info_set.regret_sum[i] += reach_probs[opponent] * regret
# Update strategy sum for average strategy
info_set.strategy_sum += reach_probs[player] * strategy
return node_value
def train(self, iterations: int) -> float:
"""Run CFR training for specified iterations."""
cards = [0, 1, 2] # Jack, Queen, King
expected_value = 0
for i in range(iterations):
# Iterate over all possible card deals
for deal in permutations(cards, 2):
deal = list(deal)
# Train for each player
for player in range(NUM_PLAYERS):
expected_value += self.cfr(
deal, '', [1.0, 1.0], player
)
return expected_value / (iterations * 6 * 2) # 6 deals, 2 players
NUM_PLAYERS = 2

Step 5: Train and Evaluate

main.py
from cfr import KuhnCFR
def main():
cfr = KuhnCFR()
# Train
print("Training CFR...")
for i in [100, 1000, 10000, 100000]:
cfr.train(i - (i // 10 if i > 100 else 0)) # Incremental
# Print strategies
print(f"\n=== After {i} iterations ===")
for key, info_set in sorted(cfr.info_sets.items()):
strategy = info_set.get_average_strategy()
print(f"{key}: Check/Fold={strategy[0]:.3f}, Bet/Call={strategy[1]:.3f}")
if __name__ == "__main__":
main()

Expected Results

After 100,000 iterations, you should see strategies close to Nash equilibrium:

Player 1 with Jack:
Initial: Check 79%, Bet 21% (bluff frequency)
After check-bet: Fold 100%
Player 1 with Queen:
Initial: Check 100%
After check-bet: Fold ~67%, Call ~33%
Player 1 with King:
Initial: Check ~67%, Bet ~33%
After check-bet: Call 100%

Game value should converge to approximately -0.0556 (= -1/18).

Option B: Use a Library

Terminal window
# Install
pip install open_spiel
# Or from source for latest features
git clone https://github.com/google-deepmind/open_spiel.git
cd open_spiel
./install.sh

Quick Start:

import pyspiel
from open_spiel.python.algorithms import cfr
# Create Kuhn Poker game
game = pyspiel.load_game("kuhn_poker")
# Create CFR solver
cfr_solver = cfr.CFRSolver(game)
# Train
for i in range(10000):
cfr_solver.evaluate_and_update_policy()
# Get average policy (Nash equilibrium approximation)
average_policy = cfr_solver.average_policy()
# Print strategy for info state "0" (Player 0 with Jack, no history)
print(average_policy.action_probabilities("0"))

Available Poker Games:

  • kuhn_poker - 3-card simplified poker
  • leduc_poker - 6-card, 2 betting rounds
  • liars_dice - Related imperfect info game

pyCFR (Learning-Focused)

Terminal window
git clone https://github.com/tansey/pycfr.git
cd pycfr
pip install -e .
from pokergames import *
from pokercfr import *
# Create Kuhn poker
game = kuhn_poker()
# Create CFR solver
cfr = CounterfactualRegretMinimizer(game)
# Train
for i in range(10000):
cfr.run()
# Get strategy
strategy = cfr.profile

RLCard (RL + Game Theory)

Terminal window
pip install rlcard
import rlcard
from rlcard.agents import CFRAgent
# Create environment
env = rlcard.make('leduc-holdem')
# Create CFR agent
agent = CFRAgent(env, model_path='./cfr_model')
# Train
for episode in range(10000):
agent.train()
# Save model
agent.save()

Complete Kuhn Poker Implementation

Here’s a single-file, production-ready implementation:

#!/usr/bin/env python3
"""
Complete Kuhn Poker CFR Implementation
Run: python kuhn_cfr_complete.py
"""
import numpy as np
from itertools import permutations
from typing import Dict, List, Tuple
class KuhnPokerCFR:
"""CFR solver for Kuhn Poker."""
PASS = 0
BET = 1
NUM_ACTIONS = 2
def __init__(self):
self.regret_sum: Dict[str, np.ndarray] = {}
self.strategy_sum: Dict[str, np.ndarray] = {}
self.iterations = 0
def _get_strategy(self, info_set: str) -> np.ndarray:
"""Get current strategy via regret matching."""
if info_set not in self.regret_sum:
self.regret_sum[info_set] = np.zeros(self.NUM_ACTIONS)
self.strategy_sum[info_set] = np.zeros(self.NUM_ACTIONS)
regrets = np.maximum(self.regret_sum[info_set], 0)
total = regrets.sum()
if total > 0:
return regrets / total
return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS
def _get_average_strategy(self, info_set: str) -> np.ndarray:
"""Get Nash equilibrium approximation."""
if info_set not in self.strategy_sum:
return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS
total = self.strategy_sum[info_set].sum()
if total > 0:
return self.strategy_sum[info_set] / total
return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS
def _cfr(self, cards: List[int], history: str,
p0: float, p1: float) -> float:
"""
CFR recursive traversal.
Returns expected value for player 0.
"""
plays = len(history)
player = plays % 2
opponent = 1 - player
# Check for terminal states
if plays >= 2:
terminal_pass = history[-1] == 'p'
double_bet = history[-2:] == 'bb'
if terminal_pass:
if history == 'pp':
# Both passed - higher card wins
return 1 if cards[0] > cards[1] else -1
else:
# One player folded
return 1 if history[-2:] == 'bp' else -1
elif double_bet:
# Showdown - higher card wins 2
return 2 if cards[0] > cards[1] else -2
info_set = str(cards[player]) + history
strategy = self._get_strategy(info_set)
# Calculate action utilities
util = np.zeros(self.NUM_ACTIONS)
node_util = 0
for a in range(self.NUM_ACTIONS):
next_history = history + ('p' if a == 0 else 'b')
if player == 0:
util[a] = -self._cfr(cards, next_history,
p0 * strategy[a], p1)
else:
util[a] = -self._cfr(cards, next_history,
p0, p1 * strategy[a])
node_util += strategy[a] * util[a]
# Update regrets
for a in range(self.NUM_ACTIONS):
regret = util[a] - node_util
if player == 0:
self.regret_sum[info_set][a] += p1 * regret
else:
self.regret_sum[info_set][a] += p0 * regret
# Update strategy sum
reach_prob = p0 if player == 0 else p1
self.strategy_sum[info_set] += reach_prob * strategy
return node_util
def train(self, iterations: int) -> float:
"""Train for specified iterations. Returns game value."""
cards = [0, 1, 2] # J, Q, K
util = 0
for _ in range(iterations):
for deal in permutations(cards, 2):
util += self._cfr(list(deal), '', 1, 1)
self.iterations += 1
return util / (iterations * 6) # 6 possible deals
def get_strategy_profile(self) -> Dict[str, Tuple[float, float]]:
"""Get all Nash equilibrium strategies."""
profile = {}
for info_set in sorted(self.strategy_sum.keys()):
strategy = self._get_average_strategy(info_set)
profile[info_set] = (strategy[0], strategy[1])
return profile
def play_vs_human(self):
"""Interactive game against trained bot."""
import random
cards = [0, 1, 2]
card_names = ['Jack', 'Queen', 'King']
while True:
random.shuffle(cards)
human_card, bot_card = cards[0], cards[1]
print(f"\nYour card: {card_names[human_card]}")
print("You go first. Enter 'p' to pass or 'b' to bet:")
history = ''
pot = 2 # Both anted 1
while True:
player = len(history) % 2
if player == 0: # Human
action = input("> ").strip().lower()
if action not in ['p', 'b']:
print("Invalid. Enter 'p' or 'b'")
continue
else: # Bot
info_set = str(bot_card) + history
strategy = self._get_average_strategy(info_set)
action = 'p' if random.random() < strategy[0] else 'b'
print(f"Bot: {action}")
history += action
# Check terminal
if len(history) >= 2:
if history[-1] == 'p':
if history == 'pp':
winner = 'You' if human_card > bot_card else 'Bot'
print(f"Showdown! Bot had {card_names[bot_card]}. {winner} wins!")
else:
winner = 'Bot' if history[-2:] == 'bp' else 'You'
print(f"Fold! {winner} wins!")
break
elif history[-2:] == 'bb':
winner = 'You' if human_card > bot_card else 'Bot'
print(f"Showdown! Bot had {card_names[bot_card]}. {winner} wins the pot of 4!")
break
if input("\nPlay again? (y/n): ").lower() != 'y':
break
def main():
print("=== Kuhn Poker CFR Training ===\n")
cfr = KuhnPokerCFR()
# Training with progress
for i in [1000, 10000, 100000]:
print(f"Training to {i} iterations...")
cfr.train(i - cfr.iterations)
# Display results
print("\n=== Nash Equilibrium Strategies ===")
print("Info Set | Pass/Fold | Bet/Call")
print("-" * 35)
for info_set, (pass_prob, bet_prob) in cfr.get_strategy_profile().items():
card = ['J', 'Q', 'K'][int(info_set[0])]
history = info_set[1:] if len(info_set) > 1 else 'start'
print(f"{card}:{history:6} | {pass_prob:.3f} | {bet_prob:.3f}")
# Theoretical game value
print(f"\nExpected game value: ~-0.0556 (= -1/18)")
print("Player 2 has slight advantage from position.\n")
# Offer to play
if input("Play against the bot? (y/n): ").lower() == 'y':
cfr.play_vs_human()
if __name__ == "__main__":
main()

Testing Your Implementation

1. Verify Convergence

Track exploitability over iterations. Should decrease to near-zero:

def compute_exploitability(cfr):
"""Compute how exploitable the strategy is."""
# Best response calculation (simplified)
# Full implementation requires best response oracle
pass

2. Check Known Equilibrium Values

For Kuhn Poker:

  • Game value: -1/18 ≈ -0.0556
  • P1 with Jack bets ~21% (bluff)
  • P1 with King bets ~33% (value)

3. Unit Tests

def test_terminal_payoffs():
cfr = KuhnPokerCFR()
# King vs Jack, both pass -> King wins 1
assert cfr._cfr([2, 0], 'pp', 1, 1) == 1
# Jack vs King, bet then fold -> Jack wins 1
assert cfr._cfr([0, 2], 'bp', 1, 1) == 1
def test_convergence():
cfr = KuhnPokerCFR()
cfr.train(100000)
# Jack should bluff about 1/3 of the time at equilibrium
strategy = cfr._get_average_strategy('0') # Jack, no history
assert 0.15 < strategy[1] < 0.35 # Bet probability

Progression Path

Level 1: Kuhn Poker (Hours)

  • 12 information sets
  • 3 cards
  • Full traversal feasible
  • Goal: Understand CFR mechanics

Level 2: Leduc Poker (Days)

  • ~300 information sets
  • 6 cards (2 suits × 3 ranks)
  • 2 betting rounds
  • Goal: Scale basic CFR

Level 3: Limit Texas Hold’em (Weeks)

  • ~10^12 information sets
  • Requires abstraction
  • Use MCCFR (sampling)
  • Goal: Learn game abstraction

Level 4: No-Limit Texas Hold’em (Months)

  • ~10^160 information sets
  • Requires:
    • Card abstraction (hand bucketing)
    • Action abstraction (limit bet sizes)
    • MCCFR with external sampling
    • Blueprint + real-time solving
  • Goal: Research-level implementation

Resources

Tutorials

Libraries

  • OpenSpiel - Google DeepMind’s game research framework
  • pyCFR - Pure Python CFR for learning
  • RLCard - RL toolkit for card games
  • PokerRL - Deep RL for poker

Papers

  • Zinkevich et al. (2007) - Original CFR paper
  • Lanctot et al. (2009) - Monte Carlo CFR
  • Tammelin (2014) - CFR+
  • Brown & Sandholm (2019) - Pluribus

GitHub Implementations