implementation-guide

Overview
Choose Your Path
Option A: Build From Scratch
Option B: Use a Library
Complete Kuhn Poker Implementation
Testing Your Implementation
Progression Path
Resources

Overview

This guide walks you through building a working CFR (Counterfactual Regret Minimization) poker AI. By the end, you’ll have a trained agent that plays optimal Kuhn poker and understand how to scale to larger games.

Prerequisites:

Python 3.8+
Basic understanding of game trees and probability
Familiarity with recursion

Time estimates:

Kuhn Poker from scratch: 2-4 hours
Using OpenSpiel: 30 minutes
Full Texas Hold’em: Weeks to months

Choose Your Path

Approach	Pros	Cons	Best For
From Scratch	Deep understanding, full control	More work, easy to introduce bugs	Learning, research
OpenSpiel	Production-ready, many games	Heavier dependency, less intuitive	Research, benchmarking
pyCFR	Pure Python, readable	Not optimized, small games only	Learning, toy problems
RLCard	Easy API, multiple card games	Less game-theory focused	RL research

Recommendation: Start from scratch with Kuhn Poker, then move to OpenSpiel for larger games.

Option A: Build From Scratch

Step 1: Environment Setup

# Create project directory
mkdir poker-cfr && cd poker-cfr

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install numpy

Step 2: Define the Game

First, understand Kuhn Poker:

3 cards: Jack (0), Queen (1), King (2)
2 players, each antes 1 chip
One card dealt to each player
Actions: Check (c), Bet (b), Call (c), Fold (f)

from enum import IntEnum

class Card(IntEnum):
    JACK = 0
    QUEEN = 1
    KING = 2

class Action:
    CHECK = 'c'  # Pass, no bet
    BET = 'b'    # Bet 1 chip
    CALL = 'c'   # Match opponent's bet
    FOLD = 'f'   # Give up

# Game constants
NUM_PLAYERS = 2
NUM_CARDS = 3
ANTE = 1
BET_SIZE = 1

Step 3: Implement Information Sets

import numpy as np

class InformationSet:
    """Tracks regrets and strategies for a game state."""

    def __init__(self, num_actions: int):
        self.num_actions = num_actions
        self.regret_sum = np.zeros(num_actions)
        self.strategy_sum = np.zeros(num_actions)

    def get_strategy(self) -> np.ndarray:
        """Get current strategy via regret matching."""
        # Only consider positive regrets
        positive_regrets = np.maximum(self.regret_sum, 0)
        total = positive_regrets.sum()

        if total > 0:
            return positive_regrets / total
        else:
            # Uniform random if no positive regret
            return np.ones(self.num_actions) / self.num_actions

    def get_average_strategy(self) -> np.ndarray:
        """Get Nash equilibrium approximation."""
        total = self.strategy_sum.sum()
        if total > 0:
            return self.strategy_sum / total
        return np.ones(self.num_actions) / self.num_actions

Step 4: Implement CFR Traversal

import numpy as np
from itertools import permutations
from information_set import InformationSet

class KuhnCFR:
    def __init__(self):
        self.info_sets = {}  # key -> InformationSet
        self.num_actions = 2  # check/fold or bet/call

    def get_info_set(self, card: int, history: str) -> InformationSet:
        """Get or create information set for card + history."""
        key = f"{card}:{history}"
        if key not in self.info_sets:
            self.info_sets[key] = InformationSet(self.num_actions)
        return self.info_sets[key]

    def is_terminal(self, history: str) -> bool:
        """Check if game is over."""
        if len(history) < 2:
            return False
        # Terminal states: pp (check-check), bb (bet-call),
        # bp (bet-fold), pbp (check-bet-fold), pbb (check-bet-call)
        return history in ['pp', 'bb', 'bp', 'pbp', 'pbb']

    def get_payoff(self, cards: list, history: str, player: int) -> float:
        """Get terminal payoff for player."""
        opponent = 1 - player

        if history == 'pp':  # Both checked
            # Higher card wins ante
            if cards[player] > cards[opponent]:
                return 1
            else:
                return -1

        elif history == 'bp':  # Player 0 bet, player 1 folded
            return 1 if player == 0 else -1

        elif history == 'pbp':  # Check, bet, fold
            return 1 if player == 1 else -1

        else:  # Showdown after betting (bb or pbb)
            pot = 2  # Both bet
            if cards[player] > cards[opponent]:
                return pot
            else:
                return -pot

    def cfr(self, cards: list, history: str, reach_probs: list, player: int) -> float:
        """
        CFR recursive traversal.

        Args:
            cards: [player0_card, player1_card]
            history: string of actions taken
            reach_probs: [p0_reach_prob, p1_reach_prob]
            player: player to update regrets for

        Returns:
            Expected value for the traversing player
        """
        current_player = len(history) % 2

        # Terminal node
        if self.is_terminal(history):
            return self.get_payoff(cards, history, player)

        info_set = self.get_info_set(cards[current_player], history)
        strategy = info_set.get_strategy()

        # Get action values
        action_values = np.zeros(self.num_actions)
        actions = ['p', 'b'] if history == '' or history[-1] == 'p' else ['p', 'b']
        # After bet: p=fold, b=call
        if len(history) > 0 and history[-1] == 'b':
            actions = ['p', 'b']  # p=fold, b=call

        for i, action in enumerate(actions):
            new_history = history + action

            if current_player == player:
                # Update reach prob for current player
                new_reach = reach_probs.copy()
                new_reach[current_player] *= strategy[i]
                action_values[i] = self.cfr(cards, new_history, new_reach, player)
            else:
                new_reach = reach_probs.copy()
                new_reach[current_player] *= strategy[i]
                action_values[i] = self.cfr(cards, new_history, new_reach, player)

        # Expected value under current strategy
        node_value = np.dot(strategy, action_values)

        # Update regrets for current player
        if current_player == player:
            opponent = 1 - player
            for i in range(self.num_actions):
                regret = action_values[i] - node_value
                info_set.regret_sum[i] += reach_probs[opponent] * regret

            # Update strategy sum for average strategy
            info_set.strategy_sum += reach_probs[player] * strategy

        return node_value

    def train(self, iterations: int) -> float:
        """Run CFR training for specified iterations."""
        cards = [0, 1, 2]  # Jack, Queen, King
        expected_value = 0

        for i in range(iterations):
            # Iterate over all possible card deals
            for deal in permutations(cards, 2):
                deal = list(deal)
                # Train for each player
                for player in range(NUM_PLAYERS):
                    expected_value += self.cfr(
                        deal, '', [1.0, 1.0], player
                    )

        return expected_value / (iterations * 6 * 2)  # 6 deals, 2 players

NUM_PLAYERS = 2

Step 5: Train and Evaluate

from cfr import KuhnCFR

def main():
    cfr = KuhnCFR()

    # Train
    print("Training CFR...")
    for i in [100, 1000, 10000, 100000]:
        cfr.train(i - (i // 10 if i > 100 else 0))  # Incremental

        # Print strategies
        print(f"\n=== After {i} iterations ===")
        for key, info_set in sorted(cfr.info_sets.items()):
            strategy = info_set.get_average_strategy()
            print(f"{key}: Check/Fold={strategy[0]:.3f}, Bet/Call={strategy[1]:.3f}")

if __name__ == "__main__":
    main()

Expected Results

After 100,000 iterations, you should see strategies close to Nash equilibrium:

Player 1 with Jack:
  Initial: Check 79%, Bet 21%  (bluff frequency)
  After check-bet: Fold 100%

Player 1 with Queen:
  Initial: Check 100%
  After check-bet: Fold ~67%, Call ~33%

Player 1 with King:
  Initial: Check ~67%, Bet ~33%
  After check-bet: Call 100%

Game value should converge to approximately -0.0556 (= -1/18).

Option B: Use a Library

OpenSpiel (Recommended for Research)

# Install
pip install open_spiel

# Or from source for latest features
git clone https://github.com/google-deepmind/open_spiel.git
cd open_spiel
./install.sh

Quick Start:

import pyspiel
from open_spiel.python.algorithms import cfr

# Create Kuhn Poker game
game = pyspiel.load_game("kuhn_poker")

# Create CFR solver
cfr_solver = cfr.CFRSolver(game)

# Train
for i in range(10000):
    cfr_solver.evaluate_and_update_policy()

# Get average policy (Nash equilibrium approximation)
average_policy = cfr_solver.average_policy()

# Print strategy for info state "0" (Player 0 with Jack, no history)
print(average_policy.action_probabilities("0"))

Available Poker Games:

kuhn_poker - 3-card simplified poker
leduc_poker - 6-card, 2 betting rounds
liars_dice - Related imperfect info game

pyCFR (Learning-Focused)

git clone https://github.com/tansey/pycfr.git
cd pycfr
pip install -e .

from pokergames import *
from pokercfr import *

# Create Kuhn poker
game = kuhn_poker()

# Create CFR solver
cfr = CounterfactualRegretMinimizer(game)

# Train
for i in range(10000):
    cfr.run()

# Get strategy
strategy = cfr.profile

RLCard (RL + Game Theory)

pip install rlcard

import rlcard
from rlcard.agents import CFRAgent

# Create environment
env = rlcard.make('leduc-holdem')

# Create CFR agent
agent = CFRAgent(env, model_path='./cfr_model')

# Train
for episode in range(10000):
    agent.train()

# Save model
agent.save()

Complete Kuhn Poker Implementation

Here’s a single-file, production-ready implementation:

#!/usr/bin/env python3
"""
Complete Kuhn Poker CFR Implementation
Run: python kuhn_cfr_complete.py
"""

import numpy as np
from itertools import permutations
from typing import Dict, List, Tuple

class KuhnPokerCFR:
    """CFR solver for Kuhn Poker."""

    PASS = 0
    BET = 1
    NUM_ACTIONS = 2

    def __init__(self):
        self.regret_sum: Dict[str, np.ndarray] = {}
        self.strategy_sum: Dict[str, np.ndarray] = {}
        self.iterations = 0

    def _get_strategy(self, info_set: str) -> np.ndarray:
        """Get current strategy via regret matching."""
        if info_set not in self.regret_sum:
            self.regret_sum[info_set] = np.zeros(self.NUM_ACTIONS)
            self.strategy_sum[info_set] = np.zeros(self.NUM_ACTIONS)

        regrets = np.maximum(self.regret_sum[info_set], 0)
        total = regrets.sum()

        if total > 0:
            return regrets / total
        return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS

    def _get_average_strategy(self, info_set: str) -> np.ndarray:
        """Get Nash equilibrium approximation."""
        if info_set not in self.strategy_sum:
            return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS

        total = self.strategy_sum[info_set].sum()
        if total > 0:
            return self.strategy_sum[info_set] / total
        return np.ones(self.NUM_ACTIONS) / self.NUM_ACTIONS

    def _cfr(self, cards: List[int], history: str,
             p0: float, p1: float) -> float:
        """
        CFR recursive traversal.
        Returns expected value for player 0.
        """
        plays = len(history)
        player = plays % 2
        opponent = 1 - player

        # Check for terminal states
        if plays >= 2:
            terminal_pass = history[-1] == 'p'
            double_bet = history[-2:] == 'bb'

            if terminal_pass:
                if history == 'pp':
                    # Both passed - higher card wins
                    return 1 if cards[0] > cards[1] else -1
                else:
                    # One player folded
                    return 1 if history[-2:] == 'bp' else -1

            elif double_bet:
                # Showdown - higher card wins 2
                return 2 if cards[0] > cards[1] else -2

        info_set = str(cards[player]) + history
        strategy = self._get_strategy(info_set)

        # Calculate action utilities
        util = np.zeros(self.NUM_ACTIONS)
        node_util = 0

        for a in range(self.NUM_ACTIONS):
            next_history = history + ('p' if a == 0 else 'b')

            if player == 0:
                util[a] = -self._cfr(cards, next_history,
                                      p0 * strategy[a], p1)
            else:
                util[a] = -self._cfr(cards, next_history,
                                      p0, p1 * strategy[a])

            node_util += strategy[a] * util[a]

        # Update regrets
        for a in range(self.NUM_ACTIONS):
            regret = util[a] - node_util
            if player == 0:
                self.regret_sum[info_set][a] += p1 * regret
            else:
                self.regret_sum[info_set][a] += p0 * regret

        # Update strategy sum
        reach_prob = p0 if player == 0 else p1
        self.strategy_sum[info_set] += reach_prob * strategy

        return node_util

    def train(self, iterations: int) -> float:
        """Train for specified iterations. Returns game value."""
        cards = [0, 1, 2]  # J, Q, K
        util = 0

        for _ in range(iterations):
            for deal in permutations(cards, 2):
                util += self._cfr(list(deal), '', 1, 1)
            self.iterations += 1

        return util / (iterations * 6)  # 6 possible deals

    def get_strategy_profile(self) -> Dict[str, Tuple[float, float]]:
        """Get all Nash equilibrium strategies."""
        profile = {}
        for info_set in sorted(self.strategy_sum.keys()):
            strategy = self._get_average_strategy(info_set)
            profile[info_set] = (strategy[0], strategy[1])
        return profile

    def play_vs_human(self):
        """Interactive game against trained bot."""
        import random

        cards = [0, 1, 2]
        card_names = ['Jack', 'Queen', 'King']

        while True:
            random.shuffle(cards)
            human_card, bot_card = cards[0], cards[1]

            print(f"\nYour card: {card_names[human_card]}")
            print("You go first. Enter 'p' to pass or 'b' to bet:")

            history = ''
            pot = 2  # Both anted 1

            while True:
                player = len(history) % 2

                if player == 0:  # Human
                    action = input("> ").strip().lower()
                    if action not in ['p', 'b']:
                        print("Invalid. Enter 'p' or 'b'")
                        continue
                else:  # Bot
                    info_set = str(bot_card) + history
                    strategy = self._get_average_strategy(info_set)
                    action = 'p' if random.random() < strategy[0] else 'b'
                    print(f"Bot: {action}")

                history += action

                # Check terminal
                if len(history) >= 2:
                    if history[-1] == 'p':
                        if history == 'pp':
                            winner = 'You' if human_card > bot_card else 'Bot'
                            print(f"Showdown! Bot had {card_names[bot_card]}. {winner} wins!")
                        else:
                            winner = 'Bot' if history[-2:] == 'bp' else 'You'
                            print(f"Fold! {winner} wins!")
                        break
                    elif history[-2:] == 'bb':
                        winner = 'You' if human_card > bot_card else 'Bot'
                        print(f"Showdown! Bot had {card_names[bot_card]}. {winner} wins the pot of 4!")
                        break

            if input("\nPlay again? (y/n): ").lower() != 'y':
                break


def main():
    print("=== Kuhn Poker CFR Training ===\n")

    cfr = KuhnPokerCFR()

    # Training with progress
    for i in [1000, 10000, 100000]:
        print(f"Training to {i} iterations...")
        cfr.train(i - cfr.iterations)

    # Display results
    print("\n=== Nash Equilibrium Strategies ===")
    print("Info Set | Pass/Fold | Bet/Call")
    print("-" * 35)

    for info_set, (pass_prob, bet_prob) in cfr.get_strategy_profile().items():
        card = ['J', 'Q', 'K'][int(info_set[0])]
        history = info_set[1:] if len(info_set) > 1 else 'start'
        print(f"{card}:{history:6} |   {pass_prob:.3f}   |   {bet_prob:.3f}")

    # Theoretical game value
    print(f"\nExpected game value: ~-0.0556 (= -1/18)")
    print("Player 2 has slight advantage from position.\n")

    # Offer to play
    if input("Play against the bot? (y/n): ").lower() == 'y':
        cfr.play_vs_human()


if __name__ == "__main__":
    main()

Testing Your Implementation

1. Verify Convergence

Track exploitability over iterations. Should decrease to near-zero:

def compute_exploitability(cfr):
    """Compute how exploitable the strategy is."""
    # Best response calculation (simplified)
    # Full implementation requires best response oracle
    pass

2. Check Known Equilibrium Values

For Kuhn Poker:

Game value: -1/18 ≈ -0.0556
P1 with Jack bets ~21% (bluff)
P1 with King bets ~33% (value)

3. Unit Tests

def test_terminal_payoffs():
    cfr = KuhnPokerCFR()

    # King vs Jack, both pass -> King wins 1
    assert cfr._cfr([2, 0], 'pp', 1, 1) == 1

    # Jack vs King, bet then fold -> Jack wins 1
    assert cfr._cfr([0, 2], 'bp', 1, 1) == 1

def test_convergence():
    cfr = KuhnPokerCFR()
    cfr.train(100000)

    # Jack should bluff about 1/3 of the time at equilibrium
    strategy = cfr._get_average_strategy('0')  # Jack, no history
    assert 0.15 < strategy[1] < 0.35  # Bet probability

Progression Path

Level 1: Kuhn Poker (Hours)

12 information sets
3 cards
Full traversal feasible
Goal: Understand CFR mechanics

Level 2: Leduc Poker (Days)

~300 information sets
6 cards (2 suits × 3 ranks)
2 betting rounds
Goal: Scale basic CFR

Level 3: Limit Texas Hold’em (Weeks)

~10^12 information sets
Requires abstraction
Use MCCFR (sampling)
Goal: Learn game abstraction

Level 4: No-Limit Texas Hold’em (Months)

~10^160 information sets
Requires:
- Card abstraction (hand bucketing)
- Action abstraction (limit bet sizes)
- MCCFR with external sampling
- Blueprint + real-time solving
Goal: Research-level implementation

Resources

Tutorials

AI Poker Tutorial - Comprehensive course
Vanilla CFR for Engineers - Clear Python walkthrough
labml.ai CFR - Annotated implementation

Libraries

OpenSpiel - Google DeepMind’s game research framework
pyCFR - Pure Python CFR for learning
RLCard - RL toolkit for card games
PokerRL - Deep RL for poker

Papers

Zinkevich et al. (2007) - Original CFR paper
Lanctot et al. (2009) - Monte Carlo CFR
Tammelin (2014) - CFR+
Brown & Sandholm (2019) - Pluribus

GitHub Implementations

simple-poker-cfr - C++ simplified Hold’em
poker-cfr - Rust Hold’em implementation
KuhnPoker - Python Kuhn with exploitability