Overview

This document compares two approaches to caching Expected Hand Strength (EHS) values during CFR poker training: random pre-computed caching vs. bucket-based caching. The analysis shows why random pre-computation is ineffective and how bucket-based abstraction dramatically improves cache hit rates.

1. State Space Complexity

State Space Sizes by Street

Poker State Space Growth

Preflop
1,326 hole card combinations

Flop
1,326 × C(50,3) = ~26 million
(1,326 holes × 19,600 boards)

Turn
1,326 × C(50,4) = ~305 million
(1,326 holes × 230,300 boards)

River
1,326 × C(50,5) = ~2.8 billion
(1,326 holes × 2,118,760 boards)

Coverage Analysis

StreetTotal Combinations100k Random CacheCoverage %
Flop~26 million100,0000.38%
Turn~305 million100,0000.033%
River~2.8 billion100,0000.0036%

Key Problem: Pre-computing 100k random samples provides less than 0.001% coverage on turn/river, leading to >99% cache miss rate during training.

2. Problem: Random Pre-computed EHS Approach

Why It Doesn’t Work

Random Pre-computed Cache

CFR Training Iteration 3

CFR Training Iteration 2

CFR Training Iteration 1

Training visits path A
hole: 7h8h, board: AsKdQc2h5s

Training visits path B
hole: AhKh, board: 7s8d9c3h2c

Training visits path C
hole: QsJs, board: 6d4h3c9sKc

Cache contains 100k random samples
❌ Path A not in cache
❌ Path B not in cache
❌ Path C not in cache

Cache MISS → Compute EHS

Cache MISS → Compute EHS

Cache MISS → Compute EHS

Flow: Current Random Pre-computation

Hit
<0.1%

Miss
>99.9%

Training encounters
hole + board

Hash lookup
in cache?

Return cached EHS

Compute EHS
~1000 simulations

Cache result
likely never reused

Continue training

Why cache misses are so high:

  1. Random paths: Each CFR iteration explores different random game trajectories
  2. Massive state space: 2.8B river combinations, only 100k cached
  3. No locality: Random pre-computation doesn’t align with training access patterns
  4. Single-use entries: Cached values are rarely visited again in subsequent iterations

3. Solution: Bucket-Based Caching

Abstraction Strategy

Instead of caching exact (hole, board) combinations, cache by abstracted buckets:

Result

Single Bucket

Many Exact States

Map to same bucket

Map to same bucket

Map to same bucket

7h8h + AsKdQc → EHS: 0.34

7s8s + AhKcQd → EHS: 0.35

7d8d + AcKhQs → EHS: 0.33

Preflop Bucket: 3 (mid suited connector)
Flop Bucket: 2 (gutshot + weak draws)
---
Bucket EHS: 0.34 (average)

3 different exact states
→ 1 cached bucket value
✅ High reuse across iterations

Bucket Space Reduction

StrategyPreflopFlopTurnRiverTotal Combinations
Exact States1,32626M305M2.8B~2.8 billion
Bucket Abstraction51010105,000
Reduction Factor265x2.6M x30.5M x280M x560,000x

Example bucket scheme:

  • 5 preflop buckets (pairs, suited, offsuit, etc.)
  • 10 flop buckets (made hands, draws, etc.)
  • 10 turn buckets (strength categories)
  • 10 river buckets (final strength)
  • Total: 5 × 10 × 10 × 10 = 5,000 bucket combinations

Flow: Bucket-Based Caching

Hit
>90%

Miss
<10%

Training encounters
hole + board

Get abstraction buckets
preflop + street bucket

Lookup bucket EHS
in cache?

Return bucket EHS

Compute bucket EHS
sample multiple hands
in bucket

Cache bucket value
reused across many states

Continue training

Why cache hits are high:

  1. Locality: Many exact states map to same bucket
  2. Small space: Only 5,000 buckets vs. billions of states
  3. Reuse: Same buckets visited repeatedly across iterations
  4. Coverage: 100% of state space maps to some bucket

4. Comparison: Random vs. Bucket-Based

Trade-off Decision

Bucket-Based Approach

Random Pre-computation Approach

❌ Coverage: <0.001%

❌ Hit Rate: <1%

❌ Wasted Computation: 99%+

✅ Precision: Exact per state

✅ Coverage: 100%

✅ Hit Rate: >90%

✅ Efficient: Shared computation

⚠️ Precision: Averaged per bucket

Accept slight precision loss
for massive speedup and coverage

Performance Impact

MetricRandom Pre-computationBucket-BasedImprovement
Cache Hit Rate<1%>90%90-100x
EHS Computations per Iteration~99% of visited states~10% of visited states10x reduction
Memory Footprint100k entries × 8 bytes = 800 KB5k entries × 8 bytes = 40 KB20x smaller
Training SpeedBaseline5-10x fasterSignificant

5. Implementation Considerations

Bucket-Based Caching Implementation

class BucketEHSCache:
def __init__(self):
# Cache: (preflop_bucket, street_bucket) -> EHS value
self.cache = {}
def get_ehs(self, hole_cards, board):
# Get abstraction buckets
preflop_bucket = get_preflop_bucket(hole_cards)
street_bucket = get_street_bucket(hole_cards, board)
key = (preflop_bucket, street_bucket)
if key in self.cache:
return self.cache[key] # Cache hit
# Cache miss: compute average EHS for all hands in bucket
bucket_ehs = compute_bucket_average_ehs(preflop_bucket, street_bucket, board)
self.cache[key] = bucket_ehs
return bucket_ehs

Key Design Decisions

  1. Bucket Granularity: Balance between precision and cache size

    • Fewer buckets = higher hit rate, lower precision
    • More buckets = lower hit rate, higher precision
    • Sweet spot: 5-10 buckets per street
  2. Precomputation Strategy:

    • Pre-compute all 5,000 bucket EHS values before training
    • Or compute lazily during training (cold start penalty)
  3. Bucket Definition:

    • Hand strength percentiles
    • Equity distributions
    • Potential (made hand + draws)

Profiling Update: The Real Story

UPDATE (2025-12-03): After profiling, we discovered the analysis above is partially incorrect.

What Profiling Revealed

Total _compute_ehs calls: 27,398
Actual GPU compute calls: 720
In-memory cache hits: 26,678 (97.4% hit rate!)

The existing in-memory _ehs_cache in PostflopAbstraction already achieves 97.4% hit rate!

Why Pre-computed Tables Don’t Help

FactorReality
In-memory cache hit rate97.4% already
Pre-computed table benefitMarginal (<0.1% improvement)
Real bottleneckPer-call GPU overhead (23.8ms each)

Corrected Understanding

What Actually Happens

What We Thought

Profiling showed

Pre-computed tables would
provide cache coverage

In-memory cache already
has 97.4% hit rate!

Only 720 EHS computations
actually run per 200 iterations

Bottleneck is per-call
GPU overhead (23.8ms)

Actual Optimization Opportunities

OptimizationExpected ImpactEffort
Reduce EHS samples (100→50)2x fasterLow
Batch EHS across tree branches3-5x fasterHigh
Optimize _compute_indices1.2x fasterMedium
Async GPU overlap1.5x fasterHigh

Conclusion

Random pre-computed EHS caching is ineffective because:

  • Minuscule coverage of massive state space Actually: In-memory cache already has 97.4% hit rate
  • Random access patterns don’t align with training Actually: Cache grows organically with training paths
  • >99% cache miss rate Actually: Only 2.6% miss rate

Bucket-based caching would NOT help significantly because:

  • In-memory cache already provides excellent coverage
  • The bottleneck is per-call GPU overhead, not cache misses
  • ~720 GPU calls × 23.8ms = 17.1s regardless of caching strategy

Actual recommendation: Focus on reducing per-call GPU overhead:

  1. Reduce Monte Carlo samples from 100 to 50
  2. Batch EHS queries across multiple tree branches
  3. Optimize combinatorial index computation