PARALLEL_APPROACH_EXPERIMENT

Date: 2025-11-21 Iteration: 8 (parallel multi-approach test) Goal: Determine which prompting strategy is most effective for complex agent tasks

Background

After 7 iterations achieving 0% validation, we discovered the issue wasn’t that the agent couldn’t implement logic - it was that we gave it the wrong rule. Iteration 7 proved the agent CAN implement complex conditional logic when properly instructed.

The real challenge: Finding the right way to communicate requirements to the agent.

The Hypothesis

Different prompting styles may have vastly different effectiveness rates. Instead of iterating sequentially to find the best approach, we’re testing 4 different strategies in parallel.

The Fix (Common to All Approaches)

All 4 variations include the corrected weight-splitting logic:

OLD RULE (Iteration 7):

IF service has "SUB 1LB" key → SPLIT

NEW RULE (All approaches):

IF service has "SUB 1LB" key AND has_multiple_keys → SPLIT
ELSE → SINGLE sheet

Additional fix: Use multiple data sources

SUB1 sheet: From “SUB 1LB” source
1LB/6LB/10LB sheets: From “ECONOMY”/“GROUND” sources

The 4 Approaches

Strategy: Present logic as pseudocode/algorithm

Key Characteristics:

Step-by-step algorithm with explicit INPUT/OUTPUT
Python-like pseudocode format
Concrete examples with execution trace

Hypothesis: Agents respond well to structured, algorithmic thinking

Prompt Style:

ALGORITHM: Determine if service should split by weight

INPUT: service_keys (list of strings)

STEP 1: Check if "SUB 1LB" exists
  has_sub_1lb = False
  for each key in service_keys:
    if "SUB 1LB" in key.upper():
      has_sub_1lb = True

STEP 2: Check if multiple keys
  has_multiple_keys = (len(service_keys) > 1)

STEP 3: Make decision
  IF has_sub_1lb AND has_multiple_keys:
    RETURN: SPLIT
  ELSE:
    RETURN: SINGLE

Approach B: Example-Driven Learning

Strategy: Show concrete reference examples, let agent infer pattern

Key Characteristics:

4 detailed examples (2 split, 2 non-split)
Shows exact mapping data → expected output
Includes “why” explanation for each

Hypothesis: Agents learn patterns better from examples than rules

Prompt Style:

EXAMPLE 1: Service that should NOT split
  Mapping: "DHL SM PARCEL GROUND"
  Keys: ["SUB 1LB 2025"]
  Output: SINGLE sheet "01_DHL_SMP_Ground_2025"
  Why: Only 1 key, keep all weights together

EXAMPLE 2: Service that SHOULD split
  Mapping: "DHL SM PARCEL PLUS GROUND"
  Keys: ["SUB 1LB 2025", "ECONOMY 2025", "GROUND RESIDENTIAL"]
  Output: 4 sheets (SUB1, 1LB, 6LB, 10LB)
  Why: Has 3 keys including "SUB 1LB"

Approach C: Multi-Phase Agent

Strategy: Analysis phase first, then execution

Key Characteristics:

Phase 1: Output classification plan for ALL mapping entries
Phase 2: Execute the plan
Forces agent to think before coding

Hypothesis: Explicit planning reduces implementation errors

Prompt Style:

PHASE 1: ANALYSIS (Do this first, output findings)

List all mapping entries with classification:
- Entry 1: DHL SM PARCEL GROUND
  Keys: ["SUB 1LB 2025"]
  Classification: SINGLE
  Expected: 1 sheet

PHASE 2: EXECUTION (After analysis complete)
Implement the classifications from Phase 1

Approach D: Validation-Driven Iteration

Strategy: Built-in self-validation and error correction

Key Characteristics:

Same corrected rule as others
Includes validation step in prompt
Instructions to check and fix output

Hypothesis: Self-validation catches errors before final output

Prompt Style:

def should_split_by_weight(service_keys):
    has_sub_1lb = any("SUB 1LB" in key.upper() for key in service_keys)
    has_multiple_keys = len(service_keys) > 1
    return has_sub_1lb and has_multiple_keys

VALIDATION STEP (After generation):
1. Count total sheets (should be ~60)
2. Check split services have 4 sheets
3. Check single services have 1 sheet
4. If issues found: regenerate with fixes

Success Metrics

Each approach will be measured on:

Validation Score: X/59 sheets matching reference (target: 53+/59 = 90%)
Sheet Count: Should generate 60 total (59 + summary)
Correct Splits: 14 services × 4 sheets = 56 sheets for split services
Correct Singles: ~3-5 single sheets for non-split services
Sheet Naming: Follows ##_CARRIER_SERVICE_[WEIGHT]_2025 format

Expected Outcomes

Best case: One approach achieves 90%+ validation Good case: Multiple approaches achieve 50%+, showing rule fix is correct Learning case: All fail similarly, revealing deeper issues

What This Tells Us

Which prompting style works best for complex conditional logic
Whether explicit examples beat algorithmic rules
If planning phases reduce implementation errors
Whether self-validation catches issues agents miss

This data will inform:

Future agent development best practices
How to structure prompts for maximum effectiveness
Which strategies generalize to other complex extraction tasks

Timeline

Start: 19:13 (all 4 agents launched in parallel)
Expected completion: 19:18-19:23 (5-10 minutes)
Results analysis: Immediately after completion

Files

Agent definitions: .claude/agents/rate-card-approach-{a,b,c,d}.md
Test script: test-parallel-approaches.sh
Results directory: Rate cards/runs/parallel-test-<timestamp>/
Validation logs: Per-approach subdirectories with validation.log

PARALLEL_APPROACH_EXPERIMENT

Background

The Hypothesis

The Fix (Common to All Approaches)

The 4 Approaches

Approach A: Algorithmic Rule Refinement

Approach B: Example-Driven Learning

Approach C: Multi-Phase Agent

Approach D: Validation-Driven Iteration

Success Metrics

Expected Outcomes

What This Tells Us

Timeline

Files