PARALLEL_APPROACH_EXPERIMENT
Date: 2025-11-21 Iteration: 8 (parallel multi-approach test) Goal: Determine which prompting strategy is most effective for complex agent tasks
Background
After 7 iterations achieving 0% validation, we discovered the issue wasn’t that the agent couldn’t implement logic - it was that we gave it the wrong rule. Iteration 7 proved the agent CAN implement complex conditional logic when properly instructed.
The real challenge: Finding the right way to communicate requirements to the agent.
The Hypothesis
Different prompting styles may have vastly different effectiveness rates. Instead of iterating sequentially to find the best approach, we’re testing 4 different strategies in parallel.
The Fix (Common to All Approaches)
All 4 variations include the corrected weight-splitting logic:
OLD RULE (Iteration 7):
IF service has "SUB 1LB" key → SPLITNEW RULE (All approaches):
IF service has "SUB 1LB" key AND has_multiple_keys → SPLITELSE → SINGLE sheetAdditional fix: Use multiple data sources
- SUB1 sheet: From “SUB 1LB” source
- 1LB/6LB/10LB sheets: From “ECONOMY”/“GROUND” sources
The 4 Approaches
Approach A: Algorithmic Rule Refinement
Strategy: Present logic as pseudocode/algorithm
Key Characteristics:
- Step-by-step algorithm with explicit INPUT/OUTPUT
- Python-like pseudocode format
- Concrete examples with execution trace
Hypothesis: Agents respond well to structured, algorithmic thinking
Prompt Style:
ALGORITHM: Determine if service should split by weight
INPUT: service_keys (list of strings)
STEP 1: Check if "SUB 1LB" exists has_sub_1lb = False for each key in service_keys: if "SUB 1LB" in key.upper(): has_sub_1lb = True
STEP 2: Check if multiple keys has_multiple_keys = (len(service_keys) > 1)
STEP 3: Make decision IF has_sub_1lb AND has_multiple_keys: RETURN: SPLIT ELSE: RETURN: SINGLEApproach B: Example-Driven Learning
Strategy: Show concrete reference examples, let agent infer pattern
Key Characteristics:
- 4 detailed examples (2 split, 2 non-split)
- Shows exact mapping data → expected output
- Includes “why” explanation for each
Hypothesis: Agents learn patterns better from examples than rules
Prompt Style:
EXAMPLE 1: Service that should NOT split Mapping: "DHL SM PARCEL GROUND" Keys: ["SUB 1LB 2025"] Output: SINGLE sheet "01_DHL_SMP_Ground_2025" Why: Only 1 key, keep all weights together
EXAMPLE 2: Service that SHOULD split Mapping: "DHL SM PARCEL PLUS GROUND" Keys: ["SUB 1LB 2025", "ECONOMY 2025", "GROUND RESIDENTIAL"] Output: 4 sheets (SUB1, 1LB, 6LB, 10LB) Why: Has 3 keys including "SUB 1LB"Approach C: Multi-Phase Agent
Strategy: Analysis phase first, then execution
Key Characteristics:
- Phase 1: Output classification plan for ALL mapping entries
- Phase 2: Execute the plan
- Forces agent to think before coding
Hypothesis: Explicit planning reduces implementation errors
Prompt Style:
PHASE 1: ANALYSIS (Do this first, output findings)
List all mapping entries with classification:- Entry 1: DHL SM PARCEL GROUND Keys: ["SUB 1LB 2025"] Classification: SINGLE Expected: 1 sheet
PHASE 2: EXECUTION (After analysis complete)Implement the classifications from Phase 1Approach D: Validation-Driven Iteration
Strategy: Built-in self-validation and error correction
Key Characteristics:
- Same corrected rule as others
- Includes validation step in prompt
- Instructions to check and fix output
Hypothesis: Self-validation catches errors before final output
Prompt Style:
def should_split_by_weight(service_keys): has_sub_1lb = any("SUB 1LB" in key.upper() for key in service_keys) has_multiple_keys = len(service_keys) > 1 return has_sub_1lb and has_multiple_keys
VALIDATION STEP (After generation):1. Count total sheets (should be ~60)2. Check split services have 4 sheets3. Check single services have 1 sheet4. If issues found: regenerate with fixesSuccess Metrics
Each approach will be measured on:
- Validation Score: X/59 sheets matching reference (target: 53+/59 = 90%)
- Sheet Count: Should generate 60 total (59 + summary)
- Correct Splits: 14 services × 4 sheets = 56 sheets for split services
- Correct Singles: ~3-5 single sheets for non-split services
- Sheet Naming: Follows
##_CARRIER_SERVICE_[WEIGHT]_2025format
Expected Outcomes
Best case: One approach achieves 90%+ validation Good case: Multiple approaches achieve 50%+, showing rule fix is correct Learning case: All fail similarly, revealing deeper issues
What This Tells Us
- Which prompting style works best for complex conditional logic
- Whether explicit examples beat algorithmic rules
- If planning phases reduce implementation errors
- Whether self-validation catches issues agents miss
This data will inform:
- Future agent development best practices
- How to structure prompts for maximum effectiveness
- Which strategies generalize to other complex extraction tasks
Timeline
- Start: 19:13 (all 4 agents launched in parallel)
- Expected completion: 19:18-19:23 (5-10 minutes)
- Results analysis: Immediately after completion
Files
- Agent definitions:
.claude/agents/rate-card-approach-{a,b,c,d}.md - Test script:
test-parallel-approaches.sh - Results directory:
Rate cards/runs/parallel-test-<timestamp>/ - Validation logs: Per-approach subdirectories with
validation.log