Date: 2025-11-21 Iteration: 8 (parallel multi-approach test) Goal: Determine which prompting strategy is most effective for complex agent tasks

Background

After 7 iterations achieving 0% validation, we discovered the issue wasn’t that the agent couldn’t implement logic - it was that we gave it the wrong rule. Iteration 7 proved the agent CAN implement complex conditional logic when properly instructed.

The real challenge: Finding the right way to communicate requirements to the agent.

The Hypothesis

Different prompting styles may have vastly different effectiveness rates. Instead of iterating sequentially to find the best approach, we’re testing 4 different strategies in parallel.

The Fix (Common to All Approaches)

All 4 variations include the corrected weight-splitting logic:

OLD RULE (Iteration 7):

IF service has "SUB 1LB" key → SPLIT

NEW RULE (All approaches):

IF service has "SUB 1LB" key AND has_multiple_keys → SPLIT
ELSE → SINGLE sheet

Additional fix: Use multiple data sources

  • SUB1 sheet: From “SUB 1LB” source
  • 1LB/6LB/10LB sheets: From “ECONOMY”/“GROUND” sources

The 4 Approaches

Approach A: Algorithmic Rule Refinement

Strategy: Present logic as pseudocode/algorithm

Key Characteristics:

  • Step-by-step algorithm with explicit INPUT/OUTPUT
  • Python-like pseudocode format
  • Concrete examples with execution trace

Hypothesis: Agents respond well to structured, algorithmic thinking

Prompt Style:

ALGORITHM: Determine if service should split by weight
INPUT: service_keys (list of strings)
STEP 1: Check if "SUB 1LB" exists
has_sub_1lb = False
for each key in service_keys:
if "SUB 1LB" in key.upper():
has_sub_1lb = True
STEP 2: Check if multiple keys
has_multiple_keys = (len(service_keys) > 1)
STEP 3: Make decision
IF has_sub_1lb AND has_multiple_keys:
RETURN: SPLIT
ELSE:
RETURN: SINGLE

Approach B: Example-Driven Learning

Strategy: Show concrete reference examples, let agent infer pattern

Key Characteristics:

  • 4 detailed examples (2 split, 2 non-split)
  • Shows exact mapping data → expected output
  • Includes “why” explanation for each

Hypothesis: Agents learn patterns better from examples than rules

Prompt Style:

EXAMPLE 1: Service that should NOT split
Mapping: "DHL SM PARCEL GROUND"
Keys: ["SUB 1LB 2025"]
Output: SINGLE sheet "01_DHL_SMP_Ground_2025"
Why: Only 1 key, keep all weights together
EXAMPLE 2: Service that SHOULD split
Mapping: "DHL SM PARCEL PLUS GROUND"
Keys: ["SUB 1LB 2025", "ECONOMY 2025", "GROUND RESIDENTIAL"]
Output: 4 sheets (SUB1, 1LB, 6LB, 10LB)
Why: Has 3 keys including "SUB 1LB"

Approach C: Multi-Phase Agent

Strategy: Analysis phase first, then execution

Key Characteristics:

  • Phase 1: Output classification plan for ALL mapping entries
  • Phase 2: Execute the plan
  • Forces agent to think before coding

Hypothesis: Explicit planning reduces implementation errors

Prompt Style:

PHASE 1: ANALYSIS (Do this first, output findings)
List all mapping entries with classification:
- Entry 1: DHL SM PARCEL GROUND
Keys: ["SUB 1LB 2025"]
Classification: SINGLE
Expected: 1 sheet
PHASE 2: EXECUTION (After analysis complete)
Implement the classifications from Phase 1

Approach D: Validation-Driven Iteration

Strategy: Built-in self-validation and error correction

Key Characteristics:

  • Same corrected rule as others
  • Includes validation step in prompt
  • Instructions to check and fix output

Hypothesis: Self-validation catches errors before final output

Prompt Style:

def should_split_by_weight(service_keys):
has_sub_1lb = any("SUB 1LB" in key.upper() for key in service_keys)
has_multiple_keys = len(service_keys) > 1
return has_sub_1lb and has_multiple_keys
VALIDATION STEP (After generation):
1. Count total sheets (should be ~60)
2. Check split services have 4 sheets
3. Check single services have 1 sheet
4. If issues found: regenerate with fixes

Success Metrics

Each approach will be measured on:

  1. Validation Score: X/59 sheets matching reference (target: 53+/59 = 90%)
  2. Sheet Count: Should generate 60 total (59 + summary)
  3. Correct Splits: 14 services × 4 sheets = 56 sheets for split services
  4. Correct Singles: ~3-5 single sheets for non-split services
  5. Sheet Naming: Follows #[WEIGHT]_2025 format

Expected Outcomes

Best case: One approach achieves 90%+ validation Good case: Multiple approaches achieve 50%+, showing rule fix is correct Learning case: All fail similarly, revealing deeper issues

What This Tells Us

  • Which prompting style works best for complex conditional logic
  • Whether explicit examples beat algorithmic rules
  • If planning phases reduce implementation errors
  • Whether self-validation catches issues agents miss

This data will inform:

  • Future agent development best practices
  • How to structure prompts for maximum effectiveness
  • Which strategies generalize to other complex extraction tasks

Timeline

  • Start: 19:13 (all 4 agents launched in parallel)
  • Expected completion: 19:18-19:23 (5-10 minutes)
  • Results analysis: Immediately after completion

Files

  • Agent definitions: .claude/agents/rate-card-approach-{a,b,c,d}.md
  • Test script: test-parallel-approaches.sh
  • Results directory: Rate cards/runs/parallel-test-<timestamp>/
  • Validation logs: Per-approach subdirectories with validation.log