automated-reasoning-implementation-guide

Amazon’s Approach Breakdown & Replication Guide

Part 1: What Amazon Implemented
- System Architecture
Part 2: Core Components
Part 3: How to Replicate It
Part 4: Technical Deep Dive
- How Claims Are Extracted from Natural Language
- SMT Encoding Patterns
Part 5: Open Source Tools Stack
- Required Tools
- Recommended Stack for LLM Verification
Part 6: Complete Working Example
- Scenario: E-commerce Price Verification
Part 7: Scaling to Production
- Challenges
- Best Practices
Part 8: Resources & Next Steps
Summary: How to Get Started Today

Part 1: What Amazon Implemented

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                   User Query                                 │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│            LLM (Claude, GPT, etc.)                           │
│            Generates Response                                │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│         Automated Reasoning Checks                           │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  1. Extract Logical Representation                   │  │
│  │  2. Encode as Mathematical Formula (SMT)             │  │
│  │  3. Check Against Policy Rules (Z3/SMT Solver)       │  │
│  │  4. Return: Valid / Invalid / No Data                │  │
│  └──────────────────────────────────────────────────────┘  │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
                    ┌───────┴────────┐
                    │                │
                ✅ Valid        ❌ Invalid
                    │                │
            Return Response    Block/Regenerate

Part 2: Core Components

1. Zelkova (AWS Internal Tool)

Purpose: Policy analysis using automated reasoning

What it does:

Analyzes IAM policies, S3 policies, VPC security rules
Reduces policies to mathematical formulas
Uses SMT solvers (Z3, cvc5) for verification

Technical Approach:

Policy → Mathematical Formula → SMT Solver → Proof/Counterexample

Encoding Theories Used:

String theory (for text matching)
Regular expressions
Bit vectors (for permissions)
Integer comparisons
Set theory

2. Automated Reasoning Checks (Bedrock Guardrails)

Purpose: Prevent LLM hallucinations with 99% verification accuracy

Technical Implementation:

Step 1: Create Automated Reasoning Policy

Source Documents (Knowledge Base)
         ↓
Extract Variables & Logical Rules
         ↓
Encode as Formal Logic (SMT Format)
         ↓
Automated Reasoning Policy

Example Policy Structure:

; Example: Verify product pricing rules
(declare-const price Real)
(declare-const discount Real)
(declare-const final_price Real)

; Rules
(assert (>= price 0))
(assert (and (>= discount 0) (<= discount 100)))
(assert (= final_price (* price (- 1 (/ discount 100)))))

; Verification query
(check-sat)

Step 2: Integration Flow

1. User Query → LLM
2. LLM Response → Extract Claims
3. Claims → Convert to Logical Formulas
4. Formulas + Policy → SMT Solver (Z3)
5. Solver Output → Valid/Invalid/No Data
6. Result → Allow/Block/Request Clarification

Step 3: Validation Process

# Pseudocode for validation
def validate_llm_response(response, policy):
    # Extract claims from natural language
    claims = extract_claims(response)

    # Convert to logical formulas
    formulas = []
    for claim in claims:
        formula = nl_to_logic(claim)  # Natural language → Logic
        formulas.append(formula)

    # Check against policy using SMT solver
    for formula in formulas:
        result = z3_solver.check(formula, policy)
        if result == "UNSAT":
            return {"valid": False, "claim": claim, "reason": "Contradicts policy"}
        elif result == "UNKNOWN":
            return {"valid": "uncertain", "claim": claim, "reason": "Insufficient data"}

    return {"valid": True}

3. Dafny (Verification-Aware Programming)

Purpose: Write provably correct code with mathematical guarantees

What it does:

Programming language with built-in verification
Automatically generates proofs of correctness
Used for AWS access management, storage, cryptography

Example:

method Max(a: int, b: int) returns (m: int)
    ensures m >= a && m >= b
    ensures m == a || m == b
{
    if a > b {
        return a;
    } else {
        return b;
    }
}

Dafny automatically proves that this method satisfies its postconditions.

Part 3: How to Replicate It

Option 1: Minimal Python Implementation (Using Z3)

Tools Needed:

Z3 SMT Solver (open source, Microsoft)
Python 3.8+
LangChain or similar LLM framework

Installation:

pip install z3-solver langchain openai

Implementation:

from z3 import *
import openai
from typing import List, Dict, Any

class AutomatedReasoningChecker:
    def __init__(self, knowledge_base: Dict[str, Any]):
        """
        knowledge_base: Dictionary of facts and rules
        Example:
        {
            "capitals": {"France": "Paris", "UK": "London"},
            "pricing_rules": {"min_price": 0, "max_discount": 100}
        }
        """
        self.kb = knowledge_base
        self.solver = Solver()
        self._encode_knowledge_base()

    def _encode_knowledge_base(self):
        """Convert knowledge base to SMT constraints"""
        # Example: Encode pricing rules
        if "pricing_rules" in self.kb:
            price = Real('price')
            discount = Real('discount')

            self.solver.add(price >= self.kb["pricing_rules"]["min_price"])
            self.solver.add(discount >= 0)
            self.solver.add(discount <= self.kb["pricing_rules"]["max_discount"])

    def extract_claim(self, llm_response: str) -> List[str]:
        """
        Extract verifiable claims from LLM response
        In production, this would use NLP/LLM to extract structured claims
        """
        # Simplified: Use another LLM call to extract claims
        extraction_prompt = f"""
        Extract factual claims from this text as JSON:
        Text: {llm_response}

        Format:
        {{"claims": [{{"type": "fact", "subject": "...", "predicate": "...", "object": "..."}}]}}
        """

        # Call LLM to extract structured claims
        # (In practice, use GPT-4 or Claude with function calling)
        return self._call_llm_for_extraction(extraction_prompt)

    def verify_claim(self, claim: Dict[str, str]) -> Dict[str, Any]:
        """
        Verify a single claim against knowledge base
        """
        claim_type = claim.get("type")

        if claim_type == "capital":
            # Example: "Paris is the capital of France"
            country = claim["subject"]
            capital = claim["object"]

            if country in self.kb.get("capitals", {}):
                expected = self.kb["capitals"][country]
                if capital == expected:
                    return {"valid": True, "proof": f"KB confirms {capital} is capital of {country}"}
                else:
                    return {"valid": False, "reason": f"KB says capital is {expected}, not {capital}"}
            else:
                return {"valid": "unknown", "reason": "No data for this country"}

        elif claim_type == "pricing":
            # Example: "The final price is $80 after 20% discount on $100"
            original = Real('original_price')
            disc = Real('discount_pct')
            final = Real('final_price')

            # Add claim constraints
            self.solver.push()
            self.solver.add(original == claim["original_price"])
            self.solver.add(disc == claim["discount_pct"])
            self.solver.add(final == claim["final_price"])

            # Add correctness constraint
            self.solver.add(final == original * (1 - disc / 100))

            # Check satisfiability
            result = self.solver.check()
            self.solver.pop()

            if result == sat:
                return {"valid": True, "proof": "Calculation verified by SMT solver"}
            else:
                return {"valid": False, "reason": "Math doesn't check out"}

        return {"valid": "unknown", "reason": "Unsupported claim type"}

    def validate_llm_response(self, llm_response: str) -> Dict[str, Any]:
        """
        Main validation function
        """
        claims = self.extract_claim(llm_response)
        results = []

        for claim in claims:
            verification = self.verify_claim(claim)
            results.append({
                "claim": claim,
                "verification": verification
            })

        # Aggregate results
        all_valid = all(r["verification"]["valid"] == True for r in results)

        return {
            "overall_valid": all_valid,
            "claim_results": results
        }

# Usage Example
kb = {
    "capitals": {"France": "Paris", "UK": "London", "Spain": "Madrid"},
    "pricing_rules": {"min_price": 0, "max_discount": 100}
}

checker = AutomatedReasoningChecker(kb)

# LLM response to verify
llm_response = "The capital of France is Paris."

# Validate
result = checker.validate_llm_response(llm_response)
print(result)

Option 2: Using Existing Neurosymbolic Frameworks

Logic-LM (Research Implementation)

Combines LLMs with multiple symbolic solvers:

# Using Logic-LM framework
from logic_lm import LogicLM

# Initialize with solvers
llm = LogicLM(
    llm_backend="gpt-4",
    solvers={
        "prolog": "pyke",
        "smt": "z3",
        "fol": "prover9"
    }
)

# Query with verification
response = llm.query(
    prompt="What is the capital of France?",
    verify=True,
    knowledge_base="geography_facts.pl"
)

print(response.answer)        # "Paris"
print(response.verified)      # True
print(response.proof)         # SMT proof or Prolog derivation

Imandra (Commercial Tool)

Neurosymbolic AI platform with LLM integration:

from imandra import verify

@verify
def calculate_discount(price, discount_pct):
    """
    Calculates final price after discount

    Requires:
    - price >= 0
    - 0 <= discount_pct <= 100

    Ensures:
    - result >= 0
    - result <= price
    """
    return price * (1 - discount_pct / 100)

# Imandra automatically generates formal proofs
# that this function satisfies its specification

Option 3: AWS Bedrock Integration (Production-Ready)

If you’re using AWS, integrate directly with Bedrock Guardrails:

import boto3

bedrock = boto3.client('bedrock-runtime')

# Create Automated Reasoning policy
policy_response = bedrock.create_automated_reasoning_policy(
    name="product-pricing-policy",
    description="Validates product pricing calculations",
    sourceDocuments=[
        {
            "s3Uri": "s3://my-bucket/pricing-rules.pdf",
            "documentType": "PDF"
        }
    ]
)

policy_id = policy_response['policyId']

# Apply policy to guardrail
guardrail_response = bedrock.create_guardrail(
    name="my-llm-guardrail",
    automatedReasoningChecks=[
        {
            "policyId": policy_id,
            "action": "BLOCK"  # or "WARN"
        }
    ]
)

# Use with LLM
response = bedrock.invoke_model(
    modelId="anthropic.claude-v3",
    body={
        "prompt": "Calculate 20% discount on $100",
        "guardrailIdentifier": guardrail_response['guardrailId']
    }
)

# Response includes validation results
print(response['automatedReasoningResults'])

Part 4: Technical Deep Dive

How Claims Are Extracted from Natural Language

Pipeline:

Natural Language → Parse → Extract Entities → Build Logical Form → Encode as SMT

Example:

Input: “After applying a 20% discount on a 80.”

Step 1: Parse & Extract

{
  "entities": {
    "original_price": 100,
    "discount_percentage": 20,
    "final_price": 80
  },
  "relations": [
    {"type": "applies_to", "source": "discount", "target": "original_price"},
    {"type": "results_in", "source": "discount", "target": "final_price"}
  ]
}

Step 2: Convert to Logical Form

final_price = original_price × (1 - discount_percentage / 100)

Step 3: Encode as SMT

(declare-const original_price Real)
(declare-const discount_percentage Real)
(declare-const final_price Real)

(assert (= original_price 100.0))
(assert (= discount_percentage 20.0))
(assert (= final_price 80.0))

; Verification constraint
(assert (= final_price (* original_price (- 1.0 (/ discount_percentage 100.0)))))

(check-sat)  ; Returns: sat (valid) or unsat (invalid)

Step 4: Run Z3

from z3 import *

original = Real('original_price')
discount = Real('discount_pct')
final = Real('final_price')

solver = Solver()
solver.add(original == 100)
solver.add(discount == 20)
solver.add(final == 80)
solver.add(final == original * (1 - discount / 100))

if solver.check() == sat:
    print("✅ Claim is mathematically valid")
    print(solver.model())
else:
    print("❌ Claim is mathematically invalid")

SMT Encoding Patterns

Pattern 1: Factual Claims (Lookup)

# Knowledge base as Z3 constraints
capitals = {
    "France": "Paris",
    "UK": "London"
}

def verify_capital(country, claimed_capital):
    if country in capitals:
        return capitals[country] == claimed_capital
    else:
        return "UNKNOWN"

Pattern 2: Numerical Calculations

def verify_math(expression, claimed_result):
    solver = Solver()

    # Parse expression and encode
    vars = extract_variables(expression)
    for var in vars:
        solver.add(var.constraint)

    result = Real('result')
    solver.add(result == claimed_result)
    solver.add(result == eval_expression(expression))

    return solver.check() == sat

Pattern 3: Logical Consistency

def verify_logical_consistency(statements):
    """
    Check if a set of statements is logically consistent
    """
    solver = Solver()

    for stmt in statements:
        solver.add(encode_statement(stmt))

    # If SAT, statements are consistent
    # If UNSAT, there's a contradiction
    return solver.check() == sat

Part 5: Open Source Tools Stack

Required Tools

Tool	Purpose	Installation
Z3	SMT solver (core verification)	`pip install z3-solver`
Dafny	Verification-aware programming	Download from GitHub
Coq	Theorem prover	`opam install coq`
Lean 4	Modern proof assistant	Download from lean-lang.org
cvc5	Alternative SMT solver	Build from source

Recommended Stack for LLM Verification

LLM Layer:        GPT-4 / Claude / Llama
                         ↓
Extraction:       LangChain + Function Calling
                         ↓
Logic Encoding:   Python + Z3/cvc5
                         ↓
Verification:     SMT Solver (Z3)
                         ↓
Result:           Valid / Invalid / Unknown

Part 6: Complete Working Example

Scenario: E-commerce Price Verification

Goal: Verify LLM-generated product pricing is mathematically correct

from z3 import *
from openai import OpenAI

class PriceVerifier:
    def __init__(self):
        self.solver = Solver()
        self.llm = OpenAI()

    def ask_llm(self, question):
        """Get LLM response"""
        response = self.llm.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": question}]
        )
        return response.choices[0].message.content

    def extract_pricing_claim(self, llm_response):
        """Extract structured pricing data from LLM response"""
        extraction_prompt = f"""
        Extract pricing information as JSON:

        Text: {llm_response}

        Return JSON with these fields:
        {{
            "original_price": <number>,
            "discount_percentage": <number>,
            "final_price": <number>
        }}
        """

        result = self.llm.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": extraction_prompt}],
            response_format={"type": "json_object"}
        )

        import json
        return json.loads(result.choices[0].message.content)

    def verify_pricing(self, claim):
        """Verify pricing calculation using Z3"""
        original = Real('original_price')
        discount = Real('discount_pct')
        final = Real('final_price')

        solver = Solver()

        # Add claim values
        solver.add(original == claim['original_price'])
        solver.add(discount == claim['discount_percentage'])
        solver.add(final == claim['final_price'])

        # Add correctness constraint
        solver.add(final == original * (1 - discount / 100))

        # Add business rules
        solver.add(original >= 0)
        solver.add(discount >= 0)
        solver.add(discount <= 100)
        solver.add(final >= 0)

        if solver.check() == sat:
            return {
                "valid": True,
                "proof": "Mathematical verification successful",
                "model": solver.model()
            }
        else:
            return {
                "valid": False,
                "reason": "Calculation violates pricing rules or math is incorrect"
            }

    def verified_query(self, question):
        """Ask LLM and verify the response"""
        # Get LLM response
        llm_answer = self.ask_llm(question)

        # Extract claim
        claim = self.extract_pricing_claim(llm_answer)

        # Verify
        verification = self.verify_pricing(claim)

        return {
            "question": question,
            "llm_response": llm_answer,
            "extracted_claim": claim,
            "verification": verification,
            "final_answer": llm_answer if verification["valid"] else "BLOCKED: Failed verification"
        }

# Usage
verifier = PriceVerifier()

result = verifier.verified_query(
    "A product costs $100. After a 20% discount, what is the final price?"
)

print("LLM Answer:", result["llm_response"])
print("Verified:", result["verification"]["valid"])
print("Final Answer:", result["final_answer"])

Output:

LLM Answer: After applying a 20% discount on a $100 product, the final price is $80.
Verified: True
Final Answer: After applying a 20% discount on a $100 product, the final price is $80.

If LLM made a mistake:

LLM Answer: After applying a 20% discount on a $100 product, the final price is $75.
Verified: False
Final Answer: BLOCKED: Failed verification

Part 7: Scaling to Production

Challenges

Latency: SMT solving can be slow for complex formulas
- Solution: Cache common queries, use incremental solving
Natural Language → Logic Gap: Hard to extract perfect logical forms
- Solution: Use LLMs for extraction, validate with simpler claims first
Knowledge Base Maintenance: Keeping KB up to date
- Solution: Auto-extract rules from documentation, version control policies

Best Practices

Start Simple: Begin with numeric/factual verification
Iterative Refinement: Build up complexity gradually
Hybrid Approach: Use probabilistic + symbolic together
Human in Loop: Flag uncertain cases for human review

Part 8: Resources & Next Steps

Learning Path

Week 1: Learn Z3 basics
- Tutorial: https://ericpony.github.io/z3py-tutorial/guide-examples.htm
Week 2: Build simple fact checker
- Verify arithmetic claims from LLM
Week 3: Add natural language extraction
- Use LLM to extract structured claims
Week 4: Integrate with production LLM
- Build API wrapper with verification layer

Code Repositories

Z3 Python: https://github.com/Z3Prover/z3
Logic-LM: https://github.com/teacherpeterpan/Logic-LLM
AWS Dafny Tools: https://github.com/awslabs/aws-verification-model-for-libcrypto
Neurosymbolic Examples: https://github.com/LAMDASZ-ML/Awesome-Neuro-Symbolic-Learning-with-LLM

Papers

“Semantic-based Automated Reasoning for AWS Access Policies using SMT” (Amazon, 2018)
“Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach” (2025)
“MATH-VF: Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving” (2025)

Summary: How to Get Started Today

# 1. Install Z3
pip install z3-solver

# 2. Copy the PriceVerifier example above

# 3. Test with your LLM API

# 4. Extend to your domain (e.g., legal, medical, financial)

# 5. Iterate and improve

Key Insight: You don’t need Amazon’s full infrastructure. Start with:

Z3 for verification
LLM for natural language understanding
Python to glue them together

This gives you 80% of the value with 20% of the complexity!

Last Updated: October 31, 2025