automated-reasoning-implementation-guide
Amazon’s Approach Breakdown & Replication Guide
Table of Contents
- Part 1: What Amazon Implemented
- Part 2: Core Components
- Part 3: How to Replicate It
- Part 4: Technical Deep Dive
- Part 5: Open Source Tools Stack
- Part 6: Complete Working Example
- Part 7: Scaling to Production
- Part 8: Resources & Next Steps
- Summary: How to Get Started Today
Part 1: What Amazon Implemented
System Architecture
┌─────────────────────────────────────────────────────────────┐│ User Query │└───────────────────────────┬─────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ LLM (Claude, GPT, etc.) ││ Generates Response │└───────────────────────────┬─────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Automated Reasoning Checks ││ ┌──────────────────────────────────────────────────────┐ ││ │ 1. Extract Logical Representation │ ││ │ 2. Encode as Mathematical Formula (SMT) │ ││ │ 3. Check Against Policy Rules (Z3/SMT Solver) │ ││ │ 4. Return: Valid / Invalid / No Data │ ││ └──────────────────────────────────────────────────────┘ │└───────────────────────────┬─────────────────────────────────┘ │ ▼ ┌───────┴────────┐ │ │ ✅ Valid ❌ Invalid │ │ Return Response Block/RegeneratePart 2: Core Components
1. Zelkova (AWS Internal Tool)
Purpose: Policy analysis using automated reasoning
What it does:
- Analyzes IAM policies, S3 policies, VPC security rules
- Reduces policies to mathematical formulas
- Uses SMT solvers (Z3, cvc5) for verification
Technical Approach:
Policy → Mathematical Formula → SMT Solver → Proof/CounterexampleEncoding Theories Used:
- String theory (for text matching)
- Regular expressions
- Bit vectors (for permissions)
- Integer comparisons
- Set theory
2. Automated Reasoning Checks (Bedrock Guardrails)
Purpose: Prevent LLM hallucinations with 99% verification accuracy
Technical Implementation:
Step 1: Create Automated Reasoning Policy
Source Documents (Knowledge Base) ↓Extract Variables & Logical Rules ↓Encode as Formal Logic (SMT Format) ↓Automated Reasoning PolicyExample Policy Structure:
; Example: Verify product pricing rules(declare-const price Real)(declare-const discount Real)(declare-const final_price Real)
; Rules(assert (>= price 0))(assert (and (>= discount 0) (<= discount 100)))(assert (= final_price (* price (- 1 (/ discount 100)))))
; Verification query(check-sat)Step 2: Integration Flow
1. User Query → LLM2. LLM Response → Extract Claims3. Claims → Convert to Logical Formulas4. Formulas + Policy → SMT Solver (Z3)5. Solver Output → Valid/Invalid/No Data6. Result → Allow/Block/Request ClarificationStep 3: Validation Process
# Pseudocode for validationdef validate_llm_response(response, policy): # Extract claims from natural language claims = extract_claims(response)
# Convert to logical formulas formulas = [] for claim in claims: formula = nl_to_logic(claim) # Natural language → Logic formulas.append(formula)
# Check against policy using SMT solver for formula in formulas: result = z3_solver.check(formula, policy) if result == "UNSAT": return {"valid": False, "claim": claim, "reason": "Contradicts policy"} elif result == "UNKNOWN": return {"valid": "uncertain", "claim": claim, "reason": "Insufficient data"}
return {"valid": True}3. Dafny (Verification-Aware Programming)
Purpose: Write provably correct code with mathematical guarantees
What it does:
- Programming language with built-in verification
- Automatically generates proofs of correctness
- Used for AWS access management, storage, cryptography
Example:
method Max(a: int, b: int) returns (m: int) ensures m >= a && m >= b ensures m == a || m == b{ if a > b { return a; } else { return b; }}Dafny automatically proves that this method satisfies its postconditions.
Part 3: How to Replicate It
Option 1: Minimal Python Implementation (Using Z3)
Tools Needed:
- Z3 SMT Solver (open source, Microsoft)
- Python 3.8+
- LangChain or similar LLM framework
Installation:
pip install z3-solver langchain openaiImplementation:
from z3 import *import openaifrom typing import List, Dict, Any
class AutomatedReasoningChecker: def __init__(self, knowledge_base: Dict[str, Any]): """ knowledge_base: Dictionary of facts and rules Example: { "capitals": {"France": "Paris", "UK": "London"}, "pricing_rules": {"min_price": 0, "max_discount": 100} } """ self.kb = knowledge_base self.solver = Solver() self._encode_knowledge_base()
def _encode_knowledge_base(self): """Convert knowledge base to SMT constraints""" # Example: Encode pricing rules if "pricing_rules" in self.kb: price = Real('price') discount = Real('discount')
self.solver.add(price >= self.kb["pricing_rules"]["min_price"]) self.solver.add(discount >= 0) self.solver.add(discount <= self.kb["pricing_rules"]["max_discount"])
def extract_claim(self, llm_response: str) -> List[str]: """ Extract verifiable claims from LLM response In production, this would use NLP/LLM to extract structured claims """ # Simplified: Use another LLM call to extract claims extraction_prompt = f""" Extract factual claims from this text as JSON: Text: {llm_response}
Format: {{"claims": [{{"type": "fact", "subject": "...", "predicate": "...", "object": "..."}}]}} """
# Call LLM to extract structured claims # (In practice, use GPT-4 or Claude with function calling) return self._call_llm_for_extraction(extraction_prompt)
def verify_claim(self, claim: Dict[str, str]) -> Dict[str, Any]: """ Verify a single claim against knowledge base """ claim_type = claim.get("type")
if claim_type == "capital": # Example: "Paris is the capital of France" country = claim["subject"] capital = claim["object"]
if country in self.kb.get("capitals", {}): expected = self.kb["capitals"][country] if capital == expected: return {"valid": True, "proof": f"KB confirms {capital} is capital of {country}"} else: return {"valid": False, "reason": f"KB says capital is {expected}, not {capital}"} else: return {"valid": "unknown", "reason": "No data for this country"}
elif claim_type == "pricing": # Example: "The final price is $80 after 20% discount on $100" original = Real('original_price') disc = Real('discount_pct') final = Real('final_price')
# Add claim constraints self.solver.push() self.solver.add(original == claim["original_price"]) self.solver.add(disc == claim["discount_pct"]) self.solver.add(final == claim["final_price"])
# Add correctness constraint self.solver.add(final == original * (1 - disc / 100))
# Check satisfiability result = self.solver.check() self.solver.pop()
if result == sat: return {"valid": True, "proof": "Calculation verified by SMT solver"} else: return {"valid": False, "reason": "Math doesn't check out"}
return {"valid": "unknown", "reason": "Unsupported claim type"}
def validate_llm_response(self, llm_response: str) -> Dict[str, Any]: """ Main validation function """ claims = self.extract_claim(llm_response) results = []
for claim in claims: verification = self.verify_claim(claim) results.append({ "claim": claim, "verification": verification })
# Aggregate results all_valid = all(r["verification"]["valid"] == True for r in results)
return { "overall_valid": all_valid, "claim_results": results }
# Usage Examplekb = { "capitals": {"France": "Paris", "UK": "London", "Spain": "Madrid"}, "pricing_rules": {"min_price": 0, "max_discount": 100}}
checker = AutomatedReasoningChecker(kb)
# LLM response to verifyllm_response = "The capital of France is Paris."
# Validateresult = checker.validate_llm_response(llm_response)print(result)Option 2: Using Existing Neurosymbolic Frameworks
Logic-LM (Research Implementation)
Combines LLMs with multiple symbolic solvers:
# Using Logic-LM frameworkfrom logic_lm import LogicLM
# Initialize with solversllm = LogicLM( llm_backend="gpt-4", solvers={ "prolog": "pyke", "smt": "z3", "fol": "prover9" })
# Query with verificationresponse = llm.query( prompt="What is the capital of France?", verify=True, knowledge_base="geography_facts.pl")
print(response.answer) # "Paris"print(response.verified) # Trueprint(response.proof) # SMT proof or Prolog derivationImandra (Commercial Tool)
Neurosymbolic AI platform with LLM integration:
from imandra import verify
@verifydef calculate_discount(price, discount_pct): """ Calculates final price after discount
Requires: - price >= 0 - 0 <= discount_pct <= 100
Ensures: - result >= 0 - result <= price """ return price * (1 - discount_pct / 100)
# Imandra automatically generates formal proofs# that this function satisfies its specificationOption 3: AWS Bedrock Integration (Production-Ready)
If you’re using AWS, integrate directly with Bedrock Guardrails:
import boto3
bedrock = boto3.client('bedrock-runtime')
# Create Automated Reasoning policypolicy_response = bedrock.create_automated_reasoning_policy( name="product-pricing-policy", description="Validates product pricing calculations", sourceDocuments=[ { "s3Uri": "s3://my-bucket/pricing-rules.pdf", "documentType": "PDF" } ])
policy_id = policy_response['policyId']
# Apply policy to guardrailguardrail_response = bedrock.create_guardrail( name="my-llm-guardrail", automatedReasoningChecks=[ { "policyId": policy_id, "action": "BLOCK" # or "WARN" } ])
# Use with LLMresponse = bedrock.invoke_model( modelId="anthropic.claude-v3", body={ "prompt": "Calculate 20% discount on $100", "guardrailIdentifier": guardrail_response['guardrailId'] })
# Response includes validation resultsprint(response['automatedReasoningResults'])Part 4: Technical Deep Dive
How Claims Are Extracted from Natural Language
Pipeline:
Natural Language → Parse → Extract Entities → Build Logical Form → Encode as SMTExample:
Input: “After applying a 20% discount on a
Step 1: Parse & Extract
{ "entities": { "original_price": 100, "discount_percentage": 20, "final_price": 80 }, "relations": [ {"type": "applies_to", "source": "discount", "target": "original_price"}, {"type": "results_in", "source": "discount", "target": "final_price"} ]}Step 2: Convert to Logical Form
final_price = original_price × (1 - discount_percentage / 100)Step 3: Encode as SMT
(declare-const original_price Real)(declare-const discount_percentage Real)(declare-const final_price Real)
(assert (= original_price 100.0))(assert (= discount_percentage 20.0))(assert (= final_price 80.0))
; Verification constraint(assert (= final_price (* original_price (- 1.0 (/ discount_percentage 100.0)))))
(check-sat) ; Returns: sat (valid) or unsat (invalid)Step 4: Run Z3
from z3 import *
original = Real('original_price')discount = Real('discount_pct')final = Real('final_price')
solver = Solver()solver.add(original == 100)solver.add(discount == 20)solver.add(final == 80)solver.add(final == original * (1 - discount / 100))
if solver.check() == sat: print("✅ Claim is mathematically valid") print(solver.model())else: print("❌ Claim is mathematically invalid")SMT Encoding Patterns
Pattern 1: Factual Claims (Lookup)
# Knowledge base as Z3 constraintscapitals = { "France": "Paris", "UK": "London"}
def verify_capital(country, claimed_capital): if country in capitals: return capitals[country] == claimed_capital else: return "UNKNOWN"Pattern 2: Numerical Calculations
def verify_math(expression, claimed_result): solver = Solver()
# Parse expression and encode vars = extract_variables(expression) for var in vars: solver.add(var.constraint)
result = Real('result') solver.add(result == claimed_result) solver.add(result == eval_expression(expression))
return solver.check() == satPattern 3: Logical Consistency
def verify_logical_consistency(statements): """ Check if a set of statements is logically consistent """ solver = Solver()
for stmt in statements: solver.add(encode_statement(stmt))
# If SAT, statements are consistent # If UNSAT, there's a contradiction return solver.check() == satPart 5: Open Source Tools Stack
Required Tools
| Tool | Purpose | Installation |
|---|---|---|
| Z3 | SMT solver (core verification) | pip install z3-solver |
| Dafny | Verification-aware programming | Download from GitHub |
| Coq | Theorem prover | opam install coq |
| Lean 4 | Modern proof assistant | Download from lean-lang.org |
| cvc5 | Alternative SMT solver | Build from source |
Recommended Stack for LLM Verification
LLM Layer: GPT-4 / Claude / Llama ↓Extraction: LangChain + Function Calling ↓Logic Encoding: Python + Z3/cvc5 ↓Verification: SMT Solver (Z3) ↓Result: Valid / Invalid / UnknownPart 6: Complete Working Example
Scenario: E-commerce Price Verification
Goal: Verify LLM-generated product pricing is mathematically correct
from z3 import *from openai import OpenAI
class PriceVerifier: def __init__(self): self.solver = Solver() self.llm = OpenAI()
def ask_llm(self, question): """Get LLM response""" response = self.llm.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": question}] ) return response.choices[0].message.content
def extract_pricing_claim(self, llm_response): """Extract structured pricing data from LLM response""" extraction_prompt = f""" Extract pricing information as JSON:
Text: {llm_response}
Return JSON with these fields: {{ "original_price": <number>, "discount_percentage": <number>, "final_price": <number> }} """
result = self.llm.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": extraction_prompt}], response_format={"type": "json_object"} )
import json return json.loads(result.choices[0].message.content)
def verify_pricing(self, claim): """Verify pricing calculation using Z3""" original = Real('original_price') discount = Real('discount_pct') final = Real('final_price')
solver = Solver()
# Add claim values solver.add(original == claim['original_price']) solver.add(discount == claim['discount_percentage']) solver.add(final == claim['final_price'])
# Add correctness constraint solver.add(final == original * (1 - discount / 100))
# Add business rules solver.add(original >= 0) solver.add(discount >= 0) solver.add(discount <= 100) solver.add(final >= 0)
if solver.check() == sat: return { "valid": True, "proof": "Mathematical verification successful", "model": solver.model() } else: return { "valid": False, "reason": "Calculation violates pricing rules or math is incorrect" }
def verified_query(self, question): """Ask LLM and verify the response""" # Get LLM response llm_answer = self.ask_llm(question)
# Extract claim claim = self.extract_pricing_claim(llm_answer)
# Verify verification = self.verify_pricing(claim)
return { "question": question, "llm_response": llm_answer, "extracted_claim": claim, "verification": verification, "final_answer": llm_answer if verification["valid"] else "BLOCKED: Failed verification" }
# Usageverifier = PriceVerifier()
result = verifier.verified_query( "A product costs $100. After a 20% discount, what is the final price?")
print("LLM Answer:", result["llm_response"])print("Verified:", result["verification"]["valid"])print("Final Answer:", result["final_answer"])Output:
LLM Answer: After applying a 20% discount on a $100 product, the final price is $80.Verified: TrueFinal Answer: After applying a 20% discount on a $100 product, the final price is $80.If LLM made a mistake:
LLM Answer: After applying a 20% discount on a $100 product, the final price is $75.Verified: FalseFinal Answer: BLOCKED: Failed verificationPart 7: Scaling to Production
Challenges
-
Latency: SMT solving can be slow for complex formulas
- Solution: Cache common queries, use incremental solving
-
Natural Language → Logic Gap: Hard to extract perfect logical forms
- Solution: Use LLMs for extraction, validate with simpler claims first
-
Knowledge Base Maintenance: Keeping KB up to date
- Solution: Auto-extract rules from documentation, version control policies
Best Practices
- Start Simple: Begin with numeric/factual verification
- Iterative Refinement: Build up complexity gradually
- Hybrid Approach: Use probabilistic + symbolic together
- Human in Loop: Flag uncertain cases for human review
Part 8: Resources & Next Steps
Learning Path
-
Week 1: Learn Z3 basics
-
Week 2: Build simple fact checker
- Verify arithmetic claims from LLM
-
Week 3: Add natural language extraction
- Use LLM to extract structured claims
-
Week 4: Integrate with production LLM
- Build API wrapper with verification layer
Code Repositories
- Z3 Python: https://github.com/Z3Prover/z3
- Logic-LM: https://github.com/teacherpeterpan/Logic-LLM
- AWS Dafny Tools: https://github.com/awslabs/aws-verification-model-for-libcrypto
- Neurosymbolic Examples: https://github.com/LAMDASZ-ML/Awesome-Neuro-Symbolic-Learning-with-LLM
Papers
- “Semantic-based Automated Reasoning for AWS Access Policies using SMT” (Amazon, 2018)
- “Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach” (2025)
- “MATH-VF: Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving” (2025)
Summary: How to Get Started Today
# 1. Install Z3pip install z3-solver
# 2. Copy the PriceVerifier example above
# 3. Test with your LLM API
# 4. Extend to your domain (e.g., legal, medical, financial)
# 5. Iterate and improveKey Insight: You don’t need Amazon’s full infrastructure. Start with:
- Z3 for verification
- LLM for natural language understanding
- Python to glue them together
This gives you 80% of the value with 20% of the complexity!
Last Updated: October 31, 2025