Amazon’s Approach Breakdown & Replication Guide

Table of Contents


Part 1: What Amazon Implemented

System Architecture

┌─────────────────────────────────────────────────────────────┐
│ User Query │
└───────────────────────────┬─────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ LLM (Claude, GPT, etc.) │
│ Generates Response │
└───────────────────────────┬─────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Automated Reasoning Checks │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 1. Extract Logical Representation │ │
│ │ 2. Encode as Mathematical Formula (SMT) │ │
│ │ 3. Check Against Policy Rules (Z3/SMT Solver) │ │
│ │ 4. Return: Valid / Invalid / No Data │ │
│ └──────────────────────────────────────────────────────┘ │
└───────────────────────────┬─────────────────────────────────┘
┌───────┴────────┐
│ │
✅ Valid ❌ Invalid
│ │
Return Response Block/Regenerate

Part 2: Core Components

1. Zelkova (AWS Internal Tool)

Purpose: Policy analysis using automated reasoning

What it does:

  • Analyzes IAM policies, S3 policies, VPC security rules
  • Reduces policies to mathematical formulas
  • Uses SMT solvers (Z3, cvc5) for verification

Technical Approach:

Policy → Mathematical Formula → SMT Solver → Proof/Counterexample

Encoding Theories Used:

  • String theory (for text matching)
  • Regular expressions
  • Bit vectors (for permissions)
  • Integer comparisons
  • Set theory

2. Automated Reasoning Checks (Bedrock Guardrails)

Purpose: Prevent LLM hallucinations with 99% verification accuracy

Technical Implementation:

Step 1: Create Automated Reasoning Policy

Source Documents (Knowledge Base)
Extract Variables & Logical Rules
Encode as Formal Logic (SMT Format)
Automated Reasoning Policy

Example Policy Structure:

; Example: Verify product pricing rules
(declare-const price Real)
(declare-const discount Real)
(declare-const final_price Real)
; Rules
(assert (>= price 0))
(assert (and (>= discount 0) (<= discount 100)))
(assert (= final_price (* price (- 1 (/ discount 100)))))
; Verification query
(check-sat)

Step 2: Integration Flow

1. User Query → LLM
2. LLM Response → Extract Claims
3. Claims → Convert to Logical Formulas
4. Formulas + Policy → SMT Solver (Z3)
5. Solver Output → Valid/Invalid/No Data
6. Result → Allow/Block/Request Clarification

Step 3: Validation Process

# Pseudocode for validation
def validate_llm_response(response, policy):
# Extract claims from natural language
claims = extract_claims(response)
# Convert to logical formulas
formulas = []
for claim in claims:
formula = nl_to_logic(claim) # Natural language → Logic
formulas.append(formula)
# Check against policy using SMT solver
for formula in formulas:
result = z3_solver.check(formula, policy)
if result == "UNSAT":
return {"valid": False, "claim": claim, "reason": "Contradicts policy"}
elif result == "UNKNOWN":
return {"valid": "uncertain", "claim": claim, "reason": "Insufficient data"}
return {"valid": True}

3. Dafny (Verification-Aware Programming)

Purpose: Write provably correct code with mathematical guarantees

What it does:

  • Programming language with built-in verification
  • Automatically generates proofs of correctness
  • Used for AWS access management, storage, cryptography

Example:

method Max(a: int, b: int) returns (m: int)
ensures m >= a && m >= b
ensures m == a || m == b
{
if a > b {
return a;
} else {
return b;
}
}

Dafny automatically proves that this method satisfies its postconditions.


Part 3: How to Replicate It

Option 1: Minimal Python Implementation (Using Z3)

Tools Needed:

  • Z3 SMT Solver (open source, Microsoft)
  • Python 3.8+
  • LangChain or similar LLM framework

Installation:

Terminal window
pip install z3-solver langchain openai

Implementation:

from z3 import *
import openai
from typing import List, Dict, Any
class AutomatedReasoningChecker:
def __init__(self, knowledge_base: Dict[str, Any]):
"""
knowledge_base: Dictionary of facts and rules
Example:
{
"capitals": {"France": "Paris", "UK": "London"},
"pricing_rules": {"min_price": 0, "max_discount": 100}
}
"""
self.kb = knowledge_base
self.solver = Solver()
self._encode_knowledge_base()
def _encode_knowledge_base(self):
"""Convert knowledge base to SMT constraints"""
# Example: Encode pricing rules
if "pricing_rules" in self.kb:
price = Real('price')
discount = Real('discount')
self.solver.add(price >= self.kb["pricing_rules"]["min_price"])
self.solver.add(discount >= 0)
self.solver.add(discount <= self.kb["pricing_rules"]["max_discount"])
def extract_claim(self, llm_response: str) -> List[str]:
"""
Extract verifiable claims from LLM response
In production, this would use NLP/LLM to extract structured claims
"""
# Simplified: Use another LLM call to extract claims
extraction_prompt = f"""
Extract factual claims from this text as JSON:
Text: {llm_response}
Format:
{{"claims": [{{"type": "fact", "subject": "...", "predicate": "...", "object": "..."}}]}}
"""
# Call LLM to extract structured claims
# (In practice, use GPT-4 or Claude with function calling)
return self._call_llm_for_extraction(extraction_prompt)
def verify_claim(self, claim: Dict[str, str]) -> Dict[str, Any]:
"""
Verify a single claim against knowledge base
"""
claim_type = claim.get("type")
if claim_type == "capital":
# Example: "Paris is the capital of France"
country = claim["subject"]
capital = claim["object"]
if country in self.kb.get("capitals", {}):
expected = self.kb["capitals"][country]
if capital == expected:
return {"valid": True, "proof": f"KB confirms {capital} is capital of {country}"}
else:
return {"valid": False, "reason": f"KB says capital is {expected}, not {capital}"}
else:
return {"valid": "unknown", "reason": "No data for this country"}
elif claim_type == "pricing":
# Example: "The final price is $80 after 20% discount on $100"
original = Real('original_price')
disc = Real('discount_pct')
final = Real('final_price')
# Add claim constraints
self.solver.push()
self.solver.add(original == claim["original_price"])
self.solver.add(disc == claim["discount_pct"])
self.solver.add(final == claim["final_price"])
# Add correctness constraint
self.solver.add(final == original * (1 - disc / 100))
# Check satisfiability
result = self.solver.check()
self.solver.pop()
if result == sat:
return {"valid": True, "proof": "Calculation verified by SMT solver"}
else:
return {"valid": False, "reason": "Math doesn't check out"}
return {"valid": "unknown", "reason": "Unsupported claim type"}
def validate_llm_response(self, llm_response: str) -> Dict[str, Any]:
"""
Main validation function
"""
claims = self.extract_claim(llm_response)
results = []
for claim in claims:
verification = self.verify_claim(claim)
results.append({
"claim": claim,
"verification": verification
})
# Aggregate results
all_valid = all(r["verification"]["valid"] == True for r in results)
return {
"overall_valid": all_valid,
"claim_results": results
}
# Usage Example
kb = {
"capitals": {"France": "Paris", "UK": "London", "Spain": "Madrid"},
"pricing_rules": {"min_price": 0, "max_discount": 100}
}
checker = AutomatedReasoningChecker(kb)
# LLM response to verify
llm_response = "The capital of France is Paris."
# Validate
result = checker.validate_llm_response(llm_response)
print(result)

Option 2: Using Existing Neurosymbolic Frameworks

Logic-LM (Research Implementation)

Combines LLMs with multiple symbolic solvers:

# Using Logic-LM framework
from logic_lm import LogicLM
# Initialize with solvers
llm = LogicLM(
llm_backend="gpt-4",
solvers={
"prolog": "pyke",
"smt": "z3",
"fol": "prover9"
}
)
# Query with verification
response = llm.query(
prompt="What is the capital of France?",
verify=True,
knowledge_base="geography_facts.pl"
)
print(response.answer) # "Paris"
print(response.verified) # True
print(response.proof) # SMT proof or Prolog derivation

Imandra (Commercial Tool)

Neurosymbolic AI platform with LLM integration:

from imandra import verify
@verify
def calculate_discount(price, discount_pct):
"""
Calculates final price after discount
Requires:
- price >= 0
- 0 <= discount_pct <= 100
Ensures:
- result >= 0
- result <= price
"""
return price * (1 - discount_pct / 100)
# Imandra automatically generates formal proofs
# that this function satisfies its specification

Option 3: AWS Bedrock Integration (Production-Ready)

If you’re using AWS, integrate directly with Bedrock Guardrails:

import boto3
bedrock = boto3.client('bedrock-runtime')
# Create Automated Reasoning policy
policy_response = bedrock.create_automated_reasoning_policy(
name="product-pricing-policy",
description="Validates product pricing calculations",
sourceDocuments=[
{
"s3Uri": "s3://my-bucket/pricing-rules.pdf",
"documentType": "PDF"
}
]
)
policy_id = policy_response['policyId']
# Apply policy to guardrail
guardrail_response = bedrock.create_guardrail(
name="my-llm-guardrail",
automatedReasoningChecks=[
{
"policyId": policy_id,
"action": "BLOCK" # or "WARN"
}
]
)
# Use with LLM
response = bedrock.invoke_model(
modelId="anthropic.claude-v3",
body={
"prompt": "Calculate 20% discount on $100",
"guardrailIdentifier": guardrail_response['guardrailId']
}
)
# Response includes validation results
print(response['automatedReasoningResults'])

Part 4: Technical Deep Dive

How Claims Are Extracted from Natural Language

Pipeline:

Natural Language → Parse → Extract Entities → Build Logical Form → Encode as SMT

Example:

Input: “After applying a 20% discount on a 80.”

Step 1: Parse & Extract

{
"entities": {
"original_price": 100,
"discount_percentage": 20,
"final_price": 80
},
"relations": [
{"type": "applies_to", "source": "discount", "target": "original_price"},
{"type": "results_in", "source": "discount", "target": "final_price"}
]
}

Step 2: Convert to Logical Form

final_price = original_price × (1 - discount_percentage / 100)

Step 3: Encode as SMT

(declare-const original_price Real)
(declare-const discount_percentage Real)
(declare-const final_price Real)
(assert (= original_price 100.0))
(assert (= discount_percentage 20.0))
(assert (= final_price 80.0))
; Verification constraint
(assert (= final_price (* original_price (- 1.0 (/ discount_percentage 100.0)))))
(check-sat) ; Returns: sat (valid) or unsat (invalid)

Step 4: Run Z3

from z3 import *
original = Real('original_price')
discount = Real('discount_pct')
final = Real('final_price')
solver = Solver()
solver.add(original == 100)
solver.add(discount == 20)
solver.add(final == 80)
solver.add(final == original * (1 - discount / 100))
if solver.check() == sat:
print("✅ Claim is mathematically valid")
print(solver.model())
else:
print("❌ Claim is mathematically invalid")

SMT Encoding Patterns

Pattern 1: Factual Claims (Lookup)

# Knowledge base as Z3 constraints
capitals = {
"France": "Paris",
"UK": "London"
}
def verify_capital(country, claimed_capital):
if country in capitals:
return capitals[country] == claimed_capital
else:
return "UNKNOWN"

Pattern 2: Numerical Calculations

def verify_math(expression, claimed_result):
solver = Solver()
# Parse expression and encode
vars = extract_variables(expression)
for var in vars:
solver.add(var.constraint)
result = Real('result')
solver.add(result == claimed_result)
solver.add(result == eval_expression(expression))
return solver.check() == sat

Pattern 3: Logical Consistency

def verify_logical_consistency(statements):
"""
Check if a set of statements is logically consistent
"""
solver = Solver()
for stmt in statements:
solver.add(encode_statement(stmt))
# If SAT, statements are consistent
# If UNSAT, there's a contradiction
return solver.check() == sat

Part 5: Open Source Tools Stack

Required Tools

ToolPurposeInstallation
Z3SMT solver (core verification)pip install z3-solver
DafnyVerification-aware programmingDownload from GitHub
CoqTheorem proveropam install coq
Lean 4Modern proof assistantDownload from lean-lang.org
cvc5Alternative SMT solverBuild from source
LLM Layer: GPT-4 / Claude / Llama
Extraction: LangChain + Function Calling
Logic Encoding: Python + Z3/cvc5
Verification: SMT Solver (Z3)
Result: Valid / Invalid / Unknown

Part 6: Complete Working Example

Scenario: E-commerce Price Verification

Goal: Verify LLM-generated product pricing is mathematically correct

from z3 import *
from openai import OpenAI
class PriceVerifier:
def __init__(self):
self.solver = Solver()
self.llm = OpenAI()
def ask_llm(self, question):
"""Get LLM response"""
response = self.llm.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
def extract_pricing_claim(self, llm_response):
"""Extract structured pricing data from LLM response"""
extraction_prompt = f"""
Extract pricing information as JSON:
Text: {llm_response}
Return JSON with these fields:
{{
"original_price": <number>,
"discount_percentage": <number>,
"final_price": <number>
}}
"""
result = self.llm.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": extraction_prompt}],
response_format={"type": "json_object"}
)
import json
return json.loads(result.choices[0].message.content)
def verify_pricing(self, claim):
"""Verify pricing calculation using Z3"""
original = Real('original_price')
discount = Real('discount_pct')
final = Real('final_price')
solver = Solver()
# Add claim values
solver.add(original == claim['original_price'])
solver.add(discount == claim['discount_percentage'])
solver.add(final == claim['final_price'])
# Add correctness constraint
solver.add(final == original * (1 - discount / 100))
# Add business rules
solver.add(original >= 0)
solver.add(discount >= 0)
solver.add(discount <= 100)
solver.add(final >= 0)
if solver.check() == sat:
return {
"valid": True,
"proof": "Mathematical verification successful",
"model": solver.model()
}
else:
return {
"valid": False,
"reason": "Calculation violates pricing rules or math is incorrect"
}
def verified_query(self, question):
"""Ask LLM and verify the response"""
# Get LLM response
llm_answer = self.ask_llm(question)
# Extract claim
claim = self.extract_pricing_claim(llm_answer)
# Verify
verification = self.verify_pricing(claim)
return {
"question": question,
"llm_response": llm_answer,
"extracted_claim": claim,
"verification": verification,
"final_answer": llm_answer if verification["valid"] else "BLOCKED: Failed verification"
}
# Usage
verifier = PriceVerifier()
result = verifier.verified_query(
"A product costs $100. After a 20% discount, what is the final price?"
)
print("LLM Answer:", result["llm_response"])
print("Verified:", result["verification"]["valid"])
print("Final Answer:", result["final_answer"])

Output:

LLM Answer: After applying a 20% discount on a $100 product, the final price is $80.
Verified: True
Final Answer: After applying a 20% discount on a $100 product, the final price is $80.

If LLM made a mistake:

LLM Answer: After applying a 20% discount on a $100 product, the final price is $75.
Verified: False
Final Answer: BLOCKED: Failed verification

Part 7: Scaling to Production

Challenges

  1. Latency: SMT solving can be slow for complex formulas

    • Solution: Cache common queries, use incremental solving
  2. Natural Language → Logic Gap: Hard to extract perfect logical forms

    • Solution: Use LLMs for extraction, validate with simpler claims first
  3. Knowledge Base Maintenance: Keeping KB up to date

    • Solution: Auto-extract rules from documentation, version control policies

Best Practices

  1. Start Simple: Begin with numeric/factual verification
  2. Iterative Refinement: Build up complexity gradually
  3. Hybrid Approach: Use probabilistic + symbolic together
  4. Human in Loop: Flag uncertain cases for human review

Part 8: Resources & Next Steps

Learning Path

  1. Week 1: Learn Z3 basics

  2. Week 2: Build simple fact checker

    • Verify arithmetic claims from LLM
  3. Week 3: Add natural language extraction

    • Use LLM to extract structured claims
  4. Week 4: Integrate with production LLM

    • Build API wrapper with verification layer

Code Repositories

Papers

  1. “Semantic-based Automated Reasoning for AWS Access Policies using SMT” (Amazon, 2018)
  2. “Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach” (2025)
  3. “MATH-VF: Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving” (2025)

Summary: How to Get Started Today

Terminal window
# 1. Install Z3
pip install z3-solver
# 2. Copy the PriceVerifier example above
# 3. Test with your LLM API
# 4. Extend to your domain (e.g., legal, medical, financial)
# 5. Iterate and improve

Key Insight: You don’t need Amazon’s full infrastructure. Start with:

  • Z3 for verification
  • LLM for natural language understanding
  • Python to glue them together

This gives you 80% of the value with 20% of the complexity!


Last Updated: October 31, 2025