README
Research and implementation of Amazon’s Automated Reasoning approach for verifying LLM outputs using formal methods and SMT solvers.
=� Project Structure
automated-reasoning/�� amazon-automated-reasoning.md # Amazon AR research & overview�� automated-reasoning-implementation-guide.md # Complete implementation guide�� restaurant-riddle-smt-solution.py # Working SMT solver example�� pyproject.toml # uv project configuration�� README.md # This file=� Quick Start
Setup
# Install dependencies with uvuv sync
# Or manually install Z3uv add z3-solverRun the Restaurant Riddle Solver
uv run python restaurant-riddle-smt-solution.pyExpected Output:
SOLUTION:Alice � Quinoa BowlBob � Veggie BurgerCarol � Veggie BurgerDavid � Chicken Caesar SaladEve � Grilled Salmon
SMT SOLVER FOUND PROVABLY CORRECT SOLUTION!=� Documentation
1. Amazon Automated Reasoning Overview
See: amazon-automated-reasoning.md
Key Topics:
- What is Automated Reasoning?
- Amazon’s implementation (Zelkova, Bedrock Guardrails, Dafny)
- 2025 breakthrough: 99% verification accuracy for LLM outputs
- Real-world applications (AWS IAM, S3, VPC, EU AI Act compliance)
2. Implementation Guide
See: automated-reasoning-implementation-guide.md
Includes:
- Complete breakdown of Amazon’s technical approach
- How to replicate with open-source tools (Z3, Python)
- Working code examples (PriceVerifier, restaurant riddle)
- Three implementation options (DIY, AWS Bedrock, Imandra)
- Production deployment best practices
3. Working Example: Restaurant Riddle
See: restaurant-riddle-smt-solution.py
Demonstrates:
- Constraint satisfaction with SMT
- Why LLMs fail at logical consistency
- How automated reasoning prevents errors
- Z3 solver in action
>� Core Concept
Traditional LLM Approach (Probabilistic)
User Query � LLM � Response (maybe correct, maybe hallucination)Automated Reasoning Approach (Provable)
User Query � LLM � Response � SMT Verification (Z3) � Valid / Invalid / Unknown � Return or Block with Proof=, The Restaurant Riddle Example
Problem: 5 people, 5 dishes, 5 complex constraints
LLM Result (Qwen3-0.6B):
- L Made logical error (inconsistent reasoning)
- Said: “Carol: Quinoa Bowl (same as Bob’s choice)”
- But Bob chose: Veggie Burger
- Logical contradiction!
SMT Solver Result:
- Provably correct solution
- Cannot make logical errors
- Mathematical proof of correctness
=� Key Insights
Why SMT > LLMs for Constraint Problems
- Coverage: Verifies ALL possible behaviors, not sample cases
- Accuracy: Mathematical proof, not probability
- Consistency: Cannot produce contradictions
- Explainability: Provides proof or counterexample
When to Use SMT
Good for:
- Constraint satisfaction problems
- Mathematical calculations
- Policy verification
- Logical consistency checking
- Security properties
L Not ideal for:
- Creative writing
- Subjective opinions
- Open-ended generation
- Ambiguous natural language
=� Tech Stack
- Python 3.11+: Core language
- uv: Fast Python package manager
- Z3: Microsoft’s SMT solver (open source)
- SMT-LIB: Standard language for SMT solvers
=� Learning Path
- Start here: Run
restaurant-riddle-smt-solution.py - Understand basics: Read
amazon-automated-reasoning.md - Build your own: Follow
automated-reasoning-implementation-guide.md - Extend: Adapt to your domain (finance, legal, medical)
= Resources
Papers
- “Semantic-based Automated Reasoning for AWS Access Policies using SMT” (Amazon, 2018)
- “Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach” (2025)
Tools
- Z3: https://github.com/Z3Prover/z3
- Dafny: https://github.com/dafny-lang/dafny
- AWS Bedrock Guardrails: https://aws.amazon.com/bedrock/guardrails/
Tutorials
- Z3 Python Tutorial: https://ericpony.github.io/z3py-tutorial/guide-examples.htm
- SMT-LIB Standard: https://smtlib.cs.uiowa.edu/
<� Next Steps
- Modify
restaurant-riddle-smt-solution.pyto add new constraints - Build a price calculator with SMT verification
- Create an LLM + SMT hybrid system
- Deploy to production with AWS Bedrock
=� Comparison
| Approach | Accuracy | Speed | Coverage | Explainable |
|---|---|---|---|---|
| LLM Only | ~70-90% | Fast | Samples | L |
| LLM + SMT | 99% | Medium | All cases | |
| SMT Only | 100% | Slow | All cases |
> Contributing
This is a research project. Feel free to:
- Add new examples
- Improve documentation
- Share findings
Last Updated: October 31, 2025 Research Date: October 31, 2025