README

Research and implementation of Amazon’s Automated Reasoning approach for verifying LLM outputs using formal methods and SMT solvers.

=� Project Structure

automated-reasoning/
�� amazon-automated-reasoning.md              # Amazon AR research & overview
�� automated-reasoning-implementation-guide.md # Complete implementation guide
�� restaurant-riddle-smt-solution.py          # Working SMT solver example
�� pyproject.toml                             # uv project configuration
�� README.md                                  # This file

=� Quick Start

Setup

# Install dependencies with uv
uv sync

# Or manually install Z3
uv add z3-solver

Run the Restaurant Riddle Solver

uv run python restaurant-riddle-smt-solution.py

Expected Output:

SOLUTION:
Alice      � Quinoa Bowl
Bob        � Veggie Burger
Carol      � Veggie Burger
David      � Chicken Caesar Salad
Eve        � Grilled Salmon

 SMT SOLVER FOUND PROVABLY CORRECT SOLUTION!

=� Documentation

1. Amazon Automated Reasoning Overview

See: amazon-automated-reasoning.md

Key Topics:

What is Automated Reasoning?
Amazon’s implementation (Zelkova, Bedrock Guardrails, Dafny)
2025 breakthrough: 99% verification accuracy for LLM outputs
Real-world applications (AWS IAM, S3, VPC, EU AI Act compliance)

2. Implementation Guide

See: automated-reasoning-implementation-guide.md

Includes:

Complete breakdown of Amazon’s technical approach
How to replicate with open-source tools (Z3, Python)
Working code examples (PriceVerifier, restaurant riddle)
Three implementation options (DIY, AWS Bedrock, Imandra)
Production deployment best practices

3. Working Example: Restaurant Riddle

See: restaurant-riddle-smt-solution.py

Demonstrates:

Constraint satisfaction with SMT
Why LLMs fail at logical consistency
How automated reasoning prevents errors
Z3 solver in action

>� Core Concept

Traditional LLM Approach (Probabilistic)

User Query � LLM � Response (maybe correct, maybe hallucination)

Automated Reasoning Approach (Provable)

User Query � LLM � Response
                      �
             SMT Verification (Z3)
                      �
            Valid / Invalid / Unknown
                      �
       Return or Block with Proof

=, The Restaurant Riddle Example

Problem: 5 people, 5 dishes, 5 complex constraints

LLM Result (Qwen3-0.6B):

L Made logical error (inconsistent reasoning)
Said: “Carol: Quinoa Bowl (same as Bob’s choice)”
But Bob chose: Veggie Burger
Logical contradiction!

SMT Solver Result:

Provably correct solution
Cannot make logical errors
Mathematical proof of correctness

=� Key Insights

Why SMT > LLMs for Constraint Problems

Coverage: Verifies ALL possible behaviors, not sample cases
Accuracy: Mathematical proof, not probability
Consistency: Cannot produce contradictions
Explainability: Provides proof or counterexample

When to Use SMT

Good for:

Constraint satisfaction problems
Mathematical calculations
Policy verification
Logical consistency checking
Security properties

L Not ideal for:

Creative writing
Subjective opinions
Open-ended generation
Ambiguous natural language

=� Tech Stack

Python 3.11+: Core language
uv: Fast Python package manager
Z3: Microsoft’s SMT solver (open source)
SMT-LIB: Standard language for SMT solvers

=� Learning Path

Start here: Run restaurant-riddle-smt-solution.py
Understand basics: Read amazon-automated-reasoning.md
Build your own: Follow automated-reasoning-implementation-guide.md
Extend: Adapt to your domain (finance, legal, medical)

= Resources

Papers

“Semantic-based Automated Reasoning for AWS Access Policies using SMT” (Amazon, 2018)
“Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach” (2025)

Tools

Z3: https://github.com/Z3Prover/z3
Dafny: https://github.com/dafny-lang/dafny
AWS Bedrock Guardrails: https://aws.amazon.com/bedrock/guardrails/

Tutorials

Z3 Python Tutorial: https://ericpony.github.io/z3py-tutorial/guide-examples.htm
SMT-LIB Standard: https://smtlib.cs.uiowa.edu/

<� Next Steps

Modify restaurant-riddle-smt-solution.py to add new constraints
Build a price calculator with SMT verification
Create an LLM + SMT hybrid system
Deploy to production with AWS Bedrock

=� Comparison

Approach	Accuracy	Speed	Coverage	Explainable
LLM Only	~70-90%	Fast	Samples	L
LLM + SMT	99%	Medium	All cases
SMT Only	100%	Slow	All cases

> Contributing

This is a research project. Feel free to:

Add new examples
Improve documentation
Share findings

Last Updated: October 31, 2025 Research Date: October 31, 2025