Research and implementation of Amazon’s Automated Reasoning approach for verifying LLM outputs using formal methods and SMT solvers.

=� Project Structure

automated-reasoning/
�� amazon-automated-reasoning.md # Amazon AR research & overview
�� automated-reasoning-implementation-guide.md # Complete implementation guide
�� restaurant-riddle-smt-solution.py # Working SMT solver example
�� pyproject.toml # uv project configuration
�� README.md # This file

=� Quick Start

Setup

Terminal window
# Install dependencies with uv
uv sync
# Or manually install Z3
uv add z3-solver

Run the Restaurant Riddle Solver

Terminal window
uv run python restaurant-riddle-smt-solution.py

Expected Output:

SOLUTION:
Alice � Quinoa Bowl
Bob � Veggie Burger
Carol � Veggie Burger
David � Chicken Caesar Salad
Eve � Grilled Salmon
 SMT SOLVER FOUND PROVABLY CORRECT SOLUTION!

=� Documentation

1. Amazon Automated Reasoning Overview

See: amazon-automated-reasoning.md

Key Topics:

  • What is Automated Reasoning?
  • Amazon’s implementation (Zelkova, Bedrock Guardrails, Dafny)
  • 2025 breakthrough: 99% verification accuracy for LLM outputs
  • Real-world applications (AWS IAM, S3, VPC, EU AI Act compliance)

2. Implementation Guide

See: automated-reasoning-implementation-guide.md

Includes:

  • Complete breakdown of Amazon’s technical approach
  • How to replicate with open-source tools (Z3, Python)
  • Working code examples (PriceVerifier, restaurant riddle)
  • Three implementation options (DIY, AWS Bedrock, Imandra)
  • Production deployment best practices

3. Working Example: Restaurant Riddle

See: restaurant-riddle-smt-solution.py

Demonstrates:

  • Constraint satisfaction with SMT
  • Why LLMs fail at logical consistency
  • How automated reasoning prevents errors
  • Z3 solver in action

>� Core Concept

Traditional LLM Approach (Probabilistic)

User Query � LLM � Response (maybe correct, maybe hallucination)

Automated Reasoning Approach (Provable)

User Query � LLM � Response
SMT Verification (Z3)
Valid / Invalid / Unknown
Return or Block with Proof

=, The Restaurant Riddle Example

Problem: 5 people, 5 dishes, 5 complex constraints

LLM Result (Qwen3-0.6B):

  • L Made logical error (inconsistent reasoning)
  • Said: “Carol: Quinoa Bowl (same as Bob’s choice)”
  • But Bob chose: Veggie Burger
  • Logical contradiction!

SMT Solver Result:

  •  Provably correct solution
  • Cannot make logical errors
  • Mathematical proof of correctness

=� Key Insights

Why SMT > LLMs for Constraint Problems

  1. Coverage: Verifies ALL possible behaviors, not sample cases
  2. Accuracy: Mathematical proof, not probability
  3. Consistency: Cannot produce contradictions
  4. Explainability: Provides proof or counterexample

When to Use SMT

 Good for:

  • Constraint satisfaction problems
  • Mathematical calculations
  • Policy verification
  • Logical consistency checking
  • Security properties

L Not ideal for:

  • Creative writing
  • Subjective opinions
  • Open-ended generation
  • Ambiguous natural language

=� Tech Stack

  • Python 3.11+: Core language
  • uv: Fast Python package manager
  • Z3: Microsoft’s SMT solver (open source)
  • SMT-LIB: Standard language for SMT solvers

=� Learning Path

  1. Start here: Run restaurant-riddle-smt-solution.py
  2. Understand basics: Read amazon-automated-reasoning.md
  3. Build your own: Follow automated-reasoning-implementation-guide.md
  4. Extend: Adapt to your domain (finance, legal, medical)

= Resources

Papers

  • “Semantic-based Automated Reasoning for AWS Access Policies using SMT” (Amazon, 2018)
  • “Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach” (2025)

Tools

Tutorials

<� Next Steps

  1. Modify restaurant-riddle-smt-solution.py to add new constraints
  2. Build a price calculator with SMT verification
  3. Create an LLM + SMT hybrid system
  4. Deploy to production with AWS Bedrock

=� Comparison

ApproachAccuracySpeedCoverageExplainable
LLM Only~70-90%FastSamplesL
LLM + SMT99%MediumAll cases
SMT Only100%SlowAll cases

> Contributing

This is a research project. Feel free to:

  • Add new examples
  • Improve documentation
  • Share findings

Last Updated: October 31, 2025 Research Date: October 31, 2025