Local Knowledge Graph - Architecture and Cost Analysis

Building a self-hosted knowledge graph to enable intelligent search across research documentation.

Goals

Ask questions about research topics
Search knowledge graph for answers
If not available, research online and update docs
Auto-extract entities and relationships as documents are created/updated

Vector RAG vs Graph RAG

When Vector RAG is Sufficient

Corpus Size	Interconnections	Recommendation
< 50 docs	Low	Vector RAG
50-200 docs	Medium	Vector + Tags
200+ docs	High	Graph RAG

Current Research Repo Profile

142 markdown files
~1.4 MB total content
25 topic directories
42 cross-links (low connectivity)
Independent topics (Tesla, Claude, taxes, etc.)

Current state: Vector RAG would work fine. Future state: As corpus grows and topics interconnect, Graph RAG becomes valuable.

Key Differences

Aspect	Vector RAG	Graph RAG
Data Model	Flat chunks + embeddings	Knowledge graph + embeddings
Retrieval	Cosine similarity only	Vector search + graph traversal
Query Types	”Find similar content"	"What connects X to Y?”
Scaling	O(n) similarity search	O(relationships) traversal
Best For	Independent documents	Interconnected knowledge

The Network Effect

Small corpus (now):              Large corpus (future):

     o   o   o                         o───o───o───o
                                       │ ╲ │ ╱ │ ╲ │
     o   o   o          ───▶           o───o───o───o
                                       │ ╱ │ ╲ │ ╱ │
     o   o   o                         o───o───o───o

Vector sufficient                 Graph becomes valuable

Potential connections scale non-linearly:

25 docs → ~300 potential pairs
100 docs → ~5,000 potential pairs
500 docs → ~125,000 potential pairs

Solution: FalkorDB + GraphRAG-SDK

Why FalkorDB?

Requirement	FalkorDB
Self-hosted	✅ Docker one-liner
Graph database	✅ Property graph (Cypher)
Vector search	✅ Built-in (cosine, Euclidean)
GraphRAG optimized	✅ Designed for this
Low latency	✅ 140ms p99 (vs Neo4j 40s+)
LangChain integration	✅ Direct support
Cost	✅ Free (SSPLv1)

FalkorDB vs Alternatives

vs	Why FalkorDB Wins
Neo4j	280x faster, built for GraphRAG
SQLite	Native graph traversal, vector search
Zep Cloud	Self-hosted, no subscription fees
Kuzu	More mature, better LLM integrations

Quick Start

docker run -p 6379:6379 -p 3000:3000 -it --rm \
  -v ./data:/var/lib/falkordb/data \
  falkordb/falkordb

UI available at http://localhost:3000

GraphRAG-SDK: Simplified Implementation

Why Use the SDK?

Task	SDK	Custom Build
Entity extraction	Automatic	Build prompts + parsing
Ontology detection	Automatic	Define schema manually
Graph construction	Automatic	Write Cypher
NL query interface	Built-in	Build query layer
Lines of code	~20	~200-500

SDK Features

LiteLLM integration - supports OpenAI, Anthropic, Google, Ollama
Auto-ontology - detects entity types from documents
Multi-format - PDF, JSONL, CSV, HTML, TEXT, URLs
Natural language queries - converts questions to Cypher

Working Example

from graphrag_sdk import KnowledgeGraph, Ontology, Source
from graphrag_sdk.models.litellm import LiteModel
from graphrag_sdk.model_config import KnowledgeGraphModelConfig
import os

# Configure (choose your model)
os.environ["OPENAI_API_KEY"] = "your-key"  # or ANTHROPIC_API_KEY
os.environ["FALKORDB_HOST"] = "localhost"

# 1. Select model
model = LiteModel(model_name="openai/gpt-4o-mini")
# or: model = LiteModel(model_name="anthropic/claude-haiku-4-5-20251001")

# 2. Point to documents
sources = [Source("./docs/google-rag-file-search/how-it-works.md")]

# 3. Auto-detect ontology
ontology = Ontology.from_sources(sources=sources, model=model)

# 4. Build knowledge graph
kg = KnowledgeGraph(
    name="research_kb",
    ontology=ontology,
    model_config=KnowledgeGraphModelConfig.with_model(model),
)
kg.process_sources(sources)

# 5. Query in natural language
chat = kg.chat_session()
response = chat.send_message("What's the difference between vector and graph RAG?")
print(response["response"])

Entity Extraction: Model Cost Analysis

Current Corpus Stats

Metric	Value
Files	142 markdown files
Total size	~1.4 MB (1,417,859 chars)
Est. input tokens	~355K
System prompt overhead	~140K (1K per doc)
Est. output tokens	~70K (entity JSON)
Total	~500K input + 70K output

Model Comparison

Model	Input	Output	Your Corpus
GPT-4o-mini	$0.15/1M	$0.60/1M	$0.12
Claude Haiku 4.5	$1.00/1M	$5.00/1M	$0.85
GPT-4o	$2.50/1M	$10.00/1M	$1.95
Claude Sonnet 4.5	$3.00/1M	$15.00/1M	$2.55

Model Selection Guide

Model	Best For	Quality
GPT-4o-mini	Cost-sensitive, standard entities	85-90%
Claude Haiku 4.5	Better reasoning, fast	90%
GPT-4o	High accuracy needs	95%
Claude Sonnet 4.5	Novel entity discovery	95%+

Recommendation

Start: GPT-4o-mini ($0.12)
   ↓
Test quality on 10-20 docs
   ↓
If good → done
If not → upgrade to Haiku ($0.85)

For $0.12, GPT-4o-mini is worth trying first.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Local Machine                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────┐     ┌──────────────┐     ┌────────────────┐  │
│  │  Markdown    │────▶│  GraphRAG    │────▶│  FalkorDB      │  │
│  │  Docs        │     │  SDK         │     │  (Docker)      │  │
│  │  ./docs/     │     │              │     │                │  │
│  └──────────────┘     │  - Ontology  │     │  - Graph store │  │
│                       │  - Extract   │     │  - Vector index│  │
│                       │  - Ingest    │     │  - Cypher query│  │
│                       └──────────────┘     └───────┬────────┘  │
│                              │                     │           │
│                              ▼                     │           │
│                       ┌──────────────┐             │           │
│                       │  LLM API     │◀────────────┘           │
│                       │  (GPT-4o-mini│   Query                 │
│                       │   or Haiku)  │                         │
│                       └──────────────┘                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Data Flow

Ingest: Markdown docs → GraphRAG-SDK → Entity extraction → FalkorDB
Query: Natural language → LLM → Cypher → FalkorDB → Results → LLM → Answer

Implementation Plan

Phase 1: Setup (30 min)

# 1. Start FalkorDB
docker run -p 6379:6379 -p 3000:3000 -it --rm \
  -v ./falkordb-data:/var/lib/falkordb/data \
  falkordb/falkordb

# 2. Install SDK
pip install graphrag_sdk

# 3. Set API key
export OPENAI_API_KEY="your-key"

Phase 2: Test Ingestion

# Test with a few docs first
sources = [
    Source("./docs/google-rag-file-search/how-it-works.md"),
    Source("./docs/claude-code/haiku-vs-sonnet-performance.md"),
]

ontology = Ontology.from_sources(sources=sources, model=model)
print(ontology)  # Review detected entity types

kg = KnowledgeGraph(name="research_test", ontology=ontology, ...)
kg.process_sources(sources)

Phase 3: Full Ingestion

import glob

# All docs
md_files = glob.glob("./docs/**/*.md", recursive=True)
sources = [Source(f) for f in md_files]

kg.process_sources(sources)  # ~$0.12 with GPT-4o-mini

Phase 4: Automation (Optional)

Add pre-commit hook or file watcher to auto-ingest new/changed docs.

Cost Summary

Component	One-Time	Ongoing
FalkorDB (Docker)	Free	Free
GraphRAG-SDK	Free	Free
Initial ingestion (142 docs)	0.85	-
Per-doc re-ingestion	~$0.001	Per change
Queries	~$0.001	Per query

Total monthly estimate: < $1 for moderate usage

Comparison with Zep/Temporal-Bridge

Aspect	Zep Cloud	FalkorDB + SDK
Hosting	Cloud (paid)	Self-hosted (free)
Entity extraction	Automatic (Zep)	LLM API ($0.12)
Graph storage	Managed	Local Docker
Pricing trend	Increasing (4x)	Free
Control	Limited	Full
Conversation memory	Built-in	Separate concern

Trade-off: Zep provides conversation memory + knowledge graph in one. FalkorDB is just the graph - you’d need separate conversation storage.

Document Chunking Strategy

Why Chunk Size Matters for Knowledge Graphs

Unlike whole-document embeddings, chunked embeddings provide:

Entity precision - Smaller chunks create cleaner entity-to-chunk mappings
Relationship clarity - Focused chunks preserve entity relationships without noise
Retrieval accuracy - Prevents dilution of relevance signals

Optimal Chunk Sizes

Use Case	Chunk Size	Overlap	Strategy
Entity-rich documents	256-512 tokens	50-100 tokens	RecursiveCharacterTextSplitter
Technical documentation	400-500 tokens	10-20%	Semantic boundaries
Analytical content	1024+ tokens	10-20%	Page-level or semantic
Factoid queries	256-512 tokens	10-20%	Smaller chunks

Recommendation for This Project

Given the FalkorDB setup with research documentation:

# Recommended chunking configuration
chunk_size = 400-512 tokens  # ~1600-2000 characters
overlap = 50-100 tokens      # ~200-400 characters
strategy = RecursiveCharacterTextSplitter

# For documents > 1000 tokens
# Better retrieval precision vs whole-document embeddings

Performance Benchmarks

Strategy	Recall	Use Case
RecursiveCharacterTextSplitter (400 tokens)	88-89%	General purpose
LLMSemanticChunker	91.9%	High accuracy (higher cost)
ClusterSemanticChunker	91.3%	Semantic coherence
Page-level chunking	64.8%	Consistent across doc types

Embedding Model Considerations

Model	Context Window	Chunking Impact
OpenAI text-embedding-3-small	8,191 tokens	Max chunk: 8K
OpenAI text-embedding-3-large	8,191 tokens	Max chunk: 8K
Gemini embedding-001	2,048 tokens	Max chunk: 2K

Note: Most embedding models max out at 512-2048 tokens, making chunking mandatory for longer documents.

Implementation with GraphRAG-SDK

The GraphRAG-SDK handles chunking automatically during ingestion. For custom control:

from graphrag_sdk import Source
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Custom chunking before ingestion
splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=50,
    length_function=len,
)

# Process each markdown file
with open("docs/topic/file.md") as f:
    chunks = splitter.split_text(f.read())

# Each chunk becomes a document node in FalkorDB
for chunk in chunks:
    sources.append(Source(chunk))

Migration Path

Current state: Whole-document embeddings

Works fine for current 142-doc corpus
May have precision issues on large docs (e.g., architecture-and-costs.md)

Future optimization:

Identify docs > 1000 tokens
Re-ingest with 400-token chunks
Measure retrieval improvement
Apply chunking to all docs if beneficial

Cost: Re-ingesting with chunks ~$0.12-0.85 (same as initial ingestion)