Local Knowledge Graph - Architecture and Cost Analysis
Building a self-hosted knowledge graph to enable intelligent search across research documentation.
Goals
- Ask questions about research topics
- Search knowledge graph for answers
- If not available, research online and update docs
- Auto-extract entities and relationships as documents are created/updated
Vector RAG vs Graph RAG
When Vector RAG is Sufficient
| Corpus Size | Interconnections | Recommendation |
|---|---|---|
| < 50 docs | Low | Vector RAG |
| 50-200 docs | Medium | Vector + Tags |
| 200+ docs | High | Graph RAG |
Current Research Repo Profile
- 142 markdown files
- ~1.4 MB total content
- 25 topic directories
- 42 cross-links (low connectivity)
- Independent topics (Tesla, Claude, taxes, etc.)
Current state: Vector RAG would work fine. Future state: As corpus grows and topics interconnect, Graph RAG becomes valuable.
Key Differences
| Aspect | Vector RAG | Graph RAG |
|---|---|---|
| Data Model | Flat chunks + embeddings | Knowledge graph + embeddings |
| Retrieval | Cosine similarity only | Vector search + graph traversal |
| Query Types | ”Find similar content" | "What connects X to Y?” |
| Scaling | O(n) similarity search | O(relationships) traversal |
| Best For | Independent documents | Interconnected knowledge |
The Network Effect
Small corpus (now): Large corpus (future):
o o o o───o───o───o │ ╲ │ ╱ │ ╲ │ o o o ───▶ o───o───o───o │ ╱ │ ╲ │ ╱ │ o o o o───o───o───o
Vector sufficient Graph becomes valuablePotential connections scale non-linearly:
- 25 docs → ~300 potential pairs
- 100 docs → ~5,000 potential pairs
- 500 docs → ~125,000 potential pairs
Solution: FalkorDB + GraphRAG-SDK
Why FalkorDB?
| Requirement | FalkorDB |
|---|---|
| Self-hosted | ✅ Docker one-liner |
| Graph database | ✅ Property graph (Cypher) |
| Vector search | ✅ Built-in (cosine, Euclidean) |
| GraphRAG optimized | ✅ Designed for this |
| Low latency | ✅ 140ms p99 (vs Neo4j 40s+) |
| LangChain integration | ✅ Direct support |
| Cost | ✅ Free (SSPLv1) |
FalkorDB vs Alternatives
| vs | Why FalkorDB Wins |
|---|---|
| Neo4j | 280x faster, built for GraphRAG |
| SQLite | Native graph traversal, vector search |
| Zep Cloud | Self-hosted, no subscription fees |
| Kuzu | More mature, better LLM integrations |
Quick Start
docker run -p 6379:6379 -p 3000:3000 -it --rm \ -v ./data:/var/lib/falkordb/data \ falkordb/falkordbUI available at http://localhost:3000
GraphRAG-SDK: Simplified Implementation
Why Use the SDK?
| Task | SDK | Custom Build |
|---|---|---|
| Entity extraction | Automatic | Build prompts + parsing |
| Ontology detection | Automatic | Define schema manually |
| Graph construction | Automatic | Write Cypher |
| NL query interface | Built-in | Build query layer |
| Lines of code | ~20 | ~200-500 |
SDK Features
- LiteLLM integration - supports OpenAI, Anthropic, Google, Ollama
- Auto-ontology - detects entity types from documents
- Multi-format - PDF, JSONL, CSV, HTML, TEXT, URLs
- Natural language queries - converts questions to Cypher
Working Example
from graphrag_sdk import KnowledgeGraph, Ontology, Sourcefrom graphrag_sdk.models.litellm import LiteModelfrom graphrag_sdk.model_config import KnowledgeGraphModelConfigimport os
# Configure (choose your model)os.environ["OPENAI_API_KEY"] = "your-key" # or ANTHROPIC_API_KEYos.environ["FALKORDB_HOST"] = "localhost"
# 1. Select modelmodel = LiteModel(model_name="openai/gpt-4o-mini")# or: model = LiteModel(model_name="anthropic/claude-haiku-4-5-20251001")
# 2. Point to documentssources = [Source("./docs/google-rag-file-search/how-it-works.md")]
# 3. Auto-detect ontologyontology = Ontology.from_sources(sources=sources, model=model)
# 4. Build knowledge graphkg = KnowledgeGraph( name="research_kb", ontology=ontology, model_config=KnowledgeGraphModelConfig.with_model(model),)kg.process_sources(sources)
# 5. Query in natural languagechat = kg.chat_session()response = chat.send_message("What's the difference between vector and graph RAG?")print(response["response"])Entity Extraction: Model Cost Analysis
Current Corpus Stats
| Metric | Value |
|---|---|
| Files | 142 markdown files |
| Total size | ~1.4 MB (1,417,859 chars) |
| Est. input tokens | ~355K |
| System prompt overhead | ~140K (1K per doc) |
| Est. output tokens | ~70K (entity JSON) |
| Total | ~500K input + 70K output |
Model Comparison
| Model | Input | Output | Your Corpus |
|---|---|---|---|
| GPT-4o-mini | $0.15/1M | $0.60/1M | $0.12 |
| Claude Haiku 4.5 | $1.00/1M | $5.00/1M | $0.85 |
| GPT-4o | $2.50/1M | $10.00/1M | $1.95 |
| Claude Sonnet 4.5 | $3.00/1M | $15.00/1M | $2.55 |
Model Selection Guide
| Model | Best For | Quality |
|---|---|---|
| GPT-4o-mini | Cost-sensitive, standard entities | 85-90% |
| Claude Haiku 4.5 | Better reasoning, fast | 90% |
| GPT-4o | High accuracy needs | 95% |
| Claude Sonnet 4.5 | Novel entity discovery | 95%+ |
Recommendation
Start: GPT-4o-mini ($0.12) ↓Test quality on 10-20 docs ↓If good → doneIf not → upgrade to Haiku ($0.85)For $0.12, GPT-4o-mini is worth trying first.
Architecture
┌─────────────────────────────────────────────────────────────────┐│ Local Machine │├─────────────────────────────────────────────────────────────────┤│ ││ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ ││ │ Markdown │────▶│ GraphRAG │────▶│ FalkorDB │ ││ │ Docs │ │ SDK │ │ (Docker) │ ││ │ ./docs/ │ │ │ │ │ ││ └──────────────┘ │ - Ontology │ │ - Graph store │ ││ │ - Extract │ │ - Vector index│ ││ │ - Ingest │ │ - Cypher query│ ││ └──────────────┘ └───────┬────────┘ ││ │ │ ││ ▼ │ ││ ┌──────────────┐ │ ││ │ LLM API │◀────────────┘ ││ │ (GPT-4o-mini│ Query ││ │ or Haiku) │ ││ └──────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Data Flow
- Ingest: Markdown docs → GraphRAG-SDK → Entity extraction → FalkorDB
- Query: Natural language → LLM → Cypher → FalkorDB → Results → LLM → Answer
Implementation Plan
Phase 1: Setup (30 min)
# 1. Start FalkorDBdocker run -p 6379:6379 -p 3000:3000 -it --rm \ -v ./falkordb-data:/var/lib/falkordb/data \ falkordb/falkordb
# 2. Install SDKpip install graphrag_sdk
# 3. Set API keyexport OPENAI_API_KEY="your-key"Phase 2: Test Ingestion
# Test with a few docs firstsources = [ Source("./docs/google-rag-file-search/how-it-works.md"), Source("./docs/claude-code/haiku-vs-sonnet-performance.md"),]
ontology = Ontology.from_sources(sources=sources, model=model)print(ontology) # Review detected entity types
kg = KnowledgeGraph(name="research_test", ontology=ontology, ...)kg.process_sources(sources)Phase 3: Full Ingestion
import glob
# All docsmd_files = glob.glob("./docs/**/*.md", recursive=True)sources = [Source(f) for f in md_files]
kg.process_sources(sources) # ~$0.12 with GPT-4o-miniPhase 4: Automation (Optional)
Add pre-commit hook or file watcher to auto-ingest new/changed docs.
Cost Summary
| Component | One-Time | Ongoing |
|---|---|---|
| FalkorDB (Docker) | Free | Free |
| GraphRAG-SDK | Free | Free |
| Initial ingestion (142 docs) | - | |
| Per-doc re-ingestion | ~$0.001 | Per change |
| Queries | ~$0.001 | Per query |
Total monthly estimate: < $1 for moderate usage
Comparison with Zep/Temporal-Bridge
| Aspect | Zep Cloud | FalkorDB + SDK |
|---|---|---|
| Hosting | Cloud (paid) | Self-hosted (free) |
| Entity extraction | Automatic (Zep) | LLM API ($0.12) |
| Graph storage | Managed | Local Docker |
| Pricing trend | Increasing (4x) | Free |
| Control | Limited | Full |
| Conversation memory | Built-in | Separate concern |
Trade-off: Zep provides conversation memory + knowledge graph in one. FalkorDB is just the graph - you’d need separate conversation storage.
Document Chunking Strategy
Why Chunk Size Matters for Knowledge Graphs
Unlike whole-document embeddings, chunked embeddings provide:
- Entity precision - Smaller chunks create cleaner entity-to-chunk mappings
- Relationship clarity - Focused chunks preserve entity relationships without noise
- Retrieval accuracy - Prevents dilution of relevance signals
Optimal Chunk Sizes
| Use Case | Chunk Size | Overlap | Strategy |
|---|---|---|---|
| Entity-rich documents | 256-512 tokens | 50-100 tokens | RecursiveCharacterTextSplitter |
| Technical documentation | 400-500 tokens | 10-20% | Semantic boundaries |
| Analytical content | 1024+ tokens | 10-20% | Page-level or semantic |
| Factoid queries | 256-512 tokens | 10-20% | Smaller chunks |
Recommendation for This Project
Given the FalkorDB setup with research documentation:
# Recommended chunking configurationchunk_size = 400-512 tokens # ~1600-2000 charactersoverlap = 50-100 tokens # ~200-400 charactersstrategy = RecursiveCharacterTextSplitter
# For documents > 1000 tokens# Better retrieval precision vs whole-document embeddingsPerformance Benchmarks
| Strategy | Recall | Use Case |
|---|---|---|
| RecursiveCharacterTextSplitter (400 tokens) | 88-89% | General purpose |
| LLMSemanticChunker | 91.9% | High accuracy (higher cost) |
| ClusterSemanticChunker | 91.3% | Semantic coherence |
| Page-level chunking | 64.8% | Consistent across doc types |
Embedding Model Considerations
| Model | Context Window | Chunking Impact |
|---|---|---|
| OpenAI text-embedding-3-small | 8,191 tokens | Max chunk: 8K |
| OpenAI text-embedding-3-large | 8,191 tokens | Max chunk: 8K |
| Gemini embedding-001 | 2,048 tokens | Max chunk: 2K |
Note: Most embedding models max out at 512-2048 tokens, making chunking mandatory for longer documents.
Implementation with GraphRAG-SDK
The GraphRAG-SDK handles chunking automatically during ingestion. For custom control:
from graphrag_sdk import Sourcefrom langchain.text_splitter import RecursiveCharacterTextSplitter
# Custom chunking before ingestionsplitter = RecursiveCharacterTextSplitter( chunk_size=400, chunk_overlap=50, length_function=len,)
# Process each markdown filewith open("docs/topic/file.md") as f: chunks = splitter.split_text(f.read())
# Each chunk becomes a document node in FalkorDBfor chunk in chunks: sources.append(Source(chunk))Migration Path
Current state: Whole-document embeddings
- Works fine for current 142-doc corpus
- May have precision issues on large docs (e.g., architecture-and-costs.md)
Future optimization:
- Identify docs > 1000 tokens
- Re-ingest with 400-token chunks
- Measure retrieval improvement
- Apply chunking to all docs if beneficial
Cost: Re-ingesting with chunks ~$0.12-0.85 (same as initial ingestion)
References
- Best Chunking Strategies for RAG 2025
- Chunking for RAG Best Practices - Unstructured
- Optimal Chunk Size for RAG - Milvus
- Evaluating Ideal Chunk Size - LlamaIndex