Building a self-hosted knowledge graph to enable intelligent search across research documentation.

Goals

  1. Ask questions about research topics
  2. Search knowledge graph for answers
  3. If not available, research online and update docs
  4. Auto-extract entities and relationships as documents are created/updated

Vector RAG vs Graph RAG

When Vector RAG is Sufficient

Corpus SizeInterconnectionsRecommendation
< 50 docsLowVector RAG
50-200 docsMediumVector + Tags
200+ docsHighGraph RAG

Current Research Repo Profile

  • 142 markdown files
  • ~1.4 MB total content
  • 25 topic directories
  • 42 cross-links (low connectivity)
  • Independent topics (Tesla, Claude, taxes, etc.)

Current state: Vector RAG would work fine. Future state: As corpus grows and topics interconnect, Graph RAG becomes valuable.

Key Differences

AspectVector RAGGraph RAG
Data ModelFlat chunks + embeddingsKnowledge graph + embeddings
RetrievalCosine similarity onlyVector search + graph traversal
Query Types”Find similar content""What connects X to Y?”
ScalingO(n) similarity searchO(relationships) traversal
Best ForIndependent documentsInterconnected knowledge

The Network Effect

Small corpus (now): Large corpus (future):
o o o o───o───o───o
│ ╲ │ ╱ │ ╲ │
o o o ───▶ o───o───o───o
│ ╱ │ ╲ │ ╱ │
o o o o───o───o───o
Vector sufficient Graph becomes valuable

Potential connections scale non-linearly:

  • 25 docs → ~300 potential pairs
  • 100 docs → ~5,000 potential pairs
  • 500 docs → ~125,000 potential pairs

Solution: FalkorDB + GraphRAG-SDK

Why FalkorDB?

RequirementFalkorDB
Self-hosted✅ Docker one-liner
Graph database✅ Property graph (Cypher)
Vector search✅ Built-in (cosine, Euclidean)
GraphRAG optimized✅ Designed for this
Low latency✅ 140ms p99 (vs Neo4j 40s+)
LangChain integration✅ Direct support
Cost✅ Free (SSPLv1)

FalkorDB vs Alternatives

vsWhy FalkorDB Wins
Neo4j280x faster, built for GraphRAG
SQLiteNative graph traversal, vector search
Zep CloudSelf-hosted, no subscription fees
KuzuMore mature, better LLM integrations

Quick Start

Terminal window
docker run -p 6379:6379 -p 3000:3000 -it --rm \
-v ./data:/var/lib/falkordb/data \
falkordb/falkordb

UI available at http://localhost:3000


GraphRAG-SDK: Simplified Implementation

Why Use the SDK?

TaskSDKCustom Build
Entity extractionAutomaticBuild prompts + parsing
Ontology detectionAutomaticDefine schema manually
Graph constructionAutomaticWrite Cypher
NL query interfaceBuilt-inBuild query layer
Lines of code~20~200-500

SDK Features

  • LiteLLM integration - supports OpenAI, Anthropic, Google, Ollama
  • Auto-ontology - detects entity types from documents
  • Multi-format - PDF, JSONL, CSV, HTML, TEXT, URLs
  • Natural language queries - converts questions to Cypher

Working Example

from graphrag_sdk import KnowledgeGraph, Ontology, Source
from graphrag_sdk.models.litellm import LiteModel
from graphrag_sdk.model_config import KnowledgeGraphModelConfig
import os
# Configure (choose your model)
os.environ["OPENAI_API_KEY"] = "your-key" # or ANTHROPIC_API_KEY
os.environ["FALKORDB_HOST"] = "localhost"
# 1. Select model
model = LiteModel(model_name="openai/gpt-4o-mini")
# or: model = LiteModel(model_name="anthropic/claude-haiku-4-5-20251001")
# 2. Point to documents
sources = [Source("./docs/google-rag-file-search/how-it-works.md")]
# 3. Auto-detect ontology
ontology = Ontology.from_sources(sources=sources, model=model)
# 4. Build knowledge graph
kg = KnowledgeGraph(
name="research_kb",
ontology=ontology,
model_config=KnowledgeGraphModelConfig.with_model(model),
)
kg.process_sources(sources)
# 5. Query in natural language
chat = kg.chat_session()
response = chat.send_message("What's the difference between vector and graph RAG?")
print(response["response"])

Entity Extraction: Model Cost Analysis

Current Corpus Stats

MetricValue
Files142 markdown files
Total size~1.4 MB (1,417,859 chars)
Est. input tokens~355K
System prompt overhead~140K (1K per doc)
Est. output tokens~70K (entity JSON)
Total~500K input + 70K output

Model Comparison

ModelInputOutputYour Corpus
GPT-4o-mini$0.15/1M$0.60/1M$0.12
Claude Haiku 4.5$1.00/1M$5.00/1M$0.85
GPT-4o$2.50/1M$10.00/1M$1.95
Claude Sonnet 4.5$3.00/1M$15.00/1M$2.55

Model Selection Guide

ModelBest ForQuality
GPT-4o-miniCost-sensitive, standard entities85-90%
Claude Haiku 4.5Better reasoning, fast90%
GPT-4oHigh accuracy needs95%
Claude Sonnet 4.5Novel entity discovery95%+

Recommendation

Start: GPT-4o-mini ($0.12)
Test quality on 10-20 docs
If good → done
If not → upgrade to Haiku ($0.85)

For $0.12, GPT-4o-mini is worth trying first.


Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Local Machine │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Markdown │────▶│ GraphRAG │────▶│ FalkorDB │ │
│ │ Docs │ │ SDK │ │ (Docker) │ │
│ │ ./docs/ │ │ │ │ │ │
│ └──────────────┘ │ - Ontology │ │ - Graph store │ │
│ │ - Extract │ │ - Vector index│ │
│ │ - Ingest │ │ - Cypher query│ │
│ └──────────────┘ └───────┬────────┘ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ LLM API │◀────────────┘ │
│ │ (GPT-4o-mini│ Query │
│ │ or Haiku) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Data Flow

  1. Ingest: Markdown docs → GraphRAG-SDK → Entity extraction → FalkorDB
  2. Query: Natural language → LLM → Cypher → FalkorDB → Results → LLM → Answer

Implementation Plan

Phase 1: Setup (30 min)

Terminal window
# 1. Start FalkorDB
docker run -p 6379:6379 -p 3000:3000 -it --rm \
-v ./falkordb-data:/var/lib/falkordb/data \
falkordb/falkordb
# 2. Install SDK
pip install graphrag_sdk
# 3. Set API key
export OPENAI_API_KEY="your-key"

Phase 2: Test Ingestion

# Test with a few docs first
sources = [
Source("./docs/google-rag-file-search/how-it-works.md"),
Source("./docs/claude-code/haiku-vs-sonnet-performance.md"),
]
ontology = Ontology.from_sources(sources=sources, model=model)
print(ontology) # Review detected entity types
kg = KnowledgeGraph(name="research_test", ontology=ontology, ...)
kg.process_sources(sources)

Phase 3: Full Ingestion

import glob
# All docs
md_files = glob.glob("./docs/**/*.md", recursive=True)
sources = [Source(f) for f in md_files]
kg.process_sources(sources) # ~$0.12 with GPT-4o-mini

Phase 4: Automation (Optional)

Add pre-commit hook or file watcher to auto-ingest new/changed docs.


Cost Summary

ComponentOne-TimeOngoing
FalkorDB (Docker)FreeFree
GraphRAG-SDKFreeFree
Initial ingestion (142 docs)0.85-
Per-doc re-ingestion~$0.001Per change
Queries~$0.001Per query

Total monthly estimate: < $1 for moderate usage


Comparison with Zep/Temporal-Bridge

AspectZep CloudFalkorDB + SDK
HostingCloud (paid)Self-hosted (free)
Entity extractionAutomatic (Zep)LLM API ($0.12)
Graph storageManagedLocal Docker
Pricing trendIncreasing (4x)Free
ControlLimitedFull
Conversation memoryBuilt-inSeparate concern

Trade-off: Zep provides conversation memory + knowledge graph in one. FalkorDB is just the graph - you’d need separate conversation storage.


Document Chunking Strategy

Why Chunk Size Matters for Knowledge Graphs

Unlike whole-document embeddings, chunked embeddings provide:

  • Entity precision - Smaller chunks create cleaner entity-to-chunk mappings
  • Relationship clarity - Focused chunks preserve entity relationships without noise
  • Retrieval accuracy - Prevents dilution of relevance signals

Optimal Chunk Sizes

Use CaseChunk SizeOverlapStrategy
Entity-rich documents256-512 tokens50-100 tokensRecursiveCharacterTextSplitter
Technical documentation400-500 tokens10-20%Semantic boundaries
Analytical content1024+ tokens10-20%Page-level or semantic
Factoid queries256-512 tokens10-20%Smaller chunks

Recommendation for This Project

Given the FalkorDB setup with research documentation:

# Recommended chunking configuration
chunk_size = 400-512 tokens # ~1600-2000 characters
overlap = 50-100 tokens # ~200-400 characters
strategy = RecursiveCharacterTextSplitter
# For documents > 1000 tokens
# Better retrieval precision vs whole-document embeddings

Performance Benchmarks

StrategyRecallUse Case
RecursiveCharacterTextSplitter (400 tokens)88-89%General purpose
LLMSemanticChunker91.9%High accuracy (higher cost)
ClusterSemanticChunker91.3%Semantic coherence
Page-level chunking64.8%Consistent across doc types

Embedding Model Considerations

ModelContext WindowChunking Impact
OpenAI text-embedding-3-small8,191 tokensMax chunk: 8K
OpenAI text-embedding-3-large8,191 tokensMax chunk: 8K
Gemini embedding-0012,048 tokensMax chunk: 2K

Note: Most embedding models max out at 512-2048 tokens, making chunking mandatory for longer documents.

Implementation with GraphRAG-SDK

The GraphRAG-SDK handles chunking automatically during ingestion. For custom control:

from graphrag_sdk import Source
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Custom chunking before ingestion
splitter = RecursiveCharacterTextSplitter(
chunk_size=400,
chunk_overlap=50,
length_function=len,
)
# Process each markdown file
with open("docs/topic/file.md") as f:
chunks = splitter.split_text(f.read())
# Each chunk becomes a document node in FalkorDB
for chunk in chunks:
sources.append(Source(chunk))

Migration Path

Current state: Whole-document embeddings

  • Works fine for current 142-doc corpus
  • May have precision issues on large docs (e.g., architecture-and-costs.md)

Future optimization:

  1. Identify docs > 1000 tokens
  2. Re-ingest with 400-token chunks
  3. Measure retrieval improvement
  4. Apply chunking to all docs if beneficial

Cost: Re-ingesting with chunks ~$0.12-0.85 (same as initial ingestion)

References


References