graph-database-comparison
Comparison of approaches for adding graph query capabilities to different database foundations.
The Core Question
Why did FalkorDB choose Redis instead of SQLite or DuckDB?
Redis was chosen for specific architectural reasons:
| Requirement | Redis | SQLite | DuckDB |
|---|---|---|---|
| Memory-first | Native in-memory | Disk-first (WAL helps) | Hybrid (can spill) |
| Extension API | Redis Module API (C) | Virtual tables, loadable extensions | Community extensions |
| Low latency | Sub-millisecond | Slower for traversals | OLAP-optimized |
| Parallelism | OpenMP for GraphBLAS | Single-writer lock | Vectorized, parallel |
| Built-in networking | Client protocol included | Embedded only | Embedded only |
From FalkorDB Design Docs:
“FalkorDB is going to target cases which require low latency and as such is going to be in memory.”
How Each Adds Graph Capabilities
FalkorDB (Redis)
Native implementation via Redis Module API:
- Registers custom commands (
GRAPH.QUERY,GRAPH.DELETE) - GraphBLAS sparse matrix operations run in C inside Redis process
- CSC (Compressed Sparse Columns) for adjacency matrices
- Not a wrapper - graph algorithms execute natively
Client → Redis Protocol → Redis Module API → FalkorDB (GraphBLAS) ↓ Sparse matrix mathDuckDB (DuckPGQ)
Community extension with SQL/PGQ standard:
INSTALL duckpgq FROM community;LOAD duckpgq;
-- Create property graph over existing tablesCREATE PROPERTY GRAPH ...
-- Query with Cypher-like syntaxSELECT * FROM GRAPH_TABLE (pg MATCH (a:Person)-[e:KNOWS]->(b:Person) WHERE a.name = 'Alice' COLUMNS (b.name AS friend));- SIMD-friendly bulk path-finding
- CSR (Compressed Sparse Row) in-memory representation
- Runs on DuckDB’s vectorized engine
SQLite (Various)
Recursive CTEs (built-in):
WITH RECURSIVE path AS ( SELECT id, name, 0 AS depth FROM nodes WHERE id = 1 UNION ALL SELECT n.id, n.name, p.depth + 1 FROM nodes n JOIN edges e ON n.id = e.target_id JOIN path p ON e.source_id = p.id WHERE p.depth < 5)SELECT * FROM path;Limitation: “Very slow for larger graphs because the same nodes are visited multiple times”
Extensions:
- simple-graph - JSON nodes, recursive CTE traversal
- sqlite-graph - Cypher support (alpha)
Performance Comparison
FalkorDB Strengths
| Aspect | Performance | Why |
|---|---|---|
| 2-3 hop traversals | Sub-millisecond | GraphBLAS matrix multiplication |
| GraphRAG queries | 280x faster than Neo4j | Optimized for this workload |
| Concurrent reads | High throughput | Redis networking layer |
DuckDB/DuckPGQ Strengths
| Aspect | Performance | Why |
|---|---|---|
| Batch pathfinding | Fast | SIMD multi-source algorithms |
| Aggregations | Excellent | OLAP-native, columnar |
| Large datasets | Unbounded | Spills to disk |
| Disk performance | 8x faster than in-memory (compressed) | Columnar compression |
From DuckDB Memory Management:
“Counter-intuitively, using a disk-based DuckDB instance can be faster than an in-memory instance due to compression.”
Scale Limits
FalkorDB
| Constraint | Reality |
|---|---|
| Hard limit | RAM size (Redis constraint) |
| Practical limit | Memory fragmentation at scale |
| Observed issue | OOM with 18.49x fragmentation |
| Team statement | ”Scaling without memory bloat is the big problem we’re working on” |
Memory optimizations (v4.8-4.10):
- 42% memory reduction
- String interning for deduplication
- 7x more efficient than competitors
DuckDB
| Constraint | Reality |
|---|---|
| Hard limit | Disk space |
| Memory behavior | Spills to disk when exceeding memory_limit (default 80% RAM) |
| Performance | Slower with disk spilling, but still works |
When to Use Each
Approximate Scale Guidelines
| Graph Size | Recommendation | Reasoning |
|---|---|---|
| < 10M edges | FalkorDB | Sub-ms latency wins |
| 10M - 100M edges | Depends on RAM & pattern | Test both |
| 100M+ edges | DuckPGQ or distributed | FalkorDB can’t fit in RAM |
| Larger-than-RAM | DuckPGQ | No choice - FalkorDB OOMs |
Workload Pattern Guidelines
| Query Type | Winner | Why |
|---|---|---|
| Real-time traversals (2-3 hops) | FalkorDB | GraphBLAS optimization |
| Batch pathfinding (all shortest paths) | DuckPGQ | SIMD bulk algorithms |
| Graph + analytics (PageRank → aggregate → join) | DuckDB | OLAP-native |
| GraphRAG for LLMs (latency-critical) | FalkorDB | Designed for this |
| Ad-hoc exploration (large datasets) | DuckDB | Disk spillover |
Use Case Matrix
| Scenario | FalkorDB | DuckPGQ | SQLite |
|---|---|---|---|
| Knowledge graph for RAG | Optimal | Overkill | Too slow |
| Social network analytics | Good (if fits RAM) | Better at scale | No |
| Fraud detection (real-time) | Optimal | Too slow | No |
| Fraud detection (batch) | OK | Optimal | No |
| Small embedded graph | Overkill | Overkill | Simple |
Lattice Recommendation
For Lattice (research documentation knowledge graph):
Current Profile
| Metric | Value |
|---|---|
| Documents | ~150 markdown files |
| Total size | ~1.5 MB |
| Entity count | ~500-1000 |
| Relationship count | ~1000-3000 |
| Growth projection | Maybe 500-1000 docs |
Why FalkorDB is Correct
| Requirement | FalkorDB | DuckPGQ |
|---|---|---|
| Scale | 1000 docs = trivial | Massive overkill |
| Query pattern | Real-time Q&A | Would work but slower |
| GraphRAG optimization | Native | Not designed for this |
| Latency | Sub-ms | Tens of ms |
| LangChain integration | Direct | Would need custom |
| SDK | GraphRAG-SDK ready | No equivalent |
Verdict: FalkorDB is the clear choice for Lattice.
DuckPGQ would only make sense if:
- Corpus grew to millions of documents
- You needed batch analytics over the graph
- Real-time response wasn’t required
When to Reconsider
| Trigger | Action |
|---|---|
| RAM > 50% used by FalkorDB | Monitor fragmentation |
| OOM errors | Consider sharding or DuckPGQ |
| Need batch analytics | Export to DuckDB for analysis |
| > 1M entities | Evaluate distributed options |
For a research knowledge graph with hundreds to low thousands of documents, FalkorDB will remain optimal for years.
Memory Gotchas (Lessons from Lattice)
Relationship Type Count is a Hidden Memory Multiplier
Problem: FalkorDB pre-allocates a sparse matrix for each relationship type, not just each edge.
Memory ≈ NODE_CREATION_BUFFER × relationship_types × matrix_overhead ≈ 16,384 × N types × overhead| Relationship Types | Pre-allocated Slots | Observed Impact |
|---|---|---|
| 15-20 types | 327,680 | OOM at 2GB limit |
| 2 types | 32,768 | Fits in ~200MB |
Real-world case: Lattice hit OOM with only 200 documents when using many relationship types (REFERENCES, MENTIONS, AUTHORED_BY, CITES, etc.). Reducing to 2 types resolved the issue.
Design Pattern: Coarse Types + Properties
Instead of:
(doc)-[:REFERENCES]->(entity)(doc)-[:MENTIONS]->(entity)(doc)-[:CITES]->(entity)(doc)-[:DEPENDS_ON]->(entity)Use:
(doc)-[:RELATES_TO {type: "reference"}]->(entity)(doc)-[:RELATES_TO {type: "mention"}]->(entity)Trade-off: Slightly more verbose queries, but 10x less memory.
Other Memory Considerations
| Factor | Impact | Mitigation |
|---|---|---|
| NODE_CREATION_BUFFER | 16,384 default | Reduce to 1024-2048 for small graphs |
| Full-text indices | Can be 70% of total memory | Only index what you query |
| Vector dimensions | 1536-dim = 6KB/vector | Use 512-dim models if possible |
| Redis fragmentation | Up to 18x reported | Monitor mem_fragmentation_ratio |
Technical Deep Dive
FalkorDB: GraphBLAS Performance
GraphBLAS uses linear algebra for graph operations:
Adjacency Matrix A: 1 2 3 4 ┌──────────────┐1 │ 0 1 0 1 │ (node 1 → nodes 2, 4)2 │ 1 0 1 0 │ (node 2 → nodes 1, 3)3 │ 0 1 0 1 │4 │ 1 0 1 0 │ └──────────────┘
2-hop neighbors = A × A (matrix multiplication)3-hop neighbors = A × A × AWhy this is fast:
- Matrix ops are highly optimized (BLAS libraries)
- Sparse matrices only store non-zero entries (CSC format)
- OpenMP parallelization
DuckPGQ: Vectorized Execution
DuckDB processes data in vectors (1024-2048 items):
Traditional (row-by-row): Vectorized:for each row: for each vector of 1024 rows: process row process all at once (SIMD)
Overhead: N function calls Overhead: N/1024 function callsWhy this matters for graphs:
- Bulk path-finding across many start nodes simultaneously
- Better CPU cache utilization (L1 cache fits vectors)