embedded-duckdb-alternative

Analysis of replacing FalkorDB with embedded DuckDB VSS+PGQ for Lattice’s local knowledge graph use case.

The Problem

Current: Lattice requires users to:

Install Docker
Run FalkorDB container (docker run -p 6379:6379 falkordb/falkordb)
Manage container lifecycle
Understand Redis port binding

Barrier to entry: Docker + container management = friction for local knowledge graphs.

Proposed: Embedded DuckDB

Zero-setup alternative:

bun add -g @zabaca/lattice    # That's it - no Docker!
lattice init
lattice sync

No Docker. No containers. No Redis. Just a single .duckdb file.

Architecture Comparison

Aspect	FalkorDB (Current)	DuckDB Embedded
Setup	Docker required	Zero dependencies
Deployment	Container + port binding	Single binary
Data storage	Redis in-memory + RDB snapshots	Single `.duckdb` file
Vector search	Custom Cypher extension	VSS extension (HNSW)
Graph queries	Native (GraphBLAS)	DuckPGQ extension (SQL/PGQ)
Performance	Sub-ms traversals	Tens of ms
Scale limit	RAM size (~100M edges)	Disk space (unbounded)

Ease-of-Use Analysis

Current Setup (FalkorDB)

# User needs to:
1. Install Docker Desktop
2. Run container
   docker run -d -p 6379:6379 falkordb/falkordb
3. Ensure container starts on boot
4. Install lattice
   bun add -g @zabaca/lattice
5. Configure connection
   export FALKORDB_HOST=localhost
   export FALKORDB_PORT=6379

Steps: 5 | External deps: 1 (Docker)

Proposed Setup (DuckDB)

# User needs to:
1. Install lattice (DuckDB bundled)
   bun add -g @zabaca/lattice

Steps: 1 | External deps: 0

DuckDB Embedded: Technical Details

Node.js Integration

Using DuckDB Neo API (new in 2024):

import Database from 'duckdb';

// Create embedded database - that's it!
const db = new Database('./lattice-kb.duckdb');

// Load extensions
await db.run('INSTALL vss FROM community');
await db.run('LOAD vss');
await db.run('INSTALL duckpgq FROM community');
await db.run('LOAD duckpgq');

// Enable persistence for HNSW indexes (DuckDB 1.0.0+)
await db.run('SET GLOBAL hnsw_enable_experimental_persistence = true');

Key insight: The new DuckDB Neo client (replacing deprecated callback-based API) provides native TypeScript support and will be supported through DuckDB 1.5.x (~Early 2026).

HNSW Vector Indexes

// Create table with embeddings
await db.run(`
  CREATE TABLE documents (
    id INTEGER PRIMARY KEY,
    content VARCHAR,
    embedding FLOAT[512]  -- Voyage voyage-3-lite
  )
`);

// Create HNSW index
await db.run(`
  CREATE INDEX doc_embedding_idx
  ON documents
  USING HNSW (embedding)
  WITH (metric = 'cosine')
`);

Persistence: With hnsw_enable_experimental_persistence = true, indexes save to the .duckdb file. No rebuild on restart.

Property Graph Queries

// Create property graph over existing tables
await db.run(`
  CREATE PROPERTY GRAPH kg
  VERTEX TABLES (documents, entities)
  EDGE TABLES (
    doc_mentions_entity
      SOURCE KEY (doc_id) REFERENCES documents (id)
      DESTINATION KEY (entity_id) REFERENCES entities (id)
      LABEL MENTIONS
  )
`);

// Query with SQL/PGQ syntax
const results = await db.query(`
  SELECT * FROM GRAPH_TABLE (kg
    MATCH (d:documents)-[m:MENTIONS]->(e:entities)
    WHERE d.id = 1
    COLUMNS (d.id, d.content, e.name)
  )
`);

Performance Trade-offs

Validated from packages/duckpgq-vss (2025-12-06)

Operation	FalkorDB	DuckDB VSS+PGQ
2-3 hop traversals	Sub-ms (GraphBLAS)	~10-50ms
Vector search (HNSW)	Custom impl	~5-10ms
Hybrid query (separate)	~5-10ms	~100ms
Hybrid query (single CTE)	N/A	❌ Crashes (DuckPGQ bug #276)

For Lattice scale (~500-1000 entities):

FalkorDB: ~5ms
DuckDB: ~100ms

Reality check: For local research knowledge base, 100ms vs 5ms is imperceptible to users.

Implementation Complexity

Refactoring GraphService

Current (graph.service.ts):

@Injectable()
export class GraphService {
  private redis: Redis;

  async query(cypher: string): Promise<CypherResult> {
    return await this.redis.call('GRAPH.QUERY', graphName, cypher);
  }
}

Proposed (DuckDB):

@Injectable()
export class GraphService {
  private db: Database;

  async query(sql: string): Promise<QueryResult> {
    return await this.db.query(sql);
  }

  async vectorSearch(query: string, limit: number): Promise<Document[]> {
    const embedding = await this.embeddingService.embed(query);

    return await this.db.query(`
      SELECT id, content,
             array_cosine_distance(embedding, ?::FLOAT[512]) as distance
      FROM documents
      ORDER BY distance LIMIT ?
    `, [embedding, limit]);
  }

  async graphExpand(docIds: number[]): Promise<RelatedDoc[]> {
    return await this.db.query(`
      SELECT d.id, d.content, e.name as via_entity
      FROM GRAPH_TABLE (kg
        MATCH (d:documents)-[:MENTIONS]->(e:entities)
        COLUMNS (d.id, d.content, e.name)
      ) WHERE d.id IN (?)
    `, [docIds]);
  }
}

Refactor scope: ~200-300 LOC in GraphService + query methods.

Deployment Advantages

Serverless-Ready

DuckDB works in AWS Lambda, Google Cloud Functions, Vercel Edge:

// No Docker! Just bundle the .duckdb file
export async function handler(event) {
  const db = new Database('./lattice-kb.duckdb');
  const results = await db.query('SELECT * FROM documents');
  return { statusCode: 200, body: JSON.stringify(results) };
}

FalkorDB: Requires container infrastructure (ECS, Cloud Run, etc.)

Single-File Distribution

# Ship your knowledge base
lattice export my-research.duckdb

# Share with others
# They get: data + embeddings + graph + indexes in ONE file

FalkorDB: Requires Redis RDB export + import, coordination with Redis instance.

Limitations to Consider

1. DuckPGQ Maturity

Issue	Impact	Mitigation
CTE crash bug #276	Can’t do multi-GRAPH CTEs	Use separate queries (~100ms)
DuckDB version pinned to 1.3.1	DuckPGQ not available for 1.4.x	Wait for upstream build
Limited pathfinding	No GraphBLAS-style matrix ops	Sufficient for Lattice scale

2. Memory vs Disk

FalkorDB	DuckDB
All-in-RAM	Hybrid (spills to disk)
Faster for hot data	Slower for cold data
RAM limit = hard limit	Disk limit = soft limit

For Lattice: DuckDB’s hybrid approach is actually beneficial — indexes in RAM, bulk data on disk.

3. HNSW Index Persistence (Experimental)

From DuckDB VSS docs:

With the hnsw_enable_experimental_persistence option enabled, the index will be persisted… However, this is an experimental feature and may not be stable.

Risk: Index persistence could break in future DuckDB versions.

Mitigation: Lattice can rebuild indexes on first run if persistence fails (add ~10s startup cost).

Migration Path

Phase 1: Proof of Concept

Goal: Validate DuckDB works for Lattice’s queries

Create new GraphServiceDuckDB implementation
Run test suite against both FalkorDB and DuckDB
Compare query results and performance

Phase 2: Dual-Backend Support

Goal: Let users choose backend

{
  "backend": "duckdb",  // or "falkordb"
  "duckdb": {
    "path": "./lattice-kb.duckdb"
  },
  "falkordb": {
    "host": "localhost",
    "port": 6379
  }
}

Phase 3: Default to Embedded

Goal: Make DuckDB the default, keep FalkorDB as opt-in for power users

# Default: zero setup
lattice init
# → Creates lattice-kb.duckdb

# Advanced: use FalkorDB
lattice init --backend=falkordb
# → Prompts to start Docker container

Recommendation for Lattice

For Local Knowledge Graphs: DuckDB Wins

Factor	Weight	FalkorDB	DuckDB
Setup simplicity	⭐⭐⭐⭐⭐	❌ Docker	✅ Zero deps
Performance	⭐⭐⭐	✅ Sub-ms	⚠️ ~100ms
Portability	⭐⭐⭐⭐	❌ Container	✅ Single file
Scale limit	⭐⭐	⚠️ RAM	✅ Disk
Serverless support	⭐⭐⭐⭐	❌ No	✅ Yes

Conclusion: For Lattice’s target audience (developers building local knowledge bases), DuckDB’s ease-of-use outweighs FalkorDB’s performance advantage.

Keep FalkorDB as an option for:

Production GraphRAG services (latency-critical)
Large teams (already have Docker infra)
Real-time applications (<10ms response time required)

Next Steps

Prototype DuckDB backend in feature branch
Benchmark against FalkorDB with Lattice’s actual queries
Test HNSW persistence stability over time
Document migration guide for existing Lattice users
Release dual-backend support (let community validate)