hybrid-memory-persistent-tables
Research Date: 2025-11-04
Question
Can DuckDB use a hybrid mode where some tables are in-memory and some are persisted to disk within the same database session?
Answer
Yes, DuckDB supports hybrid storage through multiple approaches:
Approach 1: ATTACH Multiple Databases
You can attach both in-memory and persistent databases in the same session:
-- Attach persistent databaseATTACH 'my_data.duckdb' AS persistent_db;
-- Attach in-memory databaseATTACH ':memory:' AS memory_db;
-- Create tables in eachCREATE TABLE persistent_db.customers (id INT, name VARCHAR);CREATE TABLE memory_db.temp_results (id INT, score FLOAT);
-- Query across both databasesSELECT c.*, t.scoreFROM persistent_db.customers cJOIN memory_db.temp_results t ON c.id = t.id;Switching Between Databases
-- Switch active databaseUSE memory_db;
-- Now tables are created here by defaultCREATE TABLE working_data (x INT);
-- Switch back to persistentUSE persistent_db;Search Path
Simplify queries by setting a search path:
SET search_path = 'persistent_db,memory_db';
-- Now can omit database prefixSELECT * FROM customers; -- Uses persistent_db.customersApproach 2: Temporary Tables
Create temporary tables within a persistent database:
-- Regular table (persists to disk)CREATE TABLE persistent_data (id INT, value VARCHAR);
-- Temporary table (stays in memory, spills to disk only if needed)CREATE TEMP TABLE temp_calculations (id INT, result FLOAT);
-- Or use full syntaxCREATE TEMPORARY TABLE staging_data ASSELECT * FROM persistent_data WHERE value LIKE 'temp%';Temporary Table Behavior
- Storage: Resides in memory by default
- Spilling: Automatically spills to disk if memory constrained
- Location: Uses
temp_directoryconfiguration (defaults todatabase.db.tmp) - Lifecycle: Automatically dropped when connection closes
- Configuration:
SET temp_directory = '/tmp/duckdb_swap';
Approach 3: Cross-Database Joins
Yes, you can join across attached databases! This is a core feature of DuckDB’s multi-database support.
Basic Cross-Database Join
-- Attach databasesATTACH 'sales.duckdb' AS sales;ATTACH ':memory:' AS analytics;
-- Join across themSELECT c.customer_name, c.email, a.total_spend, a.visit_countFROM sales.customers cJOIN analytics.customer_metrics a ON c.id = a.customer_idWHERE a.total_spend > 1000;Multi-Way Joins Across Databases
Join data from multiple databases simultaneously:
ATTACH 'transactions.duckdb' AS txn;ATTACH 'reference.duckdb' AS ref;ATTACH ':memory:' AS workspace;
SELECT t.transaction_id, t.amount, c.category_name, w.calculated_scoreFROM txn.transactions tJOIN ref.categories c ON t.category_id = c.idJOIN workspace.scores w ON t.id = w.transaction_idWHERE t.date >= '2024-01-01';Cross-Database Engine Joins
DuckDB can join across different database engines in a single query:
-- Attach different database typesATTACH 'postgres://localhost/production' AS pg (TYPE POSTGRES);ATTACH 'legacy.sqlite' AS sqlite (TYPE SQLITE);ATTACH 'warehouse.duckdb' AS duck;ATTACH ':memory:' AS temp;
-- Join them all togetherSELECT u.username, o.order_date, p.product_name, t.enrichment_dataFROM pg.users uJOIN sqlite.orders o ON u.id = o.user_idJOIN duck.products p ON o.product_id = p.idJOIN temp.enrichments t ON o.id = t.order_idWHERE o.status = 'completed';Performance Characteristics
How DuckDB Optimizes Cross-Database Joins:
-
Filter Pushdown: Predicates are pushed to source databases when possible
WHERE title = 'ACE GOLDFINGER'gets executed at the source- Reduces data transfer by filtering early
-
Scan Operations: Uses specialized scanners (
POSTGRES_SCAN,SQLITE_SCAN, etc.)- Efficiently pulls data from each source
- Minimizes network overhead
-
Hash Joins: Combines results using optimized hash join algorithms
- Fast in-memory combining of data from different sources
-
Cost-Based Optimization: Query optimizer uses statistics to determine optimal join order
Example Query Plan:
┌─────────────────────────────────┐│ HASH_JOIN ││ ┌─────────┴─────────┐ ││ POSTGRES_SCAN SQLITE_SCAN │└─────────────────────────────────┘Read vs Write Constraints
- Reading: Can read from unlimited attached databases in a single query ✅
- Writing: Can only write to ONE database per transaction ❌
-- This WORKS - read from multiple databasesSELECT * FROM db1.table1 JOIN db2.table2 USING (id);
-- This FAILS - write to multiple databasesBEGIN;INSERT INTO db1.table1 VALUES (1);INSERT INTO db2.table2 VALUES (2); -- ERROR!COMMIT;Approach 4: Copy Between Databases
Transfer data between in-memory and persistent storage:
-- Save in-memory database to diskATTACH 'backup.db' AS backup;COPY FROM DATABASE memory TO backup;DETACH backup;
-- Load persistent data into memoryATTACH 'source.db' AS source;COPY FROM DATABASE source TO memory;Key Limitations & Considerations
Transaction Constraint
Critical: You can only write to ONE attached database per transaction:
-- This WORKS (single database)BEGIN TRANSACTION;INSERT INTO persistent_db.table1 VALUES (1);INSERT INTO persistent_db.table2 VALUES (2);COMMIT;
-- This FAILS (multiple databases)BEGIN TRANSACTION;INSERT INTO persistent_db.table1 VALUES (1);INSERT INTO memory_db.table2 VALUES (2); -- ERROR!COMMIT;Session Persistence
- ATTACH definitions are not persistent between sessions
- Must re-attach databases when starting a new session
- Temporary tables are automatically dropped when connection closes
Memory Management
- Both persistent and in-memory databases can spill to disk for larger-than-memory workloads
- Temporary directory defaults to
{database_name}.tmp - Configure with
SET temp_directory = '/path/to/temp'
Use Cases
Development/Testing
-- Keep test data in memory for speedATTACH ':memory:' AS test_db;ATTACH 'production.duckdb' AS prod;
-- Copy subset to testCREATE TABLE test_db.customers ASSELECT * FROM prod.customers LIMIT 1000;ETL Pipelines
-- Stage transformations in memoryCREATE TEMP TABLE staging ASSELECT * FROM source_data WHERE date >= '2024-01-01';
-- Heavy transformations in memoryCREATE TEMP TABLE transformed ASSELECT /* complex aggregations */ FROM staging;
-- Write final results to persistent storageINSERT INTO final_table SELECT * FROM transformed;Analytics Workspace
-- Persistent: historical dataCREATE TABLE historical_sales (date DATE, amount DECIMAL);
-- In-memory: session calculationsCREATE TEMP TABLE daily_metrics ASSELECT date, SUM(amount) as totalFROM historical_salesGROUP BY date;Performance Characteristics
| Aspect | In-Memory | Persistent | Temporary |
|---|---|---|---|
| Speed | Fastest | Moderate | Fast (memory) + spillable |
| Durability | Lost on close | Survives sessions | Lost on close |
| Memory usage | High | Low (cached) | High (spillable) |
| Best for | Temp calculations | Long-term storage | Session working data |