hybrid-memory-persistent-tables

Research Date: 2025-11-04

Question

Can DuckDB use a hybrid mode where some tables are in-memory and some are persisted to disk within the same database session?

Answer

Yes, DuckDB supports hybrid storage through multiple approaches:

Approach 1: ATTACH Multiple Databases

You can attach both in-memory and persistent databases in the same session:

-- Attach persistent database
ATTACH 'my_data.duckdb' AS persistent_db;

-- Attach in-memory database
ATTACH ':memory:' AS memory_db;

-- Create tables in each
CREATE TABLE persistent_db.customers (id INT, name VARCHAR);
CREATE TABLE memory_db.temp_results (id INT, score FLOAT);

-- Query across both databases
SELECT c.*, t.score
FROM persistent_db.customers c
JOIN memory_db.temp_results t ON c.id = t.id;

Switching Between Databases

-- Switch active database
USE memory_db;

-- Now tables are created here by default
CREATE TABLE working_data (x INT);

-- Switch back to persistent
USE persistent_db;

Search Path

Simplify queries by setting a search path:

SET search_path = 'persistent_db,memory_db';

-- Now can omit database prefix
SELECT * FROM customers;  -- Uses persistent_db.customers

Approach 2: Temporary Tables

Create temporary tables within a persistent database:

-- Regular table (persists to disk)
CREATE TABLE persistent_data (id INT, value VARCHAR);

-- Temporary table (stays in memory, spills to disk only if needed)
CREATE TEMP TABLE temp_calculations (id INT, result FLOAT);

-- Or use full syntax
CREATE TEMPORARY TABLE staging_data AS
SELECT * FROM persistent_data WHERE value LIKE 'temp%';

Temporary Table Behavior

Storage: Resides in memory by default
Spilling: Automatically spills to disk if memory constrained
Location: Uses temp_directory configuration (defaults to database.db.tmp)
Lifecycle: Automatically dropped when connection closes

Configuration:

SET temp_directory = '/tmp/duckdb_swap';

Approach 3: Cross-Database Joins

Yes, you can join across attached databases! This is a core feature of DuckDB’s multi-database support.

Basic Cross-Database Join

-- Attach databases
ATTACH 'sales.duckdb' AS sales;
ATTACH ':memory:' AS analytics;

-- Join across them
SELECT
    c.customer_name,
    c.email,
    a.total_spend,
    a.visit_count
FROM sales.customers c
JOIN analytics.customer_metrics a ON c.id = a.customer_id
WHERE a.total_spend > 1000;

Multi-Way Joins Across Databases

Join data from multiple databases simultaneously:

ATTACH 'transactions.duckdb' AS txn;
ATTACH 'reference.duckdb' AS ref;
ATTACH ':memory:' AS workspace;

SELECT
    t.transaction_id,
    t.amount,
    c.category_name,
    w.calculated_score
FROM txn.transactions t
JOIN ref.categories c ON t.category_id = c.id
JOIN workspace.scores w ON t.id = w.transaction_id
WHERE t.date >= '2024-01-01';

Cross-Database Engine Joins

DuckDB can join across different database engines in a single query:

-- Attach different database types
ATTACH 'postgres://localhost/production' AS pg (TYPE POSTGRES);
ATTACH 'legacy.sqlite' AS sqlite (TYPE SQLITE);
ATTACH 'warehouse.duckdb' AS duck;
ATTACH ':memory:' AS temp;

-- Join them all together
SELECT
    u.username,
    o.order_date,
    p.product_name,
    t.enrichment_data
FROM pg.users u
JOIN sqlite.orders o ON u.id = o.user_id
JOIN duck.products p ON o.product_id = p.id
JOIN temp.enrichments t ON o.id = t.order_id
WHERE o.status = 'completed';

Performance Characteristics

How DuckDB Optimizes Cross-Database Joins:

Filter Pushdown: Predicates are pushed to source databases when possible
- WHERE title = 'ACE GOLDFINGER' gets executed at the source
- Reduces data transfer by filtering early
Scan Operations: Uses specialized scanners (POSTGRES_SCAN, SQLITE_SCAN, etc.)
- Efficiently pulls data from each source
- Minimizes network overhead
Hash Joins: Combines results using optimized hash join algorithms
- Fast in-memory combining of data from different sources
Cost-Based Optimization: Query optimizer uses statistics to determine optimal join order

Example Query Plan:

┌─────────────────────────────────┐
│         HASH_JOIN              │
│   ┌─────────┴─────────┐       │
│   POSTGRES_SCAN    SQLITE_SCAN │
└─────────────────────────────────┘

Read vs Write Constraints

Reading: Can read from unlimited attached databases in a single query ✅
Writing: Can only write to ONE database per transaction ❌

-- This WORKS - read from multiple databases
SELECT * FROM db1.table1 JOIN db2.table2 USING (id);

-- This FAILS - write to multiple databases
BEGIN;
INSERT INTO db1.table1 VALUES (1);
INSERT INTO db2.table2 VALUES (2);  -- ERROR!
COMMIT;

Approach 4: Copy Between Databases

Transfer data between in-memory and persistent storage:

-- Save in-memory database to disk
ATTACH 'backup.db' AS backup;
COPY FROM DATABASE memory TO backup;
DETACH backup;

-- Load persistent data into memory
ATTACH 'source.db' AS source;
COPY FROM DATABASE source TO memory;

Key Limitations & Considerations

Transaction Constraint

Critical: You can only write to ONE attached database per transaction:

-- This WORKS (single database)
BEGIN TRANSACTION;
INSERT INTO persistent_db.table1 VALUES (1);
INSERT INTO persistent_db.table2 VALUES (2);
COMMIT;

-- This FAILS (multiple databases)
BEGIN TRANSACTION;
INSERT INTO persistent_db.table1 VALUES (1);
INSERT INTO memory_db.table2 VALUES (2);  -- ERROR!
COMMIT;

Session Persistence

ATTACH definitions are not persistent between sessions
Must re-attach databases when starting a new session
Temporary tables are automatically dropped when connection closes

Memory Management

Both persistent and in-memory databases can spill to disk for larger-than-memory workloads
Temporary directory defaults to {database_name}.tmp
Configure with SET temp_directory = '/path/to/temp'

Use Cases

Development/Testing

-- Keep test data in memory for speed
ATTACH ':memory:' AS test_db;
ATTACH 'production.duckdb' AS prod;

-- Copy subset to test
CREATE TABLE test_db.customers AS
SELECT * FROM prod.customers LIMIT 1000;

ETL Pipelines

-- Stage transformations in memory
CREATE TEMP TABLE staging AS
SELECT * FROM source_data WHERE date >= '2024-01-01';

-- Heavy transformations in memory
CREATE TEMP TABLE transformed AS
SELECT /* complex aggregations */ FROM staging;

-- Write final results to persistent storage
INSERT INTO final_table SELECT * FROM transformed;

Analytics Workspace

-- Persistent: historical data
CREATE TABLE historical_sales (date DATE, amount DECIMAL);

-- In-memory: session calculations
CREATE TEMP TABLE daily_metrics AS
SELECT date, SUM(amount) as total
FROM historical_sales
GROUP BY date;

Performance Characteristics

Aspect	In-Memory	Persistent	Temporary
Speed	Fastest	Moderate	Fast (memory) + spillable
Durability	Lost on close	Survives sessions	Lost on close
Memory usage	High	Low (cached)	High (spillable)
Best for	Temp calculations	Long-term storage	Session working data

hybrid-memory-persistent-tables

Question

Answer

Approach 1: ATTACH Multiple Databases

Switching Between Databases

Search Path

Approach 2: Temporary Tables

Temporary Table Behavior

Approach 3: Cross-Database Joins

Basic Cross-Database Join

Multi-Way Joins Across Databases

Cross-Database Engine Joins

Performance Characteristics

Read vs Write Constraints

Approach 4: Copy Between Databases

Key Limitations & Considerations

Transaction Constraint

Session Persistence

Memory Management

Use Cases

Development/Testing

ETL Pipelines

Analytics Workspace

Performance Characteristics

References