Research Date: 2025-11-04

Question

Can DuckDB use a hybrid mode where some tables are in-memory and some are persisted to disk within the same database session?

Answer

Yes, DuckDB supports hybrid storage through multiple approaches:

Approach 1: ATTACH Multiple Databases

You can attach both in-memory and persistent databases in the same session:

-- Attach persistent database
ATTACH 'my_data.duckdb' AS persistent_db;
-- Attach in-memory database
ATTACH ':memory:' AS memory_db;
-- Create tables in each
CREATE TABLE persistent_db.customers (id INT, name VARCHAR);
CREATE TABLE memory_db.temp_results (id INT, score FLOAT);
-- Query across both databases
SELECT c.*, t.score
FROM persistent_db.customers c
JOIN memory_db.temp_results t ON c.id = t.id;

Switching Between Databases

-- Switch active database
USE memory_db;
-- Now tables are created here by default
CREATE TABLE working_data (x INT);
-- Switch back to persistent
USE persistent_db;

Search Path

Simplify queries by setting a search path:

SET search_path = 'persistent_db,memory_db';
-- Now can omit database prefix
SELECT * FROM customers; -- Uses persistent_db.customers

Approach 2: Temporary Tables

Create temporary tables within a persistent database:

-- Regular table (persists to disk)
CREATE TABLE persistent_data (id INT, value VARCHAR);
-- Temporary table (stays in memory, spills to disk only if needed)
CREATE TEMP TABLE temp_calculations (id INT, result FLOAT);
-- Or use full syntax
CREATE TEMPORARY TABLE staging_data AS
SELECT * FROM persistent_data WHERE value LIKE 'temp%';

Temporary Table Behavior

  • Storage: Resides in memory by default
  • Spilling: Automatically spills to disk if memory constrained
  • Location: Uses temp_directory configuration (defaults to database.db.tmp)
  • Lifecycle: Automatically dropped when connection closes
  • Configuration:
    SET temp_directory = '/tmp/duckdb_swap';

Approach 3: Cross-Database Joins

Yes, you can join across attached databases! This is a core feature of DuckDB’s multi-database support.

Basic Cross-Database Join

-- Attach databases
ATTACH 'sales.duckdb' AS sales;
ATTACH ':memory:' AS analytics;
-- Join across them
SELECT
c.customer_name,
c.email,
a.total_spend,
a.visit_count
FROM sales.customers c
JOIN analytics.customer_metrics a ON c.id = a.customer_id
WHERE a.total_spend > 1000;

Multi-Way Joins Across Databases

Join data from multiple databases simultaneously:

ATTACH 'transactions.duckdb' AS txn;
ATTACH 'reference.duckdb' AS ref;
ATTACH ':memory:' AS workspace;
SELECT
t.transaction_id,
t.amount,
c.category_name,
w.calculated_score
FROM txn.transactions t
JOIN ref.categories c ON t.category_id = c.id
JOIN workspace.scores w ON t.id = w.transaction_id
WHERE t.date >= '2024-01-01';

Cross-Database Engine Joins

DuckDB can join across different database engines in a single query:

-- Attach different database types
ATTACH 'postgres://localhost/production' AS pg (TYPE POSTGRES);
ATTACH 'legacy.sqlite' AS sqlite (TYPE SQLITE);
ATTACH 'warehouse.duckdb' AS duck;
ATTACH ':memory:' AS temp;
-- Join them all together
SELECT
u.username,
o.order_date,
p.product_name,
t.enrichment_data
FROM pg.users u
JOIN sqlite.orders o ON u.id = o.user_id
JOIN duck.products p ON o.product_id = p.id
JOIN temp.enrichments t ON o.id = t.order_id
WHERE o.status = 'completed';

Performance Characteristics

How DuckDB Optimizes Cross-Database Joins:

  1. Filter Pushdown: Predicates are pushed to source databases when possible

    • WHERE title = 'ACE GOLDFINGER' gets executed at the source
    • Reduces data transfer by filtering early
  2. Scan Operations: Uses specialized scanners (POSTGRES_SCAN, SQLITE_SCAN, etc.)

    • Efficiently pulls data from each source
    • Minimizes network overhead
  3. Hash Joins: Combines results using optimized hash join algorithms

    • Fast in-memory combining of data from different sources
  4. Cost-Based Optimization: Query optimizer uses statistics to determine optimal join order

Example Query Plan:

┌─────────────────────────────────┐
│ HASH_JOIN │
│ ┌─────────┴─────────┐ │
│ POSTGRES_SCAN SQLITE_SCAN │
└─────────────────────────────────┘

Read vs Write Constraints

  • Reading: Can read from unlimited attached databases in a single query ✅
  • Writing: Can only write to ONE database per transaction ❌
-- This WORKS - read from multiple databases
SELECT * FROM db1.table1 JOIN db2.table2 USING (id);
-- This FAILS - write to multiple databases
BEGIN;
INSERT INTO db1.table1 VALUES (1);
INSERT INTO db2.table2 VALUES (2); -- ERROR!
COMMIT;

Approach 4: Copy Between Databases

Transfer data between in-memory and persistent storage:

-- Save in-memory database to disk
ATTACH 'backup.db' AS backup;
COPY FROM DATABASE memory TO backup;
DETACH backup;
-- Load persistent data into memory
ATTACH 'source.db' AS source;
COPY FROM DATABASE source TO memory;

Key Limitations & Considerations

Transaction Constraint

Critical: You can only write to ONE attached database per transaction:

-- This WORKS (single database)
BEGIN TRANSACTION;
INSERT INTO persistent_db.table1 VALUES (1);
INSERT INTO persistent_db.table2 VALUES (2);
COMMIT;
-- This FAILS (multiple databases)
BEGIN TRANSACTION;
INSERT INTO persistent_db.table1 VALUES (1);
INSERT INTO memory_db.table2 VALUES (2); -- ERROR!
COMMIT;

Session Persistence

  • ATTACH definitions are not persistent between sessions
  • Must re-attach databases when starting a new session
  • Temporary tables are automatically dropped when connection closes

Memory Management

  • Both persistent and in-memory databases can spill to disk for larger-than-memory workloads
  • Temporary directory defaults to {database_name}.tmp
  • Configure with SET temp_directory = '/path/to/temp'

Use Cases

Development/Testing

-- Keep test data in memory for speed
ATTACH ':memory:' AS test_db;
ATTACH 'production.duckdb' AS prod;
-- Copy subset to test
CREATE TABLE test_db.customers AS
SELECT * FROM prod.customers LIMIT 1000;

ETL Pipelines

-- Stage transformations in memory
CREATE TEMP TABLE staging AS
SELECT * FROM source_data WHERE date >= '2024-01-01';
-- Heavy transformations in memory
CREATE TEMP TABLE transformed AS
SELECT /* complex aggregations */ FROM staging;
-- Write final results to persistent storage
INSERT INTO final_table SELECT * FROM transformed;

Analytics Workspace

-- Persistent: historical data
CREATE TABLE historical_sales (date DATE, amount DECIMAL);
-- In-memory: session calculations
CREATE TEMP TABLE daily_metrics AS
SELECT date, SUM(amount) as total
FROM historical_sales
GROUP BY date;

Performance Characteristics

AspectIn-MemoryPersistentTemporary
SpeedFastestModerateFast (memory) + spillable
DurabilityLost on closeSurvives sessionsLost on close
Memory usageHighLow (cached)High (spillable)
Best forTemp calculationsLong-term storageSession working data

References