postgres-scan-memory-accumulation

Research Date: 2026-02-05

Problem

When using DuckDB with the postgres_scan extension to persist large datasets from DuckDB to PostgreSQL in batches, memory usage continuously increases without being released between batch operations. This can lead to:

Out-of-memory (OOM) crashes on large datasets
Unstable connections due to prolonged processing times
Failed operations that are difficult to recover

Root Cause

This is a known memory leak in the DuckDB postgres_scan extension, documented in multiple GitHub issues:

Primary Issue

The postgres extension fails to call PQclear() on PGresult pointers in certain cases, particularly when executing COPY FROM operations. This means PostgreSQL query results are not properly cleaned up, causing memory to accumulate.

Issue #274: Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet
Issue #158: postgres_scanner causes memory leak
Issue #15171: Memory Leak in PostgreSQL Extension
Issue #15176: Memory leak on repeated INSERT into persistent database

Why Batching Doesn’t Help

Expected behavior: Batching should process data in chunks, releasing memory between batches.

Actual behavior: Memory accumulates continuously regardless of batch size because:

Query results from previous batches are retained in memory
DuckDB appears to keep all previous query results that were inserted
The postgres extension doesn’t properly release PostgreSQL result objects

As one user reported: “DuckDB runs out of memory anyway no matter how small the batches are.”

Workarounds

1. Subprocess Isolation (Most Reliable)

Run each batch in a separate subprocess so the OS cleans up memory when the process exits:

#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> subprocess
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> json

def#2A2E3A;--1:#DADDE5"> batch_insert_subprocess#2965A8;--1:#80BBFF">(batch_data#2965A8;--1:#80BBFF">,#2A2E3A;--1:#DADDE5"> batch_num#2965A8;--1:#80BBFF">):
    #2965A8;--1:#80BBFF">"""#25723F;--1:#82D99F">Execute DuckDB batch insert in isolated subprocess#2965A8;--1:#80BBFF">"""
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">sql =#2A2E3A;--1:#DADDE5"> f#25723F;--1:#82D99F">"""
#25723F;--1:#82D99F">    #25723F;--1:#82D99F">ATTACH 'postgresql://user:pass@host/db' AS pg (TYPE POSTGRES);
#25723F;--1:#82D99F">    #25723F;--1:#82D99F">INSERT INTO pg.target_table
#25723F;--1:#82D99F">    #25723F;--1:#82D99F">SELECT * FROM read_json_auto('batch_#2965A8;--1:#80BBFF">{#2A2E3A;--1:#DADDE5">batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json');
#25723F;--1:#82D99F">    #25723F;--1:#82D99F">"""

    #606369;--1:#8E9299"># Write batch data to temp file
    #2965A8;--1:#80BBFF">with#2A2E3A;--1:#DADDE5"> open#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">w#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">as#2A2E3A;--1:#DADDE5"> f#2965A8;--1:#80BBFF">:
#2A2E3A;--1:#DADDE5">        #2A2E3A;--1:#DADDE5">json#2965A8;--1:#80BBFF">.dump#2965A8;--1:#80BBFF">(batch_data#2965A8;--1:#80BBFF">, f#2965A8;--1:#80BBFF">)

    #606369;--1:#8E9299"># Run DuckDB CLI in subprocess
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">result =#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">.run#2965A8;--1:#80BBFF">(
        #2965A8;--1:#80BBFF">[#2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">duckdb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">-c#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, sql#2965A8;--1:#80BBFF">],
        capture_output=#2965A8;--1:#80BBFF">True,
        text=#2965A8;--1:#80BBFF">True
    #2965A8;--1:#80BBFF">)

    #606369;--1:#8E9299"># Clean up temp file
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">os#2965A8;--1:#80BBFF">.remove#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json'#2965A8;--1:#80BBFF">)

    #2965A8;--1:#80BBFF">return#2A2E3A;--1:#DADDE5"> result#2965A8;--1:#80BBFF">.#2A2E3A;--1:#DADDE5">returncode ==#2A2E3A;--1:#DADDE5"> 0

#606369;--1:#8E9299"># Process batches
#2965A8;--1:#80BBFF">for#2A2E3A;--1:#DADDE5"> i#2965A8;--1:#80BBFF">,#2A2E3A;--1:#DADDE5"> batch #2965A8;--1:#80BBFF">in#2A2E3A;--1:#DADDE5"> enumerate#2965A8;--1:#80BBFF">(data_batches#2965A8;--1:#80BBFF">):
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">success =#2A2E3A;--1:#DADDE5"> batch_insert_subprocess#2965A8;--1:#80BBFF">(batch#2965A8;--1:#80BBFF">, i#2965A8;--1:#80BBFF">)
    #2965A8;--1:#80BBFF">if#2A2E3A;--1:#DADDE5"> not#2A2E3A;--1:#DADDE5"> success#2965A8;--1:#80BBFF">:
        print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> failed"#2965A8;--1:#80BBFF">)

Pros:

OS guarantees complete memory cleanup
Isolates failures to individual batches
Most reliable workaround

Cons:

Higher overhead from process creation
Slower than in-process batching
Requires file-based data passing

2. Memory Limit Configuration

Set a conservative memory limit (50-60% of system RAM) to force earlier disk spilling:

#606369;--1:#8E9299">-- Set at connection start
SET#2A2E3A;--1:#DADDE5"> memory_limit =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">8GB#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">;  #606369;--1:#8E9299">-- For 16GB system
SET#2A2E3A;--1:#DADDE5"> temp_directory =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">/fast/ssd/duckdb_tmp#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">;

Rationale: Some DuckDB operations bypass the buffer manager and can exceed the configured limit. Setting a lower limit provides headroom.

Pros:

Simple configuration change
Prevents complete OOM crashes
Works with existing code

Cons:

Doesn’t fix the leak, just delays it
May cause disk spilling, slowing operations
Still will fail on large enough datasets

3. Clear Postgres Cache

Use pg_clear_cache() between batches to clear internal PostgreSQL caches:

ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);

#606369;--1:#8E9299">-- Batch 1
INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_1;
SELECT#2A2E3A;--1:#DADDE5"> pg_clear_cache#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5">;

#606369;--1:#8E9299">-- Batch 2
INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_2;
SELECT#2A2E3A;--1:#DADDE5"> pg_clear_cache#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5">;

Pros:

May help with cache-related memory growth
Simple to add to existing batch loops

Cons:

Unclear if this addresses the core leak
Limited documentation on effectiveness
May not help with PGresult pointer leaks

4. Copy Data Locally First (Recommended for Read Operations)

Instead of repeatedly querying PostgreSQL, create a local DuckDB copy first:

#606369;--1:#8E9299">-- One-time copy to local DuckDB
ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);
ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">local_copy.duckdb#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">;

CREATE#2A2E3A;--1:#DADDE5"> TABLE#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">.staging AS
SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> pg.source_table;

#2A2E3A;--1:#DADDE5">DETACH pg;

#606369;--1:#8E9299">-- Now work with local data
#606369;--1:#8E9299">-- Generate/transform records in DuckDB
CREATE#2A2E3A;--1:#DADDE5"> TABLE#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">.generated_data AS
SELECT#2A2E3A;--1:#DADDE5"> #606369;--1:#8E9299">/* your transformations */#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> local.staging;

#606369;--1:#8E9299">-- Batch insert back to Postgres (memory still accumulates, but fewer operations)
ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);
INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> local.generated_data;

Pros:

Reduces postgres_scan usage
Faster transformations on local data
More stable (no network issues)

Cons:

Requires disk space for local copy
Initial copy still subject to leak
Not suitable if source data changes frequently

5. Explicit Checkpointing (May Help with DuckDB-side Memory)

Force checkpoints to flush WAL and potentially release some memory:

#606369;--1:#8E9299">-- After each batch
INSERT INTO#2A2E3A;--1:#DADDE5"> target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_data;
CHECKPOINT#2A2E3A;--1:#DADDE5">;  #606369;--1:#8E9299">-- or FORCE CHECKPOINT

Configuration:

SET#2A2E3A;--1:#DADDE5"> checkpoint_threshold =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">16MB#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">;  #606369;--1:#8E9299">-- Default, adjust as needed

Note: This addresses DuckDB’s internal memory management but does not fix the postgres_scan leak. It may help with write-heavy operations to persistent DuckDB databases.

Pros:

Flushes WAL to disk
May reclaim some buffer memory
Useful for persistent DuckDB operations

Cons:

Doesn’t fix postgres extension memory leak
Adds I/O overhead
Limited benefit for the core issue

6. Use Direct PostgreSQL Connection (Alternative Approach)

Bypass DuckDB’s postgres_scan entirely for large write operations:

#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> duckdb
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> psycopg2

#606369;--1:#8E9299"># Generate data in DuckDB
#2A2E3A;--1:#DADDE5">conn_duck =#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">()
#2A2E3A;--1:#DADDE5">conn_duck#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">CREATE TABLE generated AS SELECT /* ... */#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)

#606369;--1:#8E9299"># Export to Parquet
#2A2E3A;--1:#DADDE5">conn_duck#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">COPY generated TO 'batch.parquet'#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)

#606369;--1:#8E9299"># Use native PostgreSQL client to import
#2A2E3A;--1:#DADDE5">conn_pg =#2A2E3A;--1:#DADDE5"> psycopg2#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5">cursor =#2A2E3A;--1:#DADDE5"> conn_pg#2965A8;--1:#80BBFF">.cursor#2965A8;--1:#80BBFF">()

#606369;--1:#8E9299"># Use COPY FROM or bulk insert
#2965A8;--1:#80BBFF">with#2A2E3A;--1:#DADDE5"> open#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">batch.parquet#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">rb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">as#2A2E3A;--1:#DADDE5"> f#2965A8;--1:#80BBFF">:
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">cursor#2965A8;--1:#80BBFF">.copy_expert#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">COPY target_table FROM STDIN WITH (FORMAT parquet)#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">, f#2965A8;--1:#80BBFF">)

#2A2E3A;--1:#DADDE5">conn_pg#2965A8;--1:#80BBFF">.commit#2965A8;--1:#80BBFF">()

Pros:

Completely avoids postgres_scan memory leak
Uses native PostgreSQL drivers
More control over connection management

Cons:

More complex code
Requires intermediate files
Loses convenience of DuckDB’s integrated approach

Expected Behavior vs Reality

Aspect	Expected	Reality
Batch memory	Released after each batch	Accumulates continuously
Batch size impact	Smaller = less memory	No significant effect
Long operations	Stable with proper batching	Becomes unstable, OOMs
Connection stability	Improved with batching	Still vulnerable due to duration

Status and Fix Timeline

Current Status (as of Feb 2026):

Issue is acknowledged by DuckDB team
Multiple related GitHub issues open
Some fixes merged for pg_duckdb (reverse direction: Postgres → DuckDB)
Core postgres_scan leak remains unfixed

Recent Fixes (pg_duckdb, different from postgres_scan):

Memory leak on query failure
Memory leak when reading LIST/JSON/JSONB columns from Postgres

Recommendation: Monitor these GitHub issues for updates:

Best Practice Recommendations

For Large Dataset Migrations

Use subprocess isolation (Workaround #1) - most reliable
Set conservative memory limits (Workaround #2) - safety net
Monitor memory usage - detect issues early
Implement retry logic - handle failures gracefully

For Ongoing Operations

Copy data locally first (Workaround #4) - reduce postgres_scan usage
Consider direct PostgreSQL connection (Workaround #6) for writes
Use postgres_scan primarily for read operations
Keep operations short-lived when using postgres_scan

General Guidelines

#606369;--1:#8E9299"># Example: Robust batch processing with subprocess isolation
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> subprocess
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> duckdb
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> time

def#2A2E3A;--1:#DADDE5"> process_large_dataset#2965A8;--1:#80BBFF">():
    #606369;--1:#8E9299"># Generate batches in main process
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">conn =#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">()
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">CREATE TABLE source AS SELECT /* large dataset */#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)

#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">total_rows =#2A2E3A;--1:#DADDE5"> conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">SELECT COUNT(*) FROM source#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">).fetchone#2965A8;--1:#80BBFF">()[0#2965A8;--1:#80BBFF">]
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">batch_size =#2A2E3A;--1:#DADDE5"> 10000
#2A2E3A;--1:#DADDE5">    #2A2E3A;--1:#DADDE5">num_batches =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">(#2A2E3A;--1:#DADDE5">total_rows +#2A2E3A;--1:#DADDE5"> batch_size -#2A2E3A;--1:#DADDE5"> 1#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> //#2A2E3A;--1:#DADDE5"> batch_size

    print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Processing #2965A8;--1:#80BBFF">{total_rows#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> rows in #2965A8;--1:#80BBFF">{num_batches#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> batches"#2965A8;--1:#80BBFF">)

    #2965A8;--1:#80BBFF">for#2A2E3A;--1:#DADDE5"> i #2965A8;--1:#80BBFF">in#2A2E3A;--1:#DADDE5"> range#2965A8;--1:#80BBFF">(num_batches#2965A8;--1:#80BBFF">):
#2A2E3A;--1:#DADDE5">        #2A2E3A;--1:#DADDE5">offset =#2A2E3A;--1:#DADDE5"> i *#2A2E3A;--1:#DADDE5"> batch_size

        #606369;--1:#8E9299"># Export batch to temp file
#2A2E3A;--1:#DADDE5">        #2A2E3A;--1:#DADDE5">conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"""
#25723F;--1:#82D99F">            #25723F;--1:#82D99F">COPY (SELECT * FROM source LIMIT #2965A8;--1:#80BBFF">{batch_size#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> OFFSET #2965A8;--1:#80BBFF">{offset#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">)
#25723F;--1:#82D99F">            #25723F;--1:#82D99F">TO 'batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet'
#25723F;--1:#82D99F">        #25723F;--1:#82D99F">"""#2965A8;--1:#80BBFF">)

        #606369;--1:#8E9299"># Process in subprocess
#2A2E3A;--1:#DADDE5">        #2A2E3A;--1:#DADDE5">result =#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">.run#2965A8;--1:#80BBFF">([
            #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">duckdb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">-c#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, f#25723F;--1:#82D99F">"""
#25723F;--1:#82D99F">            #25723F;--1:#82D99F">ATTACH 'postgresql://host/db' AS pg (TYPE POSTGRES);
#25723F;--1:#82D99F">            #25723F;--1:#82D99F">INSERT INTO pg.target_table
#25723F;--1:#82D99F">            #25723F;--1:#82D99F">SELECT * FROM read_parquet('batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet');
#25723F;--1:#82D99F">            #25723F;--1:#82D99F">"""
        #2965A8;--1:#80BBFF">], capture_output=#2965A8;--1:#80BBFF">True)

        #606369;--1:#8E9299"># Clean up
#2A2E3A;--1:#DADDE5">        #2A2E3A;--1:#DADDE5">os#2965A8;--1:#80BBFF">.remove#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet'#2965A8;--1:#80BBFF">)

        #2965A8;--1:#80BBFF">if#2A2E3A;--1:#DADDE5"> result#2965A8;--1:#80BBFF">.#2A2E3A;--1:#DADDE5">returncode !=#2A2E3A;--1:#DADDE5"> 0#2965A8;--1:#80BBFF">:
            print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> failed: #2965A8;--1:#80BBFF">{result#2965A8;--1:#80BBFF">.stderr#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">"#2965A8;--1:#80BBFF">)
            #606369;--1:#8E9299"># Implement retry logic here
        #2965A8;--1:#80BBFF">else:
            print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">/#2965A8;--1:#80BBFF">{num_batches#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> completed"#2965A8;--1:#80BBFF">)

#2A2E3A;--1:#DADDE5">        #2A2E3A;--1:#DADDE5">time#2965A8;--1:#80BBFF">.sleep#2965A8;--1:#80BBFF">(0.1#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5">  #606369;--1:#8E9299"># Brief pause to avoid overwhelming PostgreSQL

DuckDB Memory Management in General

For background on DuckDB’s memory management system:

Other Known Memory Issues

Issue #10034: OOM error after repeated queries
Issue #8061: Memory leak on table insert from parquet
Issue #3610: Memory leak with multiple queries using same JDBC Statement

Summary

The postgres_scan memory accumulation issue is a known bug, not expected behavior. Batching alone will not solve it because the underlying memory leak prevents proper cleanup between batches.

Most effective workaround: Use subprocess isolation to guarantee OS-level memory cleanup.

Alternative: Minimize postgres_scan usage by copying data locally first, or use native PostgreSQL clients for large write operations.

Timeline: Monitor GitHub issues for official fixes. In the meantime, implement workarounds based on your use case.

postgres-scan-memory-accumulation

Problem

Root Cause

Primary Issue

Related Issues

Why Batching Doesn’t Help

Workarounds

1. Subprocess Isolation (Most Reliable)

2. Memory Limit Configuration

3. Clear Postgres Cache

4. Copy Data Locally First (Recommended for Read Operations)

5. Explicit Checkpointing (May Help with DuckDB-side Memory)

6. Use Direct PostgreSQL Connection (Alternative Approach)

Expected Behavior vs Reality

Status and Fix Timeline

Best Practice Recommendations

For Large Dataset Migrations

For Ongoing Operations

General Guidelines

Related Issues

DuckDB Memory Management in General

Other Known Memory Issues

Summary

Sources