Research Date: 2026-02-05

Problem

When using DuckDB with the postgres_scan extension to persist large datasets from DuckDB to PostgreSQL in batches, memory usage continuously increases without being released between batch operations. This can lead to:

  • Out-of-memory (OOM) crashes on large datasets
  • Unstable connections due to prolonged processing times
  • Failed operations that are difficult to recover

Root Cause

This is a known memory leak in the DuckDB postgres_scan extension, documented in multiple GitHub issues:

Primary Issue

The postgres extension fails to call PQclear() on PGresult pointers in certain cases, particularly when executing COPY FROM operations. This means PostgreSQL query results are not properly cleaned up, causing memory to accumulate.

  • Issue 74: Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet
  • Issue 58: postgres_scanner causes memory leak
  • Issue 5171: Memory Leak in PostgreSQL Extension
  • Issue 5176: Memory leak on repeated INSERT into persistent database

Why Batching Doesn’t Help

Expected behavior: Batching should process data in chunks, releasing memory between batches.

Actual behavior: Memory accumulates continuously regardless of batch size because:

  1. Query results from previous batches are retained in memory
  2. DuckDB appears to keep all previous query results that were inserted
  3. The postgres extension doesn’t properly release PostgreSQL result objects

As one user reported: “DuckDB runs out of memory anyway no matter how small the batches are.”

Workarounds

1. Subprocess Isolation (Most Reliable)

Run each batch in a separate subprocess so the OS cleans up memory when the process exits:

#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> subprocess
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> json
def#2A2E3A;--1:#DADDE5"> batch_insert_subprocess#2965A8;--1:#80BBFF">(batch_data#2965A8;--1:#80BBFF">,#2A2E3A;--1:#DADDE5"> batch_num#2965A8;--1:#80BBFF">):
#2965A8;--1:#80BBFF">"""#25723F;--1:#82D99F">Execute DuckDB batch insert in isolated subprocess#2965A8;--1:#80BBFF">"""
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">sql =#2A2E3A;--1:#DADDE5"> f#25723F;--1:#82D99F">"""
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">ATTACH 'postgresql://user:pass@host/db' AS pg (TYPE POSTGRES);
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">INSERT INTO pg.target_table
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">SELECT * FROM read_json_auto('batch_#2965A8;--1:#80BBFF">{#2A2E3A;--1:#DADDE5">batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json');
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">"""
#606369;--1:#8E9299"># Write batch data to temp file
#2965A8;--1:#80BBFF">with#2A2E3A;--1:#DADDE5"> open#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">w#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">as#2A2E3A;--1:#DADDE5"> f#2965A8;--1:#80BBFF">:
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">json#2965A8;--1:#80BBFF">.dump#2965A8;--1:#80BBFF">(batch_data#2965A8;--1:#80BBFF">, f#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Run DuckDB CLI in subprocess
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">result =#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">.run#2965A8;--1:#80BBFF">(
#2965A8;--1:#80BBFF">[#2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">duckdb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">-c#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, sql#2965A8;--1:#80BBFF">],
capture_output=#2965A8;--1:#80BBFF">True,
text=#2965A8;--1:#80BBFF">True
#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Clean up temp file
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">os#2965A8;--1:#80BBFF">.remove#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json'#2965A8;--1:#80BBFF">)
#2965A8;--1:#80BBFF">return#2A2E3A;--1:#DADDE5"> result#2965A8;--1:#80BBFF">.#2A2E3A;--1:#DADDE5">returncode ==#2A2E3A;--1:#DADDE5"> 0
#606369;--1:#8E9299"># Process batches
#2965A8;--1:#80BBFF">for#2A2E3A;--1:#DADDE5"> i#2965A8;--1:#80BBFF">,#2A2E3A;--1:#DADDE5"> batch #2965A8;--1:#80BBFF">in#2A2E3A;--1:#DADDE5"> enumerate#2965A8;--1:#80BBFF">(data_batches#2965A8;--1:#80BBFF">):
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">success =#2A2E3A;--1:#DADDE5"> batch_insert_subprocess#2965A8;--1:#80BBFF">(batch#2965A8;--1:#80BBFF">, i#2965A8;--1:#80BBFF">)
#2965A8;--1:#80BBFF">if#2A2E3A;--1:#DADDE5"> not#2A2E3A;--1:#DADDE5"> success#2965A8;--1:#80BBFF">:
print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> failed"#2965A8;--1:#80BBFF">)

Pros:

  • OS guarantees complete memory cleanup
  • Isolates failures to individual batches
  • Most reliable workaround

Cons:

  • Higher overhead from process creation
  • Slower than in-process batching
  • Requires file-based data passing

2. Memory Limit Configuration

Set a conservative memory limit (50-60% of system RAM) to force earlier disk spilling:

#606369;--1:#8E9299">-- Set at connection start
SET#2A2E3A;--1:#DADDE5"> memory_limit =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">8GB#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">; #606369;--1:#8E9299">-- For 16GB system
SET#2A2E3A;--1:#DADDE5"> temp_directory =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">/fast/ssd/duckdb_tmp#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">;

Rationale: Some DuckDB operations bypass the buffer manager and can exceed the configured limit. Setting a lower limit provides headroom.

Pros:

  • Simple configuration change
  • Prevents complete OOM crashes
  • Works with existing code

Cons:

  • Doesn’t fix the leak, just delays it
  • May cause disk spilling, slowing operations
  • Still will fail on large enough datasets

3. Clear Postgres Cache

Use pg_clear_cache() between batches to clear internal PostgreSQL caches:

ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);
#606369;--1:#8E9299">-- Batch 1
INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_1;
SELECT#2A2E3A;--1:#DADDE5"> pg_clear_cache#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5">;
#606369;--1:#8E9299">-- Batch 2
INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_2;
SELECT#2A2E3A;--1:#DADDE5"> pg_clear_cache#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5">;

Pros:

  • May help with cache-related memory growth
  • Simple to add to existing batch loops

Cons:

  • Unclear if this addresses the core leak
  • Limited documentation on effectiveness
  • May not help with PGresult pointer leaks

Instead of repeatedly querying PostgreSQL, create a local DuckDB copy first:

#606369;--1:#8E9299">-- One-time copy to local DuckDB
ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);
ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">local_copy.duckdb#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">;
CREATE#2A2E3A;--1:#DADDE5"> TABLE#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">.staging AS
SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> pg.source_table;
#2A2E3A;--1:#DADDE5">DETACH pg;
#606369;--1:#8E9299">-- Now work with local data
#606369;--1:#8E9299">-- Generate/transform records in DuckDB
CREATE#2A2E3A;--1:#DADDE5"> TABLE#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">.generated_data AS
SELECT#2A2E3A;--1:#DADDE5"> #606369;--1:#8E9299">/* your transformations */#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> local.staging;
#606369;--1:#8E9299">-- Batch insert back to Postgres (memory still accumulates, but fewer operations)
ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);
INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> local.generated_data;

Pros:

  • Reduces postgres_scan usage
  • Faster transformations on local data
  • More stable (no network issues)

Cons:

  • Requires disk space for local copy
  • Initial copy still subject to leak
  • Not suitable if source data changes frequently

5. Explicit Checkpointing (May Help with DuckDB-side Memory)

Force checkpoints to flush WAL and potentially release some memory:

#606369;--1:#8E9299">-- After each batch
INSERT INTO#2A2E3A;--1:#DADDE5"> target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_data;
CHECKPOINT#2A2E3A;--1:#DADDE5">; #606369;--1:#8E9299">-- or FORCE CHECKPOINT

Configuration:

SET#2A2E3A;--1:#DADDE5"> checkpoint_threshold =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">16MB#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">; #606369;--1:#8E9299">-- Default, adjust as needed

Note: This addresses DuckDB’s internal memory management but does not fix the postgres_scan leak. It may help with write-heavy operations to persistent DuckDB databases.

Pros:

  • Flushes WAL to disk
  • May reclaim some buffer memory
  • Useful for persistent DuckDB operations

Cons:

  • Doesn’t fix postgres extension memory leak
  • Adds I/O overhead
  • Limited benefit for the core issue

6. Use Direct PostgreSQL Connection (Alternative Approach)

Bypass DuckDB’s postgres_scan entirely for large write operations:

#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> duckdb
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> psycopg2
#606369;--1:#8E9299"># Generate data in DuckDB
#2A2E3A;--1:#DADDE5">conn_duck =#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">()
#2A2E3A;--1:#DADDE5">conn_duck#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">CREATE TABLE generated AS SELECT /* ... */#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Export to Parquet
#2A2E3A;--1:#DADDE5">conn_duck#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">COPY generated TO 'batch.parquet'#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Use native PostgreSQL client to import
#2A2E3A;--1:#DADDE5">conn_pg =#2A2E3A;--1:#DADDE5"> psycopg2#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5">cursor =#2A2E3A;--1:#DADDE5"> conn_pg#2965A8;--1:#80BBFF">.cursor#2965A8;--1:#80BBFF">()
#606369;--1:#8E9299"># Use COPY FROM or bulk insert
#2965A8;--1:#80BBFF">with#2A2E3A;--1:#DADDE5"> open#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">batch.parquet#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">rb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">as#2A2E3A;--1:#DADDE5"> f#2965A8;--1:#80BBFF">:
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">cursor#2965A8;--1:#80BBFF">.copy_expert#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">COPY target_table FROM STDIN WITH (FORMAT parquet)#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">, f#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5">conn_pg#2965A8;--1:#80BBFF">.commit#2965A8;--1:#80BBFF">()

Pros:

  • Completely avoids postgres_scan memory leak
  • Uses native PostgreSQL drivers
  • More control over connection management

Cons:

  • More complex code
  • Requires intermediate files
  • Loses convenience of DuckDB’s integrated approach

Expected Behavior vs Reality

AspectExpectedReality
Batch memoryReleased after each batchAccumulates continuously
Batch size impactSmaller = less memoryNo significant effect
Long operationsStable with proper batchingBecomes unstable, OOMs
Connection stabilityImproved with batchingStill vulnerable due to duration

Status and Fix Timeline

Current Status (as of Feb 2026):

  • Issue is acknowledged by DuckDB team
  • Multiple related GitHub issues open
  • Some fixes merged for pg_duckdb (reverse direction: Postgres → DuckDB)
  • Core postgres_scan leak remains unfixed

Recent Fixes (pg_duckdb, different from postgres_scan):

  • Memory leak on query failure
  • Memory leak when reading LIST/JSON/JSONB columns from Postgres

Recommendation: Monitor these GitHub issues for updates:

Best Practice Recommendations

For Large Dataset Migrations

  1. Use subprocess isolation (Workaround ) - most reliable
  2. Set conservative memory limits (Workaround ) - safety net
  3. Monitor memory usage - detect issues early
  4. Implement retry logic - handle failures gracefully

For Ongoing Operations

  1. Copy data locally first (Workaround ) - reduce postgres_scan usage
  2. Consider direct PostgreSQL connection (Workaround ) for writes
  3. Use postgres_scan primarily for read operations
  4. Keep operations short-lived when using postgres_scan

General Guidelines

#606369;--1:#8E9299"># Example: Robust batch processing with subprocess isolation
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> subprocess
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> duckdb
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> time
def#2A2E3A;--1:#DADDE5"> process_large_dataset#2965A8;--1:#80BBFF">():
#606369;--1:#8E9299"># Generate batches in main process
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">conn =#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">()
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">CREATE TABLE source AS SELECT /* large dataset */#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">total_rows =#2A2E3A;--1:#DADDE5"> conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">SELECT COUNT(*) FROM source#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">).fetchone#2965A8;--1:#80BBFF">()[0#2965A8;--1:#80BBFF">]
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">batch_size =#2A2E3A;--1:#DADDE5"> 10000
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">num_batches =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">(#2A2E3A;--1:#DADDE5">total_rows +#2A2E3A;--1:#DADDE5"> batch_size -#2A2E3A;--1:#DADDE5"> 1#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> //#2A2E3A;--1:#DADDE5"> batch_size
print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Processing #2965A8;--1:#80BBFF">{total_rows#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> rows in #2965A8;--1:#80BBFF">{num_batches#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> batches"#2965A8;--1:#80BBFF">)
#2965A8;--1:#80BBFF">for#2A2E3A;--1:#DADDE5"> i #2965A8;--1:#80BBFF">in#2A2E3A;--1:#DADDE5"> range#2965A8;--1:#80BBFF">(num_batches#2965A8;--1:#80BBFF">):
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">offset =#2A2E3A;--1:#DADDE5"> i *#2A2E3A;--1:#DADDE5"> batch_size
#606369;--1:#8E9299"># Export batch to temp file
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"""
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">COPY (SELECT * FROM source LIMIT #2965A8;--1:#80BBFF">{batch_size#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> OFFSET #2965A8;--1:#80BBFF">{offset#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">)
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">TO 'batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet'
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">"""#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Process in subprocess
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">result =#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">.run#2965A8;--1:#80BBFF">([
#2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">duckdb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">-c#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, f#25723F;--1:#82D99F">"""
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">ATTACH 'postgresql://host/db' AS pg (TYPE POSTGRES);
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">INSERT INTO pg.target_table
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">SELECT * FROM read_parquet('batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet');
#25723F;--1:#82D99F"> #25723F;--1:#82D99F">"""
#2965A8;--1:#80BBFF">], capture_output=#2965A8;--1:#80BBFF">True)
#606369;--1:#8E9299"># Clean up
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">os#2965A8;--1:#80BBFF">.remove#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet'#2965A8;--1:#80BBFF">)
#2965A8;--1:#80BBFF">if#2A2E3A;--1:#DADDE5"> result#2965A8;--1:#80BBFF">.#2A2E3A;--1:#DADDE5">returncode !=#2A2E3A;--1:#DADDE5"> 0#2965A8;--1:#80BBFF">:
print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> failed: #2965A8;--1:#80BBFF">{result#2965A8;--1:#80BBFF">.stderr#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">"#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Implement retry logic here
#2965A8;--1:#80BBFF">else:
print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">/#2965A8;--1:#80BBFF">{num_batches#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> completed"#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">time#2965A8;--1:#80BBFF">.sleep#2965A8;--1:#80BBFF">(0.1#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #606369;--1:#8E9299"># Brief pause to avoid overwhelming PostgreSQL

DuckDB Memory Management in General

For background on DuckDB’s memory management system:

Other Known Memory Issues

  • Issue 0034: OOM error after repeated queries
  • Issue : Memory leak on table insert from parquet
  • Issue : Memory leak with multiple queries using same JDBC Statement

Summary

The postgres_scan memory accumulation issue is a known bug, not expected behavior. Batching alone will not solve it because the underlying memory leak prevents proper cleanup between batches.

Most effective workaround: Use subprocess isolation to guarantee OS-level memory cleanup.

Alternative: Minimize postgres_scan usage by copying data locally first, or use native PostgreSQL clients for large write operations.

Timeline: Monitor GitHub issues for official fixes. In the meantime, implement workarounds based on your use case.

Sources

  1. Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet - Issue 74
  2. postgres_scanner causes memory leak - Issue 58
  3. Memory Leak in PostgreSQL Extension - Issue 5171
  4. Memory leak on repeated INSERT - Issue 5176
  5. Out of Memory Errors Guide
  6. Memory Management in DuckDB
  7. CHECKPOINT Statement Documentation
  8. Pragmas Documentation
  9. Tuning Workloads Guide
  10. pg_duckdb Releases