postgres-scan-memory-accumulation
Research Date: 2026-02-05
Problem
When using DuckDB with the postgres_scan extension to persist large datasets from DuckDB to PostgreSQL in batches, memory usage continuously increases without being released between batch operations. This can lead to:
- Out-of-memory (OOM) crashes on large datasets
- Unstable connections due to prolonged processing times
- Failed operations that are difficult to recover
Root Cause
This is a known memory leak in the DuckDB postgres_scan extension, documented in multiple GitHub issues:
Primary Issue
The postgres extension fails to call PQclear() on PGresult pointers in certain cases, particularly when executing COPY FROM operations. This means PostgreSQL query results are not properly cleaned up, causing memory to accumulate.
Related Issues
- Issue #274: Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet
- Issue #158: postgres_scanner causes memory leak
- Issue #15171: Memory Leak in PostgreSQL Extension
- Issue #15176: Memory leak on repeated INSERT into persistent database
Why Batching Doesn’t Help
Expected behavior: Batching should process data in chunks, releasing memory between batches.
Actual behavior: Memory accumulates continuously regardless of batch size because:
- Query results from previous batches are retained in memory
- DuckDB appears to keep all previous query results that were inserted
- The postgres extension doesn’t properly release PostgreSQL result objects
As one user reported: “DuckDB runs out of memory anyway no matter how small the batches are.”
Workarounds
1. Subprocess Isolation (Most Reliable)
Run each batch in a separate subprocess so the OS cleans up memory when the process exits:
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> json
def#2A2E3A;--1:#DADDE5"> batch_insert_subprocess#2965A8;--1:#80BBFF">(batch_data#2965A8;--1:#80BBFF">,#2A2E3A;--1:#DADDE5"> batch_num#2965A8;--1:#80BBFF">): #2965A8;--1:#80BBFF">"""#25723F;--1:#82D99F">Execute DuckDB batch insert in isolated subprocess#2965A8;--1:#80BBFF">"""#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">sql =#2A2E3A;--1:#DADDE5"> f#25723F;--1:#82D99F">"""#25723F;--1:#82D99F"> #25723F;--1:#82D99F">ATTACH 'postgresql://user:pass@host/db' AS pg (TYPE POSTGRES);#25723F;--1:#82D99F"> #25723F;--1:#82D99F">INSERT INTO pg.target_table#25723F;--1:#82D99F"> #25723F;--1:#82D99F">SELECT * FROM read_json_auto('batch_#2965A8;--1:#80BBFF">{#2A2E3A;--1:#DADDE5">batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json');#25723F;--1:#82D99F"> #25723F;--1:#82D99F">"""
#606369;--1:#8E9299"># Write batch data to temp file #2965A8;--1:#80BBFF">with#2A2E3A;--1:#DADDE5"> open#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">w#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">as#2A2E3A;--1:#DADDE5"> f#2965A8;--1:#80BBFF">:#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">json#2965A8;--1:#80BBFF">.dump#2965A8;--1:#80BBFF">(batch_data#2965A8;--1:#80BBFF">, f#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Run DuckDB CLI in subprocess#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">result =#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">.run#2965A8;--1:#80BBFF">( #2965A8;--1:#80BBFF">[#2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">duckdb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">-c#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, sql#2965A8;--1:#80BBFF">], capture_output=#2965A8;--1:#80BBFF">True, text=#2965A8;--1:#80BBFF">True #2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Clean up temp file#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">os#2965A8;--1:#80BBFF">.remove#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{batch_num#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.json'#2965A8;--1:#80BBFF">)
#2965A8;--1:#80BBFF">return#2A2E3A;--1:#DADDE5"> result#2965A8;--1:#80BBFF">.#2A2E3A;--1:#DADDE5">returncode ==#2A2E3A;--1:#DADDE5"> 0
#606369;--1:#8E9299"># Process batches#2965A8;--1:#80BBFF">for#2A2E3A;--1:#DADDE5"> i#2965A8;--1:#80BBFF">,#2A2E3A;--1:#DADDE5"> batch #2965A8;--1:#80BBFF">in#2A2E3A;--1:#DADDE5"> enumerate#2965A8;--1:#80BBFF">(data_batches#2965A8;--1:#80BBFF">):#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">success =#2A2E3A;--1:#DADDE5"> batch_insert_subprocess#2965A8;--1:#80BBFF">(batch#2965A8;--1:#80BBFF">, i#2965A8;--1:#80BBFF">) #2965A8;--1:#80BBFF">if#2A2E3A;--1:#DADDE5"> not#2A2E3A;--1:#DADDE5"> success#2965A8;--1:#80BBFF">: print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> failed"#2965A8;--1:#80BBFF">)Pros:
- OS guarantees complete memory cleanup
- Isolates failures to individual batches
- Most reliable workaround
Cons:
- Higher overhead from process creation
- Slower than in-process batching
- Requires file-based data passing
2. Memory Limit Configuration
Set a conservative memory limit (50-60% of system RAM) to force earlier disk spilling:
#606369;--1:#8E9299">-- Set at connection startSET#2A2E3A;--1:#DADDE5"> memory_limit =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">8GB#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">; #606369;--1:#8E9299">-- For 16GB systemSET#2A2E3A;--1:#DADDE5"> temp_directory =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">/fast/ssd/duckdb_tmp#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">;Rationale: Some DuckDB operations bypass the buffer manager and can exceed the configured limit. Setting a lower limit provides headroom.
Pros:
- Simple configuration change
- Prevents complete OOM crashes
- Works with existing code
Cons:
- Doesn’t fix the leak, just delays it
- May cause disk spilling, slowing operations
- Still will fail on large enough datasets
3. Clear Postgres Cache
Use pg_clear_cache() between batches to clear internal PostgreSQL caches:
ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);
#606369;--1:#8E9299">-- Batch 1INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_1;SELECT#2A2E3A;--1:#DADDE5"> pg_clear_cache#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5">;
#606369;--1:#8E9299">-- Batch 2INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_2;SELECT#2A2E3A;--1:#DADDE5"> pg_clear_cache#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5">;Pros:
- May help with cache-related memory growth
- Simple to add to existing batch loops
Cons:
- Unclear if this addresses the core leak
- Limited documentation on effectiveness
- May not help with PGresult pointer leaks
4. Copy Data Locally First (Recommended for Read Operations)
Instead of repeatedly querying PostgreSQL, create a local DuckDB copy first:
#606369;--1:#8E9299">-- One-time copy to local DuckDBATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">local_copy.duckdb#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">;
CREATE#2A2E3A;--1:#DADDE5"> TABLE#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">.staging ASSELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> pg.source_table;
#2A2E3A;--1:#DADDE5">DETACH pg;
#606369;--1:#8E9299">-- Now work with local data#606369;--1:#8E9299">-- Generate/transform records in DuckDBCREATE#2A2E3A;--1:#DADDE5"> TABLE#2A2E3A;--1:#DADDE5"> local#2A2E3A;--1:#DADDE5">.generated_data ASSELECT#2A2E3A;--1:#DADDE5"> #606369;--1:#8E9299">/* your transformations */#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> local.staging;
#606369;--1:#8E9299">-- Batch insert back to Postgres (memory still accumulates, but fewer operations)ATTACH#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5"> AS#2A2E3A;--1:#DADDE5"> pg (TYPE#2A2E3A;--1:#DADDE5"> POSTGRES);INSERT INTO#2A2E3A;--1:#DADDE5"> pg.target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> local.generated_data;Pros:
- Reduces postgres_scan usage
- Faster transformations on local data
- More stable (no network issues)
Cons:
- Requires disk space for local copy
- Initial copy still subject to leak
- Not suitable if source data changes frequently
5. Explicit Checkpointing (May Help with DuckDB-side Memory)
Force checkpoints to flush WAL and potentially release some memory:
#606369;--1:#8E9299">-- After each batchINSERT INTO#2A2E3A;--1:#DADDE5"> target_table SELECT#2A2E3A;--1:#DADDE5"> *#2A2E3A;--1:#DADDE5"> FROM#2A2E3A;--1:#DADDE5"> batch_data;CHECKPOINT#2A2E3A;--1:#DADDE5">; #606369;--1:#8E9299">-- or FORCE CHECKPOINTConfiguration:
SET#2A2E3A;--1:#DADDE5"> checkpoint_threshold =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">16MB#2965A8;--1:#80BBFF">'#2A2E3A;--1:#DADDE5">; #606369;--1:#8E9299">-- Default, adjust as neededNote: This addresses DuckDB’s internal memory management but does not fix the postgres_scan leak. It may help with write-heavy operations to persistent DuckDB databases.
Pros:
- Flushes WAL to disk
- May reclaim some buffer memory
- Useful for persistent DuckDB operations
Cons:
- Doesn’t fix postgres extension memory leak
- Adds I/O overhead
- Limited benefit for the core issue
6. Use Direct PostgreSQL Connection (Alternative Approach)
Bypass DuckDB’s postgres_scan entirely for large write operations:
#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> psycopg2
#606369;--1:#8E9299"># Generate data in DuckDB#2A2E3A;--1:#DADDE5">conn_duck =#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5">conn_duck#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">CREATE TABLE generated AS SELECT /* ... */#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Export to Parquet#2A2E3A;--1:#DADDE5">conn_duck#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">COPY generated TO 'batch.parquet'#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Use native PostgreSQL client to import#2A2E3A;--1:#DADDE5">conn_pg =#2A2E3A;--1:#DADDE5"> psycopg2#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">postgresql://host/db#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5">cursor =#2A2E3A;--1:#DADDE5"> conn_pg#2965A8;--1:#80BBFF">.cursor#2965A8;--1:#80BBFF">()
#606369;--1:#8E9299"># Use COPY FROM or bulk insert#2965A8;--1:#80BBFF">with#2A2E3A;--1:#DADDE5"> open#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">batch.parquet#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">rb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">as#2A2E3A;--1:#DADDE5"> f#2965A8;--1:#80BBFF">:#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">cursor#2965A8;--1:#80BBFF">.copy_expert#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">COPY target_table FROM STDIN WITH (FORMAT parquet)#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">, f#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5">conn_pg#2965A8;--1:#80BBFF">.commit#2965A8;--1:#80BBFF">()Pros:
- Completely avoids postgres_scan memory leak
- Uses native PostgreSQL drivers
- More control over connection management
Cons:
- More complex code
- Requires intermediate files
- Loses convenience of DuckDB’s integrated approach
Expected Behavior vs Reality
| Aspect | Expected | Reality |
|---|---|---|
| Batch memory | Released after each batch | Accumulates continuously |
| Batch size impact | Smaller = less memory | No significant effect |
| Long operations | Stable with proper batching | Becomes unstable, OOMs |
| Connection stability | Improved with batching | Still vulnerable due to duration |
Status and Fix Timeline
Current Status (as of Feb 2026):
- Issue is acknowledged by DuckDB team
- Multiple related GitHub issues open
- Some fixes merged for pg_duckdb (reverse direction: Postgres → DuckDB)
- Core postgres_scan leak remains unfixed
Recent Fixes (pg_duckdb, different from postgres_scan):
- Memory leak on query failure
- Memory leak when reading LIST/JSON/JSONB columns from Postgres
Recommendation: Monitor these GitHub issues for updates:
Best Practice Recommendations
For Large Dataset Migrations
- Use subprocess isolation (Workaround #1) - most reliable
- Set conservative memory limits (Workaround #2) - safety net
- Monitor memory usage - detect issues early
- Implement retry logic - handle failures gracefully
For Ongoing Operations
- Copy data locally first (Workaround #4) - reduce postgres_scan usage
- Consider direct PostgreSQL connection (Workaround #6) for writes
- Use postgres_scan primarily for read operations
- Keep operations short-lived when using postgres_scan
General Guidelines
#606369;--1:#8E9299"># Example: Robust batch processing with subprocess isolation#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">import#2A2E3A;--1:#DADDE5"> time
def#2A2E3A;--1:#DADDE5"> process_large_dataset#2965A8;--1:#80BBFF">(): #606369;--1:#8E9299"># Generate batches in main process#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">conn =#2A2E3A;--1:#DADDE5"> duckdb#2965A8;--1:#80BBFF">.connect#2965A8;--1:#80BBFF">()#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">CREATE TABLE source AS SELECT /* large dataset */#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">total_rows =#2A2E3A;--1:#DADDE5"> conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(#2965A8;--1:#80BBFF">"#25723F;--1:#82D99F">SELECT COUNT(*) FROM source#2965A8;--1:#80BBFF">"#2965A8;--1:#80BBFF">).fetchone#2965A8;--1:#80BBFF">()[0#2965A8;--1:#80BBFF">]#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">batch_size =#2A2E3A;--1:#DADDE5"> 10000#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">num_batches =#2A2E3A;--1:#DADDE5"> #2965A8;--1:#80BBFF">(#2A2E3A;--1:#DADDE5">total_rows +#2A2E3A;--1:#DADDE5"> batch_size -#2A2E3A;--1:#DADDE5"> 1#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> //#2A2E3A;--1:#DADDE5"> batch_size
print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Processing #2965A8;--1:#80BBFF">{total_rows#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> rows in #2965A8;--1:#80BBFF">{num_batches#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> batches"#2965A8;--1:#80BBFF">)
#2965A8;--1:#80BBFF">for#2A2E3A;--1:#DADDE5"> i #2965A8;--1:#80BBFF">in#2A2E3A;--1:#DADDE5"> range#2965A8;--1:#80BBFF">(num_batches#2965A8;--1:#80BBFF">):#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">offset =#2A2E3A;--1:#DADDE5"> i *#2A2E3A;--1:#DADDE5"> batch_size
#606369;--1:#8E9299"># Export batch to temp file#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">conn#2965A8;--1:#80BBFF">.execute#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"""#25723F;--1:#82D99F"> #25723F;--1:#82D99F">COPY (SELECT * FROM source LIMIT #2965A8;--1:#80BBFF">{batch_size#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> OFFSET #2965A8;--1:#80BBFF">{offset#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">)#25723F;--1:#82D99F"> #25723F;--1:#82D99F">TO 'batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet'#25723F;--1:#82D99F"> #25723F;--1:#82D99F">"""#2965A8;--1:#80BBFF">)
#606369;--1:#8E9299"># Process in subprocess#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">result =#2A2E3A;--1:#DADDE5"> subprocess#2965A8;--1:#80BBFF">.run#2965A8;--1:#80BBFF">([ #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">duckdb#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, #2965A8;--1:#80BBFF">'#25723F;--1:#82D99F">-c#2965A8;--1:#80BBFF">'#2965A8;--1:#80BBFF">, f#25723F;--1:#82D99F">"""#25723F;--1:#82D99F"> #25723F;--1:#82D99F">ATTACH 'postgresql://host/db' AS pg (TYPE POSTGRES);#25723F;--1:#82D99F"> #25723F;--1:#82D99F">INSERT INTO pg.target_table#25723F;--1:#82D99F"> #25723F;--1:#82D99F">SELECT * FROM read_parquet('batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet');#25723F;--1:#82D99F"> #25723F;--1:#82D99F">""" #2965A8;--1:#80BBFF">], capture_output=#2965A8;--1:#80BBFF">True)
#606369;--1:#8E9299"># Clean up#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">os#2965A8;--1:#80BBFF">.remove#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">'batch_#2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">.parquet'#2965A8;--1:#80BBFF">)
#2965A8;--1:#80BBFF">if#2A2E3A;--1:#DADDE5"> result#2965A8;--1:#80BBFF">.#2A2E3A;--1:#DADDE5">returncode !=#2A2E3A;--1:#DADDE5"> 0#2965A8;--1:#80BBFF">: print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> failed: #2965A8;--1:#80BBFF">{result#2965A8;--1:#80BBFF">.stderr#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">"#2965A8;--1:#80BBFF">) #606369;--1:#8E9299"># Implement retry logic here #2965A8;--1:#80BBFF">else: print#2965A8;--1:#80BBFF">(f#25723F;--1:#82D99F">"Batch #2965A8;--1:#80BBFF">{i#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F">/#2965A8;--1:#80BBFF">{num_batches#2965A8;--1:#80BBFF">}#25723F;--1:#82D99F"> completed"#2965A8;--1:#80BBFF">)
#2A2E3A;--1:#DADDE5"> #2A2E3A;--1:#DADDE5">time#2965A8;--1:#80BBFF">.sleep#2965A8;--1:#80BBFF">(0.1#2965A8;--1:#80BBFF">)#2A2E3A;--1:#DADDE5"> #606369;--1:#8E9299"># Brief pause to avoid overwhelming PostgreSQLRelated Issues
DuckDB Memory Management in General
For background on DuckDB’s memory management system:
Other Known Memory Issues
- Issue #10034: OOM error after repeated queries
- Issue #8061: Memory leak on table insert from parquet
- Issue #3610: Memory leak with multiple queries using same JDBC Statement
Summary
The postgres_scan memory accumulation issue is a known bug, not expected behavior. Batching alone will not solve it because the underlying memory leak prevents proper cleanup between batches.
Most effective workaround: Use subprocess isolation to guarantee OS-level memory cleanup.
Alternative: Minimize postgres_scan usage by copying data locally first, or use native PostgreSQL clients for large write operations.
Timeline: Monitor GitHub issues for official fixes. In the meantime, implement workarounds based on your use case.
Sources
- Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet - Issue #274
- postgres_scanner causes memory leak - Issue #158
- Memory Leak in PostgreSQL Extension - Issue #15171
- Memory leak on repeated INSERT - Issue #15176
- Out of Memory Errors Guide
- Memory Management in DuckDB
- CHECKPOINT Statement Documentation
- Pragmas Documentation
- Tuning Workloads Guide
- pg_duckdb Releases