ephemeral-file-context-proposal
Problem Statement
Claude Code sessions consume 200k tokens rapidly, with file reads accounting for ~40% of context (80,000+ tokens). Files are read repeatedly throughout a session, creating massive duplication:
Turn 1: Read app.py (2,000 tokens) → injected into conversationTurn 5: Read app.py (2,000 tokens) → DUPLICATETurn 10: Read app.py (2,000 tokens) → DUPLICATETurn 15: Read app.py (2,000 tokens) → DUPLICATETotal: 8,000 tokens for one fileThis is the single biggest source of context bloat in coding sessions.
Proposed Solution
Implement ephemeral file context - a session-level cache that:
- Stores file contents separately from conversation history
- Updates files in-place rather than appending
- Automatically detects external file changes
- Leverages Anthropic’s prompt caching for 90% cost reduction
Architecture
Current Flow
[System Prompt][Conversation Thread]: - User: "Edit app.py" - Assistant: "Let me read it" - Tool Call: Read(app.py) - Tool Result: <2000 tokens of app.py> ← Injected into main thread - Assistant: "Here's the edit" - User: "Add another function" - Assistant: "Let me read it again" - Tool Call: Read(app.py) - Tool Result: <2000 tokens of app.py> ← DUPLICATEProposed Flow
[System Prompt]
[Ephemeral File Context - CACHED]: app.py: <2000 tokens, current state> utils.py: <1500 tokens, current state>
[Conversation Thread]: - User: "Edit app.py" - Assistant: "Let me read it" (cache reference only) - Tool Call: Read(app.py) → cached: true - Assistant: "Here's the edit" - User: "Add another function" - Assistant: (already has app.py from cache)Implementation
Core Data Structure
interface FileCache { content: string; hash: string; // MD5 hash for change detection lastRead: number; // Timestamp path: string; // Absolute path}
// Session-level cacheconst fileCache = new Map<string, FileCache>();File Tool Modifications
Read Tool
function handleRead(filePath: string) { const content = fs.readFileSync(filePath, 'utf-8'); const hash = md5(content);
// Update cache fileCache.set(filePath, { content, hash, lastRead: Date.now(), path: filePath });
// Return reference only (don't inject content into conversation) return { ref: filePath, cached: true, hash };}Write/Edit Tools
function handleWrite(filePath: string, newContent: string) { // Perform file write fs.writeFileSync(filePath, newContent);
// Update cache IN-PLACE (not appended) const hash = md5(newContent); fileCache.set(filePath, { content: newContent, hash, lastRead: Date.now(), path: filePath });
return { ref: filePath, updated: true, hash };}Prompt Construction
function buildPrompt() { // Validate cache before constructing prompt validateFileCache();
return [ // System prompt systemPrompt,
// Ephemeral file context (cacheable) { type: "text", text: buildFileContext(fileCache), cache_control: { type: "ephemeral" } // Anthropic prompt caching },
// Conversation thread (no file contents) ...conversationHistory ];}
function buildFileContext(cache: Map<string, FileCache>): string { if (cache.size === 0) return "";
let context = "=== Session File Context ===\n\n";
for (const [path, {content, hash}] of cache) { context += `--- ${path} (${hash.slice(0, 8)}) ---\n`; context += content; context += "\n\n"; }
return context;}Auto-Update on External Changes
Approach: Check-on-Use (Recommended)
Validate cache right before each API call:
function validateFileCache() { for (const [path, cached] of fileCache) { try { // Check if file still exists if (!fs.existsSync(path)) { console.log(`[Cache] ${path} deleted, removing from cache`); fileCache.delete(path); continue; }
// Check if content changed const currentContent = fs.readFileSync(path, 'utf-8'); const currentHash = md5(currentContent);
if (currentHash !== cached.hash) { console.log(`[Cache] ${path} changed externally, refreshing...`); fileCache.set(path, { content: currentContent, hash: currentHash, lastRead: Date.now(), path }); } } catch (error) { console.error(`[Cache] Error validating ${path}:`, error.message); fileCache.delete(path); } }}Why Check-on-Use?
- Simple - No file watchers, no event handling complexity
- Reliable - Guarantees correctness at the moment that matters (before API call)
- Low overhead - Hashing 10-20 files takes <50ms, API calls take 1-5 seconds
- No cleanup - No watchers to close, no memory leaks
- Cross-platform - Avoids fs.watch quirks on different OSes
Alternative: Active File Watching (Optional Enhancement)
For real-time user feedback, add file watchers:
import chokidar from 'chokidar';
const watchers = new Map<string, chokidar.FSWatcher>();
function handleRead(filePath: string) { // ... existing cache logic ...
// Start watching if not already watching if (!watchers.has(filePath)) { const watcher = chokidar.watch(filePath, { ignoreInitial: true, awaitWriteFinish: { stabilityThreshold: 100, pollInterval: 50 } });
watcher.on('change', () => { console.log(`⚠️ ${filePath} changed externally`); // Mark for refresh (actual refresh happens on next prompt build) const cached = fileCache.get(filePath); if (cached) { cached.dirty = true; } });
watchers.set(filePath, watcher); }}
function endSession() { // Clean up watchers for (const watcher of watchers.values()) { watcher.close(); } watchers.clear(); fileCache.clear();}Hybrid approach: Watchers provide immediate feedback, but check-on-use guarantees correctness.
Use Cases
1. User Runs Formatter Externally
Session:- Claude edits app.py- User runs: prettier --write app.py- Claude references app.py again- Cache auto-refreshes, Claude sees formatted version ✓2. Git Operations
Session:- Claude reviews files on feature branch- User runs: git checkout main- Claude references files again- Cache auto-refreshes, Claude sees main branch versions ✓3. Hot Reload / Build Tools
Session:- Claude edits component.tsx- Next.js rebuild adds generated code- Claude edits component again- Cache auto-refreshes, Claude sees generated additions ✓4. IDE Auto-saves
Session:- Claude suggests changes- User manually edits in IDE (auto-saves)- Claude needs to reference file- Cache auto-refreshes, sees user's manual edits ✓Benefits & ROI
Context Reduction
Current session breakdown:
File reads (repeated): 80,000 tokens ← TARGETBash/grep outputs: 30,000 tokensSystem prompts: 30,000 tokensAssistant responses: 30,000 tokensUser messages: 20,000 tokens--------------------------------------------Total: 190,000 tokensWith ephemeral file context:
File states (current only): 20,000 tokens ← 75% reductionBash/grep outputs: 30,000 tokensSystem prompts: 30,000 tokensAssistant responses: 30,000 tokensUser messages: 20,000 tokens--------------------------------------------Total: 130,000 tokensResult: 30-40% total context reduction
Session Length
- Current: Hit 200k limit after ~30-50 turns
- With ephemeral context: Hit 200k limit after ~50-80 turns
- Effective increase: 1.5-2x longer sessions
Cost Reduction
With Anthropic prompt caching:
- Cached input tokens: 10x cheaper than regular input tokens
- File context marked as cacheable
- 5-minute cache TTL (or 1-hour extended)
Cost calculation (assuming file context is 40% of input):
Current cost per API call:- 100k input tokens × $3/1M = $0.30
With caching (after first call):- 40k cached file tokens × $0.30/1M = $0.012- 60k regular tokens × $3/1M = $0.18- Total: $0.192 (36% reduction)Latency Reduction
Anthropic reports 85% latency reduction for cached content. With 40% of context cached:
- Approximate latency improvement: 30-40%
Implementation Effort
Phase 1: Core Implementation (1-2 weeks)
- File cache data structure: 1 day
- Modify Read/Write/Edit tools: 2 days
- Prompt construction refactor: 2 days
- Check-on-use validation: 1 day
- Prompt caching integration: 1 day
- Testing & edge cases: 2-3 days
Phase 2: Enhancements (Optional, 3-5 days)
- File watching for real-time feedback: 2 days
- Cache size limits and eviction policy: 1 day
- Cache analytics/debugging tools: 1 day
- Performance optimization: 1 day
Total: 2-3 weeks for full implementation
Edge Cases & Considerations
1. Large Files
Problem: 10MB file in cache
Solution:
- Set max file size for caching (e.g., 100KB)
- Fall back to current behavior for oversized files
- Or: Cache file but use chunking/windowing
2. Binary Files
Problem: Images, PDFs in cache
Solution:
- Only cache text files
- Detect via extension or content-type
- Binary files use current flow
3. Cache Eviction
Problem: 100 files in cache = too much context
Solution:
- LRU eviction policy
- Max cache size (e.g., 50 files or 500KB total)
- Explicitly evict files not accessed in last N turns
4. Deleted Files
Problem: File in cache gets deleted
Solution:
- Check fs.existsSync() during validation
- Remove from cache if deleted
- Handle gracefully in buildFileContext()
5. Renamed/Moved Files
Problem: File moves, cache has old path
Solution:
- Treat as delete + new file
- Cache keyed by absolute path
- New read creates new cache entry
6. Symlinks
Problem: Symlink resolution and caching
Solution:
- Resolve symlinks to real paths
- Cache by resolved path
- Handle broken symlinks gracefully
Configuration
interface EphemeralCacheConfig { enabled: boolean; // Default: true maxFiles: number; // Default: 50 maxTotalSize: number; // Default: 5MB (in bytes) maxFileSize: number; // Default: 1MB (in bytes) fileWatching: boolean; // Default: false (opt-in) cacheValidation: 'always' | 'on-change' | 'manual'; // Default: 'always' cacheTTL: number; // Default: 300000 (5 min, matches Anthropic default) allowedExtensions?: string[]; // Default: all text files excludedPaths?: string[]; // Default: node_modules, .git, etc.}Debugging & Observability
// Cache statisticsinterface CacheStats { totalFiles: number; totalSize: number; hits: number; // Files served from cache misses: number; // Files read fresh updates: number; // External change detections evictions: number; // LRU evictions cacheEfficiency: number; // hits / (hits + misses)}
// Loggingconsole.log('[Cache] Statistics:', getCacheStats());console.log('[Cache] Current files:', Array.from(fileCache.keys()));Testing Strategy
Unit Tests
- File read/write/edit with cache updates
- Hash change detection
- Cache validation logic
- Eviction policy
- Edge cases (deleted files, large files, etc.)
Integration Tests
- Full session with file operations
- External file modifications
- Git operations during session
- Build tool interactions
- Prompt construction with cache
Performance Tests
- Cache validation overhead (should be <50ms)
- Large cache performance (50+ files)
- Memory usage profiling
- Prompt caching effectiveness
Migration Path
Phase 1: Beta Flag
const USE_EPHEMERAL_CACHE = process.env.CLAUDE_CODE_EPHEMERAL_CACHE === 'true';- Opt-in via environment variable
- Gather feedback from beta users
- Monitor cache hit rates and performance
Phase 2: Default Enabled
- Enable by default after validation
- Keep flag for opt-out
- Monitor error rates and user feedback
Phase 3: Remove Old Path
- After several versions, remove non-cached flow
- Keep configuration for cache tuning
Success Metrics
Target Metrics (measure after 1 month):
- Average session length: +50% increase
- Context usage per session: -30% reduction
- Cache hit rate: >80%
- User-reported “hit context limit”: -60% reduction
- API cost per session: -30% reduction (with caching)
Monitoring:
- Track cache statistics per session
- A/B test with control group (old behavior)
- User surveys on session experience
Related Work
Anthropic Prompt Caching
Similar Patterns
- Letta Memory Blocks - Mutable context management
- Cursor IDE - Similar file caching approach
- Continue.dev - Context management for coding assistants
Conclusion
Ephemeral file context is a high-impact, medium-effort feature that addresses the single biggest source of context bloat in Claude Code sessions.
Key benefits:
- 30-40% context reduction
- 1.5-2x longer sessions
- 90% cost reduction on file reads (with caching)
- Better handling of external file changes
- Minimal performance overhead
Recommendation: Implement Phase 1 (core + check-on-use) immediately. Add file watching in Phase 2 based on user feedback.
This is the 80/20 optimization for Claude Code context efficiency.