Problem Statement

Claude Code sessions consume 200k tokens rapidly, with file reads accounting for ~40% of context (80,000+ tokens). Files are read repeatedly throughout a session, creating massive duplication:

Turn 1: Read app.py (2,000 tokens) → injected into conversation
Turn 5: Read app.py (2,000 tokens) → DUPLICATE
Turn 10: Read app.py (2,000 tokens) → DUPLICATE
Turn 15: Read app.py (2,000 tokens) → DUPLICATE
Total: 8,000 tokens for one file

This is the single biggest source of context bloat in coding sessions.

Proposed Solution

Implement ephemeral file context - a session-level cache that:

  1. Stores file contents separately from conversation history
  2. Updates files in-place rather than appending
  3. Automatically detects external file changes
  4. Leverages Anthropic’s prompt caching for 90% cost reduction

Architecture

Current Flow

[System Prompt]
[Conversation Thread]:
- User: "Edit app.py"
- Assistant: "Let me read it"
- Tool Call: Read(app.py)
- Tool Result: <2000 tokens of app.py> ← Injected into main thread
- Assistant: "Here's the edit"
- User: "Add another function"
- Assistant: "Let me read it again"
- Tool Call: Read(app.py)
- Tool Result: <2000 tokens of app.py> ← DUPLICATE

Proposed Flow

[System Prompt]
[Ephemeral File Context - CACHED]:
app.py: <2000 tokens, current state>
utils.py: <1500 tokens, current state>
[Conversation Thread]:
- User: "Edit app.py"
- Assistant: "Let me read it" (cache reference only)
- Tool Call: Read(app.py) → cached: true
- Assistant: "Here's the edit"
- User: "Add another function"
- Assistant: (already has app.py from cache)

Implementation

Core Data Structure

interface FileCache {
content: string;
hash: string; // MD5 hash for change detection
lastRead: number; // Timestamp
path: string; // Absolute path
}
// Session-level cache
const fileCache = new Map<string, FileCache>();

File Tool Modifications

Read Tool

function handleRead(filePath: string) {
const content = fs.readFileSync(filePath, 'utf-8');
const hash = md5(content);
// Update cache
fileCache.set(filePath, {
content,
hash,
lastRead: Date.now(),
path: filePath
});
// Return reference only (don't inject content into conversation)
return {
ref: filePath,
cached: true,
hash
};
}

Write/Edit Tools

function handleWrite(filePath: string, newContent: string) {
// Perform file write
fs.writeFileSync(filePath, newContent);
// Update cache IN-PLACE (not appended)
const hash = md5(newContent);
fileCache.set(filePath, {
content: newContent,
hash,
lastRead: Date.now(),
path: filePath
});
return {
ref: filePath,
updated: true,
hash
};
}

Prompt Construction

function buildPrompt() {
// Validate cache before constructing prompt
validateFileCache();
return [
// System prompt
systemPrompt,
// Ephemeral file context (cacheable)
{
type: "text",
text: buildFileContext(fileCache),
cache_control: { type: "ephemeral" } // Anthropic prompt caching
},
// Conversation thread (no file contents)
...conversationHistory
];
}
function buildFileContext(cache: Map<string, FileCache>): string {
if (cache.size === 0) return "";
let context = "=== Session File Context ===\n\n";
for (const [path, {content, hash}] of cache) {
context += `--- ${path} (${hash.slice(0, 8)}) ---\n`;
context += content;
context += "\n\n";
}
return context;
}

Auto-Update on External Changes

Validate cache right before each API call:

function validateFileCache() {
for (const [path, cached] of fileCache) {
try {
// Check if file still exists
if (!fs.existsSync(path)) {
console.log(`[Cache] ${path} deleted, removing from cache`);
fileCache.delete(path);
continue;
}
// Check if content changed
const currentContent = fs.readFileSync(path, 'utf-8');
const currentHash = md5(currentContent);
if (currentHash !== cached.hash) {
console.log(`[Cache] ${path} changed externally, refreshing...`);
fileCache.set(path, {
content: currentContent,
hash: currentHash,
lastRead: Date.now(),
path
});
}
} catch (error) {
console.error(`[Cache] Error validating ${path}:`, error.message);
fileCache.delete(path);
}
}
}

Why Check-on-Use?

  1. Simple - No file watchers, no event handling complexity
  2. Reliable - Guarantees correctness at the moment that matters (before API call)
  3. Low overhead - Hashing 10-20 files takes <50ms, API calls take 1-5 seconds
  4. No cleanup - No watchers to close, no memory leaks
  5. Cross-platform - Avoids fs.watch quirks on different OSes

Alternative: Active File Watching (Optional Enhancement)

For real-time user feedback, add file watchers:

import chokidar from 'chokidar';
const watchers = new Map<string, chokidar.FSWatcher>();
function handleRead(filePath: string) {
// ... existing cache logic ...
// Start watching if not already watching
if (!watchers.has(filePath)) {
const watcher = chokidar.watch(filePath, {
ignoreInitial: true,
awaitWriteFinish: {
stabilityThreshold: 100,
pollInterval: 50
}
});
watcher.on('change', () => {
console.log(`⚠️ ${filePath} changed externally`);
// Mark for refresh (actual refresh happens on next prompt build)
const cached = fileCache.get(filePath);
if (cached) {
cached.dirty = true;
}
});
watchers.set(filePath, watcher);
}
}
function endSession() {
// Clean up watchers
for (const watcher of watchers.values()) {
watcher.close();
}
watchers.clear();
fileCache.clear();
}

Hybrid approach: Watchers provide immediate feedback, but check-on-use guarantees correctness.

Use Cases

1. User Runs Formatter Externally

Session:
- Claude edits app.py
- User runs: prettier --write app.py
- Claude references app.py again
- Cache auto-refreshes, Claude sees formatted version ✓

2. Git Operations

Session:
- Claude reviews files on feature branch
- User runs: git checkout main
- Claude references files again
- Cache auto-refreshes, Claude sees main branch versions ✓

3. Hot Reload / Build Tools

Session:
- Claude edits component.tsx
- Next.js rebuild adds generated code
- Claude edits component again
- Cache auto-refreshes, Claude sees generated additions ✓

4. IDE Auto-saves

Session:
- Claude suggests changes
- User manually edits in IDE (auto-saves)
- Claude needs to reference file
- Cache auto-refreshes, sees user's manual edits ✓

Benefits & ROI

Context Reduction

Current session breakdown:

File reads (repeated): 80,000 tokens ← TARGET
Bash/grep outputs: 30,000 tokens
System prompts: 30,000 tokens
Assistant responses: 30,000 tokens
User messages: 20,000 tokens
--------------------------------------------
Total: 190,000 tokens

With ephemeral file context:

File states (current only): 20,000 tokens ← 75% reduction
Bash/grep outputs: 30,000 tokens
System prompts: 30,000 tokens
Assistant responses: 30,000 tokens
User messages: 20,000 tokens
--------------------------------------------
Total: 130,000 tokens

Result: 30-40% total context reduction

Session Length

  • Current: Hit 200k limit after ~30-50 turns
  • With ephemeral context: Hit 200k limit after ~50-80 turns
  • Effective increase: 1.5-2x longer sessions

Cost Reduction

With Anthropic prompt caching:

  • Cached input tokens: 10x cheaper than regular input tokens
  • File context marked as cacheable
  • 5-minute cache TTL (or 1-hour extended)

Cost calculation (assuming file context is 40% of input):

Current cost per API call:
- 100k input tokens × $3/1M = $0.30
With caching (after first call):
- 40k cached file tokens × $0.30/1M = $0.012
- 60k regular tokens × $3/1M = $0.18
- Total: $0.192 (36% reduction)

Latency Reduction

Anthropic reports 85% latency reduction for cached content. With 40% of context cached:

  • Approximate latency improvement: 30-40%

Implementation Effort

Phase 1: Core Implementation (1-2 weeks)

  • File cache data structure: 1 day
  • Modify Read/Write/Edit tools: 2 days
  • Prompt construction refactor: 2 days
  • Check-on-use validation: 1 day
  • Prompt caching integration: 1 day
  • Testing & edge cases: 2-3 days

Phase 2: Enhancements (Optional, 3-5 days)

  • File watching for real-time feedback: 2 days
  • Cache size limits and eviction policy: 1 day
  • Cache analytics/debugging tools: 1 day
  • Performance optimization: 1 day

Total: 2-3 weeks for full implementation

Edge Cases & Considerations

1. Large Files

Problem: 10MB file in cache

Solution:

  • Set max file size for caching (e.g., 100KB)
  • Fall back to current behavior for oversized files
  • Or: Cache file but use chunking/windowing

2. Binary Files

Problem: Images, PDFs in cache

Solution:

  • Only cache text files
  • Detect via extension or content-type
  • Binary files use current flow

3. Cache Eviction

Problem: 100 files in cache = too much context

Solution:

  • LRU eviction policy
  • Max cache size (e.g., 50 files or 500KB total)
  • Explicitly evict files not accessed in last N turns

4. Deleted Files

Problem: File in cache gets deleted

Solution:

  • Check fs.existsSync() during validation
  • Remove from cache if deleted
  • Handle gracefully in buildFileContext()

5. Renamed/Moved Files

Problem: File moves, cache has old path

Solution:

  • Treat as delete + new file
  • Cache keyed by absolute path
  • New read creates new cache entry

Problem: Symlink resolution and caching

Solution:

  • Resolve symlinks to real paths
  • Cache by resolved path
  • Handle broken symlinks gracefully

Configuration

interface EphemeralCacheConfig {
enabled: boolean; // Default: true
maxFiles: number; // Default: 50
maxTotalSize: number; // Default: 5MB (in bytes)
maxFileSize: number; // Default: 1MB (in bytes)
fileWatching: boolean; // Default: false (opt-in)
cacheValidation: 'always' | 'on-change' | 'manual'; // Default: 'always'
cacheTTL: number; // Default: 300000 (5 min, matches Anthropic default)
allowedExtensions?: string[]; // Default: all text files
excludedPaths?: string[]; // Default: node_modules, .git, etc.
}

Debugging & Observability

// Cache statistics
interface CacheStats {
totalFiles: number;
totalSize: number;
hits: number; // Files served from cache
misses: number; // Files read fresh
updates: number; // External change detections
evictions: number; // LRU evictions
cacheEfficiency: number; // hits / (hits + misses)
}
// Logging
console.log('[Cache] Statistics:', getCacheStats());
console.log('[Cache] Current files:', Array.from(fileCache.keys()));

Testing Strategy

Unit Tests

  • File read/write/edit with cache updates
  • Hash change detection
  • Cache validation logic
  • Eviction policy
  • Edge cases (deleted files, large files, etc.)

Integration Tests

  • Full session with file operations
  • External file modifications
  • Git operations during session
  • Build tool interactions
  • Prompt construction with cache

Performance Tests

  • Cache validation overhead (should be <50ms)
  • Large cache performance (50+ files)
  • Memory usage profiling
  • Prompt caching effectiveness

Migration Path

Phase 1: Beta Flag

const USE_EPHEMERAL_CACHE = process.env.CLAUDE_CODE_EPHEMERAL_CACHE === 'true';
  • Opt-in via environment variable
  • Gather feedback from beta users
  • Monitor cache hit rates and performance

Phase 2: Default Enabled

  • Enable by default after validation
  • Keep flag for opt-out
  • Monitor error rates and user feedback

Phase 3: Remove Old Path

  • After several versions, remove non-cached flow
  • Keep configuration for cache tuning

Success Metrics

Target Metrics (measure after 1 month):

  • Average session length: +50% increase
  • Context usage per session: -30% reduction
  • Cache hit rate: >80%
  • User-reported “hit context limit”: -60% reduction
  • API cost per session: -30% reduction (with caching)

Monitoring:

  • Track cache statistics per session
  • A/B test with control group (old behavior)
  • User surveys on session experience

Anthropic Prompt Caching

Similar Patterns

  • Letta Memory Blocks - Mutable context management
  • Cursor IDE - Similar file caching approach
  • Continue.dev - Context management for coding assistants

Conclusion

Ephemeral file context is a high-impact, medium-effort feature that addresses the single biggest source of context bloat in Claude Code sessions.

Key benefits:

  • 30-40% context reduction
  • 1.5-2x longer sessions
  • 90% cost reduction on file reads (with caching)
  • Better handling of external file changes
  • Minimal performance overhead

Recommendation: Implement Phase 1 (core + check-on-use) immediately. Add file watching in Phase 2 based on user feedback.

This is the 80/20 optimization for Claude Code context efficiency.