ephemeral-file-context-proposal

Problem Statement

Claude Code sessions consume 200k tokens rapidly, with file reads accounting for ~40% of context (80,000+ tokens). Files are read repeatedly throughout a session, creating massive duplication:

Turn 1: Read app.py (2,000 tokens) → injected into conversation
Turn 5: Read app.py (2,000 tokens) → DUPLICATE
Turn 10: Read app.py (2,000 tokens) → DUPLICATE
Turn 15: Read app.py (2,000 tokens) → DUPLICATE
Total: 8,000 tokens for one file

This is the single biggest source of context bloat in coding sessions.

Proposed Solution

Implement ephemeral file context - a session-level cache that:

Stores file contents separately from conversation history
Updates files in-place rather than appending
Automatically detects external file changes
Leverages Anthropic’s prompt caching for 90% cost reduction

Architecture

Current Flow

[System Prompt]
[Conversation Thread]:
  - User: "Edit app.py"
  - Assistant: "Let me read it"
  - Tool Call: Read(app.py)
  - Tool Result: <2000 tokens of app.py>  ← Injected into main thread
  - Assistant: "Here's the edit"
  - User: "Add another function"
  - Assistant: "Let me read it again"
  - Tool Call: Read(app.py)
  - Tool Result: <2000 tokens of app.py>  ← DUPLICATE

Proposed Flow

[System Prompt]

[Ephemeral File Context - CACHED]:
  app.py: <2000 tokens, current state>
  utils.py: <1500 tokens, current state>

[Conversation Thread]:
  - User: "Edit app.py"
  - Assistant: "Let me read it" (cache reference only)
  - Tool Call: Read(app.py) → cached: true
  - Assistant: "Here's the edit"
  - User: "Add another function"
  - Assistant: (already has app.py from cache)

Implementation

Core Data Structure

interface FileCache {
  content: string;
  hash: string;        // MD5 hash for change detection
  lastRead: number;    // Timestamp
  path: string;        // Absolute path
}

// Session-level cache
const fileCache = new Map<string, FileCache>();

File Tool Modifications

Read Tool

function handleRead(filePath: string) {
  const content = fs.readFileSync(filePath, 'utf-8');
  const hash = md5(content);

  // Update cache
  fileCache.set(filePath, {
    content,
    hash,
    lastRead: Date.now(),
    path: filePath
  });

  // Return reference only (don't inject content into conversation)
  return {
    ref: filePath,
    cached: true,
    hash
  };
}

Write/Edit Tools

function handleWrite(filePath: string, newContent: string) {
  // Perform file write
  fs.writeFileSync(filePath, newContent);

  // Update cache IN-PLACE (not appended)
  const hash = md5(newContent);
  fileCache.set(filePath, {
    content: newContent,
    hash,
    lastRead: Date.now(),
    path: filePath
  });

  return {
    ref: filePath,
    updated: true,
    hash
  };
}

Prompt Construction

function buildPrompt() {
  // Validate cache before constructing prompt
  validateFileCache();

  return [
    // System prompt
    systemPrompt,

    // Ephemeral file context (cacheable)
    {
      type: "text",
      text: buildFileContext(fileCache),
      cache_control: { type: "ephemeral" }  // Anthropic prompt caching
    },

    // Conversation thread (no file contents)
    ...conversationHistory
  ];
}

function buildFileContext(cache: Map<string, FileCache>): string {
  if (cache.size === 0) return "";

  let context = "=== Session File Context ===\n\n";

  for (const [path, {content, hash}] of cache) {
    context += `--- ${path} (${hash.slice(0, 8)}) ---\n`;
    context += content;
    context += "\n\n";
  }

  return context;
}

Auto-Update on External Changes

Approach: Check-on-Use (Recommended)

Validate cache right before each API call:

function validateFileCache() {
  for (const [path, cached] of fileCache) {
    try {
      // Check if file still exists
      if (!fs.existsSync(path)) {
        console.log(`[Cache] ${path} deleted, removing from cache`);
        fileCache.delete(path);
        continue;
      }

      // Check if content changed
      const currentContent = fs.readFileSync(path, 'utf-8');
      const currentHash = md5(currentContent);

      if (currentHash !== cached.hash) {
        console.log(`[Cache] ${path} changed externally, refreshing...`);
        fileCache.set(path, {
          content: currentContent,
          hash: currentHash,
          lastRead: Date.now(),
          path
        });
      }
    } catch (error) {
      console.error(`[Cache] Error validating ${path}:`, error.message);
      fileCache.delete(path);
    }
  }
}

Why Check-on-Use?

Simple - No file watchers, no event handling complexity
Reliable - Guarantees correctness at the moment that matters (before API call)
Low overhead - Hashing 10-20 files takes <50ms, API calls take 1-5 seconds
No cleanup - No watchers to close, no memory leaks
Cross-platform - Avoids fs.watch quirks on different OSes

Alternative: Active File Watching (Optional Enhancement)

For real-time user feedback, add file watchers:

import chokidar from 'chokidar';

const watchers = new Map<string, chokidar.FSWatcher>();

function handleRead(filePath: string) {
  // ... existing cache logic ...

  // Start watching if not already watching
  if (!watchers.has(filePath)) {
    const watcher = chokidar.watch(filePath, {
      ignoreInitial: true,
      awaitWriteFinish: {
        stabilityThreshold: 100,
        pollInterval: 50
      }
    });

    watcher.on('change', () => {
      console.log(`⚠️  ${filePath} changed externally`);
      // Mark for refresh (actual refresh happens on next prompt build)
      const cached = fileCache.get(filePath);
      if (cached) {
        cached.dirty = true;
      }
    });

    watchers.set(filePath, watcher);
  }
}

function endSession() {
  // Clean up watchers
  for (const watcher of watchers.values()) {
    watcher.close();
  }
  watchers.clear();
  fileCache.clear();
}

Hybrid approach: Watchers provide immediate feedback, but check-on-use guarantees correctness.

Use Cases

1. User Runs Formatter Externally

Session:
- Claude edits app.py
- User runs: prettier --write app.py
- Claude references app.py again
- Cache auto-refreshes, Claude sees formatted version ✓

2. Git Operations

Session:
- Claude reviews files on feature branch
- User runs: git checkout main
- Claude references files again
- Cache auto-refreshes, Claude sees main branch versions ✓

3. Hot Reload / Build Tools

Session:
- Claude edits component.tsx
- Next.js rebuild adds generated code
- Claude edits component again
- Cache auto-refreshes, Claude sees generated additions ✓

4. IDE Auto-saves

Session:
- Claude suggests changes
- User manually edits in IDE (auto-saves)
- Claude needs to reference file
- Cache auto-refreshes, sees user's manual edits ✓

Benefits & ROI

Context Reduction

Current session breakdown:

File reads (repeated):      80,000 tokens  ← TARGET
Bash/grep outputs:          30,000 tokens
System prompts:             30,000 tokens
Assistant responses:        30,000 tokens
User messages:              20,000 tokens
--------------------------------------------
Total:                     190,000 tokens

With ephemeral file context:

File states (current only): 20,000 tokens  ← 75% reduction
Bash/grep outputs:          30,000 tokens
System prompts:             30,000 tokens
Assistant responses:        30,000 tokens
User messages:              20,000 tokens
--------------------------------------------
Total:                     130,000 tokens

Result: 30-40% total context reduction

Session Length

Current: Hit 200k limit after ~30-50 turns
With ephemeral context: Hit 200k limit after ~50-80 turns
Effective increase: 1.5-2x longer sessions

Cost Reduction

With Anthropic prompt caching:

Cached input tokens: 10x cheaper than regular input tokens
File context marked as cacheable
5-minute cache TTL (or 1-hour extended)

Cost calculation (assuming file context is 40% of input):

Current cost per API call:
- 100k input tokens × $3/1M = $0.30

With caching (after first call):
- 40k cached file tokens × $0.30/1M = $0.012
- 60k regular tokens × $3/1M = $0.18
- Total: $0.192 (36% reduction)

Latency Reduction

Anthropic reports 85% latency reduction for cached content. With 40% of context cached:

Approximate latency improvement: 30-40%

Implementation Effort

Phase 1: Core Implementation (1-2 weeks)

File cache data structure: 1 day
Modify Read/Write/Edit tools: 2 days
Prompt construction refactor: 2 days
Check-on-use validation: 1 day
Prompt caching integration: 1 day
Testing & edge cases: 2-3 days

Phase 2: Enhancements (Optional, 3-5 days)

File watching for real-time feedback: 2 days
Cache size limits and eviction policy: 1 day
Cache analytics/debugging tools: 1 day
Performance optimization: 1 day

Total: 2-3 weeks for full implementation

Edge Cases & Considerations

1. Large Files

Problem: 10MB file in cache

Solution:

Set max file size for caching (e.g., 100KB)
Fall back to current behavior for oversized files
Or: Cache file but use chunking/windowing

2. Binary Files

Problem: Images, PDFs in cache

Solution:

Only cache text files
Detect via extension or content-type
Binary files use current flow

3. Cache Eviction

Problem: 100 files in cache = too much context

Solution:

LRU eviction policy
Max cache size (e.g., 50 files or 500KB total)
Explicitly evict files not accessed in last N turns

4. Deleted Files

Problem: File in cache gets deleted

Solution:

Check fs.existsSync() during validation
Remove from cache if deleted
Handle gracefully in buildFileContext()

5. Renamed/Moved Files

Problem: File moves, cache has old path

Solution:

Treat as delete + new file
Cache keyed by absolute path
New read creates new cache entry

6. Symlinks

Problem: Symlink resolution and caching

Solution:

Resolve symlinks to real paths
Cache by resolved path
Handle broken symlinks gracefully

Configuration

interface EphemeralCacheConfig {
  enabled: boolean;              // Default: true
  maxFiles: number;              // Default: 50
  maxTotalSize: number;          // Default: 5MB (in bytes)
  maxFileSize: number;           // Default: 1MB (in bytes)
  fileWatching: boolean;         // Default: false (opt-in)
  cacheValidation: 'always' | 'on-change' | 'manual';  // Default: 'always'
  cacheTTL: number;              // Default: 300000 (5 min, matches Anthropic default)
  allowedExtensions?: string[];  // Default: all text files
  excludedPaths?: string[];      // Default: node_modules, .git, etc.
}

Debugging & Observability

// Cache statistics
interface CacheStats {
  totalFiles: number;
  totalSize: number;
  hits: number;              // Files served from cache
  misses: number;            // Files read fresh
  updates: number;           // External change detections
  evictions: number;         // LRU evictions
  cacheEfficiency: number;   // hits / (hits + misses)
}

// Logging
console.log('[Cache] Statistics:', getCacheStats());
console.log('[Cache] Current files:', Array.from(fileCache.keys()));

Testing Strategy

Unit Tests

File read/write/edit with cache updates
Hash change detection
Cache validation logic
Eviction policy
Edge cases (deleted files, large files, etc.)

Integration Tests

Full session with file operations
External file modifications
Git operations during session
Build tool interactions
Prompt construction with cache

Performance Tests

Cache validation overhead (should be <50ms)
Large cache performance (50+ files)
Memory usage profiling
Prompt caching effectiveness

Migration Path

Phase 1: Beta Flag

const USE_EPHEMERAL_CACHE = process.env.CLAUDE_CODE_EPHEMERAL_CACHE === 'true';

Opt-in via environment variable
Gather feedback from beta users
Monitor cache hit rates and performance

Phase 2: Default Enabled

Enable by default after validation
Keep flag for opt-out
Monitor error rates and user feedback

Phase 3: Remove Old Path

After several versions, remove non-cached flow
Keep configuration for cache tuning

Success Metrics

Target Metrics (measure after 1 month):

Average session length: +50% increase
Context usage per session: -30% reduction
Cache hit rate: >80%
User-reported “hit context limit”: -60% reduction
API cost per session: -30% reduction (with caching)

Monitoring:

Track cache statistics per session
A/B test with control group (old behavior)
User surveys on session experience

Anthropic Prompt Caching

Similar Patterns

Letta Memory Blocks - Mutable context management
Cursor IDE - Similar file caching approach
Continue.dev - Context management for coding assistants

Conclusion

Ephemeral file context is a high-impact, medium-effort feature that addresses the single biggest source of context bloat in Claude Code sessions.

Key benefits:

30-40% context reduction
1.5-2x longer sessions
90% cost reduction on file reads (with caching)
Better handling of external file changes
Minimal performance overhead

Recommendation: Implement Phase 1 (core + check-on-use) immediately. Add file watching in Phase 2 based on user feedback.

This is the 80/20 optimization for Claude Code context efficiency.

ephemeral-file-context-proposal

Problem Statement

Proposed Solution

Architecture

Current Flow

Proposed Flow

Implementation

Core Data Structure

File Tool Modifications

Read Tool

Write/Edit Tools

Prompt Construction

Auto-Update on External Changes

Approach: Check-on-Use (Recommended)

Alternative: Active File Watching (Optional Enhancement)

Use Cases

1. User Runs Formatter Externally

2. Git Operations

3. Hot Reload / Build Tools

4. IDE Auto-saves

Benefits & ROI

Context Reduction

Session Length

Cost Reduction

Latency Reduction

Implementation Effort

Edge Cases & Considerations

1. Large Files

2. Binary Files

3. Cache Eviction

4. Deleted Files

5. Renamed/Moved Files

6. Symlinks

Configuration

Debugging & Observability

Testing Strategy

Unit Tests

Integration Tests

Performance Tests

Migration Path

Phase 1: Beta Flag

Phase 2: Default Enabled

Phase 3: Remove Old Path

Success Metrics

Related Work

Anthropic Prompt Caching

Similar Patterns

Conclusion