Purpose

This document provides practical examples and visualizations to help understand what 100,000 tokens (100k) of context means in terms of articles, codebases, documentation, and other content types.

Key Finding: The 75,000 Word Rule

100,000 tokens ≈ 75,000 words

This is the most reliable conversion ratio for English text, as confirmed by Anthropic’s original 100K context window announcement.

Token-to-Text Fundamentals

Basic Ratios

  • 1 token ≈ 0.75 words (or 75 words per 100 tokens)
  • 1 token ≈ 4-5 characters in English text
  • Token counts vary by language, format, and content type

Why Tokens Aren’t Words

Tokens are subword units that LLMs use for processing. Common words might be single tokens, while uncommon words or technical terms may be split into multiple tokens.

Practical Size Comparisons

100k Tokens as Text

MeasureEquivalent
Words~75,000 words
Pages~75-100 pages (single-spaced)
Books13% of “War and Peace”
Harry PotterRoughly 1.3x the first Harry Potter book (76,944 words)
Characters~4-5 million characters
Audio~6 hours of transcription

100k Tokens as Code

Code token density varies significantly by language:

LanguageLines of Code (LOC)Notes
Python~10,000 linesMore verbose, clearer syntax
JavaScript~14,285 linesShorter tokens, compact syntax
SQL~8,695 linesDenser, more keywords
Average~5,000-10,000 linesConservative estimate across languages

Rule of Thumb:

  • 100 lines of Python ≈ 1,000 tokens
  • 100 lines of JavaScript ≈ 700 tokens
  • 100 lines of SQL ≈ 1,150 tokens

100k Tokens as Documentation

Documentation TypeEquivalent
API Documentation10-15 large API reference docs
Technical Specifications3-5 comprehensive specs
Research Papers15-20 academic papers (assuming ~5,000 words each)
README Files50-100 detailed README files
Corporate Reports1-2 full annual reports

Context Window Comparisons

To put 100k in perspective:

Token CountSize DescriptionUse Cases
8,000~6,000 wordsDetailed conversation, single code file
32,000~24,000 wordsMultiple related files, documentation section
100,000~75,000 wordsEntire codebase, multiple research papers
128,000~96,000 wordsFull codebases, hour-long meeting transcripts
200,000~150,000 words500 pages, multiple books
1,000,000~750,000 words”War and Peace” sized content
100,000,000~75M words10M+ lines of code, 750 novels

Real-World Context Consumption

Claude Code Sessions: Processing Novels Daily

200k token context limit ≈ 2.5 Harry Potter books

In a typical Claude Code coding session that hits the 200k token limit, you’re collectively reading and generating about 2-3 novels worth of text:

  • Reading codebase files (often repeatedly)
  • Tool outputs (grep, bash, git diffs)
  • Generated code and explanations
  • Conversation back-and-forth
  • System prompts and instructions

Multiple sessions per day? You’re easily processing 5-10+ Harry Potter books worth of information daily. That’s an incredible amount of information throughput when you think about it as literature.

The key insight: context consumption ≠ content creation. Most of the 200k is reading existing code repeatedly, not generating new content. You might only generate 20-30k tokens (~20,000 words) of truly new text per session, but you’re processing 200k tokens total.

Think of it like a researcher who reads 10 books to write 1 paper - the context window is all the reading, not just the writing.

Real-World Use Cases for 100k Context

With 100,000 token context windows, you can:

Business & Finance

  • Analyze full annual reports for strategic risks and opportunities
  • Digest and summarize dense financial statements
  • Process complete quarterly earnings calls with Q&A
  • Assess pros and cons of legislation
  • Identify risks and themes across multiple legal documents
  • Compare different forms of legal argument

Development

  • Process medium-sized codebases in a single context
  • Include entire API documentation sets
  • Analyze dependencies and relationships across many files

Research

  • Summarize and synthesize multiple research papers
  • Extract themes across large document collections
  • Cross-reference findings from different sources

Visualization Tools

Interactive Visualizations

  1. One Million Tokens Visualized

    • Interactive tool showing token size and meaning
    • Visualizes words, characters, and pages at scale
    • Helpful for understanding larger context windows
  2. GitHub 128k Tokens Visualization

    • Visual comparison of different LLM context window sizes
    • Shows relative scale between models
  3. Token Translator

    • Convert between tokens and different content types
    • Practical calculator for planning context usage

Token Counting Tools

Quick Reference Examples

Example 1: A Medium-Sized Web Application

Frontend:
- React components: ~3,000 lines (~300 tokens per 100 lines)
- CSS/styling: ~2,000 lines
- TypeScript types: ~1,000 lines
Backend:
- API routes: ~2,000 lines
- Database models: ~1,000 lines
- Business logic: ~3,000 lines
Total: ~12,000 lines ≈ 100,000 tokens

Example 2: Technical Documentation Set

- Installation guide: ~5,000 words
- API reference: ~30,000 words
- Architecture overview: ~10,000 words
- Tutorial series: ~20,000 words
- FAQ: ~10,000 words
Total: ~75,000 words ≈ 100,000 tokens

Example 3: Research Compilation

- 15 academic papers × ~5,000 words each = 75,000 words
OR
- 5 comprehensive research reports × ~15,000 words each = 75,000 words
Total: ~75,000 words ≈ 100,000 tokens

The Context Inefficiency Problem

Why Files Are Read Repeatedly

When Claude Code reads a file multiple times, each read adds to the conversation history:

Turn 1: Read app.py (1,000 tokens) → added to conversation
Turn 5: Read app.py again (1,000 tokens) → added to conversation again
Turn 10: Read app.py again (1,000 tokens) → added to conversation again
Turn 15: Read app.py again (1,000 tokens) → added to conversation again
Total: 4,000 tokens for the same file

This happens because tool results are appended to the main conversation thread.

The Simple Solution: Ephemeral File Context

The architecture fix is straightforward:

Instead of:

[System Prompt]
[Conversation Thread]:
- User message
- Assistant message
- Tool call: Read(app.py)
- Tool result: <1000 tokens of app.py> ← Goes into main thread
- User message
- Assistant message
- Tool call: Read(app.py)
- Tool result: <1000 tokens of app.py> ← Duplicate in main thread

Do this:

[System Prompt]
[Ephemeral File Context - CACHED]:
app.py: <current state - 1000 tokens>
database.py: <current state - 800 tokens>
[Conversation Thread]:
- User message
- Assistant message
- Tool call: Read(app.py) (reference only, no full result)
- User message
- Assistant message (already has app.py from ephemeral context)

When sending to the API, construct:

  1. System prompt
  2. Ephemeral file context with current file states only
  3. Conversation thread (just messages, not tool result contents)

File updates: When a file is edited, update it in-place in the ephemeral context.

Result: Reading a 1000-token file 10 times = 1000 tokens total (not 10,000).

This is a major source of potential context savings in coding sessions.

Why Claude Code Doesn’t Do This Yet

The fix is architecturally straightforward, but requires:

  1. Refactoring tool result handling - Tool results currently flow directly into conversation history
  2. State tracking - Need to maintain a side-store of current file states keyed by path
  3. Cache invalidation logic - Determine when to update ephemeral context vs. use cached version
  4. Prompt construction changes - Build API calls with separate ephemeral + conversation sections

Prompt caching enables this: Anthropic’s prompt caching is designed exactly for this pattern:

  • Mark ephemeral file context as cacheable
  • 90% cost reduction for cached tokens
  • 85% latency reduction
  • Cache invalidates when content changes (5-min or 1-hour TTL)

The implementation just hasn’t been prioritized, likely because:

  • Claude Code was built before prompt caching existed
  • Engineering effort vs. benefit tradeoff
  • Edge cases around non-file tool results (git diffs, grep output, etc.)

Impact of Ephemeral File Context

If Claude Code used ephemeral file context:

Current 200k session breakdown:

  • 80k tokens: Repeated file reads (same files, multiple times)
  • 40k tokens: Tool outputs (grep, bash, git)
  • 30k tokens: System prompts
  • 30k tokens: Assistant responses
  • 20k tokens: User messages

With ephemeral file context:

  • 20k tokens: File states (current only, not repeated)
  • 40k tokens: Tool outputs
  • 30k tokens: System prompts
  • 30k tokens: Assistant responses
  • 20k tokens: User messages

Result: 200k session → ~140k session (30% reduction), or equivalently, much longer sessions before hitting limits.

This doesn’t require architectural changes to transformers - just a different way of organizing the prompt sent to the API.

Important Considerations

Token Count Variability

Token counts are not fixed - they depend on:

  1. Language: Non-English text often requires more tokens
  2. Technical Terms: Specialized vocabulary may be split into multiple tokens
  3. Code vs Prose: Code typically uses tokens differently than natural language
  4. Format: JSON, XML, and structured data have different token patterns

Context Window ≠ Usable Context

While models may have 100k token windows:

  • Input + Output share the window
  • Reserve tokens for the model’s response
  • Some tasks need buffer space for reasoning
  • Quality may degrade at maximum capacity

The Cost Factor

Larger contexts cost more:

  • Most APIs charge per token (input + output)
  • 100k token requests are expensive at scale
  • Consider if you need the full context or can use RAG/chunking strategies

Historical Context

Claude’s 100K Breakthrough (May 2023)

Anthropic’s introduction of 100K context windows was a major milestone, representing a ~5-10x increase over previous models. This enabled entirely new use cases like:

  • Full codebase analysis
  • Multi-document synthesis
  • Long-form document generation
  • Complex conversation threads

Today, context windows have grown even larger (200k, 1M, even 100M tokens), but 100k remains a practical sweet spot for many applications balancing capability and cost.

Sources

  1. Introducing 100K Context Windows - Anthropic
  2. One Million Tokens Visualized
  3. GitHub - 128k Tokens Visualization
  4. Understanding LLM Token Counts - Medium
  5. Visualizing Token Limits in LLMs - Galecia Group
  6. Calculating LLM Token Counts: A Practical Guide - Winder AI
  7. Code to Tokens Conversion: A Developer’s Guide - 16x Prompt
  8. Sebastian Raschka on X: Harry Potter Token Count
  9. Codebase Token Counter - GitHub
  10. Token Translator Tool