ultrathink-vs-thinking-mode

Executive Summary

Ultrathink was a Claude Code v1-specific keyword triggering extended thinking with ~32K tokens. Claude Code v2 replaced it with a simple “Thinking On/Off” toggle (TAB key). The Extended Thinking API remains the programmatic way to control thinking budgets for all Claude interfaces.

Three Distinct Concepts

Feature	Where It Works	How to Activate	Token Budget
Ultrathink (deprecated)	Claude Code v1 only	Keywords: “think”, “think harder”, “ultrathink”	4K → 10K → 32K
Thinking On/Off	Claude Code v2	TAB key toggle	Configurable via settings
Extended Thinking API	API, Claude App	`thinking: {type: "enabled", budget_tokens: N}`	User-specified

Ultrathink (Claude Code v1 - Deprecated)

What It Was

In Claude Code v1, specific phrases triggered increasing levels of thinking budget:

"think"        → 4,000 tokens   (routine debugging)
"megathink"    → 10,000 tokens  (architectural decisions)
"ultrathink"   → 31,999 tokens  (deep sustained reasoning)

Trigger Phrases

The system recognized various phrasings:

Low: “think”, “think about this”
Medium: “think hard”, “think harder”, “think really hard”, “think super hard”
High: “ultrathink”, “think intensely”, “think longer”

Why It Was Removed

Claude Code v2 simplified the UX by replacing keyword-based levels with a simple toggle, providing more transparent control.

Claude Code V2: Thinking On/Off

Current Approach

TAB key toggles thinking mode on/off in Claude Code v2:

Thinking On: Claude performs extended reasoning before responding
Thinking Off: Claude responds immediately without extended thinking

Configuration

Token budget is now configurable in Claude Code settings rather than keyword-based.

Migration Guide

Claude Code v1	Claude Code v2
”ultrathink this problem”	Press TAB → Thinking On
”think about X”	Press TAB → Thinking On
Normal prompts	Thinking Off (default)

Extended Thinking API

API Usage

{
  "model": "claude-opus-4-5-20251101",
  "max_tokens": 4096,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  },
  "messages": [
    {
      "role": "user",
      "content": "Solve this complex problem..."
    }
  ]
}

Token Budget Recommendations

Task Complexity	Recommended Budget	Use Case
Simple	Skip extended thinking	Syntax fixes, formatting
Moderate	2,000 - 5,000 tokens	Code review, debugging
Complex	5,000 - 15,000 tokens	Architecture design, refactoring
Very Complex	15,000 - 32,000 tokens	Novel algorithms, research

Cost Implications

Extended thinking tokens are charged at the same rate as input tokens:

Example: Opus 4.5 with 10K thinking budget

Base cost: $15/M input tokens
Thinking: 10,000 tokens × $15/M = $0.15
Output: 1,000 tokens × $75/M = $0.075
Total: $0.225 per request

Interleaved Thinking (Claude 4+)

What It Enables

With interleaved thinking, Claude can think between tool calls rather than only before the first response.

API Usage

curl https://api.anthropic.com/v1/messages \
  -H "anthropic-beta: interleaved-thinking-2025-05-14" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -d '{
    "model": "claude-4-20250430",
    "thinking": {
      "type": "enabled",
      "budget_tokens": 10000
    },
    "messages": [...]
  }'

Benefits

More sophisticated reasoning after receiving tool results
Better decision-making in multi-step workflows
Adaptive strategy based on intermediate findings

The “Think” Tool (Different Concept)

Key Distinction

Feature	When It Happens	Purpose
Extended Thinking	BEFORE generating response	Deep reasoning on the problem
Think Tool	DURING response generation	Pause to check if more info needed

Think Tool Use Case

# Claude generates code, then uses think tool:
def process_data(data):
    # ... some code ...
    <think>
    Wait, I should check if the user wants error handling
    for edge cases. Let me ask before continuing.
    </think>

The think tool allows Claude to stop mid-response and request more information.

When to Use Each Mode

Use Extended Thinking (API/App) When:

✅ Solving novel problems without clear solutions ✅ Complex architectural decisions ✅ Multi-step mathematical reasoning ✅ Research and analysis tasks ✅ Code that requires exploring multiple approaches

Skip Extended Thinking When:

❌ Simple syntax fixes ❌ Formatting tasks ❌ Well-specified problems with clear instructions ❌ Iteration speed is critical ❌ Budget-conscious applications

Warning: Extended thinking can make Claude MORE verbose and LESS accurate on basic tasks while adding latency and cost.

Performance Characteristics

Accuracy vs Thinking Tokens

Claude’s accuracy improves logarithmically with thinking tokens:

1,000 tokens  → Baseline
5,000 tokens  → +10% accuracy (estimated)
10,000 tokens → +15% accuracy
32,000 tokens → +20% accuracy

Diminishing returns after ~15K tokens for most tasks.

Latency Impact

Thinking Budget	Added Latency	Total Time (estimate)
No thinking	0s	2-5s
5,000 tokens	+2-4s	4-9s
10,000 tokens	+4-8s	6-13s
32,000 tokens	+10-20s	12-25s

Common Misconceptions

❌ Myth: “Ultrathink” works in the API

Reality: The API requires explicit budget_tokens parameter. Keywords like “ultrathink” have no effect outside Claude Code v1.

// ❌ This doesn't work
{"thinking": {"type": "ultrathink"}}

// ✅ This works
{"thinking": {"type": "enabled", "budget_tokens": 30000}}

❌ Myth: More thinking always = better results

Reality: Extended thinking is counterproductive for:

Simple, well-defined tasks
Tasks requiring quick iteration
When Claude has clear instructions

❌ Myth: Extended thinking = different model

Reality: It’s the same model spending more time reasoning before responding, not a different model architecture.

Best Practices

1. Start Conservative

Begin with lower budgets (5K-10K) and increase only if:

Responses lack depth
Claude makes preventable mistakes
Task clearly benefits from more reasoning

2. Match Budget to Task

Simple debugging     → Skip thinking
Code review          → 2K-5K tokens
Architecture design  → 10K-15K tokens
Research problems    → 15K-30K tokens

3. Monitor Cost vs Benefit

Track:

Success rate improvement vs baseline
Cost increase vs value gained
Time-to-solution vs thinking budget

4. Use in Agent Loops

Extended thinking is most valuable in agent loops where:

Single attempt must be highly accurate
Retry cost is high (time/money)
Wrong decisions compound over multiple steps

Evolution Timeline

Date	Change	Impact
2024 Q2	Extended Thinking API launched	Programmable thinking budgets
2024 Q3	Claude Code v1: Ultrathink keywords	Easy access via “ultrathink” trigger
2024 Q4	”Think” tool introduced	Mid-response reasoning
2025 Q1	Claude 4 + Interleaved Thinking	Think between tool calls
2025 Q2	Claude Code v2: TAB toggle	Deprecated keyword-based levels

ultrathink-vs-thinking-mode

Executive Summary

Three Distinct Concepts

Ultrathink (Claude Code v1 - Deprecated)

What It Was

Trigger Phrases

Why It Was Removed

Claude Code V2: Thinking On/Off

Current Approach

Configuration

Migration Guide

Extended Thinking API

API Usage

Token Budget Recommendations

Cost Implications

Interleaved Thinking (Claude 4+)

What It Enables

API Usage

Benefits

The “Think” Tool (Different Concept)

Key Distinction

Think Tool Use Case

When to Use Each Mode

Use Extended Thinking (API/App) When:

Skip Extended Thinking When:

Performance Characteristics

Accuracy vs Thinking Tokens

Latency Impact

Common Misconceptions

❌ Myth: “Ultrathink” works in the API

❌ Myth: More thinking always = better results

❌ Myth: Extended thinking = different model

Best Practices

1. Start Conservative

2. Match Budget to Task

3. Monitor Cost vs Benefit

4. Use in Agent Loops

Evolution Timeline

Sources