voyage-reranker

Purpose

Guide to using Voyage AI’s reranker API (rerank-2.5 series) for improving retrieval quality and implementing multi-label search strategies across knowledge graph entity types.

Voyage AI Reranker (August 2025)

Latest Models

Voyage AI’s rerank-2.5 series introduces instruction-following capabilities and 8x longer context than competitors:

Model	Context	Best For	Performance vs Cohere v3.5
rerank-2.5	32K tokens	Quality-critical retrieval	+7.94% (standard), +12.70% (MAIR)
rerank-2.5-lite	32K tokens	Latency-sensitive applications	+7.16% (standard), +10.36% (MAIR)
rerank-2	16K tokens	Legacy	—
rerank-2-lite	8K tokens	Legacy	—

Context Advantage: 32K tokens = 8x Cohere Rerank v3.5, 2x rerank-2, enabling accurate retrieval across longer documents.

How It Works

Voyage’s reranker is a cross-encoder that:

Takes a query and list of candidate documents
Jointly processes each query-document pair
Outputs relevance scores for precise ranking
Refines results from fast embedding-based retrieval

Unlike bi-encoders (embeddings), cross-encoders see both query and document together, providing higher accuracy at the cost of higher latency.

API Usage

Python SDK:

import voyageai

client = voyageai.Client(api_key="your-api-key")

query = "bun link command"
documents = [
    "Tool: bun link. Bun's package linking command for local development workflows",
    "Tool: Pluribus. Multi-player poker AI system developed by Facebook AI",
    "Technology: Bun Package Manager. Fast package manager included in Bun runtime"
]

result = client.rerank(
    query=query,
    documents=documents,
    model="rerank-2.5-lite",
    top_k=3
)

for doc in result.results:
    print(f"{doc.index}: {doc.relevance_score:.4f}")

REST API:

curl https://api.voyageai.com/v1/rerank \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $VOYAGE_API_KEY" \
  -d '{
    "query": "bun link command",
    "documents": ["...", "...", "..."],
    "model": "rerank-2.5-lite",
    "top_k": 3
  }'

API Parameters

Parameter	Type	Required	Description
`query`	string	Yes	Search query (max 8K tokens for rerank-2.5)
`documents`	List[str]	Yes	Up to 1,000 documents to rerank
`model`	string	Yes	`rerank-2.5`, `rerank-2.5-lite`, etc.
`top_k`	int	No	Return only top-k most relevant results
`truncation`	bool	No	Auto-truncate oversized inputs (default: true)

Token Limits:

Total tokens: (query tokens × doc count) + sum(document tokens)
rerank-2.5/lite: 600K max
rerank-1/lite: 300K max

Instruction-Following Capabilities

New in rerank-2.5: Steer relevance scoring with natural language:

# Domain-specific relevance
query = "Instruction: Prioritize recent research papers over blog posts. Query: transformers architecture"

# Custom criteria
query = "Instruction: Focus on code examples with TypeScript. Query: react hooks"

This allows fine-grained control over what the reranker considers “relevant” without retraining.

Multi-Label Querying Strategy

The Challenge

Knowledge graphs contain multiple entity types (labels):

Document: Full research documents with comprehensive content
Technology: Technical tools/frameworks with descriptions
Tool: Specific utilities and commands
Concept: Abstract ideas and patterns

Searching across all labels requires a strategy to combine and rank heterogeneous results.

Architecture: Two-Stage Retrieval + Reranking

Stage 1: Vector Search Across Labels

// Lattice's current approach (src/graph/graph.service.ts)
async vectorSearchAll(
  queryEmbedding: number[],
  limit: number = 20
): Promise<SearchResult[]> {
  const labels = ["Document", "Technology", "Tool", "Concept", "Person", ...];

  // Query each label separately
  const labelPromises = labels.map(label =>
    this.vectorSearch(label, queryEmbedding, limit)
  );

  const allResults = await Promise.all(labelPromises);

  // Flatten and sort by similarity score
  return allResults
    .flat()
    .sort((a, b) => b.score - a.score)
    .slice(0, limit);
}

Problem: Similarity scores aren’t directly comparable across entity types with different text lengths/structures.

Stage 2: Cross-Encoder Reranking

Apply Voyage reranker to refine combined results:

async searchWithReranking(
  query: string,
  limit: number = 20
): Promise<SearchResult[]> {
  // Stage 1: Fast vector search (top 100)
  const queryEmbedding = await this.embedding.generateQueryEmbedding(query);
  const candidates = await this.graph.vectorSearchAll(queryEmbedding, 100);

  // Stage 2: Precise reranking (top 20)
  const documents = candidates.map(c =>
    `${c.label}: ${c.name}. ${c.description || ''}`
  );

  const reranked = await voyageClient.rerank({
    query: query,
    documents: documents,
    model: "rerank-2.5-lite",
    top_k: limit
  });

  // Map back to original entities
  return reranked.results.map(r => ({
    ...candidates[r.index],
    rerankScore: r.relevance_score
  }));
}

When to Use Reranking

Scenario	Approach	Rationale
Simple entity search	Vector search only	Fast, good enough for single-label queries
Multi-label search	Vector + reranking	Essential for comparing heterogeneous entities
Long documents	Vector + reranking	Cross-encoder handles context better
Domain-specific	Reranking with instructions	Steer relevance criteria
Latency-critical	Vector only or rerank-2.5-lite	Balance speed vs accuracy

Example: Multi-Label Search in Practice

Before (vector search only):

Search: "bun link"
1. Pluribus (poker AI) - 84.21%  ❌ High score, wrong domain
2. Libratus (poker AI) - 82.57%  ❌ High score, wrong domain
3. bun link - 82.07%  ✅ Correct but buried

After (query embeddings):

1. bun link - 69.90%  ✅ Correct
2. bun-link.md - 65.06%  ✅ Correct
3. Bun Package Manager - 57.25%  ✅ Correct

With reranking (theoretical):

1. bun link - 0.95  ✅ Cross-encoder confirms relevance
2. Bun Package Manager - 0.89  ✅ Semantic connection
3. bun-link.md - 0.87  ✅ Document with full context

Implementation Considerations

Cost vs Benefit

Voyage reranker pricing (as of 2025):

rerank-2.5: ~$2.00 per 1M tokens
rerank-2.5-lite: ~$0.50 per 1M tokens

For Lattice with ~1K queries/day:

Vector search only: $0 (local computation)
Vector + reranking (lite): ~$5-10/month
Worth it? Yes for production, optional for personal use

Latency Impact

Stage	Latency
Vector search (100 candidates)	~10-50ms
Reranking (20 results)	~100-300ms
Total	~150-350ms

Still well within acceptable range for interactive search.

Integration with Lattice

Proposed enhancement to src/commands/query.command.ts:

@Option({
  flags: "--rerank",
  description: "Use Voyage reranker for improved accuracy"
})
parseRerank(value: boolean): boolean {
  return value;
}

async run(inputs: string[], options: SearchCommandOptions): Promise<void> {
  const query = inputs[0];
  const queryEmbedding = await this.embeddingService.generateQueryEmbedding(query);

  if (options.rerank) {
    // Two-stage: vector + reranking
    const candidates = await this.graphService.vectorSearchAll(queryEmbedding, 100);
    const results = await this.rerankResults(query, candidates, options.limit);
  } else {
    // Fast vector search only
    const results = await this.graphService.vectorSearchAll(queryEmbedding, options.limit);
  }

  // Display results...
}

Research Findings: Multi-Label KG Querying

Recent research (2025) on knowledge graph retrieval with reranking:

Knowledge Graph-Guided RAG

Multi-path subgraph construction: Incorporate one-hop, multi-hop, and importance-based relations
Query-aware attention: Reward models score subgraph triples by semantic relevance
Key insight: Embedding chunk metadata with text > powerful rerankers alone

ReranKGC Framework

Retrieve-and-rerank pipeline for multi-modal knowledge graph completion
Uses KGC-CLIP to extract multi-modal knowledge for candidate re-ranking
Published April 2025 in Neural Networks journal

AR-Align

Unsupervised multi-view contrastive learning for entity alignment
Attention-based reranking: Reranks hard entities by weighted similarity across different structures
Improves precision for ambiguous entity matching

GraphRAG Best Practices

From Neo4j Advanced RAG Techniques (2025):

Knowledge graphs unify scattered data (docs, tables, APIs)
GraphRAG retrieves along entity connections
Improves disambiguation and multi-hop answers
Keeps sources traceable

Recommendation for Lattice: Implement relationship-aware reranking that considers entity connections, not just text similarity.

Best Practices

1. Use Query Input Type + Reranking

// Best: Both optimizations
const queryEmbed = await embedding.generateQueryEmbedding(query);  // input_type="query"
const candidates = await graph.vectorSearchAll(queryEmbed, 100);
const results = await voyageClient.rerank({ query, documents: candidates, top_k: 20 });

2. Batch Reranking for Efficiency

// Efficient: Rerank 100 candidates → 20 results (1 API call)
// Inefficient: Rerank each label separately (multiple API calls)

3. Cache Reranking Results

For common queries, cache reranking results to avoid redundant API calls.

4. Use Instructions for Domain-Specific Search

const domainQuery = `Instruction: Prioritize technical documentation over blog posts. Query: ${userQuery}`;

5. Monitor Token Usage

// Estimate tokens before calling rerank
const estimatedTokens = (query.length + documents.reduce((sum, d) => sum + d.length, 0)) / 4;
if (estimatedTokens > 600000) {
  // Reduce candidate count or truncate documents
}

Comparison: Rerankers (2025)

Provider	Model	Context	Price/1M tokens	Performance
Voyage AI	rerank-2.5	32K	~$2.00	Best (MAIR +12.7%)
Voyage AI	rerank-2.5-lite	32K	~$0.50	Good (+10.36%)
Cohere	Rerank v3.5	4K	~$2.00	Baseline
Jina AI	jina-reranker-v2	8K	$0.70	Competitive

Voyage offers the longest context and best performance in 2025.

Sources

Last updated: 2025-12-07