Purpose

Guide to using Voyage AI’s reranker API (rerank-2.5 series) for improving retrieval quality and implementing multi-label search strategies across knowledge graph entity types.

Voyage AI Reranker (August 2025)

Latest Models

Voyage AI’s rerank-2.5 series introduces instruction-following capabilities and 8x longer context than competitors:

ModelContextBest ForPerformance vs Cohere v3.5
rerank-2.532K tokensQuality-critical retrieval+7.94% (standard), +12.70% (MAIR)
rerank-2.5-lite32K tokensLatency-sensitive applications+7.16% (standard), +10.36% (MAIR)
rerank-216K tokensLegacy
rerank-2-lite8K tokensLegacy

Context Advantage: 32K tokens = 8x Cohere Rerank v3.5, 2x rerank-2, enabling accurate retrieval across longer documents.

How It Works

Voyage’s reranker is a cross-encoder that:

  1. Takes a query and list of candidate documents
  2. Jointly processes each query-document pair
  3. Outputs relevance scores for precise ranking
  4. Refines results from fast embedding-based retrieval

Unlike bi-encoders (embeddings), cross-encoders see both query and document together, providing higher accuracy at the cost of higher latency.

API Usage

Python SDK:

import voyageai
client = voyageai.Client(api_key="your-api-key")
query = "bun link command"
documents = [
"Tool: bun link. Bun's package linking command for local development workflows",
"Tool: Pluribus. Multi-player poker AI system developed by Facebook AI",
"Technology: Bun Package Manager. Fast package manager included in Bun runtime"
]
result = client.rerank(
query=query,
documents=documents,
model="rerank-2.5-lite",
top_k=3
)
for doc in result.results:
print(f"{doc.index}: {doc.relevance_score:.4f}")

REST API:

Terminal window
curl https://api.voyageai.com/v1/rerank \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $VOYAGE_API_KEY" \
-d '{
"query": "bun link command",
"documents": ["...", "...", "..."],
"model": "rerank-2.5-lite",
"top_k": 3
}'

API Parameters

ParameterTypeRequiredDescription
querystringYesSearch query (max 8K tokens for rerank-2.5)
documentsList[str]YesUp to 1,000 documents to rerank
modelstringYesrerank-2.5, rerank-2.5-lite, etc.
top_kintNoReturn only top-k most relevant results
truncationboolNoAuto-truncate oversized inputs (default: true)

Token Limits:

  • Total tokens: (query tokens × doc count) + sum(document tokens)
  • rerank-2.5/lite: 600K max
  • rerank-1/lite: 300K max

Instruction-Following Capabilities

New in rerank-2.5: Steer relevance scoring with natural language:

# Domain-specific relevance
query = "Instruction: Prioritize recent research papers over blog posts. Query: transformers architecture"
# Custom criteria
query = "Instruction: Focus on code examples with TypeScript. Query: react hooks"

This allows fine-grained control over what the reranker considers “relevant” without retraining.

Multi-Label Querying Strategy

The Challenge

Knowledge graphs contain multiple entity types (labels):

  • Document: Full research documents with comprehensive content
  • Technology: Technical tools/frameworks with descriptions
  • Tool: Specific utilities and commands
  • Concept: Abstract ideas and patterns

Searching across all labels requires a strategy to combine and rank heterogeneous results.

Architecture: Two-Stage Retrieval + Reranking

Stage 1: Vector Search Across Labels

// Lattice's current approach (src/graph/graph.service.ts)
async vectorSearchAll(
queryEmbedding: number[],
limit: number = 20
): Promise<SearchResult[]> {
const labels = ["Document", "Technology", "Tool", "Concept", "Person", ...];
// Query each label separately
const labelPromises = labels.map(label =>
this.vectorSearch(label, queryEmbedding, limit)
);
const allResults = await Promise.all(labelPromises);
// Flatten and sort by similarity score
return allResults
.flat()
.sort((a, b) => b.score - a.score)
.slice(0, limit);
}

Problem: Similarity scores aren’t directly comparable across entity types with different text lengths/structures.

Stage 2: Cross-Encoder Reranking

Apply Voyage reranker to refine combined results:

async searchWithReranking(
query: string,
limit: number = 20
): Promise<SearchResult[]> {
// Stage 1: Fast vector search (top 100)
const queryEmbedding = await this.embedding.generateQueryEmbedding(query);
const candidates = await this.graph.vectorSearchAll(queryEmbedding, 100);
// Stage 2: Precise reranking (top 20)
const documents = candidates.map(c =>
`${c.label}: ${c.name}. ${c.description || ''}`
);
const reranked = await voyageClient.rerank({
query: query,
documents: documents,
model: "rerank-2.5-lite",
top_k: limit
});
// Map back to original entities
return reranked.results.map(r => ({
...candidates[r.index],
rerankScore: r.relevance_score
}));
}

When to Use Reranking

ScenarioApproachRationale
Simple entity searchVector search onlyFast, good enough for single-label queries
Multi-label searchVector + rerankingEssential for comparing heterogeneous entities
Long documentsVector + rerankingCross-encoder handles context better
Domain-specificReranking with instructionsSteer relevance criteria
Latency-criticalVector only or rerank-2.5-liteBalance speed vs accuracy

Example: Multi-Label Search in Practice

Before (vector search only):

Search: "bun link"
1. Pluribus (poker AI) - 84.21% ❌ High score, wrong domain
2. Libratus (poker AI) - 82.57% ❌ High score, wrong domain
3. bun link - 82.07% ✅ Correct but buried

After (query embeddings):

1. bun link - 69.90% ✅ Correct
2. bun-link.md - 65.06% ✅ Correct
3. Bun Package Manager - 57.25% ✅ Correct

With reranking (theoretical):

1. bun link - 0.95 ✅ Cross-encoder confirms relevance
2. Bun Package Manager - 0.89 ✅ Semantic connection
3. bun-link.md - 0.87 ✅ Document with full context

Implementation Considerations

Cost vs Benefit

Voyage reranker pricing (as of 2025):

  • rerank-2.5: ~$2.00 per 1M tokens
  • rerank-2.5-lite: ~$0.50 per 1M tokens

For Lattice with ~1K queries/day:

  • Vector search only: $0 (local computation)
  • Vector + reranking (lite): ~$5-10/month
  • Worth it? Yes for production, optional for personal use

Latency Impact

StageLatency
Vector search (100 candidates)~10-50ms
Reranking (20 results)~100-300ms
Total~150-350ms

Still well within acceptable range for interactive search.

Integration with Lattice

Proposed enhancement to src/commands/query.command.ts:

@Option({
flags: "--rerank",
description: "Use Voyage reranker for improved accuracy"
})
parseRerank(value: boolean): boolean {
return value;
}
async run(inputs: string[], options: SearchCommandOptions): Promise<void> {
const query = inputs[0];
const queryEmbedding = await this.embeddingService.generateQueryEmbedding(query);
if (options.rerank) {
// Two-stage: vector + reranking
const candidates = await this.graphService.vectorSearchAll(queryEmbedding, 100);
const results = await this.rerankResults(query, candidates, options.limit);
} else {
// Fast vector search only
const results = await this.graphService.vectorSearchAll(queryEmbedding, options.limit);
}
// Display results...
}

Research Findings: Multi-Label KG Querying

Recent research (2025) on knowledge graph retrieval with reranking:

Knowledge Graph-Guided RAG

  • Multi-path subgraph construction: Incorporate one-hop, multi-hop, and importance-based relations
  • Query-aware attention: Reward models score subgraph triples by semantic relevance
  • Key insight: Embedding chunk metadata with text > powerful rerankers alone

ReranKGC Framework

  • Retrieve-and-rerank pipeline for multi-modal knowledge graph completion
  • Uses KGC-CLIP to extract multi-modal knowledge for candidate re-ranking
  • Published April 2025 in Neural Networks journal

AR-Align

  • Unsupervised multi-view contrastive learning for entity alignment
  • Attention-based reranking: Reranks hard entities by weighted similarity across different structures
  • Improves precision for ambiguous entity matching

GraphRAG Best Practices

From Neo4j Advanced RAG Techniques (2025):

  1. Knowledge graphs unify scattered data (docs, tables, APIs)
  2. GraphRAG retrieves along entity connections
  3. Improves disambiguation and multi-hop answers
  4. Keeps sources traceable

Recommendation for Lattice: Implement relationship-aware reranking that considers entity connections, not just text similarity.

Best Practices

1. Use Query Input Type + Reranking

// Best: Both optimizations
const queryEmbed = await embedding.generateQueryEmbedding(query); // input_type="query"
const candidates = await graph.vectorSearchAll(queryEmbed, 100);
const results = await voyageClient.rerank({ query, documents: candidates, top_k: 20 });

2. Batch Reranking for Efficiency

// Efficient: Rerank 100 candidates → 20 results (1 API call)
// Inefficient: Rerank each label separately (multiple API calls)

3. Cache Reranking Results

For common queries, cache reranking results to avoid redundant API calls.

const domainQuery = `Instruction: Prioritize technical documentation over blog posts. Query: ${userQuery}`;

5. Monitor Token Usage

// Estimate tokens before calling rerank
const estimatedTokens = (query.length + documents.reduce((sum, d) => sum + d.length, 0)) / 4;
if (estimatedTokens > 600000) {
// Reduce candidate count or truncate documents
}

Comparison: Rerankers (2025)

ProviderModelContextPrice/1M tokensPerformance
Voyage AIrerank-2.532K~$2.00Best (MAIR +12.7%)
Voyage AIrerank-2.5-lite32K~$0.50Good (+10.36%)
CohereRerank v3.54K~$2.00Baseline
Jina AIjina-reranker-v28K$0.70Competitive

Voyage offers the longest context and best performance in 2025.

Sources


Last updated: 2025-12-07