README

Documents

Document	Description
Voyage Reranker & Multi-Label Querying	Guide to rerank-2.5 series with 32K context, instruction-following, and multi-label knowledge graph search strategies

Quick Recommendation

Use Case	Recommended Model	Cost	Why
Best Quality (API)	Voyage voyage-3-large	$0.18/1M tokens	#1 on MTEB, 32K context
Best Value (API)	Voyage voyage-3.5-lite	$0.02/1M tokens	+7% better than OpenAI 3-small at same price
Cheapest (API)	OpenAI text-embedding-3-small (batch)	$0.01/1M tokens	Half price via batch API, good ecosystem
Free/Local	Nomic Embed v2	Free	Fully open-source, runs via Ollama
Multilingual	Cohere Embed v3	~$0.40/1M tokens	100+ languages, handles noisy data

Performance Rankings (MTEB Benchmark)

Top Commercial Models

Rank	Model	MTEB Score	Context	Notes
1	Voyage voyage-3-large	Best	32K tokens	Outperforms OpenAI by ~10%
2	NVIDIA NV-Embed-v2	72.31	-	Highest raw MTEB score
3	OpenAI text-embedding-3-large	Good	8K tokens	Industry standard
4	Mistral-embed	77.8% accuracy	-	Best accuracy in some benchmarks

Top Open-Source Models

Model	Performance	License	Local Deployment
Nomic Embed v2	Competitive with 2x larger models	Open (MIT)	Ollama, HuggingFace
BGE-M3	84.7% accuracy	MIT	HuggingFace TEI
E5-Base-v2	83-85% accuracy	MIT	Fast, no prefix needed
Jina v3	Good multilingual	CC-BY-NC-4.0	Restricted commercial use

Detailed Pricing (November 2025)

Commercial APIs

Provider	Model	Price per 1M Tokens	Dimensions	Context
OpenAI	text-embedding-3-small	0.01 (batch)	1536	8K
OpenAI	text-embedding-3-large	0.065 (batch)	3072	8K
OpenAI	text-embedding-ada-002	0.05 (batch)	1536	8K
Voyage AI	voyage-3-large	$0.18	Variable (Matryoshka)	32K
Voyage AI	voyage-3.5	$0.06	256-2048 (Matryoshka)	32K
Voyage AI	voyage-3.5-lite	$0.02	256-2048 (Matryoshka)	32K
Voyage AI	voyage-code-3	$0.18	Variable	32K
Cohere	embed-v3/v4	~$0.40	1024	512
Google	gemini-embedding-001	Free (1500 RPM limit)	3072	-

Free/Open-Source Options

Model	Hosting Cost	Hardware Required
Nomic Embed v2	Free	4GB+ RAM, CPU or GPU
BGE-M3	Free	8GB+ RAM recommended
E5-Base-v2	Free	4GB+ RAM

Key Findings

Voyage AI: Current Leader

voyage-3-large ranks #1 across 8 domains (law, finance, code, etc.)
Outperforms OpenAI-v3-large by 9.74% average
Outperforms Cohere-v3-English by 20.71% average
32K context window (vs OpenAI’s 8K, Cohere’s 512)
Supports Matryoshka learning: use smaller dimensions (256, 512, 1024) with minimal quality loss
At 1/24 the storage cost (int8 512 dims), still beats OpenAI by 8.56%

OpenAI: Best Value for Quality

text-embedding-3-small at $0.02/1M tokens is exceptional value
Batch API cuts cost in half ($0.01/1M tokens)
Well-documented, reliable, wide ecosystem support
Not the absolute best quality, but “good enough” for most use cases

Open-Source: Nomic Embed v2

First general-purpose Mixture of Experts (MoE) embedding model
Outperforms models 2x its size on multilingual benchmarks
Fully open: weights, training data, and code all available
Supports ~100 languages
Flexible dimensions (768 to 256) via Matryoshka representation
Run locally via Ollama: ollama pull nomic-embed-text

Cohere: Best for Multilingual/Noisy Data

Handles 100+ languages with consistent quality
Robust to noisy, real-world data (typos, OCR errors)
Higher price point but reliable for enterprise

Warning: MTEB Benchmark Gaming

Many open-source models appear to be fine-tuned specifically on MTEB benchmarks, producing inflated scores that don’t reflect real-world performance. Best practice is downstream evaluation - measure actual retrieval accuracy in your application.

Detailed Performance: voyage-3.5-lite vs OpenAI Models

MTEB Benchmark Scores

Model	MTEB Score	vs voyage-3.5-lite	Price/1M
voyage-3.5-lite	~66.1%	— (baseline)	$0.02
text-embedding-3-large	~62%	-6.34%	$0.13
text-embedding-3-small	62.3%	-7.58%	$0.02
text-embedding-ada-002	61.0%	~-8%	$0.10

Performance Analysis

voyage-3.5-lite advantages over OpenAI:

Outperforms text-embedding-3-large by 6.34% on average across domains
Outperforms text-embedding-3-small by ~7.58% at the same price point
Achieves retrieval quality within 0.3% of Cohere-v4 at 1/6 the cost
83% reduction in vector database costs vs OpenAI 3-large (using int8 2048-dim vs float 3072-dim)

Storage Efficiency Comparison

Model	Dimensions	Storage per Vector	Relative Size
voyage-3.5-lite (int8)	512-2048	512-2048 bytes	1x (baseline)
text-embedding-3-small	1536	6144 bytes (float32)	3-12x larger
text-embedding-3-large	3072	12288 bytes (float32)	6-24x larger

Real-World Migration Case Study (MyClone)

MyClone migrated from OpenAI text-embedding-3-small to voyage-3.5-lite:

3× storage savings (1536-dim → 512-dim via Matryoshka)
2× faster retrieval due to smaller vectors
15-20% latency reduction in voice applications
No quality loss - Matryoshka training preserves semantic signal in lower dimensions

Why voyage-3.5-lite Wins

Same price, better quality: At $0.02/1M tokens, matches OpenAI 3-small pricing but delivers 7-8% better accuracy
Flexible dimensions: Matryoshka learning allows 256/512/1024/2048 dimensions with minimal quality loss
Larger context: 32K tokens vs OpenAI’s 8K (4x improvement)
Storage optimization: int8 quantization + smaller dimensions = 83% cost reduction vs OpenAI 3-large
Free tier: 200M tokens free vs no free tier for OpenAI

When to Still Use OpenAI

Batch API pricing: $0.01/1M tokens is 50% cheaper than voyage-3.5-lite (only advantage on cost)
Ecosystem integration: Better SDK/library support in some frameworks (LangChain, LlamaIndex)
Existing infrastructure: Already using OpenAI APIs extensively
Vendor consolidation: Want all AI services from one provider
ada-002 migration: Direct drop-in replacement with 3-small (same 1536 dims)

Note: At standard pricing ($0.02/1M), there’s no cost advantage - voyage-3.5-lite wins on both price AND quality.

Cost Analysis: 1 Million Documents

Assuming 1,000 tokens average per document (1B tokens total):

Model	Cost to Embed	Annual Re-embed	Notes
Voyage voyage-3.5-lite	$20	$20	Best value + quality
OpenAI 3-small (batch)	$10	$10	Batch API pricing
OpenAI 3-small (standard)	$20	$20	Same as voyage-3.5-lite
Voyage voyage-3.5	$60	$60	Higher quality than lite
Voyage voyage-3-large	$180	$180	Top MTEB performance
Cohere embed-v3	$400	$400	Best multilingual
Nomic (local)	$0 (+ compute)	$0	Privacy-first option

Note: Voyage AI offers 200M free tokens for new accounts, reducing initial costs.

Recommendations by Scenario

For @research/graph (Our Use Case)

Option 1: Voyage voyage-3.5-lite (Recommended)

$0.02/1M tokens - same price as OpenAI 3-small
7-8% better accuracy than OpenAI at same cost
32K context window handles full documents
200M free tokens for new accounts
Flexible dimensions (256-2048) for storage optimization

Option 2: OpenAI text-embedding-3-small (If Already Using OpenAI)

Batch API: $0.01/1M tokens (50% cheaper than voyage-3.5-lite)
Well-supported in TypeScript/Node.js ecosystem
Easy API integration, wide library support
Only choose over Voyage if: using batch API OR already on OpenAI infrastructure

Option 3: Nomic Embed via Ollama (Cost-Free)

Zero API costs
Privacy: data never leaves your machine
Slightly more complex setup
May need GPU for speed

Option 4: Voyage voyage-3-large (Best Quality)

If semantic search quality is critical
32K context handles full documents
Higher cost ($0.18/1M) justified for production RAG

For Production RAG Systems

Best overall value: Voyage voyage-3.5-lite ($0.02/1M, best accuracy at this price)
Absolute cheapest API: OpenAI batch API ($0.01/1M) - 50% cheaper but 7% less accurate
Cost-free: Nomic local via Ollama or HuggingFace TEI
Quality-critical: Voyage voyage-3-large ($0.18/1M, top MTEB scores)
Multilingual enterprise: Cohere embed-v3 (~$0.40/1M, 100+ languages)
Air-gapped/private: Nomic, BGE, or E5 via Hugging Face TEI

Bottom line: Don’t use OpenAI standard pricing - either use batch API (cheaper) or switch to Voyage (better).

Running Local Embeddings

Ollama (Simplest)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull embedding model
ollama pull nomic-embed-text

# Generate embeddings
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "Hello world"}'

Hugging Face Text Embeddings Inference (Production)

docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id BAAI/bge-base-en-v1.5

Voyage AI: Query vs Document Input Types

Voyage AI’s embedding API includes an input_type parameter that significantly impacts search quality for asymmetric retrieval tasks.

The `input_type` Parameter

Value	Prepended Prompt	Use Case
`None` (default)	(none)	Direct vector conversion
`"query"`	”Represent the query for retrieving supporting documents: “	Search queries
`"document"`	”Represent the document for retrieval: “	Documents being indexed

Why It Matters

For asymmetric retrieval (short queries matching longer documents):

Documents should be embedded with input_type: "document"
Search queries should be embedded with input_type: "query"

Using "document" for both produces embeddings in different semantic spaces, causing poor search results.

Impact on Search Quality

Before (using “document” for queries):

Search: "bun link"
1. Pluribus (poker AI) - 84.21%  ❌ Irrelevant
2. Libratus (poker AI) - 82.57%  ❌ Irrelevant
3. bun link - 82.07%

After (using “query” for queries):

Search: "bun link"
1. bun link - 69.90%  ✅ Correct
2. bun-link.md - 65.06%  ✅ Correct
3. Bun Package Manager - 57.25%  ✅ Correct

The absolute similarity scores are lower, but the ranking is correct - which is what matters for search.

Compatibility

Voyage states that “embeddings generated with and without the input_type argument are compatible” - they can be compared via cosine similarity even when generated with different input types. However, using the appropriate type for each use case produces better retrieval results.

Implementation

// For storing documents
const docEmbedding = await voyageAPI.embed(text, { input_type: "document" });

// For search queries
const queryEmbedding = await voyageAPI.embed(query, { input_type: "query" });

Sources

Last updated: 2025-12-07