Documents

DocumentDescription
Voyage Reranker & Multi-Label QueryingGuide to rerank-2.5 series with 32K context, instruction-following, and multi-label knowledge graph search strategies

Quick Recommendation

Use CaseRecommended ModelCostWhy
Best Quality (API)Voyage voyage-3-large$0.18/1M tokens on MTEB, 32K context
Best Value (API)Voyage voyage-3.5-lite$0.02/1M tokens+7% better than OpenAI 3-small at same price
Cheapest (API)OpenAI text-embedding-3-small (batch)$0.01/1M tokensHalf price via batch API, good ecosystem
Free/LocalNomic Embed v2FreeFully open-source, runs via Ollama
MultilingualCohere Embed v3~$0.40/1M tokens100+ languages, handles noisy data

Performance Rankings (MTEB Benchmark)

Top Commercial Models

RankModelMTEB ScoreContextNotes
1Voyage voyage-3-largeBest32K tokensOutperforms OpenAI by ~10%
2NVIDIA NV-Embed-v272.31-Highest raw MTEB score
3OpenAI text-embedding-3-largeGood8K tokensIndustry standard
4Mistral-embed77.8% accuracy-Best accuracy in some benchmarks

Top Open-Source Models

ModelPerformanceLicenseLocal Deployment
Nomic Embed v2Competitive with 2x larger modelsOpen (MIT)Ollama, HuggingFace
BGE-M384.7% accuracyMITHuggingFace TEI
E5-Base-v283-85% accuracyMITFast, no prefix needed
Jina v3Good multilingualCC-BY-NC-4.0Restricted commercial use

Detailed Pricing (November 2025)

Commercial APIs

ProviderModelPrice per 1M TokensDimensionsContext
OpenAItext-embedding-3-small0.01 (batch)15368K
OpenAItext-embedding-3-large0.065 (batch)30728K
OpenAItext-embedding-ada-0020.05 (batch)15368K
Voyage AIvoyage-3-large$0.18Variable (Matryoshka)32K
Voyage AIvoyage-3.5$0.06256-2048 (Matryoshka)32K
Voyage AIvoyage-3.5-lite$0.02256-2048 (Matryoshka)32K
Voyage AIvoyage-code-3$0.18Variable32K
Cohereembed-v3/v4~$0.401024512
Googlegemini-embedding-001Free (1500 RPM limit)3072-

Free/Open-Source Options

ModelHosting CostHardware Required
Nomic Embed v2Free4GB+ RAM, CPU or GPU
BGE-M3Free8GB+ RAM recommended
E5-Base-v2Free4GB+ RAM

Key Findings

Voyage AI: Current Leader

  • voyage-3-large ranks across 8 domains (law, finance, code, etc.)
  • Outperforms OpenAI-v3-large by 9.74% average
  • Outperforms Cohere-v3-English by 20.71% average
  • 32K context window (vs OpenAI’s 8K, Cohere’s 512)
  • Supports Matryoshka learning: use smaller dimensions (256, 512, 1024) with minimal quality loss
  • At 1/24 the storage cost (int8 512 dims), still beats OpenAI by 8.56%

OpenAI: Best Value for Quality

  • text-embedding-3-small at $0.02/1M tokens is exceptional value
  • Batch API cuts cost in half ($0.01/1M tokens)
  • Well-documented, reliable, wide ecosystem support
  • Not the absolute best quality, but “good enough” for most use cases

Open-Source: Nomic Embed v2

  • First general-purpose Mixture of Experts (MoE) embedding model
  • Outperforms models 2x its size on multilingual benchmarks
  • Fully open: weights, training data, and code all available
  • Supports ~100 languages
  • Flexible dimensions (768 to 256) via Matryoshka representation
  • Run locally via Ollama: ollama pull nomic-embed-text

Cohere: Best for Multilingual/Noisy Data

  • Handles 100+ languages with consistent quality
  • Robust to noisy, real-world data (typos, OCR errors)
  • Higher price point but reliable for enterprise

Warning: MTEB Benchmark Gaming

Many open-source models appear to be fine-tuned specifically on MTEB benchmarks, producing inflated scores that don’t reflect real-world performance. Best practice is downstream evaluation - measure actual retrieval accuracy in your application.

Detailed Performance: voyage-3.5-lite vs OpenAI Models

MTEB Benchmark Scores

ModelMTEB Scorevs voyage-3.5-litePrice/1M
voyage-3.5-lite~66.1%— (baseline)$0.02
text-embedding-3-large~62%-6.34%$0.13
text-embedding-3-small62.3%-7.58%$0.02
text-embedding-ada-00261.0%~-8%$0.10

Performance Analysis

voyage-3.5-lite advantages over OpenAI:

  • Outperforms text-embedding-3-large by 6.34% on average across domains
  • Outperforms text-embedding-3-small by ~7.58% at the same price point
  • Achieves retrieval quality within 0.3% of Cohere-v4 at 1/6 the cost
  • 83% reduction in vector database costs vs OpenAI 3-large (using int8 2048-dim vs float 3072-dim)

Storage Efficiency Comparison

ModelDimensionsStorage per VectorRelative Size
voyage-3.5-lite (int8)512-2048512-2048 bytes1x (baseline)
text-embedding-3-small15366144 bytes (float32)3-12x larger
text-embedding-3-large307212288 bytes (float32)6-24x larger

Real-World Migration Case Study (MyClone)

MyClone migrated from OpenAI text-embedding-3-small to voyage-3.5-lite:

  • 3× storage savings (1536-dim → 512-dim via Matryoshka)
  • 2× faster retrieval due to smaller vectors
  • 15-20% latency reduction in voice applications
  • No quality loss - Matryoshka training preserves semantic signal in lower dimensions

Why voyage-3.5-lite Wins

  1. Same price, better quality: At $0.02/1M tokens, matches OpenAI 3-small pricing but delivers 7-8% better accuracy
  2. Flexible dimensions: Matryoshka learning allows 256/512/1024/2048 dimensions with minimal quality loss
  3. Larger context: 32K tokens vs OpenAI’s 8K (4x improvement)
  4. Storage optimization: int8 quantization + smaller dimensions = 83% cost reduction vs OpenAI 3-large
  5. Free tier: 200M tokens free vs no free tier for OpenAI

When to Still Use OpenAI

  • Batch API pricing: $0.01/1M tokens is 50% cheaper than voyage-3.5-lite (only advantage on cost)
  • Ecosystem integration: Better SDK/library support in some frameworks (LangChain, LlamaIndex)
  • Existing infrastructure: Already using OpenAI APIs extensively
  • Vendor consolidation: Want all AI services from one provider
  • ada-002 migration: Direct drop-in replacement with 3-small (same 1536 dims)

Note: At standard pricing ($0.02/1M), there’s no cost advantage - voyage-3.5-lite wins on both price AND quality.

Cost Analysis: 1 Million Documents

Assuming 1,000 tokens average per document (1B tokens total):

ModelCost to EmbedAnnual Re-embedNotes
Voyage voyage-3.5-lite$20$20Best value + quality
OpenAI 3-small (batch)$10$10Batch API pricing
OpenAI 3-small (standard)$20$20Same as voyage-3.5-lite
Voyage voyage-3.5$60$60Higher quality than lite
Voyage voyage-3-large$180$180Top MTEB performance
Cohere embed-v3$400$400Best multilingual
Nomic (local)$0 (+ compute)$0Privacy-first option

Note: Voyage AI offers 200M free tokens for new accounts, reducing initial costs.

Recommendations by Scenario

For @research/graph (Our Use Case)

Option 1: Voyage voyage-3.5-lite (Recommended)

  • $0.02/1M tokens - same price as OpenAI 3-small
  • 7-8% better accuracy than OpenAI at same cost
  • 32K context window handles full documents
  • 200M free tokens for new accounts
  • Flexible dimensions (256-2048) for storage optimization

Option 2: OpenAI text-embedding-3-small (If Already Using OpenAI)

  • Batch API: $0.01/1M tokens (50% cheaper than voyage-3.5-lite)
  • Well-supported in TypeScript/Node.js ecosystem
  • Easy API integration, wide library support
  • Only choose over Voyage if: using batch API OR already on OpenAI infrastructure

Option 3: Nomic Embed via Ollama (Cost-Free)

  • Zero API costs
  • Privacy: data never leaves your machine
  • Slightly more complex setup
  • May need GPU for speed

Option 4: Voyage voyage-3-large (Best Quality)

  • If semantic search quality is critical
  • 32K context handles full documents
  • Higher cost ($0.18/1M) justified for production RAG

For Production RAG Systems

  1. Best overall value: Voyage voyage-3.5-lite ($0.02/1M, best accuracy at this price)
  2. Absolute cheapest API: OpenAI batch API ($0.01/1M) - 50% cheaper but 7% less accurate
  3. Cost-free: Nomic local via Ollama or HuggingFace TEI
  4. Quality-critical: Voyage voyage-3-large ($0.18/1M, top MTEB scores)
  5. Multilingual enterprise: Cohere embed-v3 (~$0.40/1M, 100+ languages)
  6. Air-gapped/private: Nomic, BGE, or E5 via Hugging Face TEI

Bottom line: Don’t use OpenAI standard pricing - either use batch API (cheaper) or switch to Voyage (better).

Running Local Embeddings

Ollama (Simplest)

Terminal window
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull embedding model
ollama pull nomic-embed-text
# Generate embeddings
curl http://localhost:11434/api/embeddings \
-d '{"model": "nomic-embed-text", "prompt": "Hello world"}'

Hugging Face Text Embeddings Inference (Production)

Terminal window
docker run --gpus all -p 8080:80 \
ghcr.io/huggingface/text-embeddings-inference:latest \
--model-id BAAI/bge-base-en-v1.5

Voyage AI: Query vs Document Input Types

Voyage AI’s embedding API includes an input_type parameter that significantly impacts search quality for asymmetric retrieval tasks.

The input_type Parameter

ValuePrepended PromptUse Case
None (default)(none)Direct vector conversion
"query"”Represent the query for retrieving supporting documents: “Search queries
"document"”Represent the document for retrieval: “Documents being indexed

Why It Matters

For asymmetric retrieval (short queries matching longer documents):

  • Documents should be embedded with input_type: "document"
  • Search queries should be embedded with input_type: "query"

Using "document" for both produces embeddings in different semantic spaces, causing poor search results.

Impact on Search Quality

Before (using “document” for queries):

Search: "bun link"
1. Pluribus (poker AI) - 84.21% ❌ Irrelevant
2. Libratus (poker AI) - 82.57% ❌ Irrelevant
3. bun link - 82.07%

After (using “query” for queries):

Search: "bun link"
1. bun link - 69.90% ✅ Correct
2. bun-link.md - 65.06% ✅ Correct
3. Bun Package Manager - 57.25% ✅ Correct

The absolute similarity scores are lower, but the ranking is correct - which is what matters for search.

Compatibility

Voyage states that “embeddings generated with and without the input_type argument are compatible” - they can be compared via cosine similarity even when generated with different input types. However, using the appropriate type for each use case produces better retrieval results.

Implementation

// For storing documents
const docEmbedding = await voyageAPI.embed(text, { input_type: "document" });
// For search queries
const queryEmbedding = await voyageAPI.embed(query, { input_type: "query" });

Sources


Last updated: 2025-12-07