README
Documents
| Document | Description |
|---|---|
| Voyage Reranker & Multi-Label Querying | Guide to rerank-2.5 series with 32K context, instruction-following, and multi-label knowledge graph search strategies |
Quick Recommendation
| Use Case | Recommended Model | Cost | Why |
|---|---|---|---|
| Best Quality (API) | Voyage voyage-3-large | $0.18/1M tokens | #1 on MTEB, 32K context |
| Best Value (API) | Voyage voyage-3.5-lite | $0.02/1M tokens | +7% better than OpenAI 3-small at same price |
| Cheapest (API) | OpenAI text-embedding-3-small (batch) | $0.01/1M tokens | Half price via batch API, good ecosystem |
| Free/Local | Nomic Embed v2 | Free | Fully open-source, runs via Ollama |
| Multilingual | Cohere Embed v3 | ~$0.40/1M tokens | 100+ languages, handles noisy data |
Performance Rankings (MTEB Benchmark)
Top Commercial Models
| Rank | Model | MTEB Score | Context | Notes |
|---|---|---|---|---|
| 1 | Voyage voyage-3-large | Best | 32K tokens | Outperforms OpenAI by ~10% |
| 2 | NVIDIA NV-Embed-v2 | 72.31 | - | Highest raw MTEB score |
| 3 | OpenAI text-embedding-3-large | Good | 8K tokens | Industry standard |
| 4 | Mistral-embed | 77.8% accuracy | - | Best accuracy in some benchmarks |
Top Open-Source Models
| Model | Performance | License | Local Deployment |
|---|---|---|---|
| Nomic Embed v2 | Competitive with 2x larger models | Open (MIT) | Ollama, HuggingFace |
| BGE-M3 | 84.7% accuracy | MIT | HuggingFace TEI |
| E5-Base-v2 | 83-85% accuracy | MIT | Fast, no prefix needed |
| Jina v3 | Good multilingual | CC-BY-NC-4.0 | Restricted commercial use |
Detailed Pricing (November 2025)
Commercial APIs
| Provider | Model | Price per 1M Tokens | Dimensions | Context |
|---|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | 8K | |
| OpenAI | text-embedding-3-large | 3072 | 8K | |
| OpenAI | text-embedding-ada-002 | 1536 | 8K | |
| Voyage AI | voyage-3-large | $0.18 | Variable (Matryoshka) | 32K |
| Voyage AI | voyage-3.5 | $0.06 | 256-2048 (Matryoshka) | 32K |
| Voyage AI | voyage-3.5-lite | $0.02 | 256-2048 (Matryoshka) | 32K |
| Voyage AI | voyage-code-3 | $0.18 | Variable | 32K |
| Cohere | embed-v3/v4 | ~$0.40 | 1024 | 512 |
| gemini-embedding-001 | Free (1500 RPM limit) | 3072 | - |
Free/Open-Source Options
| Model | Hosting Cost | Hardware Required |
|---|---|---|
| Nomic Embed v2 | Free | 4GB+ RAM, CPU or GPU |
| BGE-M3 | Free | 8GB+ RAM recommended |
| E5-Base-v2 | Free | 4GB+ RAM |
Key Findings
Voyage AI: Current Leader
- voyage-3-large ranks #1 across 8 domains (law, finance, code, etc.)
- Outperforms OpenAI-v3-large by 9.74% average
- Outperforms Cohere-v3-English by 20.71% average
- 32K context window (vs OpenAI’s 8K, Cohere’s 512)
- Supports Matryoshka learning: use smaller dimensions (256, 512, 1024) with minimal quality loss
- At 1/24 the storage cost (int8 512 dims), still beats OpenAI by 8.56%
OpenAI: Best Value for Quality
- text-embedding-3-small at $0.02/1M tokens is exceptional value
- Batch API cuts cost in half ($0.01/1M tokens)
- Well-documented, reliable, wide ecosystem support
- Not the absolute best quality, but “good enough” for most use cases
Open-Source: Nomic Embed v2
- First general-purpose Mixture of Experts (MoE) embedding model
- Outperforms models 2x its size on multilingual benchmarks
- Fully open: weights, training data, and code all available
- Supports ~100 languages
- Flexible dimensions (768 to 256) via Matryoshka representation
- Run locally via Ollama:
ollama pull nomic-embed-text
Cohere: Best for Multilingual/Noisy Data
- Handles 100+ languages with consistent quality
- Robust to noisy, real-world data (typos, OCR errors)
- Higher price point but reliable for enterprise
Warning: MTEB Benchmark Gaming
Many open-source models appear to be fine-tuned specifically on MTEB benchmarks, producing inflated scores that don’t reflect real-world performance. Best practice is downstream evaluation - measure actual retrieval accuracy in your application.
Detailed Performance: voyage-3.5-lite vs OpenAI Models
MTEB Benchmark Scores
| Model | MTEB Score | vs voyage-3.5-lite | Price/1M |
|---|---|---|---|
| voyage-3.5-lite | ~66.1% | — (baseline) | $0.02 |
| text-embedding-3-large | ~62% | -6.34% | $0.13 |
| text-embedding-3-small | 62.3% | -7.58% | $0.02 |
| text-embedding-ada-002 | 61.0% | ~-8% | $0.10 |
Performance Analysis
voyage-3.5-lite advantages over OpenAI:
- Outperforms text-embedding-3-large by 6.34% on average across domains
- Outperforms text-embedding-3-small by ~7.58% at the same price point
- Achieves retrieval quality within 0.3% of Cohere-v4 at 1/6 the cost
- 83% reduction in vector database costs vs OpenAI 3-large (using int8 2048-dim vs float 3072-dim)
Storage Efficiency Comparison
| Model | Dimensions | Storage per Vector | Relative Size |
|---|---|---|---|
| voyage-3.5-lite (int8) | 512-2048 | 512-2048 bytes | 1x (baseline) |
| text-embedding-3-small | 1536 | 6144 bytes (float32) | 3-12x larger |
| text-embedding-3-large | 3072 | 12288 bytes (float32) | 6-24x larger |
Real-World Migration Case Study (MyClone)
MyClone migrated from OpenAI text-embedding-3-small to voyage-3.5-lite:
- 3× storage savings (1536-dim → 512-dim via Matryoshka)
- 2× faster retrieval due to smaller vectors
- 15-20% latency reduction in voice applications
- No quality loss - Matryoshka training preserves semantic signal in lower dimensions
Why voyage-3.5-lite Wins
- Same price, better quality: At $0.02/1M tokens, matches OpenAI 3-small pricing but delivers 7-8% better accuracy
- Flexible dimensions: Matryoshka learning allows 256/512/1024/2048 dimensions with minimal quality loss
- Larger context: 32K tokens vs OpenAI’s 8K (4x improvement)
- Storage optimization: int8 quantization + smaller dimensions = 83% cost reduction vs OpenAI 3-large
- Free tier: 200M tokens free vs no free tier for OpenAI
When to Still Use OpenAI
- Batch API pricing: $0.01/1M tokens is 50% cheaper than voyage-3.5-lite (only advantage on cost)
- Ecosystem integration: Better SDK/library support in some frameworks (LangChain, LlamaIndex)
- Existing infrastructure: Already using OpenAI APIs extensively
- Vendor consolidation: Want all AI services from one provider
- ada-002 migration: Direct drop-in replacement with 3-small (same 1536 dims)
Note: At standard pricing ($0.02/1M), there’s no cost advantage - voyage-3.5-lite wins on both price AND quality.
Cost Analysis: 1 Million Documents
Assuming 1,000 tokens average per document (1B tokens total):
| Model | Cost to Embed | Annual Re-embed | Notes |
|---|---|---|---|
| Voyage voyage-3.5-lite | $20 | $20 | Best value + quality |
| OpenAI 3-small (batch) | $10 | $10 | Batch API pricing |
| OpenAI 3-small (standard) | $20 | $20 | Same as voyage-3.5-lite |
| Voyage voyage-3.5 | $60 | $60 | Higher quality than lite |
| Voyage voyage-3-large | $180 | $180 | Top MTEB performance |
| Cohere embed-v3 | $400 | $400 | Best multilingual |
| Nomic (local) | $0 (+ compute) | $0 | Privacy-first option |
Note: Voyage AI offers 200M free tokens for new accounts, reducing initial costs.
Recommendations by Scenario
For @research/graph (Our Use Case)
Option 1: Voyage voyage-3.5-lite (Recommended)
- $0.02/1M tokens - same price as OpenAI 3-small
- 7-8% better accuracy than OpenAI at same cost
- 32K context window handles full documents
- 200M free tokens for new accounts
- Flexible dimensions (256-2048) for storage optimization
Option 2: OpenAI text-embedding-3-small (If Already Using OpenAI)
- Batch API: $0.01/1M tokens (50% cheaper than voyage-3.5-lite)
- Well-supported in TypeScript/Node.js ecosystem
- Easy API integration, wide library support
- Only choose over Voyage if: using batch API OR already on OpenAI infrastructure
Option 3: Nomic Embed via Ollama (Cost-Free)
- Zero API costs
- Privacy: data never leaves your machine
- Slightly more complex setup
- May need GPU for speed
Option 4: Voyage voyage-3-large (Best Quality)
- If semantic search quality is critical
- 32K context handles full documents
- Higher cost ($0.18/1M) justified for production RAG
For Production RAG Systems
- Best overall value: Voyage voyage-3.5-lite ($0.02/1M, best accuracy at this price)
- Absolute cheapest API: OpenAI batch API ($0.01/1M) - 50% cheaper but 7% less accurate
- Cost-free: Nomic local via Ollama or HuggingFace TEI
- Quality-critical: Voyage voyage-3-large ($0.18/1M, top MTEB scores)
- Multilingual enterprise: Cohere embed-v3 (~$0.40/1M, 100+ languages)
- Air-gapped/private: Nomic, BGE, or E5 via Hugging Face TEI
Bottom line: Don’t use OpenAI standard pricing - either use batch API (cheaper) or switch to Voyage (better).
Running Local Embeddings
Ollama (Simplest)
# Install Ollamacurl -fsSL https://ollama.ai/install.sh | sh
# Pull embedding modelollama pull nomic-embed-text
# Generate embeddingscurl http://localhost:11434/api/embeddings \ -d '{"model": "nomic-embed-text", "prompt": "Hello world"}'Hugging Face Text Embeddings Inference (Production)
docker run --gpus all -p 8080:80 \ ghcr.io/huggingface/text-embeddings-inference:latest \ --model-id BAAI/bge-base-en-v1.5Voyage AI: Query vs Document Input Types
Voyage AI’s embedding API includes an input_type parameter that significantly impacts search quality for asymmetric retrieval tasks.
The input_type Parameter
| Value | Prepended Prompt | Use Case |
|---|---|---|
None (default) | (none) | Direct vector conversion |
"query" | ”Represent the query for retrieving supporting documents: “ | Search queries |
"document" | ”Represent the document for retrieval: “ | Documents being indexed |
Why It Matters
For asymmetric retrieval (short queries matching longer documents):
- Documents should be embedded with
input_type: "document" - Search queries should be embedded with
input_type: "query"
Using "document" for both produces embeddings in different semantic spaces, causing poor search results.
Impact on Search Quality
Before (using “document” for queries):
Search: "bun link"1. Pluribus (poker AI) - 84.21% ❌ Irrelevant2. Libratus (poker AI) - 82.57% ❌ Irrelevant3. bun link - 82.07%After (using “query” for queries):
Search: "bun link"1. bun link - 69.90% ✅ Correct2. bun-link.md - 65.06% ✅ Correct3. Bun Package Manager - 57.25% ✅ CorrectThe absolute similarity scores are lower, but the ranking is correct - which is what matters for search.
Compatibility
Voyage states that “embeddings generated with and without the input_type argument are compatible” - they can be compared via cosine similarity even when generated with different input types. However, using the appropriate type for each use case produces better retrieval results.
Implementation
// For storing documentsconst docEmbedding = await voyageAPI.embed(text, { input_type: "document" });
// For search queriesconst queryEmbedding = await voyageAPI.embed(query, { input_type: "query" });Sources
- Voyage AI: Text Embeddings Documentation
- Voyage AI: voyage-3.5 and voyage-3.5-lite Announcement
- Voyage AI: voyage-3-large Announcement
- Voyage AI: Pricing
- OpenAI: New Embedding Models and API Updates
- OpenAI Pricing
- MongoDB: Introducing voyage-3.5 and voyage-3.5-lite
- MyClone: How We Cut RAG Latency by 50% with Voyage 3.5 Lite
- Elephas: 13 Best Embedding Models in 2025
- Pinecone: Choosing an Embedding Model
- Document360: Text Embedding Models Compared
- Nomic AI: Introducing Nomic Embed
- BentoML: Guide to Open-Source Embedding Models
- Supermemory: Best Open-Source Embedding Models Benchmarked
- Hugging Face: Text Embeddings Inference
Last updated: 2025-12-07