Google RAG File Search - How It Works
Google offers two RAG file search solutions: Gemini API File Search (developer-friendly) and Vertex AI RAG Engine (enterprise-grade).
Gemini API File Search Tool
Announced November 7, 2025, this is a fully managed RAG system built directly into the Gemini API.
Architecture Overview
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌────────────┐│ Upload │───▶│ Chunking │───▶│ Embedding │───▶│ Storage ││ Files │ │ Strategy │ │ gemini-001-emb │ │ (3x size) │└─────────────┘ └──────────────┘ └─────────────────┘ └────────────┘ │┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ││ Response │◀───│ Generate │◀───│ Retrieve │◀──────────┘│ + Citations │ │ (Gemini) │ │ (Vector Search)│└─────────────┘ └──────────────┘ └─────────────────┘Step-by-Step Process
1. Data Ingestion
- Upload files to a File Search Store (corpus)
- Supported formats: PDF, DOCX, XLSX, PPTX, JSON, XML, TXT, Markdown, HTML, 100+ code file types
- Max file size: 100 MB per document
- Storage recommendation: Keep stores under 20 GB for optimal latency
2. Chunking
Documents are automatically broken into smaller pieces:
# Customizable chunking configchunking_config = { "max_tokens_per_chunk": 200, # Size of each chunk "max_overlap_tokens": 20 # Overlap between chunks}- Why chunking? LLMs have context limits; smaller chunks enable precise retrieval
- Overlap ensures context isn’t lost at chunk boundaries
- Default strategy is optimized, but customizable for specific use cases
3. Embedding Generation
Each chunk is converted to a 3072-dimensional vector using:
- Model:
gemini-embedding-001 - Performance: Leads the MTEB (Massive Text Embedding Benchmark) Multilingual leaderboard
- Embeddings capture semantic meaning, not just keywords
4. Indexing & Storage
- Embeddings stored in optimized vector database
- Storage uses ~3x original file size (embeddings + metadata)
- Supports metadata filtering (key-value pairs for selective search)
5. Retrieval (at Query Time)
When a user asks a question:
- Query → Embedding: User’s question converted to vector using same embedding model
- Similarity Search: System finds chunks with most similar embeddings (cosine similarity)
- Semantic matching: Finds relevant info even with different wording than source
Query: "What are the quarterly earnings?"Matches: "Q3 revenue was $2.5B..." (even though "earnings" ≠ "revenue")6. Generation
- Retrieved chunks injected as context into the prompt
- Gemini generates grounded response
- Built-in citations point to specific document sections used
Configuration Options
| Parameter | Description | Example |
|---|---|---|
max_tokens_per_chunk | Chunk size | 200-1000 tokens |
max_overlap_tokens | Overlap between chunks | 20-100 tokens |
file_search_store_names | Which stores to search | [“legal-docs”, “policies”] |
| Metadata filters | Filter by key-value pairs | category: "finance" |
Pricing (Gemini API)
| Component | Cost |
|---|---|
| Storage | Free |
| Query-time embedding | Free |
| Initial indexing | $0.15 / 1M tokens |
Limitations
- Cannot choose/tune embedding models
- Limited chunking strategies (no custom parsers)
- Cannot inspect embeddings or similarity scores
- Cannot customize ranking/reranking
- No fine-grained control over retrieval algorithm
Vertex AI RAG Engine (Enterprise)
For production enterprise workloads with more control.
Key Differences from Gemini File Search
| Feature | Gemini File Search | Vertex AI RAG Engine |
|---|---|---|
| Target | Developers | Enterprise |
| Vector DB | Managed (hidden) | Choose: Pinecone, Weaviate, Vertex AI Vector Search |
| Embedding Model | Fixed (gemini-embedding-001) | Configurable |
| LLM | Gemini only | Gemini, Llama, Claude, etc. |
| Data Sources | File upload | GCS, Drive, databases |
| Security | Basic | VPC-SC, CMEK |
Architecture Components
- Data Ingestion: Local files, Cloud Storage, Google Drive
- Data Transformation: Customizable chunking (size, overlap, strategy)
- Embedding: Multiple model options
- Indexing: Creates a “corpus” optimized for search
- Retrieval: Configurable backends (Vertex AI Search, Pinecone, etc.)
- Generation: Choose from 100+ LLMs in Model Garden
Vertex AI Search Integration
For high-volume applications:
- Handles large data volumes with low latency
- Improved performance and scalability
- Native integration with RAG Engine
When to Use Which
| Use Case | Recommendation |
|---|---|
| Quick prototyping | Gemini File Search |
| Simple Q&A over docs | Gemini File Search |
| Custom embedding models needed | Vertex AI RAG Engine |
| Existing vector DB (Pinecone, etc.) | Vertex AI RAG Engine |
| Enterprise security requirements | Vertex AI RAG Engine |
| Need ranking control | Vertex AI RAG Engine |
Graph RAG Support
Does Google File Search Support Graph RAG?
No. Gemini File Search is purely vector-based RAG. It does not build or traverse knowledge graphs.
Google’s GraphRAG Solution: Spanner Graph
Google provides a separate reference architecture for GraphRAG using Spanner Graph + Vertex AI.
Architecture
┌─────────────────────────────────────────────────────────────────┐│ DATA INGESTION │├─────────────────────────────────────────────────────────────────┤│ Cloud Storage → Pub/Sub → Cloud Run → LLMGraphTransformer ││ ↓ ││ Extract entities & ││ relationships ││ ↓ ││ Spanner Graph ││ (graph + embeddings) │└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐│ QUERY SERVING │├─────────────────────────────────────────────────────────────────┤│ User Query → Embedding → Vector Search → Graph Traversal ││ ↓ ││ Retrieve connected ││ entities & relationships ││ ↓ ││ Vertex AI Ranking API ││ ↓ ││ Gemini summarizes │└─────────────────────────────────────────────────────────────────┘Key Components
- LLMGraphTransformer (LangChain): Extracts entities and relationships from text using Gemini
- Spanner Graph: Stores both knowledge graph and vector embeddings
- Hybrid Retrieval: Combines vector similarity + graph traversal
- Vertex AI Ranking API: Filters results by semantic relevance
Vector RAG vs GraphRAG
| Aspect | Vector RAG | GraphRAG |
|---|---|---|
| Data Model | Flat chunks + embeddings | Knowledge graph + embeddings |
| Retrieval | Cosine similarity only | Vector search + graph traversal |
| Strengths | Simple, fast, scalable | Captures relationships, better for connected data |
| Use Cases | Q&A, document search | Complex reasoning, multi-hop queries |
| Google Product | Gemini File Search | Spanner Graph + Vertex AI |
When to Use GraphRAG
- Data has rich relationships (org charts, supply chains, research papers)
- Queries require multi-hop reasoning (“Who manages the person who wrote X?”)
- Need to understand entity connections across documents
- Standard RAG returns isolated facts without context