Google RAG File Search - How It Works

Google offers two RAG file search solutions: Gemini API File Search (developer-friendly) and Vertex AI RAG Engine (enterprise-grade).

Gemini API File Search Tool

Announced November 7, 2025, this is a fully managed RAG system built directly into the Gemini API.

Architecture Overview

┌─────────────┐    ┌──────────────┐    ┌─────────────────┐    ┌────────────┐
│   Upload    │───▶│   Chunking   │───▶│   Embedding     │───▶│  Storage   │
│   Files     │    │   Strategy   │    │ gemini-001-emb  │    │ (3x size)  │
└─────────────┘    └──────────────┘    └─────────────────┘    └────────────┘
                                                                     │
┌─────────────┐    ┌──────────────┐    ┌─────────────────┐           │
│  Response   │◀───│   Generate   │◀───│    Retrieve     │◀──────────┘
│ + Citations │    │   (Gemini)   │    │  (Vector Search)│
└─────────────┘    └──────────────┘    └─────────────────┘

Step-by-Step Process

1. Data Ingestion

Upload files to a File Search Store (corpus)
Supported formats: PDF, DOCX, XLSX, PPTX, JSON, XML, TXT, Markdown, HTML, 100+ code file types
Max file size: 100 MB per document
Storage recommendation: Keep stores under 20 GB for optimal latency

2. Chunking

Documents are automatically broken into smaller pieces:

# Customizable chunking config
chunking_config = {
    "max_tokens_per_chunk": 200,   # Size of each chunk
    "max_overlap_tokens": 20       # Overlap between chunks
}

Why chunking? LLMs have context limits; smaller chunks enable precise retrieval
Overlap ensures context isn’t lost at chunk boundaries
Default strategy is optimized, but customizable for specific use cases

3. Embedding Generation

Each chunk is converted to a 3072-dimensional vector using:

Model: gemini-embedding-001
Performance: Leads the MTEB (Massive Text Embedding Benchmark) Multilingual leaderboard
Embeddings capture semantic meaning, not just keywords

4. Indexing & Storage

Embeddings stored in optimized vector database
Storage uses ~3x original file size (embeddings + metadata)
Supports metadata filtering (key-value pairs for selective search)

5. Retrieval (at Query Time)

When a user asks a question:

Query → Embedding: User’s question converted to vector using same embedding model
Similarity Search: System finds chunks with most similar embeddings (cosine similarity)
Semantic matching: Finds relevant info even with different wording than source

Query: "What are the quarterly earnings?"
Matches: "Q3 revenue was $2.5B..." (even though "earnings" ≠ "revenue")

6. Generation

Retrieved chunks injected as context into the prompt
Gemini generates grounded response
Built-in citations point to specific document sections used

Configuration Options

Parameter	Description	Example
`max_tokens_per_chunk`	Chunk size	200-1000 tokens
`max_overlap_tokens`	Overlap between chunks	20-100 tokens
`file_search_store_names`	Which stores to search	[“legal-docs”, “policies”]
Metadata filters	Filter by key-value pairs	`category: "finance"`

Pricing (Gemini API)

Component	Cost
Storage	Free
Query-time embedding	Free
Initial indexing	$0.15 / 1M tokens

Limitations

Cannot choose/tune embedding models
Limited chunking strategies (no custom parsers)
Cannot inspect embeddings or similarity scores
Cannot customize ranking/reranking
No fine-grained control over retrieval algorithm

Vertex AI RAG Engine (Enterprise)

For production enterprise workloads with more control.

Key Differences from Gemini File Search

Feature	Gemini File Search	Vertex AI RAG Engine
Target	Developers	Enterprise
Vector DB	Managed (hidden)	Choose: Pinecone, Weaviate, Vertex AI Vector Search
Embedding Model	Fixed (gemini-embedding-001)	Configurable
LLM	Gemini only	Gemini, Llama, Claude, etc.
Data Sources	File upload	GCS, Drive, databases
Security	Basic	VPC-SC, CMEK

Architecture Components

Data Ingestion: Local files, Cloud Storage, Google Drive
Data Transformation: Customizable chunking (size, overlap, strategy)
Embedding: Multiple model options
Indexing: Creates a “corpus” optimized for search
Retrieval: Configurable backends (Vertex AI Search, Pinecone, etc.)
Generation: Choose from 100+ LLMs in Model Garden

Vertex AI Search Integration

For high-volume applications:

Handles large data volumes with low latency
Improved performance and scalability
Native integration with RAG Engine

When to Use Which

Use Case	Recommendation
Quick prototyping	Gemini File Search
Simple Q&A over docs	Gemini File Search
Custom embedding models needed	Vertex AI RAG Engine
Existing vector DB (Pinecone, etc.)	Vertex AI RAG Engine
Enterprise security requirements	Vertex AI RAG Engine
Need ranking control	Vertex AI RAG Engine

Graph RAG Support

Does Google File Search Support Graph RAG?

No. Gemini File Search is purely vector-based RAG. It does not build or traverse knowledge graphs.

Google’s GraphRAG Solution: Spanner Graph

Google provides a separate reference architecture for GraphRAG using Spanner Graph + Vertex AI.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     DATA INGESTION                              │
├─────────────────────────────────────────────────────────────────┤
│  Cloud Storage → Pub/Sub → Cloud Run → LLMGraphTransformer      │
│                                              ↓                  │
│                                    Extract entities &           │
│                                    relationships                │
│                                              ↓                  │
│                                    Spanner Graph                │
│                                    (graph + embeddings)         │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                     QUERY SERVING                               │
├─────────────────────────────────────────────────────────────────┤
│  User Query → Embedding → Vector Search → Graph Traversal       │
│                                              ↓                  │
│                                    Retrieve connected           │
│                                    entities & relationships     │
│                                              ↓                  │
│                                    Vertex AI Ranking API        │
│                                              ↓                  │
│                                    Gemini summarizes            │
└─────────────────────────────────────────────────────────────────┘

Key Components

LLMGraphTransformer (LangChain): Extracts entities and relationships from text using Gemini
Spanner Graph: Stores both knowledge graph and vector embeddings
Hybrid Retrieval: Combines vector similarity + graph traversal
Vertex AI Ranking API: Filters results by semantic relevance

Vector RAG vs GraphRAG

Aspect	Vector RAG	GraphRAG
Data Model	Flat chunks + embeddings	Knowledge graph + embeddings
Retrieval	Cosine similarity only	Vector search + graph traversal
Strengths	Simple, fast, scalable	Captures relationships, better for connected data
Use Cases	Q&A, document search	Complex reasoning, multi-hop queries
Google Product	Gemini File Search	Spanner Graph + Vertex AI

When to Use GraphRAG

Data has rich relationships (org charts, supply chains, research papers)
Queries require multi-hop reasoning (“Who manages the person who wrote X?”)
Need to understand entity connections across documents
Standard RAG returns isolated facts without context