Google offers two RAG file search solutions: Gemini API File Search (developer-friendly) and Vertex AI RAG Engine (enterprise-grade).

Gemini API File Search Tool

Announced November 7, 2025, this is a fully managed RAG system built directly into the Gemini API.

Architecture Overview

┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌────────────┐
│ Upload │───▶│ Chunking │───▶│ Embedding │───▶│ Storage │
│ Files │ │ Strategy │ │ gemini-001-emb │ │ (3x size) │
└─────────────┘ └──────────────┘ └─────────────────┘ └────────────┘
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ Response │◀───│ Generate │◀───│ Retrieve │◀──────────┘
│ + Citations │ │ (Gemini) │ │ (Vector Search)│
└─────────────┘ └──────────────┘ └─────────────────┘

Step-by-Step Process

1. Data Ingestion

  • Upload files to a File Search Store (corpus)
  • Supported formats: PDF, DOCX, XLSX, PPTX, JSON, XML, TXT, Markdown, HTML, 100+ code file types
  • Max file size: 100 MB per document
  • Storage recommendation: Keep stores under 20 GB for optimal latency

2. Chunking

Documents are automatically broken into smaller pieces:

# Customizable chunking config
chunking_config = {
"max_tokens_per_chunk": 200, # Size of each chunk
"max_overlap_tokens": 20 # Overlap between chunks
}
  • Why chunking? LLMs have context limits; smaller chunks enable precise retrieval
  • Overlap ensures context isn’t lost at chunk boundaries
  • Default strategy is optimized, but customizable for specific use cases

3. Embedding Generation

Each chunk is converted to a 3072-dimensional vector using:

  • Model: gemini-embedding-001
  • Performance: Leads the MTEB (Massive Text Embedding Benchmark) Multilingual leaderboard
  • Embeddings capture semantic meaning, not just keywords

4. Indexing & Storage

  • Embeddings stored in optimized vector database
  • Storage uses ~3x original file size (embeddings + metadata)
  • Supports metadata filtering (key-value pairs for selective search)

5. Retrieval (at Query Time)

When a user asks a question:

  1. Query → Embedding: User’s question converted to vector using same embedding model
  2. Similarity Search: System finds chunks with most similar embeddings (cosine similarity)
  3. Semantic matching: Finds relevant info even with different wording than source
Query: "What are the quarterly earnings?"
Matches: "Q3 revenue was $2.5B..." (even though "earnings" ≠ "revenue")

6. Generation

  • Retrieved chunks injected as context into the prompt
  • Gemini generates grounded response
  • Built-in citations point to specific document sections used

Configuration Options

ParameterDescriptionExample
max_tokens_per_chunkChunk size200-1000 tokens
max_overlap_tokensOverlap between chunks20-100 tokens
file_search_store_namesWhich stores to search[“legal-docs”, “policies”]
Metadata filtersFilter by key-value pairscategory: "finance"

Pricing (Gemini API)

ComponentCost
StorageFree
Query-time embeddingFree
Initial indexing$0.15 / 1M tokens

Limitations

  • Cannot choose/tune embedding models
  • Limited chunking strategies (no custom parsers)
  • Cannot inspect embeddings or similarity scores
  • Cannot customize ranking/reranking
  • No fine-grained control over retrieval algorithm

Vertex AI RAG Engine (Enterprise)

For production enterprise workloads with more control.

FeatureGemini File SearchVertex AI RAG Engine
TargetDevelopersEnterprise
Vector DBManaged (hidden)Choose: Pinecone, Weaviate, Vertex AI Vector Search
Embedding ModelFixed (gemini-embedding-001)Configurable
LLMGemini onlyGemini, Llama, Claude, etc.
Data SourcesFile uploadGCS, Drive, databases
SecurityBasicVPC-SC, CMEK

Architecture Components

  1. Data Ingestion: Local files, Cloud Storage, Google Drive
  2. Data Transformation: Customizable chunking (size, overlap, strategy)
  3. Embedding: Multiple model options
  4. Indexing: Creates a “corpus” optimized for search
  5. Retrieval: Configurable backends (Vertex AI Search, Pinecone, etc.)
  6. Generation: Choose from 100+ LLMs in Model Garden

Vertex AI Search Integration

For high-volume applications:

  • Handles large data volumes with low latency
  • Improved performance and scalability
  • Native integration with RAG Engine

When to Use Which

Use CaseRecommendation
Quick prototypingGemini File Search
Simple Q&A over docsGemini File Search
Custom embedding models neededVertex AI RAG Engine
Existing vector DB (Pinecone, etc.)Vertex AI RAG Engine
Enterprise security requirementsVertex AI RAG Engine
Need ranking controlVertex AI RAG Engine

Graph RAG Support

Does Google File Search Support Graph RAG?

No. Gemini File Search is purely vector-based RAG. It does not build or traverse knowledge graphs.

Google’s GraphRAG Solution: Spanner Graph

Google provides a separate reference architecture for GraphRAG using Spanner Graph + Vertex AI.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ DATA INGESTION │
├─────────────────────────────────────────────────────────────────┤
│ Cloud Storage → Pub/Sub → Cloud Run → LLMGraphTransformer │
│ ↓ │
│ Extract entities & │
│ relationships │
│ ↓ │
│ Spanner Graph │
│ (graph + embeddings) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ QUERY SERVING │
├─────────────────────────────────────────────────────────────────┤
│ User Query → Embedding → Vector Search → Graph Traversal │
│ ↓ │
│ Retrieve connected │
│ entities & relationships │
│ ↓ │
│ Vertex AI Ranking API │
│ ↓ │
│ Gemini summarizes │
└─────────────────────────────────────────────────────────────────┘

Key Components

  1. LLMGraphTransformer (LangChain): Extracts entities and relationships from text using Gemini
  2. Spanner Graph: Stores both knowledge graph and vector embeddings
  3. Hybrid Retrieval: Combines vector similarity + graph traversal
  4. Vertex AI Ranking API: Filters results by semantic relevance

Vector RAG vs GraphRAG

AspectVector RAGGraphRAG
Data ModelFlat chunks + embeddingsKnowledge graph + embeddings
RetrievalCosine similarity onlyVector search + graph traversal
StrengthsSimple, fast, scalableCaptures relationships, better for connected data
Use CasesQ&A, document searchComplex reasoning, multi-hop queries
Google ProductGemini File SearchSpanner Graph + Vertex AI

When to Use GraphRAG

  • Data has rich relationships (org charts, supply chains, research papers)
  • Queries require multi-hop reasoning (“Who manages the person who wrote X?”)
  • Need to understand entity connections across documents
  • Standard RAG returns isolated facts without context

References