website-page-indexing
The Problem
Google indexes your homepage but not internal pages/subpages. This is a common issue with several potential causes.
Diagnostic Steps
1. Check Google Search Console
First, verify your site in Google Search Console and check:
- Page Indexing Report: Shows which pages are indexed and why others aren’t
- URL Inspection Tool: Check specific URLs to see their status
- Look for these common statuses:
- “Crawled – currently not indexed”
- “Discovered – currently not indexed”
- “Blocked by robots.txt”
- “Excluded by noindex tag”
2. Common Causes
Technical Issues
| Cause | How to Check | Fix |
|---|---|---|
| robots.txt blocking | Visit yoursite.com/robots.txt | Remove/modify Disallow rules |
| noindex meta tags | View page source, look for <meta name="robots" content="noindex"> | Remove noindex tag |
| noindex HTTP header | Check response headers for X-Robots-Tag: noindex | Remove header |
| JavaScript rendering | Use URL Inspection “View Crawled Page” | Implement SSR/SSG |
| Canonical issues | Check for improper <link rel="canonical"> tags | Fix canonical URLs |
Content Issues
| Cause | Symptoms | Fix |
|---|---|---|
| Thin content | Pages with very little text | Add substantial, valuable content |
| Duplicate content | Multiple pages with same/similar content | Consolidate or differentiate |
| Low quality | Generic, non-unique content | Improve quality and uniqueness |
| Search intent mismatch | Content doesn’t match what users search for | Align with user intent |
Site Architecture Issues
| Cause | How to Check | Fix |
|---|---|---|
| Orphan pages | Pages with no internal links | Add internal links |
| Poor navigation | Important pages buried deep | Improve site structure |
| Missing sitemap | No sitemap.xml submitted | Create and submit sitemap |
3. SPA/JavaScript-Heavy Sites
Single Page Applications have special challenges:
- Problem: Googlebot may not see JavaScript-rendered content
- Diagnostic: Use URL Inspection → “View Crawled Page” → “Screenshot” tab
- Solutions:
- Server-Side Rendering (SSR)
- Static Site Generation (SSG)
- Pre-rendering for important pages
- Dynamic rendering (serve pre-rendered to bots)
Fixes by Priority
High Priority (Check First)
-
Verify no blocking in robots.txt
# Check yoursite.com/robots.txt# Ensure important paths aren't disallowed -
Check for noindex tags in page source:
<!-- This will prevent indexing --><meta name="robots" content="noindex"> -
Submit sitemap to Google Search Console
- Create
sitemap.xmllisting all pages - Submit via Search Console → Sitemaps
- Create
Medium Priority
-
Improve internal linking
- Link to important pages from homepage/navigation
- Add contextual links within content
- Ensure no orphan pages
-
Check canonical tags
- Each page should have proper
<link rel="canonical" href="..."> - Ensure canonicals point to the correct URLs
- Each page should have proper
-
For JavaScript sites, implement SSR/SSG:
- Next.js (React)
- Nuxt (Vue)
- SvelteKit
- Astro
Lower Priority
-
Improve content quality
- Add more substantial content to thin pages
- Make content unique and valuable
- Match search intent
-
Request manual indexing
- Use URL Inspection Tool → “Request Indexing”
- Create temporary sitemap with unindexed URLs
The robots.txt vs noindex Trap
Important: These work differently and can conflict:
-
robots.txt
Disallow: Prevents crawling but NOT indexing- Google may still index the URL if linked elsewhere
- Google can’t see noindex tags on blocked pages
-
noindex tag/header: Prevents indexing but requires crawling
- Page must be crawlable for Google to see the noindex directive
Common mistake: Blocking pages in robots.txt AND adding noindex. Google can’t see the noindex tag, so the page may still appear in results!
Timeline Expectations
- New sites: 1-4 weeks for initial crawling
- New pages: Days to weeks
- After fixes: Request re-indexing, wait days to weeks
- Large sites: May take longer for full crawl
Case Study: zabaca.com (2025-11-28)
Problem
Only the homepage zabaca.com was indexed by Google. All subpages (/products/*, /careers, /press/*) were not appearing in search results.
Investigation
Site was built with Astro SSG (static site generation) - NOT an SPA rendering issue.
Issues Found:
| Issue | Status |
|---|---|
| robots.txt | 404 - Missing |
| sitemap.xml | 404 - Missing |
| Canonical tags | Not set |
| Meta descriptions | Same on all pages (outdated copy) |
| Open Graph tags | Missing |
Solution Implemented
-
Created
robots.txt:User-agent: *Allow: /Sitemap: https://www.zabaca.com/sitemap-index.xml -
Added
@astrojs/sitemapintegration with site URL config -
Updated
layout.astro:- Added canonical tags using
Astro.url.href - Added Open Graph meta tags
- Made description a prop with fallback
- Added canonical tags using
-
Updated all pages with unique meta descriptions
Files Modified
apps/web/public/robots.txt(created)apps/web/astro.config.mjs(sitemap integration + site URL)apps/web/src/layouts/layout.astro(SEO meta tags)- All page files (unique descriptions)
Post-Deployment Steps
- Submit sitemap to Google Search Console
- Request re-indexing of key pages via URL Inspection Tool
- Monitor indexing status over 1-2 weeks
Sources
- Why Pages Aren’t Indexed - Break The Web
- Page Indexing Report - Google Search Console Help
- How To Fix Crawled Currently Not Indexed - Onely
- Crawled Currently Not Indexed - Rank Math
- Block Search Indexing with noindex - Google Developers
- Robots.txt Guide - Google Developers
- Website Not Indexed - Entail AI