The Problem

Google indexes your homepage but not internal pages/subpages. This is a common issue with several potential causes.

Diagnostic Steps

1. Check Google Search Console

First, verify your site in Google Search Console and check:

  • Page Indexing Report: Shows which pages are indexed and why others aren’t
  • URL Inspection Tool: Check specific URLs to see their status
  • Look for these common statuses:
    • “Crawled – currently not indexed”
    • “Discovered – currently not indexed”
    • “Blocked by robots.txt”
    • “Excluded by noindex tag”

2. Common Causes

Technical Issues

CauseHow to CheckFix
robots.txt blockingVisit yoursite.com/robots.txtRemove/modify Disallow rules
noindex meta tagsView page source, look for <meta name="robots" content="noindex">Remove noindex tag
noindex HTTP headerCheck response headers for X-Robots-Tag: noindexRemove header
JavaScript renderingUse URL Inspection “View Crawled Page”Implement SSR/SSG
Canonical issuesCheck for improper <link rel="canonical"> tagsFix canonical URLs

Content Issues

CauseSymptomsFix
Thin contentPages with very little textAdd substantial, valuable content
Duplicate contentMultiple pages with same/similar contentConsolidate or differentiate
Low qualityGeneric, non-unique contentImprove quality and uniqueness
Search intent mismatchContent doesn’t match what users search forAlign with user intent

Site Architecture Issues

CauseHow to CheckFix
Orphan pagesPages with no internal linksAdd internal links
Poor navigationImportant pages buried deepImprove site structure
Missing sitemapNo sitemap.xml submittedCreate and submit sitemap

3. SPA/JavaScript-Heavy Sites

Single Page Applications have special challenges:

  • Problem: Googlebot may not see JavaScript-rendered content
  • Diagnostic: Use URL Inspection → “View Crawled Page” → “Screenshot” tab
  • Solutions:
    • Server-Side Rendering (SSR)
    • Static Site Generation (SSG)
    • Pre-rendering for important pages
    • Dynamic rendering (serve pre-rendered to bots)

Fixes by Priority

High Priority (Check First)

  1. Verify no blocking in robots.txt

    # Check yoursite.com/robots.txt
    # Ensure important paths aren't disallowed
  2. Check for noindex tags in page source:

    <!-- This will prevent indexing -->
    <meta name="robots" content="noindex">
  3. Submit sitemap to Google Search Console

    • Create sitemap.xml listing all pages
    • Submit via Search Console → Sitemaps

Medium Priority

  1. Improve internal linking

    • Link to important pages from homepage/navigation
    • Add contextual links within content
    • Ensure no orphan pages
  2. Check canonical tags

    • Each page should have proper <link rel="canonical" href="...">
    • Ensure canonicals point to the correct URLs
  3. For JavaScript sites, implement SSR/SSG:

    • Next.js (React)
    • Nuxt (Vue)
    • SvelteKit
    • Astro

Lower Priority

  1. Improve content quality

    • Add more substantial content to thin pages
    • Make content unique and valuable
    • Match search intent
  2. Request manual indexing

    • Use URL Inspection Tool → “Request Indexing”
    • Create temporary sitemap with unindexed URLs

The robots.txt vs noindex Trap

Important: These work differently and can conflict:

  • robots.txt Disallow: Prevents crawling but NOT indexing

    • Google may still index the URL if linked elsewhere
    • Google can’t see noindex tags on blocked pages
  • noindex tag/header: Prevents indexing but requires crawling

    • Page must be crawlable for Google to see the noindex directive

Common mistake: Blocking pages in robots.txt AND adding noindex. Google can’t see the noindex tag, so the page may still appear in results!

Timeline Expectations

  • New sites: 1-4 weeks for initial crawling
  • New pages: Days to weeks
  • After fixes: Request re-indexing, wait days to weeks
  • Large sites: May take longer for full crawl

Case Study: zabaca.com (2025-11-28)

Problem

Only the homepage zabaca.com was indexed by Google. All subpages (/products/*, /careers, /press/*) were not appearing in search results.

Investigation

Site was built with Astro SSG (static site generation) - NOT an SPA rendering issue.

Issues Found:

IssueStatus
robots.txt404 - Missing
sitemap.xml404 - Missing
Canonical tagsNot set
Meta descriptionsSame on all pages (outdated copy)
Open Graph tagsMissing

Solution Implemented

  1. Created robots.txt:

    User-agent: *
    Allow: /
    Sitemap: https://www.zabaca.com/sitemap-index.xml
  2. Added @astrojs/sitemap integration with site URL config

  3. Updated layout.astro:

    • Added canonical tags using Astro.url.href
    • Added Open Graph meta tags
    • Made description a prop with fallback
  4. Updated all pages with unique meta descriptions

Files Modified

  • apps/web/public/robots.txt (created)
  • apps/web/astro.config.mjs (sitemap integration + site URL)
  • apps/web/src/layouts/layout.astro (SEO meta tags)
  • All page files (unique descriptions)

Post-Deployment Steps

  1. Submit sitemap to Google Search Console
  2. Request re-indexing of key pages via URL Inspection Tool
  3. Monitor indexing status over 1-2 weeks

Sources

  1. Why Pages Aren’t Indexed - Break The Web
  2. Page Indexing Report - Google Search Console Help
  3. How To Fix Crawled Currently Not Indexed - Onely
  4. Crawled Currently Not Indexed - Rank Math
  5. Block Search Indexing with noindex - Google Developers
  6. Robots.txt Guide - Google Developers
  7. Website Not Indexed - Entail AI