Google crawls primarily through links. A sitemap entry helps, but an orphaned page — one with zero internal links pointing at it — lives at the bottom of the crawl-priority pile. ClaudeBot and GPTBot weight links even more heavily; they'll often never visit a sitemap orphan at all.
The Orphan Page Detector crawls your site from the homepage via BFS up to N hops, then diffs the discovered set against your sitemap to surface the orphans.
How it works
- Fetch the sitemap, extract all
<loc>URLs, normalize (strip trailing slash, lower host). - Start a BFS from the homepage. Up to a configurable cap (default 100 pages, 3 hops).
- At each page, extract every same-origin
<a href>. - Track depth per discovered URL.
- After crawl, compute
sitemap_set - discovered_set= orphans.
What the output shows
- Coverage % — what fraction of sitemap URLs were reachable via links from the homepage.
- Click-depth distribution — how many pages live at depth 0, 1, 2, 3, 4.
- Orphan list — URLs in the sitemap but not reached. One-click AI fix prompt proposes where to add links.
Typical patterns
- Tag archive pages often orphan. Auto-generated from content but not linked from anywhere prominent.
- Legacy landing pages orphan when the homepage redesign dropped the link.
- Old blog posts orphan after the blog pagination stops showing them.
- Category pages that got replaced by collection pages but are still in the sitemap.
The fix workflow
- Run the detector.
- For each orphan, ask: "Does this page deserve to exist?" If no, 410-gone + remove from sitemap. If yes, link to it from somewhere prominent.
- Best link-from sources: hub pages, category archives, in-content references from high-authority pages, footer (last resort).
- Rerun in a month. Coverage should move toward 95%+.
The goal isn't 100% — some pages are intentionally hidden (thank-you pages, legacy redirects). Aim for zero accidental orphans.
Related reading
- Link Graph — visualizes internal topology; orphan detection at larger scale
- Index Coverage Delta — diffs live crawl vs sitemap with indexable flag
- Internal Link Auditor — finds 404 internal links
- Slug Rename Helper — when you change a slug, find every link to update
Fact-check notes and sources
- Google crawl priority signals: developers.google.com/search/docs/crawling-indexing
- Screaming Frog orphan detection: screamingfrog.co.uk/seo-spider
- BFS (breadth-first search) algorithm: en.wikipedia.org/wiki/Breadth-first_search
The $100 Network covers keeping link graphs healthy across site networks where orphans accumulate invisibly. The detector runs in minutes; the fixes are manual but targeted.