Orphan Page Detector — Sitemap Entries Unreachable From...

Google crawls primarily through links. A sitemap entry helps, but an orphaned page — one with zero internal links pointing at it — lives at the bottom of the crawl-priority pile. ClaudeBot and GPTBot weight links even more heavily; they'll often never visit a sitemap orphan at all.

The Orphan Page Detector crawls your site from the homepage via BFS up to N hops, then diffs the discovered set against your sitemap to surface the orphans.

How it works

Fetch the sitemap, extract all <loc> URLs, normalize (strip trailing slash, lower host).
Start a BFS from the homepage. Up to a configurable cap (default 100 pages, 3 hops).
At each page, extract every same-origin <a href>.
Track depth per discovered URL.
After crawl, compute sitemap_set - discovered_set = orphans.

What the output shows

Coverage % — what fraction of sitemap URLs were reachable via links from the homepage.
Click-depth distribution — how many pages live at depth 0, 1, 2, 3, 4.
Orphan list — URLs in the sitemap but not reached. One-click AI fix prompt proposes where to add links.

Typical patterns

Tag archive pages often orphan. Auto-generated from content but not linked from anywhere prominent.
Legacy landing pages orphan when the homepage redesign dropped the link.
Old blog posts orphan after the blog pagination stops showing them.
Category pages that got replaced by collection pages but are still in the sitemap.

The fix workflow

Run the detector.
For each orphan, ask: "Does this page deserve to exist?" If no, 410-gone + remove from sitemap. If yes, link to it from somewhere prominent.
Best link-from sources: hub pages, category archives, in-content references from high-authority pages, footer (last resort).
Rerun in a month. Coverage should move toward 95%+.

The goal isn't 100% — some pages are intentionally hidden (thank-you pages, legacy redirects). Aim for zero accidental orphans.

Fact-check notes and sources

Google crawl priority signals: developers.google.com/search/docs/crawling-indexing
Screaming Frog orphan detection: screamingfrog.co.uk/seo-spider
BFS (breadth-first search) algorithm: en.wikipedia.org/wiki/Breadth-first_search

The $100 Network covers keeping link graphs healthy across site networks where orphans accumulate invisibly. The detector runs in minutes; the fixes are manual but targeted.

Pages in Your Sitemap With Zero Internal Links — Orphan Detection