# Pages in Your Sitemap With Zero Internal Links — Orphan Detection

A page in your sitemap that no internal link reaches from the homepage is an orphan. Googlebot discovers pages primarily through links — sitemap-only pages get deprioritized or never crawled. The detector BFS-crawls your site from the homepage up to N hops, then diffs against your sitemap.

Author: J.A. Watte
Published: April 30, 2026
Source: https://jwatte.com/blog/blog-tool-orphan-page-detector/

---

Google crawls primarily through links. A sitemap entry helps, but an orphaned page — one with zero internal links pointing at it — lives at the bottom of the crawl-priority pile. ClaudeBot and GPTBot weight links even more heavily; they'll often never visit a sitemap orphan at all.

The [Orphan Page Detector](/tools/orphan-page-detector/) crawls your site from the homepage via BFS up to N hops, then diffs the discovered set against your sitemap to surface the orphans.

## How it works

1. Fetch the sitemap, extract all `<loc>` URLs, normalize (strip trailing slash, lower host).
2. Start a BFS from the homepage. Up to a configurable cap (default 100 pages, 3 hops).
3. At each page, extract every same-origin `<a href>`.
4. Track depth per discovered URL.
5. After crawl, compute `sitemap_set - discovered_set` = orphans.

## What the output shows

- **Coverage %** — what fraction of sitemap URLs were reachable via links from the homepage.
- **Click-depth distribution** — how many pages live at depth 0, 1, 2, 3, 4.
- **Orphan list** — URLs in the sitemap but not reached. One-click AI fix prompt proposes where to add links.

## Typical patterns

- **Tag archive pages** often orphan. Auto-generated from content but not linked from anywhere prominent.
- **Legacy landing pages** orphan when the homepage redesign dropped the link.
- **Old blog posts** orphan after the blog pagination stops showing them.
- **Category pages** that got replaced by collection pages but are still in the sitemap.

## The fix workflow

1. Run the detector.
2. For each orphan, ask: "Does this page deserve to exist?" If no, 410-gone + remove from sitemap. If yes, link to it from somewhere prominent.
3. Best link-from sources: hub pages, category archives, in-content references from high-authority pages, footer (last resort).
4. Rerun in a month. Coverage should move toward 95%+.

The goal isn't 100% — some pages are intentionally hidden (thank-you pages, legacy redirects). Aim for zero *accidental* orphans.

## Related reading

- [Link Graph](/tools/link-graph/) — visualizes internal topology; orphan detection at larger scale
- [Index Coverage Delta](/tools/index-coverage-delta/) — diffs live crawl vs sitemap with indexable flag
- [Internal Link Auditor](/tools/internal-link-auditor/) — finds 404 internal links
- [Slug Rename Helper](/tools/slug-rename-helper/) — when you change a slug, find every link to update

## Fact-check notes and sources

- Google crawl priority signals: [developers.google.com/search/docs/crawling-indexing](https://developers.google.com/search/docs/crawling-indexing/)
- Screaming Frog orphan detection: [screamingfrog.co.uk/seo-spider](https://www.screamingfrog.co.uk/seo-spider/)
- BFS (breadth-first search) algorithm: [en.wikipedia.org/wiki/Breadth-first_search](https://en.wikipedia.org/wiki/Breadth-first_search)

---

*The $100 Network covers keeping link graphs healthy across site networks where orphans accumulate invisibly. The detector runs in minutes; the fixes are manual but targeted.*


---

Canonical HTML: https://jwatte.com/blog/blog-tool-orphan-page-detector/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-orphan-page-detector.webp
