# Orphans Are The Silent Indexation Killers On Most SMB Sites

Orphan pages — pages with zero inbound internal links — are the most common reason SMB content stays in &#39;Discovered – currently not indexed&#39; purgatory. The fix is internal links from topically-relevant sources, not content rewrites.

Author: J.A. Watte
Published: April 23, 2026
Source: https://jwatte.com/blog/blog-tool-internal-link-orphan-rescue-planner/

---

The most common pattern in an SMB site audit:

- 80-150 pages total
- 40-60% indexed by Google
- The un-indexed ones all share one property: nothing on the site links to them

These are orphans. They exist — in the sitemap, at URLs, returning 200 OK — but from a crawler's perspective they're dead ends in the site graph. Google discovers them via the sitemap, considers them, and files them under "Discovered – currently not indexed" because the internal-link signal is zero.

The fix isn't rewriting the content. The fix is giving Google reason to believe the orphan matters to the rest of the site — which you do by linking to it from pages that already have authority.

## What the [Internal Link Orphan Rescue Planner](/tools/internal-link-orphan-rescue-planner/) does

You paste a homepage URL. The tool:

1. Crawls up to 80 pages from the homepage (configurable floor; larger sites should pre-filter with orphan-page-detector).
2. Builds the internal-link graph. Every `<a href>` that stays on-domain counts.
3. Identifies orphans (crawled pages with zero inbound internal links).
4. For each orphan, tokenizes title + H1 + meta description + main body text.
5. Computes TF cosine similarity between the orphan and every other crawled page.
6. Returns the top 5 candidate source pages per orphan, sorted by similarity score.
7. Proposes an anchor text pulled from the orphan's own title (stripped of pipe/dash suffixes).
8. Emits an AI prompt that writes the exact 12-25 word sentence to insert into each source page.

The output is a specific action list: "Insert link from /blog/roofing-tips/ to /services/emergency-roof-repair/ with anchor 'emergency roof repair.' The candidate cosine score is 0.22 — strongly topical."

## What makes a good rescue candidate

The cosine score threshold is ~0.08. Above that, the candidate and orphan share enough vocabulary that an inserted link reads naturally.

The three candidate tiers:

**0.25+ cosine (strongly topical).** These are the best rescue candidates. The source page is already talking about the orphan's subject. A sentence linking to the orphan flows naturally. Usually takes 5 minutes to add.

**0.15-0.25 (topical-adjacent).** Good candidates but requires a more deliberate link insertion — usually needs a new sentence, not just an anchor in an existing sentence. Still worth the effort.

**0.08-0.15 (loosely related).** Acceptable for generic "related topics" link farms in sidebars, but won't provide strong topical lift. Use only if nothing better exists.

**Below 0.08.** Not topically related. Don't force a link; it looks manipulative. Either promote the orphan via main nav / hub page, or reconsider whether the orphan belongs on the site at all.

## The 30-day rescue sprint

**Week 1:** Run the audit. Note the orphan count.

**Week 2:** For the top 10 orphans (by cosine score of top candidate), implement the rescue link. One sentence per source page. Use the AI prompt to write the sentence.

**Week 3:** For the next 10 orphans, do the same.

**Week 4:** Re-audit. Orphans should be down 80-90%. Check GSC Coverage in another 2-3 weeks to see indexation recovery.

Most sites see 60-80% of rescued orphans move from "Discovered – not indexed" to "Indexed" within 30 days. The rest usually have a second issue (thin content, noindex tag, or canonical conflict) that the orphan status was masking.

## The orphan-deserves-to-die case

Not every orphan is worth rescuing. Three signals that suggest unpublishing:

**1. Content is genuinely obsolete.** A 2019 product announcement, a discontinued service, a hiring page for a closed role. 301 redirect to the closest living page, or return 410 Gone.

**2. Content is auto-generated filler.** Tag archive pages with one post. Paginated archive pages 23+. Author-archive pages for one-post authors. Strip from the CMS, 301 to category parent.

**3. Content is duplicate of another page.** Run through the [Vector Embedding Similarity](/blog/blog-tool-vector-embedding-similarity/) to confirm. If 65%+ similar to an indexed page, consolidate.

Rescuing every orphan is over-investment. Rescuing the ones that have genuine content depth is the high-ROI move.

## The hub-page trick for orphans with no topical match

Sometimes an orphan is a legitimate, unique page that has no topically-similar source to link from. Examples: a standalone landing page, a unique case study, a one-off resource.

For those, the rescue path is:
1. Add the orphan to the main navigation hub (category page, /services/, /resources/).
2. Feature it on the homepage for 2-4 weeks until initial crawl picks it up.
3. Drop the homepage feature once indexed; keep the nav link.

This is brute-force visibility — lower topical signal but enough link equity to force Google to consider the page.

## Related reading

- [Orphan Page Detector](/tools/orphan-page-detector/) — broader detection tool (companion)
- [Link Graph](/tools/link-graph/) — visualize the whole internal-link structure
- [Link Graph Depth Audit](/tools/link-graph-depth-audit/) — depth-specific metrics
- [Internal Link Equity Flow](/tools/internal-link-equity-flow/) — PageRank-style flow analysis

## Fact-check notes and sources

- Internal-link signal as a ranking factor: [Google Search Central — Links](https://developers.google.com/search/docs/crawling-indexing/links-crawlable)
- "Discovered – currently not indexed" GSC status often correlates with weak internal-link equity per community observations and Google's own Search Console documentation
- Token cosine similarity is a standard information-retrieval heuristic; see Salton & McGill (1983)

*This post is informational, not SEO-consulting advice. Mentions of Link Whisper, Ahrefs, Semrush, and Screaming Frog are nominative fair use. No affiliation is implied.*


---

Canonical HTML: https://jwatte.com/blog/blog-tool-internal-link-orphan-rescue-planner/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-internal-link-orphan-rescue-planner.webp
