The most common pattern in an SMB site audit:
- 80-150 pages total
- 40-60% indexed by Google
- The un-indexed ones all share one property: nothing on the site links to them
These are orphans. They exist — in the sitemap, at URLs, returning 200 OK — but from a crawler's perspective they're dead ends in the site graph. Google discovers them via the sitemap, considers them, and files them under "Discovered – currently not indexed" because the internal-link signal is zero.
The fix isn't rewriting the content. The fix is giving Google reason to believe the orphan matters to the rest of the site — which you do by linking to it from pages that already have authority.
What the Internal Link Orphan Rescue Planner does
You paste a homepage URL. The tool:
- Crawls up to 80 pages from the homepage (configurable floor; larger sites should pre-filter with orphan-page-detector).
- Builds the internal-link graph. Every
<a href>that stays on-domain counts. - Identifies orphans (crawled pages with zero inbound internal links).
- For each orphan, tokenizes title + H1 + meta description + main body text.
- Computes TF cosine similarity between the orphan and every other crawled page.
- Returns the top 5 candidate source pages per orphan, sorted by similarity score.
- Proposes an anchor text pulled from the orphan's own title (stripped of pipe/dash suffixes).
- Emits an AI prompt that writes the exact 12-25 word sentence to insert into each source page.
The output is a specific action list: "Insert link from /blog/roofing-tips/ to /services/emergency-roof-repair/ with anchor 'emergency roof repair.' The candidate cosine score is 0.22 — strongly topical."
What makes a good rescue candidate
The cosine score threshold is ~0.08. Above that, the candidate and orphan share enough vocabulary that an inserted link reads naturally.
The three candidate tiers:
0.25+ cosine (strongly topical). These are the best rescue candidates. The source page is already talking about the orphan's subject. A sentence linking to the orphan flows naturally. Usually takes 5 minutes to add.
0.15-0.25 (topical-adjacent). Good candidates but requires a more deliberate link insertion — usually needs a new sentence, not just an anchor in an existing sentence. Still worth the effort.
0.08-0.15 (loosely related). Acceptable for generic "related topics" link farms in sidebars, but won't provide strong topical lift. Use only if nothing better exists.
Below 0.08. Not topically related. Don't force a link; it looks manipulative. Either promote the orphan via main nav / hub page, or reconsider whether the orphan belongs on the site at all.
The 30-day rescue sprint
Week 1: Run the audit. Note the orphan count.
Week 2: For the top 10 orphans (by cosine score of top candidate), implement the rescue link. One sentence per source page. Use the AI prompt to write the sentence.
Week 3: For the next 10 orphans, do the same.
Week 4: Re-audit. Orphans should be down 80-90%. Check GSC Coverage in another 2-3 weeks to see indexation recovery.
Most sites see 60-80% of rescued orphans move from "Discovered – not indexed" to "Indexed" within 30 days. The rest usually have a second issue (thin content, noindex tag, or canonical conflict) that the orphan status was masking.
The orphan-deserves-to-die case
Not every orphan is worth rescuing. Three signals that suggest unpublishing:
1. Content is genuinely obsolete. A 2019 product announcement, a discontinued service, a hiring page for a closed role. 301 redirect to the closest living page, or return 410 Gone.
2. Content is auto-generated filler. Tag archive pages with one post. Paginated archive pages 23+. Author-archive pages for one-post authors. Strip from the CMS, 301 to category parent.
3. Content is duplicate of another page. Run through the Vector Embedding Similarity to confirm. If 65%+ similar to an indexed page, consolidate.
Rescuing every orphan is over-investment. Rescuing the ones that have genuine content depth is the high-ROI move.
The hub-page trick for orphans with no topical match
Sometimes an orphan is a legitimate, unique page that has no topically-similar source to link from. Examples: a standalone landing page, a unique case study, a one-off resource.
For those, the rescue path is:
- Add the orphan to the main navigation hub (category page, /services/, /resources/).
- Feature it on the homepage for 2-4 weeks until initial crawl picks it up.
- Drop the homepage feature once indexed; keep the nav link.
This is brute-force visibility — lower topical signal but enough link equity to force Google to consider the page.
Related reading
- Orphan Page Detector — broader detection tool (companion)
- Link Graph — visualize the whole internal-link structure
- Link Graph Depth Audit — depth-specific metrics
- Internal Link Equity Flow — PageRank-style flow analysis
Fact-check notes and sources
- Internal-link signal as a ranking factor: Google Search Central — Links
- "Discovered – currently not indexed" GSC status often correlates with weak internal-link equity per community observations and Google's own Search Console documentation
- Token cosine similarity is a standard information-retrieval heuristic; see Salton & McGill (1983)
This post is informational, not SEO-consulting advice. Mentions of Link Whisper, Ahrefs, Semrush, and Screaming Frog are nominative fair use. No affiliation is implied.