← Back to Blog

Orphans Are The Silent Indexation Killers On Most SMB Sites

Orphans Are The Silent Indexation Killers On Most SMB Sites

The most common pattern in an SMB site audit:

  • 80-150 pages total
  • 40-60% indexed by Google
  • The un-indexed ones all share one property: nothing on the site links to them

These are orphans. They exist — in the sitemap, at URLs, returning 200 OK — but from a crawler's perspective they're dead ends in the site graph. Google discovers them via the sitemap, considers them, and files them under "Discovered – currently not indexed" because the internal-link signal is zero.

The fix isn't rewriting the content. The fix is giving Google reason to believe the orphan matters to the rest of the site — which you do by linking to it from pages that already have authority.

What the Internal Link Orphan Rescue Planner does

You paste a homepage URL. The tool:

  1. Crawls up to 80 pages from the homepage (configurable floor; larger sites should pre-filter with orphan-page-detector).
  2. Builds the internal-link graph. Every <a href> that stays on-domain counts.
  3. Identifies orphans (crawled pages with zero inbound internal links).
  4. For each orphan, tokenizes title + H1 + meta description + main body text.
  5. Computes TF cosine similarity between the orphan and every other crawled page.
  6. Returns the top 5 candidate source pages per orphan, sorted by similarity score.
  7. Proposes an anchor text pulled from the orphan's own title (stripped of pipe/dash suffixes).
  8. Emits an AI prompt that writes the exact 12-25 word sentence to insert into each source page.

The output is a specific action list: "Insert link from /blog/roofing-tips/ to /services/emergency-roof-repair/ with anchor 'emergency roof repair.' The candidate cosine score is 0.22 — strongly topical."

What makes a good rescue candidate

The cosine score threshold is ~0.08. Above that, the candidate and orphan share enough vocabulary that an inserted link reads naturally.

The three candidate tiers:

0.25+ cosine (strongly topical). These are the best rescue candidates. The source page is already talking about the orphan's subject. A sentence linking to the orphan flows naturally. Usually takes 5 minutes to add.

0.15-0.25 (topical-adjacent). Good candidates but requires a more deliberate link insertion — usually needs a new sentence, not just an anchor in an existing sentence. Still worth the effort.

0.08-0.15 (loosely related). Acceptable for generic "related topics" link farms in sidebars, but won't provide strong topical lift. Use only if nothing better exists.

Below 0.08. Not topically related. Don't force a link; it looks manipulative. Either promote the orphan via main nav / hub page, or reconsider whether the orphan belongs on the site at all.

The 30-day rescue sprint

Week 1: Run the audit. Note the orphan count.

Week 2: For the top 10 orphans (by cosine score of top candidate), implement the rescue link. One sentence per source page. Use the AI prompt to write the sentence.

Week 3: For the next 10 orphans, do the same.

Week 4: Re-audit. Orphans should be down 80-90%. Check GSC Coverage in another 2-3 weeks to see indexation recovery.

Most sites see 60-80% of rescued orphans move from "Discovered – not indexed" to "Indexed" within 30 days. The rest usually have a second issue (thin content, noindex tag, or canonical conflict) that the orphan status was masking.

The orphan-deserves-to-die case

Not every orphan is worth rescuing. Three signals that suggest unpublishing:

1. Content is genuinely obsolete. A 2019 product announcement, a discontinued service, a hiring page for a closed role. 301 redirect to the closest living page, or return 410 Gone.

2. Content is auto-generated filler. Tag archive pages with one post. Paginated archive pages 23+. Author-archive pages for one-post authors. Strip from the CMS, 301 to category parent.

3. Content is duplicate of another page. Run through the Vector Embedding Similarity to confirm. If 65%+ similar to an indexed page, consolidate.

Rescuing every orphan is over-investment. Rescuing the ones that have genuine content depth is the high-ROI move.

The hub-page trick for orphans with no topical match

Sometimes an orphan is a legitimate, unique page that has no topically-similar source to link from. Examples: a standalone landing page, a unique case study, a one-off resource.

For those, the rescue path is:

  1. Add the orphan to the main navigation hub (category page, /services/, /resources/).
  2. Feature it on the homepage for 2-4 weeks until initial crawl picks it up.
  3. Drop the homepage feature once indexed; keep the nav link.

This is brute-force visibility — lower topical signal but enough link equity to force Google to consider the page.

Related reading

Fact-check notes and sources

  • Internal-link signal as a ranking factor: Google Search Central — Links
  • "Discovered – currently not indexed" GSC status often correlates with weak internal-link equity per community observations and Google's own Search Console documentation
  • Token cosine similarity is a standard information-retrieval heuristic; see Salton & McGill (1983)

This post is informational, not SEO-consulting advice. Mentions of Link Whisper, Ahrefs, Semrush, and Screaming Frog are nominative fair use. No affiliation is implied.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026