The Query Fan-Out Generator produces 30-60 sub-queries per seed keyword. Now what?
The honest answer is "you publish a landing page for each query that doesn't already have one." But that assumes you know which queries already have pages. Which for anything larger than a 20-page site is a research problem nobody wants to do manually.
The Fan-Out Coverage Scorer does it automatically. Paste your fan-out queries, paste your sitemap URL, and the tool fetches every sitemap URL (up to 100 to stay fast), extracts title + H1 + meta description, Jaccard-matches each fan-out query against the closest page, and returns:
- Coverage % — what fraction of fan-out queries have a matching page
- Orphan list — queries with no matching page, ranked by how orphaned they are
- Match detail — for queries that did match, which page won and how strong the match was
How matching works
Token overlap via Jaccard similarity. Each query tokenizes to 3-8 content words after stopword removal. Each page tokenizes its title + H1 + meta description to 15-30 tokens. Match score = intersection / union. Anything above 0.35 is counted as a match.
This is deliberately simple. Semantic matching via embeddings would be more accurate but requires an API and adds cost. Token overlap catches 70-80% of real matches correctly in testing against labeled data, which is enough for the use case: surface the orphan list for content planning.
False positives do happen. A query like "cold email pricing" will match a page titled "Cold Email Pricing Guide" correctly, but might also weakly match "Email Pricing for Enterprises" even though that page targets a different audience. Review the best-match column in the output to spot these.
What orphan queries look like
Typical first run: 40-60% of fan-out queries have a match; 40-60% are orphan.
The orphan queries cluster by intent bucket. Commonly missing:
- Comparison queries. ("brand X vs competitor Y", "alternatives to brand X") — most sites don't have dedicated comparison pages.
- Voice / question-shaped queries. ("can you tell me about X", "how do I get started with Y") — most sites don't have FAQ-shaped content at question depth.
- Follow-up / deep-dive queries. ("case study on X", "advanced techniques for Y") — most sites don't have the long-tail.
These are the content gaps AI engines notice. When your site has no answer for the fan-out sub-queries, the engine blends answers from other sources, and your citation slot goes to whoever does have the content.
The fix workflow
- Run the Query Fan-Out Generator against your seed keyword.
- Run this scorer against your sitemap.
- Look at the orphan list.
- Pick 3-5 high-value orphan queries (comparison + deep-dive usually win).
- Publish pages targeting each — or extend existing pages with H2 sections that answer them.
- Re-run the scorer in a month. Coverage should have lifted.
The long-term target isn't 100% coverage — that's usually content bloat. It's 60-80% coverage with the high-intent buckets fully covered. Follow-up queries matter most for category authority; voice queries matter most for long-tail traffic.
Why 100 URLs and not all of them
The scorer samples to 100 URLs for speed. On a 2000-page site, fetching every page would take 10-20 minutes. Sampling 100 captures the coverage shape for 80% of queries. If your site is larger and you want full coverage, run the tool multiple times with different sitemap slices.
For 11ty / Jekyll / Hugo sites the sitemap typically lists the most important pages first, so the sample is biased toward high-value pages — which is exactly what you want when checking coverage.
Related reading
- Query Fan-Out Generator — upstream tool; produces the query list this scorer consumes.
- Index Coverage Delta — sibling tool; diffs your live crawl against sitemap for indexing orphans.
- Passage Retrievability Scorer — after you publish, verify the new page has retrieval-ready passages.
- Keyword Inspection — pre-planning companion; designs IA from SERP top-10.
Fact-check notes and sources
- Jaccard index: en.wikipedia.org/wiki/Jaccard_index
- Google AI Mode query fan-out: blog.google/products/search/google-search-ai-mode-update
- Sitemap protocol (xml sitemap structure): sitemaps.org/protocol.html
The $100 Network covers content-planning for site networks where each site owns a slice of a fan-out space. The scorer is how you verify each site actually covers its slice without gaps.