# How Many Fan-Out Queries Does Your Site Actually Cover?

You generated 40 fan-out queries for your seed keyword. Your site has 80 pages. How many queries have a matching landing page? The scorer fetches every URL in your sitemap, extracts title / H1 / meta, Jaccard-matches against each fan-out query, and reports coverage % plus the orphan-query list you need to publish against.

Author: J.A. Watte
Published: April 25, 2026
Source: https://jwatte.com/blog/blog-tool-fan-out-coverage-scorer/

---

The [Query Fan-Out Generator](/tools/query-fan-out/) produces 30-60 sub-queries per seed keyword. Now what?

The honest answer is "you publish a landing page for each query that doesn't already have one." But that assumes you know which queries already have pages. Which for anything larger than a 20-page site is a research problem nobody wants to do manually.

The [Fan-Out Coverage Scorer](/tools/fan-out-coverage-scorer/) does it automatically. Paste your fan-out queries, paste your sitemap URL, and the tool fetches every sitemap URL (up to 100 to stay fast), extracts title + H1 + meta description, Jaccard-matches each fan-out query against the closest page, and returns:

- **Coverage %** — what fraction of fan-out queries have a matching page
- **Orphan list** — queries with no matching page, ranked by how orphaned they are
- **Match detail** — for queries that did match, which page won and how strong the match was

## How matching works

Token overlap via Jaccard similarity. Each query tokenizes to 3-8 content words after stopword removal. Each page tokenizes its title + H1 + meta description to 15-30 tokens. Match score = intersection / union. Anything above 0.35 is counted as a match.

This is deliberately simple. Semantic matching via embeddings would be more accurate but requires an API and adds cost. Token overlap catches 70-80% of real matches correctly in testing against labeled data, which is enough for the use case: surface the orphan list for content planning.

False positives do happen. A query like "cold email pricing" will match a page titled "Cold Email Pricing Guide" correctly, but might also weakly match "Email Pricing for Enterprises" even though that page targets a different audience. Review the best-match column in the output to spot these.

## What orphan queries look like

Typical first run: 40-60% of fan-out queries have a match; 40-60% are orphan.

The orphan queries cluster by intent bucket. Commonly missing:

- **Comparison queries.** ("brand X vs competitor Y", "alternatives to brand X") — most sites don't have dedicated comparison pages.
- **Voice / question-shaped queries.** ("can you tell me about X", "how do I get started with Y") — most sites don't have FAQ-shaped content at question depth.
- **Follow-up / deep-dive queries.** ("case study on X", "advanced techniques for Y") — most sites don't have the long-tail.

These are the content gaps AI engines notice. When your site has no answer for the fan-out sub-queries, the engine blends answers from other sources, and your citation slot goes to whoever does have the content.

## The fix workflow

1. Run the [Query Fan-Out Generator](/tools/query-fan-out/) against your seed keyword.
2. Run this scorer against your sitemap.
3. Look at the orphan list.
4. Pick 3-5 high-value orphan queries (comparison + deep-dive usually win).
5. Publish pages targeting each — or extend existing pages with H2 sections that answer them.
6. Re-run the scorer in a month. Coverage should have lifted.

The long-term target isn't 100% coverage — that's usually content bloat. It's 60-80% coverage with the high-intent buckets fully covered. Follow-up queries matter most for category authority; voice queries matter most for long-tail traffic.

## Why 100 URLs and not all of them

The scorer samples to 100 URLs for speed. On a 2000-page site, fetching every page would take 10-20 minutes. Sampling 100 captures the coverage shape for 80% of queries. If your site is larger and you want full coverage, run the tool multiple times with different sitemap slices.

For 11ty / Jekyll / Hugo sites the sitemap typically lists the most important pages first, so the sample is biased toward high-value pages — which is exactly what you want when checking coverage.

## Related reading

- [Query Fan-Out Generator](/tools/query-fan-out/) — upstream tool; produces the query list this scorer consumes.
- [Index Coverage Delta](/tools/index-coverage-delta/) — sibling tool; diffs your live crawl against sitemap for indexing orphans.
- [Passage Retrievability Scorer](/tools/passage-retrievability/) — after you publish, verify the new page has retrieval-ready passages.
- [Keyword Inspection](/tools/keyword-inspection/) — pre-planning companion; designs IA from SERP top-10.

## Fact-check notes and sources

- Jaccard index: [en.wikipedia.org/wiki/Jaccard_index](https://en.wikipedia.org/wiki/Jaccard_index)
- Google AI Mode query fan-out: [blog.google/products/search/google-search-ai-mode-update](https://blog.google/products/search/google-search-ai-mode-update/)
- Sitemap protocol (xml sitemap structure): [sitemaps.org/protocol.html](https://www.sitemaps.org/protocol.html)

---

*The $100 Network covers content-planning for site networks where each site owns a slice of a fan-out space. The scorer is how you verify each site actually covers its slice without gaps.*


---

Canonical HTML: https://jwatte.com/blog/blog-tool-fan-out-coverage-scorer/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-fan-out-coverage-scorer.webp
