← Back to Blog

Source Diversity — The AI-Answer Metric Nobody Talks About

Source Diversity — The AI-Answer Metric Nobody Talks About

Perplexity cites 5-10 sources per answer. Copilot cites 3-7. Gemini cites 2-5 on AI Overview mode. ChatGPT with browsing cites whatever it felt like.

The important question isn't "how many sources?" It's "which sources?" If Perplexity cites the same three Reddit threads and one Wikipedia article for every query in your category, the source bucket is saturated and your odds of breaking in are low. If it cites a scattered mix of blogs, news sites, and forum posts, the bucket is open and you can win a slot.

The Citation URL Extractor parses the source list from any pasted AI response, classifies each domain, and scores diversity. Low diversity = concentrated citation market. High diversity = competitive citation market.

The diversity index

Using the Shannon diversity index from ecology (normalized 0-100%):

  • 0-30%. One or two sources dominate. The AI engine trusts them as the canonical voice for this category. To break in you need to match or exceed their authority — e-e-a-t signals, Wikipedia presence, citation density on high-authority neighbors.
  • 30-60%. Moderate spread, 4-6 sources in rotation. Most competitive categories land here. You can win a slot with consistent publishing + on-page AEO signals.
  • 60-100%. High diversity, many sources cited across queries. The engine is uncertain which source to anchor on. Easy to break in; easy to be pushed out.

Pair with the own-domain citation share: what percentage of the citations (in the one response you pasted, or across many) come from your own domain? Zero is the baseline. Above 10% across a broad sample means you've won the category.

The seven source buckets

The extractor classifies domains into seven categories:

  • Own. Your domain (plus subdomains). Provided as a form input.
  • Competitor. Explicitly-listed competitor domains.
  • Wikipedia / Wikidata. The encyclopedic anchor. Frequently cited because it's been audited.
  • Community / forum. Reddit, Stack Overflow, Hacker News, Quora, Medium, Substack.
  • Video. YouTube, Vimeo, TikTok, TED.
  • News / media. NYT, WaPo, Bloomberg, Reuters, WSJ, BBC, Forbes, TechCrunch, etc.
  • Government / research. .gov, .edu, .ac.uk, NIH, CDC, NIST, arxiv.org, nature.com, science.org.

Everything else lands in "Other," which is usually the long-tail of niche blogs, vendor docs, and smaller sites. If a category you care about (say fintech-specific publications) isn't well-represented, extend the classifier lexicon in the tool source.

Why Perplexity responses are ideal input

Perplexity shows its source list. Copy the whole answer including the source footer and paste. The extractor finds every URL including the inline-citation form [1] that Perplexity uses.

Copilot, Gemini, and ChatGPT-with-browsing also cite but with less structure. The extractor still pulls any URL that appears anywhere in the pasted text, so it works on all four. You'll miss a few URLs that weren't rendered as clickable links but were mentioned in prose; not much you can do about that without semantic parsing.

What to do with low own-domain citation share

Typical first run: own-domain citations are zero or one. The fix is upstream — AEO signals, passage retrievability per article, entity consistency across pages, llms.txt structure.

The extractor is measurement, not fix. Use it to baseline, then rerun monthly to see whether your upstream work moved the number.

Related reading

Fact-check notes and sources


The $100 Network covers building citation-worthy content across site networks so that own-domain citation share grows as a function of network size. The extractor is how you verify each network is contributing to the citation pool.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026