← Back to Blog

IBM published a GEO playbook — I turned it into a page audit

IBM published a GEO playbook — I turned it into a page audit

IBM's framing is clearest on the difference: "SEO focuses on keywords, links, website traffic, page rankings. GEO focuses on prompts, citations, ecosystems, answer eligibility."

SEO optimized a page for a crawler that returned ten blue links. GEO optimizes a page for a retriever that extracts chunks, embeds them as vectors, and composes answers — often without any click-through to the source at all.

The GEO Content Extractability Scorer scores your page against ten signals drawn from IBM's playbook.

The ten signals

1. Question-shaped headings (15 points)

AI retrievers prefer content structured as "question → answer." The tool counts H2 and H3 headings that either end in a question mark or start with how/what/when/where/why/who/which/can/does/is/are/should/do. Three or more = full credit.

2. Short-direct-answer paragraphs (15 points)

The ideal extract unit is a 30-80 word answer paragraph directly under a heading. Not a 300-word exposition — a crisp, extractable answer. The tool counts paragraphs in that word range immediately after H2/H3.

3. Conversational tone (10 points)

IBM: "Use direct, conversational language matching how users phrase questions in AI tools." The tool measures the ratio of conversational markers (you, your, let's, imagine) to formal markers (we, our, the company). Conversational-heavy content extracts better.

4. Chunk-ideal paragraph length (10 points)

RAG systems chunk content at roughly 100-token boundaries. Paragraphs of 30-120 words land inside a single chunk and stay self-contained when retrieved. Longer paragraphs get split mid-thought; shorter ones don't carry enough context.

5. Schema coverage (15 points)

FAQPage JSON-LD is the highest-value markup for question-shaped content. Article / BlogPosting schema provides metadata. The tool scores both.

6. DOM simplicity (10 points)

Complex nested DOM is harder for retrievers to parse cleanly. The tool measures maximum DOM depth and inline-style count. Shallower + fewer inline styles = better extraction.

7. Cite markup (5 points)

<cite> tags and blockquote[cite] attributes. IBM calls this "citation qualification" — content that cites sources is itself more likely to be cited by AI retrievers.

8. Author apparatus (10 points)

Per-article author schema, rel="author" link, visible byline. Full credit requires all three.

9. Freshness signals (5 points)

datePublished + dateModified in schema or meta. AI retrievers deprecate stale content heavily.

10. Heading-level consistency (5 points)

Exactly one H1. Multiple H1s dilute the topic signal; zero H1s break the structure.

Total: 100 points. Scored into HIGH (70+), PARTIAL (40-69), LOW EXTRACTABILITY (<40).

Plus: /llms.txt origin check

The tool also fetches your /llms.txt to confirm you've published a content index AI retrievers can use for canonical navigation. Not counted in the 100 but flagged.

IBM's "85% of visibility comes from outside your site"

The tool doesn't score this because it's measuring a single page's structure — but the playbook's core takeaway is that GEO visibility is 85% an ecosystem-consistency problem, 15% an on-page-structure problem. The on-page half is what this tool covers. For the ecosystem half:

  • Reddit / Quora / Stack Overflow — active answers with your brand name
  • Wikipedia / Wikidata — entity presence
  • G2 / Capterra / Trustpilot / BBB — review platform coverage
  • Media PR — third-party coverage with consistent claims

Run Citation Surface Probe for the ecosystem half. Run this tool for the on-page half. Together they cover both sides of the GEO problem.

What the scorecard tells you to do

The fix prompt the tool emits is structured as the IBM playbook translated into executable steps:

  1. Reformat content as Q&A pairs
  2. Use conversational language
  3. Keep paragraphs 30-120 words
  4. Publish FAQPage + Article JSON-LD
  5. Simplify DOM, remove inline-style sprawl
  6. Use <cite> for every factual claim
  7. Publish /llms.txt
  8. Cross-check brand consistency across external channels
  9. Track monthly via LLM Answer Citation Tracker

Paste the fix prompt into Claude or ChatGPT; ask it to rewrite your page to match. The model treats this like a structured brief.

How to use it

  1. Go to /tools/geo-content-extractability/
  2. Paste the URL of a content page you want AI retrievers to cite
  3. Click Run
  4. Read the score + detailed findings
  5. Copy the fix prompt; paste into your LLM of choice for a rewrite

Typical runtime: ~3-5 seconds per page.

What the tool doesn't measure

  • Factual accuracy. A score of 100 doesn't mean your claims are correct. AI retrievers increasingly deprecate low-factuality content; accuracy is still yours to verify.
  • Topical authority. A well-extracted paragraph still needs the source to be seen as authoritative. Entity signals (Wikipedia, sameAs) matter.
  • Competitive cohort. You might score 90 and still lose to a competitor scoring 95. For cohort comparison, use SERP Cohort Audit.
  • User-intent match. High extractability + wrong topic = no citations. The tool audits structure, not relevance.

Related reading

Fact-check notes and sources

This post is informational, not SEO-consulting or GEO-consulting advice. Mentions of IBM, Google, Perplexity, OpenAI, ChatGPT, Anthropic, Claude, Microsoft, Bing, Reddit, Quora, Stack Overflow, Wikipedia, and similar products / institutions are nominative fair use. No affiliation is implied.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026