Two sites rank for "how to repair a leaking roof." The top result is 2,400 words, 12 H2 sections, 8 images, an embedded YouTube video, a FAQ schema block, and 14 internal links. The sixth result is 3,100 words with only 3 H2s, no images, no schema, and two outbound links.
A TF-IDF audit will tell you both pages use the same vocabulary. The shape difference is the whole story.
Why shape matters more than word-level match
Google's ranking signals in 2026 are heavily structural:
- Heading depth correlates with coverage breadth. A page with 12 H2s is answering 12 distinct sub-questions a reader might have. Three H2s answers three.
- Schema types tell Google what kind of content this is. A
HowToschema on a repair guide unlocks rich-result eligibility a plainArticleschema doesn't. - Media ratio signals authenticity. Text-only pages pattern-match thin content; pages with real photos + embedded video pattern-match first-hand experience.
- Internal link density signals authority depth. A page that references ten other pages on the same site belongs to a topical cluster; a page with zero internal links is an island.
- Word-count band is a range, not a target. Top-ranking guides for most queries cluster in a surprisingly narrow band (±30% of median). Being way under or way over signals mismatch.
Word-level TF-IDF audits (what Surfer, Clearscope, and Frase sell) miss all of this. They tell you "use these 47 terms" and nothing about the container shape.
What the Canonical Winning Shape extracts
You paste up to 10 ranking URLs. The tool fetches each, profiles them, and aggregates:
- Word count: mean, median, min-max range
- Heading distribution: H1, H2, H3, H4 counts
- Media: images, videos/embeds
- Structure: paragraphs, lists, tables
- Link density: internal vs external link counts
- Shared schema @types (types present in 50%+ of competitors)
- Shared H2 topic tokens (words appearing in H2s across 50%+ of competitors)
Output is a canonical template — "the shape a new contender needs to ship" — plus an AI brief-generator prompt that takes those numbers and writes a content brief with proposed H1, 8-12 H2 section headings, media requirements, schema types, internal-link targets.
What to do with the output
Match the word count band. If the range is 1,800-2,400, don't ship 900 words and don't ship 4,000. The 1,800 floor exists because below it, Google reads the page as incomplete coverage; the 2,400 ceiling exists because above it, you're padding.
Match the H2 section count within ±2. 12 H2s is the canonical count? Ship 10-14. Not 4. Not 20.
Ship the shared schema types. If 7 of 10 competitors have HowTo schema, that's not optional. It's a rich-result eligibility signal Google uses to decide whether to surface your page in the "how to" SERP feature.
Pick ONE dimension to over-achieve. The audit tells you the canonical shape. The page that wins the SERP is usually one that matches the canonical shape on 90% of dimensions and stands out on ONE. Most media count. Deepest FAQ. Most internal-link depth. Only one — more than one reads as keyword-stuffing, not excellence.
What the tool can't do
The audit is descriptive, not causal. Some canonical-shape features are correlated with ranking without being causes. Example: if 8 of 10 competitors have a newsletter opt-in form, that's probably stylistic — adding a form to your page won't move ranking. The AI brief-generator prompt is where the causal vs correlational filtering happens; it's the reason the audit returns both numbers AND a prompt, not just numbers.
Another limit: the audit can't tell you whether the whole top-10 is a bad fit for your site. If every ranker is a 5,000-word pillar page and you're running a 400-word product page, the gap isn't fixable by matching shape — you probably can't rank there. In that case the audit's real value is telling you to drop the target and pick a different keyword.
The strategic play
Run this audit on the top 5-10 queries you WANT to rank for but currently don't. For each, the canonical shape defines the content brief. Write the brief. Ship the page. Measure the rank lift at 60 days.
On average, a page that matches canonical shape + is published on a site with existing topical authority + is indexed and linked from a hub page moves into the top 20 within 30-60 days and the top 10 within 90-120 days. The shape isn't sufficient on its own — domain authority + topic cluster + internal linking all matter — but it's necessary.
Related reading
- Content Gap TF-IDF — lexical companion to this structural audit
- E-E-A-T Generator — once the shape is right, audit the trust signals
- Heading Gap Audit — drill into the heading-structure dimension
- Search Intent Classifier — upstream: is the query one you should target at all?
Fact-check notes and sources
- Heading-depth correlation with ranking: replicable in any manual sample of a commercial SERP; most third-party correlation studies (Ahrefs, Semrush, Moz annual ranking-factor studies) include this signal
- Schema-type rich-result eligibility: Google Search Central — Understand how structured data works
- Word-count bands: not a direct ranking factor per Google, but emerges empirically from thorough-coverage signals
This post is informational, not SEO-consulting advice. Mentions of Surfer, Clearscope, Frase, MarketMuse, Ahrefs, Semrush, and Moz are nominative fair use. No affiliation is implied.