The Content Gap TF-IDF Analyzer is the audit you reach for when you already suspect a problem in this dimension and need a fast, copy-paste-able fix list. It reuses the same chrome as every other jwatte.com tool — deep-links from the mega analyzers, AI-prompt export, CSV/PDF/HTML download — but the checks it runs are narrow and specific.
Enter a target keyword + your URL. The analyzer fetches the top 10 SERP results, tokenizes each page, computes TF-IDF, and tells you which 20 phrases top-rankers use that you do not. Clearscope / SurferSEO core algorithm, free, in your browser.
What it actually checks
This is a partial extract of the audit's real findings — the same strings the tool prints when a check trips. Use it as a quick sanity check before you run the audit live:
Why this dimension matters
Google's Helpful Content Update (HCU) and the March 2024 Core Update together penalized sites with high ratios of "written-for-SEO" or "written-for-AI" content. Recovery paths are slow (6–12 months) and require removing or rewriting the offending pages — not adding more. Content that reads helpful to a human also reads helpful to the retrieval step in AI search.
Common failure patterns
- Title vs content mismatch — a title that promises "2026 pricing" over content that references 2022 numbers. The audit flags when the title's key terms don't appear in the first 300 words of content.
- Keyword cannibalization — two pages ranking for the same query, each diluting the other. Consolidate with a 301 redirect from the weaker URL to the stronger; keep the unique value in the winning page.
- Content decay — pages that ranked position 3 two years ago and now rank position 12. The fix is usually a content refresh (new year in the title, new examples, updated screenshots) + a re-submission via IndexNow.
- Author bylines missing or generic — "Written by Staff" or "Admin" signals low E-E-A-T. Every post should carry a real author byline with a Person schema and a bio link.
How to fix it at the source
Build an editorial refresh cadence: every published piece gets a review at 6, 12, and 24 months. Add dateModified + a visible "Updated on" stamp. Wire real author schema via author.url → a bio page with Person schema + sameAs to LinkedIn / Wikidata / ORCID. For cannibalization, use the tool's consolidation plan; don't try to rank two pages for one query.
When to run the audit
- After a major site change — redesign, CMS migration, DNS change, hosting platform swap.
- Quarterly as part of routine technical hygiene; the checks are cheap to run repeatedly.
- Before an investor / client review, a PCI scan, a SOC 2 audit, or an accessibility-compliance review.
- When a downstream metric drops (rankings, conversion, AI citations) and you need to rule out this dimension as the cause.
Reading the output
Every finding is severity-classified. The playbook is the same across tools:
- Critical / red: same-week fixes. These block the primary signal and cascade into downstream dimensions.
- Warning / amber: same-month fixes. Drag the score, usually don't block.
- Info / blue: context-only. Often what a PR reviewer would flag but that doesn't block merge.
- Pass / green: confirmation — keep the control in place.
Every audit also emits an "AI fix prompt" — paste into ChatGPT / Claude / Gemini for exact copy-paste code patches tied to your stack.
Related tools
- HCU Pattern Detector — Scans a URL for Google Helpful Content Update demotion signals: thin content, missing first-party experience, affiliate-without-insight density, AI-boilerplate patterns, missing author + date apparatus..
- Cannibalization Audit — Runs a search for a target query + site: operator and identifies when multiple own-site URLs compete for the same query.
- Content Decay Audit — Paste URLs + last-modified dates (or GSC Performance CSV).
- E-E-A-T Audit — Scores the four E-E-A-T pillars from Person/Org schema + sameAs depth..
- Author Authority per Article — Scores an article on 8 authorship signals: byline, Person schema, author link, photo, bio, rel=author, datePublished, dateModified..
Fact-check notes and sources
- Google: Creating helpful, reliable, people-first content
- Google: March 2024 Core Update release notes
- Google: E-E-A-T in Search Quality Rater Guidelines
- Google: Search Central - Author identity in structured data
This post is informational and not a substitute for professional consulting. Mentions of third-party platforms in the tool itself are nominative fair use. No affiliation is implied.