City pages, "X vs Y" comparison pages, service-area pages — the pSEO playbook is one template with one slot swapped per variant. When done well, it captures long-tail demand efficiently. When done poorly, it hits Google's Helpful Content filter and the whole template drops out of the index in one update.
The line between well and poorly isn't word count. It's pairwise shingle similarity across the set. pSEO Thinness Audit runs the same near-duplicate detection Google's filter is doing.
How shingle similarity works
Every page gets tokenized and sliced into overlapping 5-word sequences (shingles). For two pages, Jaccard similarity = intersection / union of their shingle sets. Identical pages = 1.0. Completely different = 0.0.
Real pSEO numbers:
- 0.7+ — near-duplicate. HCU will filter the set. Fail.
- 0.5-0.7 — heavy templating. Some filtering likely.
- 0.35-0.5 — moderate templating. Usually survives.
- Under 0.35 — differentiated. Safe.
What the audit reports
- Pairwise matrix — similarity between every pair of pages in the set.
- Per-page average — each page's mean similarity to the others. Pages with the lowest per-page average are your most differentiated; pages with the highest are your biggest risk.
- Title + H1 duplication — clusters where multiple pages share the exact same title. Classic pSEO mistake.
- Thin-page count — pages under 300 words are thin regardless of similarity.
- HCU risk tier — combined score flagging LOW / MEDIUM / HIGH risk.
The fix pattern
The AI prompt emits a content-diff plan: which sections to delete (pure boilerplate), which to rewrite uniquely per variant (local landmarks, regulations, pricing examples, author quotes), which to keep as shared scaffold.
The specific advice depends on the page type. City pages: rewrite the regulations section for each city's actual laws, add local-pricing examples with real-dollar figures, include a quote from a local author/employee. "X vs Y" pages: rewrite the tradeoffs section uniquely for each pair because the tradeoffs really are different.
Related reading
- Chunk Retrievability — per-page passage quality
- Heading Gap Audit — competitive H2 coverage
- Voice Cleanup — de-slop content after rewriting
Fact-check notes and sources
- Google Helpful Content Update: developers.google.com/search/updates/helpful-content-update
- W-shingling near-duplicate detection: en.wikipedia.org/wiki/W-shingling
- Jaccard index: en.wikipedia.org/wiki/Jaccard_index
The $100 Network covers scaling pSEO without triggering HCU. The detector is the gate before scaling from 10 pages to 1000.