← Back to Blog

When Programmatic SEO Becomes Thin-Content Spam — Shingle Detection

When Programmatic SEO Becomes Thin-Content Spam — Shingle Detection

City pages, "X vs Y" comparison pages, service-area pages — the pSEO playbook is one template with one slot swapped per variant. When done well, it captures long-tail demand efficiently. When done poorly, it hits Google's Helpful Content filter and the whole template drops out of the index in one update.

The line between well and poorly isn't word count. It's pairwise shingle similarity across the set. pSEO Thinness Audit runs the same near-duplicate detection Google's filter is doing.

How shingle similarity works

Every page gets tokenized and sliced into overlapping 5-word sequences (shingles). For two pages, Jaccard similarity = intersection / union of their shingle sets. Identical pages = 1.0. Completely different = 0.0.

Real pSEO numbers:

  • 0.7+ — near-duplicate. HCU will filter the set. Fail.
  • 0.5-0.7 — heavy templating. Some filtering likely.
  • 0.35-0.5 — moderate templating. Usually survives.
  • Under 0.35 — differentiated. Safe.

What the audit reports

  • Pairwise matrix — similarity between every pair of pages in the set.
  • Per-page average — each page's mean similarity to the others. Pages with the lowest per-page average are your most differentiated; pages with the highest are your biggest risk.
  • Title + H1 duplication — clusters where multiple pages share the exact same title. Classic pSEO mistake.
  • Thin-page count — pages under 300 words are thin regardless of similarity.
  • HCU risk tier — combined score flagging LOW / MEDIUM / HIGH risk.

The fix pattern

The AI prompt emits a content-diff plan: which sections to delete (pure boilerplate), which to rewrite uniquely per variant (local landmarks, regulations, pricing examples, author quotes), which to keep as shared scaffold.

The specific advice depends on the page type. City pages: rewrite the regulations section for each city's actual laws, add local-pricing examples with real-dollar figures, include a quote from a local author/employee. "X vs Y" pages: rewrite the tradeoffs section uniquely for each pair because the tradeoffs really are different.

Related reading

Fact-check notes and sources


The $100 Network covers scaling pSEO without triggering HCU. The detector is the gate before scaling from 10 pages to 1000.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026