Your blog has 12 pages of content. Only page 1 shows up in Google. Pages 2 through 12 exist, are linked, have sitemaps entries, and never appear in search. The usual culprit isn't a crawl problem — it's the canonical-to-page-1 anti-pattern.
A template ships <link rel="canonical" href="/blog/"> on every paginated page. Pages 2-12 all declare "my canonical is page 1." Google believes them, merges signals into page 1, and doesn't index the rest. The template author thought they were "deduplicating." They were deleting 90% of their blog from the index.
Pagination Sanity Check follows the rel=next chain from page 1 up to 10 pages and validates every pagination rule.
Six checks it runs
- Chain follows from page 1.
rel=nextlink exists and points at a live next page. - No canonical-to-page-1 anti-pattern. Each page should canonical to itself, not to page 1 (unless pages 2+ are deliberately noindexed).
- Pages self-canonical.
<link rel="canonical" href="<this-page-url>">— the most common correct pattern. - No
?page=inside canonical URLs. A canonical with?page=is usually a bug; canonicals should be clean. - rel=prev symmetric with rel=next. Page 3's
prevshould point at page 2. - No noindex on paginated pages. Unless you specifically want deep pages out of the index.
What the fix looks like
For a correct paginated series:
<!-- Page 1 -->
<link rel="canonical" href="https://site.com/blog/">
<link rel="next" href="https://site.com/blog/page/2/">
<!-- Page 2 -->
<link rel="canonical" href="https://site.com/blog/page/2/">
<link rel="prev" href="https://site.com/blog/">
<link rel="next" href="https://site.com/blog/page/3/">
Every page canonical to itself. rel=next/prev forming a chain. No noindex. That's it.
Why not just noindex the deep pages
Some SEO advice says "just noindex pages 2+." That works if the deep pages have no search value and only serve as archive navigation. For most blogs, deep pages do have search value — old posts ranking for long-tail keywords, each individually. Noindex kills that entirely.
The right default is: let paginated pages be indexable, each self-canonical, connected by rel=next/prev.
Related reading
- Index Coverage Delta — what's actually indexed vs sitemap
- Sitemap Audit — XML structure
- Param Crawl-Waste —
?page=as crawl-waste
Fact-check notes and sources
- Google on rel=next/prev deprecation: developers.google.com/search/blog/2019/03/rel-next-and-prev
- Google canonicalization docs: developers.google.com/search/docs/crawling-indexing/canonicalization
The $100 Network covers blog archive architecture at scale. Pagination sanity is the one check that prevents silent deep-page de-indexing.