# How One E-Commerce Category Page Spawns 10,000 Crawl-Waste URLs

Param-crawl-waste counts single-parameter duplicates. Facet-trap does different math: on a category page with 5 facets averaging 4 values each, the combinatorial space is 5 × 5 × 5 × 5 × 5 = 3,125 URLs. For 12 products. Google wastes 99% of its crawl budget on that one category.

Author: J.A. Watte
Published: May 4, 2026
Source: https://jwatte.com/blog/blog-tool-facet-trap-audit/

---

The math: a category page with N facet filters, each offering K values, can produce `(K+1)^N` unique URLs. Add sort, page, and utm parameters and the number explodes into five digits fast. For most e-commerce sites, one popular category holds more combinatorial URLs than Googlebot can crawl in a month.

[Facet Trap Detector](/tools/facet-trap-audit/) scans a sample category URL, extracts every facet link, and does the combo math.

## Why single-param crawl-waste auditing misses this

[Param Crawl-Waste](/tools/param-crawl-waste/) already counts parameter frequency across a sitemap. It catches `?sort=price` appearing on 500 URLs. What it doesn't do is compute what happens when `?sort=price` combines with `?color=red` and `?size=M` and `?page=2` and `?utm_source=email`. The math is different: single-param is additive, combinatorial is multiplicative.

For one small e-commerce site I tested: param-crawl-waste reported 18% crawl waste. Facet-trap reported that one category page alone could generate 4,200 URLs for 65 products — a 65:1 ratio. Both are right; they're measuring different axes.

## The classification algorithm

Every facet parameter needs one of three dispositions:

1. **Block.** `utm_*`, `fbclid`, `gclid`, `ref`, session tokens, sort, page, view-mode. These should never be indexed. robots.txt `Disallow: /*?utm_*` etc.
2. **Canonicalize.** `color`, `size`, `brand`, `material` facets. The variants should exist (users filter them), but the canonical tag on the variant page should point to the unfiltered category. Google will still crawl, but won't index.
3. **Index.** A specific facet combination that targets a real search query. "Red T-Shirts" is a query people search for, so `?color=red` should be its own indexable page with a unique title + H1.

Most e-commerce sites ship 100% of facets as indexable. The correct distribution is 80% block, 15% canonicalize, 5% index.

## What the fix buys you

On a mid-sized catalog site, fixing facet-trap recovers 20-40% of crawl budget. That budget gets redirected to actual product pages, new arrivals, and seasonal landing pages that weren't getting crawled before. Typical result: 15-30% more organic traffic within 60 days, just from more pages getting indexed.

## Related reading

- [Param Crawl-Waste](/tools/param-crawl-waste/) — single-param frequency
- [Index Coverage Delta](/tools/index-coverage-delta/) — what's actually indexed
- [Sitemap Audit](/tools/sitemap-audit/) — sitemap hygiene

## Fact-check notes and sources

- Google faceted navigation guidelines: [developers.google.com/search/docs/specialty/ecommerce/faceted-navigation](https://developers.google.com/search/docs/specialty/ecommerce/faceted-navigation)
- Consolidate duplicate URLs: [developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls)
- Search Central "crawl budget" guidance: [developers.google.com/search/blog/2017/01/what-crawl-budget-means-for-googlebot](https://developers.google.com/search/blog/2017/01/what-crawl-budget-means-for-googlebot)

---

*The $20 Dollar Agency covers e-commerce client audits where facet-trap is the most common crawl-budget win. The detector is the first-pass diagnostic.*


---

Canonical HTML: https://jwatte.com/blog/blog-tool-facet-trap-audit/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-facet-trap-audit.webp
