Part of the AEO / GEO / AI-search audit tool stack. See the pillar post for the full catalog of sibling audits and where this one fits in the lineup.
A soft 404 is the worst kind of technical-SEO bug: it's invisible to tools that only check status codes. The URL returns HTTP 200. The sitemap contains it. Everything looks clean. But when a human lands on the page, there's no content — just a "No results found" or "This product is no longer available" or a near-empty template.
Google is very good at detecting these. Their quality models classify soft 404s and demote or deindex them silently. You don't get a notification. The URL just... stops ranking.
The Soft-404 Detector crawls your sitemap and flags every URL that looks like a soft 404 regardless of HTTP status.
The seven soft-404 patterns
1. Status 200 with "not found" text
Explicit <title>, <h1>, or body content containing: "not found", "no results", "sorry, that page", "page doesn't exist", "404", "does not exist". Flagged as SOFT-404.
2. Status 200 with "out of stock" / "no longer available"
Product pages that went out of stock but kept the URL live instead of redirecting or 404-ing. Flagged as SOFT-404 if schema still says @type: Product but offers.availability is OutOfStock with no alternative.
3. Status 200 with <100 words of content
Thin content below Google's "thin content" threshold. Flagged as THIN.
4. Status 200 with empty main element
Main, article, or #content element exists but is empty or has only whitespace/boilerplate. Flagged as EMPTY.
5. Status 200 with boilerplate-only
Page is all header + footer + sidebar, no unique body. Detected by comparing to the site's template (if crawled). Flagged as BOILERPLATE.
6. Status 200 with redirect-via-JS
Server returns 200 but a JS window.location.href = … fires in <head>. Flagged as JS-REDIRECT (tool also tests the resulting URL).
7. Status 200 with meta-refresh
Server returns 200 but <meta http-equiv="refresh"> fires immediately. Same as above.
Why this matters
Google's crawler mostly handles these correctly on its end — it deindexes them. The problem for you is:
- Crawl budget — Googlebot still wastes time fetching soft 404s. Large sites lose significant crawl capacity.
- Sitemap quality — soft 404s in your sitemap signal to Google that your sitemap isn't trustworthy, which affects how aggressively the rest of your URLs get crawled.
- Internal linking — pages still linking to soft-404 URLs waste internal PageRank.
- User trust — users arriving from older search results hit empty pages, bounce, and cost you.
How to use it
- Go to /tools/soft-404-detector/
- Paste your sitemap URL (or site root — tool auto-discovers)
- Tool crawls up to 100 URLs, rate-limited
- Report groups findings by pattern
- Export CSV for prioritization
For product pages that went out of stock, the right move is:
- Short-term out-of-stock (weeks): keep live, update schema availability, add a "notify me" signup
- Permanent discontinuation: 301 redirect to the category or closest alternative
- SKU retirement where no alternative exists: 410 (Gone) or 404, not soft-404
Related reading
- Broken Link + Decay Scanner — outbound link rot
- Sitewide Crawl Sampler — broader crawl visibility
- 404 Page Quality Audit — audit your actual 404 page
Fact-check notes and sources
- Soft 404 definition (Google): Google Search Central — Soft 404 errors.
- Thin content (quality signal): Google Search Essentials — Spam policies (Thin content).
- 410 Gone vs 404 Not Found: RFC 7231 section 6.5.9.
- Availability property (out of stock): schema.org/Offer.
This post is informational, not SEO-consulting advice. Mentions of Google, Googlebot, and similar products are nominative fair use. No affiliation is implied.