# Soft-404 Detector — the pages that return 200 but signal &#39;not found&#39;

A soft 404 is a URL that returns HTTP 200 but the body says &#39;page not found&#39;, &#39;no results&#39;, or is thin-to-empty. Google demotes and eventually deindexes them. They&#39;re invisible to naive crawlers. This tool crawls your sitemap and surfaces them.

Author: J.A. Watte
Published: April 22, 2026
Source: https://jwatte.com/blog/blog-tool-soft-404-detector/

---

_Part of the [AEO / GEO / AI-search audit tool stack](/blog/blog-new-aeo-audit-tools-2026/).  See the pillar post for the full catalog of sibling audits and where this one fits in the lineup._

A soft 404 is the worst kind of technical-SEO bug: it's invisible to tools that only check status codes. The URL returns HTTP 200. The sitemap contains it. Everything looks clean. But when a human lands on the page, there's no content — just a "No results found" or "This product is no longer available" or a near-empty template.

Google is very good at detecting these. Their quality models classify soft 404s and demote or deindex them silently. You don't get a notification. The URL just... stops ranking.

[The Soft-404 Detector](/tools/soft-404-detector/) crawls your sitemap and flags every URL that looks like a soft 404 regardless of HTTP status.

## The seven soft-404 patterns

### 1. Status 200 with "not found" text
Explicit `<title>`, `<h1>`, or body content containing: "not found", "no results", "sorry, that page", "page doesn't exist", "404", "does not exist". Flagged as SOFT-404.

### 2. Status 200 with "out of stock" / "no longer available"
Product pages that went out of stock but kept the URL live instead of redirecting or 404-ing. Flagged as SOFT-404 if schema still says `@type: Product` but `offers.availability` is `OutOfStock` with no alternative.

### 3. Status 200 with <100 words of content
Thin content below Google's "thin content" threshold. Flagged as THIN.

### 4. Status 200 with empty main element
Main, article, or #content element exists but is empty or has only whitespace/boilerplate. Flagged as EMPTY.

### 5. Status 200 with boilerplate-only
Page is all header + footer + sidebar, no unique body. Detected by comparing to the site's template (if crawled). Flagged as BOILERPLATE.

### 6. Status 200 with redirect-via-JS
Server returns 200 but a JS `window.location.href = …` fires in `<head>`. Flagged as JS-REDIRECT (tool also tests the resulting URL).

### 7. Status 200 with meta-refresh
Server returns 200 but `<meta http-equiv="refresh">` fires immediately. Same as above.

## Why this matters

Google's crawler mostly handles these correctly *on its end* — it deindexes them. The problem for you is:

1. **Crawl budget** — Googlebot still wastes time fetching soft 404s. Large sites lose significant crawl capacity.
2. **Sitemap quality** — soft 404s in your sitemap signal to Google that your sitemap isn't trustworthy, which affects how aggressively the rest of your URLs get crawled.
3. **Internal linking** — pages still linking to soft-404 URLs waste internal PageRank.
4. **User trust** — users arriving from older search results hit empty pages, bounce, and cost you.

## How to use it

1. Go to [/tools/soft-404-detector/](/tools/soft-404-detector/)
2. Paste your sitemap URL (or site root — tool auto-discovers)
3. Tool crawls up to 100 URLs, rate-limited
4. Report groups findings by pattern
5. Export CSV for prioritization

For product pages that went out of stock, the right move is:
- **Short-term out-of-stock (weeks):** keep live, update schema availability, add a "notify me" signup
- **Permanent discontinuation:** 301 redirect to the category or closest alternative
- **SKU retirement where no alternative exists:** 410 (Gone) or 404, not soft-404

## Related reading

- [Broken Link + Decay Scanner](/tools/broken-link-decay-scanner/) — outbound link rot
- [Sitewide Crawl Sampler](/tools/sitewide-crawl-sampler/) — broader crawl visibility
- [404 Page Quality Audit](/tools/404-page-quality-audit/) — audit your actual 404 page

## Fact-check notes and sources

- **Soft 404 definition (Google):** [Google Search Central — Soft 404 errors](https://developers.google.com/search/docs/crawling-indexing/http-network-errors#soft-404-errors).
- **Thin content (quality signal):** [Google Search Essentials — Spam policies (Thin content)](https://developers.google.com/search/docs/essentials/spam-policies).
- **410 Gone vs 404 Not Found:** [RFC 7231 section 6.5.9](https://www.rfc-editor.org/rfc/rfc7231#section-6.5.9).
- **Availability property (out of stock):** [schema.org/Offer](https://schema.org/Offer).

_This post is informational, not SEO-consulting advice. Mentions of Google, Googlebot, and similar products are nominative fair use. No affiliation is implied._


---

Canonical HTML: https://jwatte.com/blog/blog-tool-soft-404-detector/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-soft-404-detector.webp
