I audited a public Netlify-hosted page recently. Pretty site. React SPA, Vite-built, single-bundle, deployed via Netlify's default static config. I ran the checks I always run, starting with a probe of the AI-discovery aux files. The numbers came back perfect.
Then I read the response bodies. Every "file" was the same 1,921 bytes. Every one was the SPA shell HTML, served with text/html and a 200 status. There was no robots.txt-aware llms.txt, no signed .well-known/security.txt, no agent card. Just the index page, repeated, with a 200 OK rubber-stamp on each.
This is the SPA shell trap. It is everywhere on Netlify and Vercel and Cloudflare Pages, because the default catch-all rule for an SPA reads /* → /index.html 200. That rule exists so client-side routing works. It also tells the world you have files you don't have.
Why this is worse than a 404
A 404 is a clean signal. A crawler that gets a 404 for /llms.txt knows you don't publish one and moves on. The catch-all returns 200 with HTML, which is a different message: "this file exists, here is its body, treat it as authoritative."
Three groups read that message and act on it:
- AI crawlers. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider. They check for
llms.txt,ai.txt,.well-known/ai-plugin.jsonto learn how you want them to behave. When the body is HTML, they either drop it (best case) or log it as malformed and skip your site for the next ingestion window (worst case). - Audit tooling. Lighthouse, Mega Analyzer, the Web Almanac scraper, agent.dev. They surface "200 OK" as "file present" unless they sniff content-type and body content. Some don't.
- The platform. Netlify and Vercel both ingest your
manifest.webmanifestandsite.webmanifestfor PWA-related dashboard signals. When those return HTML with a 200, the platform's own integrations occasionally emit weird states.
The site I audited had clean intentions. The owner clearly cared about AI discoverability. The Vite build produced a beautiful frontend. The catch-all rule, which they didn't write because Netlify infers it, made everything they cared about invisible.
How to detect it on your own site
Two minutes with curl, no tooling needed. Probe a path that absolutely cannot exist:
curl -sS -o /dev/null -w "%{http_code} %{size_download}b %{content_type}\n" \
"https://your-site.example/this-file-cannot-possibly-exist-9876543210"
If you get back 200 1921b text/html (or any HTML body with status 200), you have the SPA catch-all problem. A correctly configured site returns 404 ... text/html for that same path. Either is fine for nonexistent content. The status code is what matters.
For the aux files specifically, repeat the probe for each:
for p in robots.txt llms.txt llms-full.txt ai.txt humans.txt feed.json \
.well-known/security.txt .well-known/ai.txt .well-known/llms.txt \
.well-known/agent-card.json .well-known/ai-plugin.json \
manifest.webmanifest site.webmanifest sitemap.xml; do
echo "$(curl -sS -o /dev/null -w '%{http_code} %{size_download}b' "https://your-site.example/$p") /$p"
done
Any path you don't actually publish should come back 404. Any path you do publish should come back 200 with the right content-type and a body that isn't your index HTML. Same body bytes across ten different paths is the smoking gun.
The Mega Analyzer's gate
The Mega Analyzer and the Well-Known Audit both run this gate. It looks like this in the analyzer code:
const head = body.slice(0, 400).toLowerCase();
const firstTag = body.trim().slice(0, 40);
const looksLikeHtmlErrorPage =
/<!doctype\s+html|<html[\s>]/.test(head) &&
!/^(#|user-agent|sitemap|<\?xml|\{|<rss|<feed|<urlset)/i.test(firstTag);
if (looksLikeHtmlErrorPage) return null; // treat as missing, not present
If the body opens with a doctype or <html> tag and does not start with one of the legitimate first-tokens we expect (a comment, a User-agent line, a sitemap reference, an XML declaration, a JSON object, an RSS or Atom or sitemap root), it's the SPA shell, and we score it as missing. That's the only way to get a truthful score on sites that 200-everything.
The Netlify config that fixes it
For a Vite or Create-React-App SPA on Netlify, the default _redirects looks like:
/* /index.html 200
The fix is to NOT catch the aux files in that wildcard. Either let them 404 cleanly (if you don't publish them), or carve them out and serve their real content first. In _redirects:
# Aux files: serve real content with the right content-type and a 200,
# OR let Netlify return a real 404 if the file does not exist.
# These rules only fire when the file does not exist on disk.
/llms.txt /llms.txt 404
/llms-full.txt /llms-full.txt 404
/ai.txt /ai.txt 404
/.well-known/security.txt /.well-known/security.txt 404
/.well-known/ai.txt /.well-known/ai.txt 404
/.well-known/llms.txt /.well-known/llms.txt 404
/.well-known/agent-card.json /.well-known/agent-card.json 404
/.well-known/ai-plugin.json /.well-known/ai-plugin.json 404
/manifest.webmanifest /manifest.webmanifest 404
/site.webmanifest /site.webmanifest 404
# Then your SPA fallback for app routes.
/* /index.html 200
The 404 status on the carve-out lines means "if this path is not a real file on disk, return 404, not the SPA shell." The order matters; specific rules above the wildcard take precedence.
If you do publish the aux files, set the right content-type in netlify.toml so they don't get content-type-sniffed into the SPA-fallback path:
[[headers]]
for = "/llms.txt"
[headers.values]
Content-Type = "text/plain; charset=utf-8"
Cache-Control = "public, max-age=86400"
[[headers]]
for = "/.well-known/security.txt"
[headers.values]
Content-Type = "text/plain; charset=utf-8"
[[headers]]
for = "/.well-known/agent-card.json"
[headers.values]
Content-Type = "application/json; charset=utf-8"
[[headers]]
for = "/.well-known/ai-plugin.json"
[headers.values]
Content-Type = "application/json; charset=utf-8"
The same pattern applies on Vercel (vercel.json headers and rewrites), Cloudflare Pages (_headers and _redirects), and Render (render.yaml routes).
Why this matters more in 2026 than it did in 2022
Two years ago, the only thing reading /.well-known/security.txt was a researcher with a curl one-liner. Today the same path is read by AI ingestion pipelines that build supplier-trust scoring, by agent frameworks that look for agent-card.json to discover capabilities, and by browser extensions that surface ai.txt as a privacy signal. The cost of returning HTML with a 200 has moved from "minor confusion" to "actively misleading three different categories of automated reader."
The fix is fifteen lines of _redirects. The audit is one curl loop. The reason most sites are still wrong is that the default config makes the wrong behavior invisible.
I wrote about the AI-aux-file ecosystem and how publishers should think about it in The $100 Network, the third book in the Digital Empire trilogy. Chapter 17 covers the indexing-vs-ingestion split that makes the SPA shell trap so consequential right now.
Related reading
- The Well-Known Audit probes 12 standard
.well-known/paths and validates each one's content-type and body shape. - Soft 404 Detection covers the related case where a 200 response carries thin or empty content.
- AI Crawler Access Auditor walks through the bot list to allow and the bot list to deny.
- The Mega Analyzer runs every aux-file probe in one pass and scores the gaps.
- The Hosting Indexing Health Checker covers the broader pattern: 404-template bleed, canonical bleed, noindex inheritance.
Fact-check notes and sources
- HTML5 status code semantics for missing resources: RFC 9110 §15.5.5 (404 Not Found).
- security.txt format and discovery: RFC 9116.
- Netlify
_redirectssyntax and 404 carve-out behavior: Netlify documentation, Redirects and rewrites. - llms.txt proposed format: llmstxt.org.
- Source site identifying details have been redacted; the technical pattern described here is generic to any SPA built with Vite, Next, Astro, Nuxt, or Remix on a static host with a default catch-all rule.
This post is informational, not security-consulting advice. Probe your own site only, or sites you have written authorization to assess.