← Back to Blog

When 200 OK Lies: The SPA Shell Trap That Hides /llms.txt and /.well-known/ From AI Crawlers

When 200 OK Lies: The SPA Shell Trap That Hides /llms.txt and /.well-known/ From AI Crawlers

I audited a public Netlify-hosted page recently. Pretty site. React SPA, Vite-built, single-bundle, deployed via Netlify's default static config. I ran the checks I always run, starting with a probe of the AI-discovery aux files. The numbers came back perfect.

Then I read the response bodies. Every "file" was the same 1,921 bytes. Every one was the SPA shell HTML, served with text/html and a 200 status. There was no robots.txt-aware llms.txt, no signed .well-known/security.txt, no agent card. Just the index page, repeated, with a 200 OK rubber-stamp on each.

This is the SPA shell trap. It is everywhere on Netlify and Vercel and Cloudflare Pages, because the default catch-all rule for an SPA reads /* → /index.html 200. That rule exists so client-side routing works. It also tells the world you have files you don't have.

Why this is worse than a 404

A 404 is a clean signal. A crawler that gets a 404 for /llms.txt knows you don't publish one and moves on. The catch-all returns 200 with HTML, which is a different message: "this file exists, here is its body, treat it as authoritative."

Three groups read that message and act on it:

  1. AI crawlers. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider. They check for llms.txt, ai.txt, .well-known/ai-plugin.json to learn how you want them to behave. When the body is HTML, they either drop it (best case) or log it as malformed and skip your site for the next ingestion window (worst case).
  2. Audit tooling. Lighthouse, Mega Analyzer, the Web Almanac scraper, agent.dev. They surface "200 OK" as "file present" unless they sniff content-type and body content. Some don't.
  3. The platform. Netlify and Vercel both ingest your manifest.webmanifest and site.webmanifest for PWA-related dashboard signals. When those return HTML with a 200, the platform's own integrations occasionally emit weird states.

The site I audited had clean intentions. The owner clearly cared about AI discoverability. The Vite build produced a beautiful frontend. The catch-all rule, which they didn't write because Netlify infers it, made everything they cared about invisible.

How to detect it on your own site

Two minutes with curl, no tooling needed. Probe a path that absolutely cannot exist:

curl -sS -o /dev/null -w "%{http_code} %{size_download}b %{content_type}\n" \
  "https://your-site.example/this-file-cannot-possibly-exist-9876543210"

If you get back 200 1921b text/html (or any HTML body with status 200), you have the SPA catch-all problem. A correctly configured site returns 404 ... text/html for that same path. Either is fine for nonexistent content. The status code is what matters.

For the aux files specifically, repeat the probe for each:

for p in robots.txt llms.txt llms-full.txt ai.txt humans.txt feed.json \
         .well-known/security.txt .well-known/ai.txt .well-known/llms.txt \
         .well-known/agent-card.json .well-known/ai-plugin.json \
         manifest.webmanifest site.webmanifest sitemap.xml; do
  echo "$(curl -sS -o /dev/null -w '%{http_code} %{size_download}b' "https://your-site.example/$p") /$p"
done

Any path you don't actually publish should come back 404. Any path you do publish should come back 200 with the right content-type and a body that isn't your index HTML. Same body bytes across ten different paths is the smoking gun.

The Mega Analyzer's gate

The Mega Analyzer and the Well-Known Audit both run this gate. It looks like this in the analyzer code:

const head = body.slice(0, 400).toLowerCase();
const firstTag = body.trim().slice(0, 40);
const looksLikeHtmlErrorPage =
  /<!doctype\s+html|<html[\s>]/.test(head) &&
  !/^(#|user-agent|sitemap|<\?xml|\{|<rss|<feed|<urlset)/i.test(firstTag);
if (looksLikeHtmlErrorPage) return null;  // treat as missing, not present

If the body opens with a doctype or <html> tag and does not start with one of the legitimate first-tokens we expect (a comment, a User-agent line, a sitemap reference, an XML declaration, a JSON object, an RSS or Atom or sitemap root), it's the SPA shell, and we score it as missing. That's the only way to get a truthful score on sites that 200-everything.

The Netlify config that fixes it

For a Vite or Create-React-App SPA on Netlify, the default _redirects looks like:

/*    /index.html   200

The fix is to NOT catch the aux files in that wildcard. Either let them 404 cleanly (if you don't publish them), or carve them out and serve their real content first. In _redirects:

# Aux files: serve real content with the right content-type and a 200,
# OR let Netlify return a real 404 if the file does not exist.
# These rules only fire when the file does not exist on disk.
/llms.txt                       /llms.txt                       404
/llms-full.txt                  /llms-full.txt                  404
/ai.txt                         /ai.txt                         404
/.well-known/security.txt       /.well-known/security.txt       404
/.well-known/ai.txt             /.well-known/ai.txt             404
/.well-known/llms.txt           /.well-known/llms.txt           404
/.well-known/agent-card.json    /.well-known/agent-card.json    404
/.well-known/ai-plugin.json     /.well-known/ai-plugin.json     404
/manifest.webmanifest           /manifest.webmanifest           404
/site.webmanifest               /site.webmanifest               404

# Then your SPA fallback for app routes.
/*                              /index.html                     200

The 404 status on the carve-out lines means "if this path is not a real file on disk, return 404, not the SPA shell." The order matters; specific rules above the wildcard take precedence.

If you do publish the aux files, set the right content-type in netlify.toml so they don't get content-type-sniffed into the SPA-fallback path:

[[headers]]
  for = "/llms.txt"
  [headers.values]
    Content-Type = "text/plain; charset=utf-8"
    Cache-Control = "public, max-age=86400"

[[headers]]
  for = "/.well-known/security.txt"
  [headers.values]
    Content-Type = "text/plain; charset=utf-8"

[[headers]]
  for = "/.well-known/agent-card.json"
  [headers.values]
    Content-Type = "application/json; charset=utf-8"

[[headers]]
  for = "/.well-known/ai-plugin.json"
  [headers.values]
    Content-Type = "application/json; charset=utf-8"

The same pattern applies on Vercel (vercel.json headers and rewrites), Cloudflare Pages (_headers and _redirects), and Render (render.yaml routes).

Why this matters more in 2026 than it did in 2022

Two years ago, the only thing reading /.well-known/security.txt was a researcher with a curl one-liner. Today the same path is read by AI ingestion pipelines that build supplier-trust scoring, by agent frameworks that look for agent-card.json to discover capabilities, and by browser extensions that surface ai.txt as a privacy signal. The cost of returning HTML with a 200 has moved from "minor confusion" to "actively misleading three different categories of automated reader."

The fix is fifteen lines of _redirects. The audit is one curl loop. The reason most sites are still wrong is that the default config makes the wrong behavior invisible.

I wrote about the AI-aux-file ecosystem and how publishers should think about it in The $100 Network, the third book in the Digital Empire trilogy. Chapter 17 covers the indexing-vs-ingestion split that makes the SPA shell trap so consequential right now.

Related reading

Fact-check notes and sources

This post is informational, not security-consulting advice. Probe your own site only, or sites you have written authorization to assess.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026