The AI Posture Audit originally did one thing: fetch robots.txt, ai.txt, meta robots, and the X-Robots-Tag header, then cross-reference them per bot to flag crawl-directive disagreements. Useful, narrow, clear.
Then the question kept coming up: "okay, robots.txt is correct, but is my llms.txt present? Does it follow the spec? Am I serving humans.txt? Does my security.txt point somewhere? And now that I've ran four separate tools and have four separate AI fix prompts — how do I merge them into a single plan for the LLM?"
The answer turned out to be: expand the tool's scope to audit the whole AI-discovery surface, and emit one master prompt that consolidates every finding into a single paste.
What the tool now fetches
Thirteen files in one run, concurrent:
| File | Purpose |
|---|---|
/robots.txt |
Crawl directives per bot (baseline) |
/ai.txt |
Training-data-use policy (Spawning spec) |
/.well-known/ai.txt |
RFC 8615 mirror of /ai.txt |
/llms.txt |
llmstxt.org — curated index of retrievable pages |
/llms-full.txt |
Expanded llms.txt with chapter summaries |
/.well-known/llms.txt |
RFC 8615 mirror of llms.txt |
/humans.txt |
humanstxt.org — team + authorship identity |
/.well-known/security.txt |
RFC 9116 — security contact (AEO trust signal) |
/.well-known/agent-card.json |
Machine-readable identity for AI agents |
/.well-known/ai-plugin.json |
OpenAI plugin manifest |
/sitemap.xml |
XML sitemap — core discovery file |
/feed.xml |
RSS/Atom feed — syndication + AI content-freshness |
/feed.json |
JSON Feed — modern syndication |
Plus the original per-page signals (meta robots, X-Robots-Tag header), so 15 signals in total per audit.
The Mega Analyzer now fetches the same discovery surface (I backported the three newest files — security.txt, agent-card.json, ai-plugin.json — once the AI Posture Audit proved they were worth tracking). Running either tool gives you the same file-presence matrix; the AI Posture Audit focuses on the directive layer while the Mega Analyzer folds it into a broader single-URL SEO + schema + E-E-A-T audit.
Each file gets three things in the results:
- HTTP status — present (green), missing (red), or weird (amber for 403/500/redirects)
- File size — useful for spotting empty or placeholder files
- Purpose — a one-liner reminding you what this file's role is, so you know why missing matters
For /llms.txt specifically, the tool also runs the llmstxt.org structural validator and flags issues inline — "3 structural issue(s) — see master prompt." A file that's present but malformed is worse than a missing file because the retrieval engine tries to use it and fails silently.
Why "master prompt" beats six separate prompts
Before this change, a thorough AI-surface audit required:
- Run AI Posture Audit → copy its fix prompt
- Run
.well-knownAudit → copy that prompt - Run llms.txt structural validator → copy that prompt
- Run Mega Analyzer → copy the indexing hygiene bits
- Paste all four into Claude separately, or concatenate them by hand
Each separate prompt arrived with its own preamble, its own constraints, its own site-context declaration. Claude would reason about each in isolation. You'd get four fix plans that sometimes conflicted — one would recommend adding llms.txt, another would recommend prioritizing ai.txt, and you'd have to pick an order yourself.
A consolidated prompt gives the LLM the full picture on a single pass:
- What crawl directives are live
- What training-data directive is live
- Whether the llms.txt structure is spec-compliant
- Which identity/discovery files are missing
- Any regenerated files the user already generated (from the new regenerate panel)
- Explicit "do not flag X as a conflict" guidance for the common false-positive (robots.txt allow + ai.txt disallow is the AEO-friendly pattern, not a mistake)
Claude or ChatGPT then produces one fix plan with internal prioritization: critical disagreements first, then missing files that meaningfully move AEO citation probability, then nice-to-haves. No merging or reconciling by hand.
The prompt structure
The master prompt has six sections, assembled from the same data the tool displays in the UI:
- Crawl directives — robots.txt + meta robots + X-Robots-Tag verbatim
- Training directive — ai.txt verbatim (plus a one-liner reminding the LLM these govern different things)
- llms.txt — file contents + structural validation results
- Identity + discovery files — presence/size for all 13 files
- Per-bot posture summary — references back to the detailed conflict matrix in the UI
- Regenerated files (if user produced them via the regenerate panel) — the NEW robots.txt + ai.txt, so the LLM validates them against the audit findings rather than reinventing from scratch
Followed by a TASK section that declares constraints (no code-only guidance, curl verification per fix, don't flag the AEO pattern as a conflict) and priority grouping (critical → important → nice).
A 2,100-byte audit context on a typical site produces a ~6-8 KB master prompt. Easily fits into any chat window. Claude and ChatGPT both handle it in one turn.
The regenerate panel feeds the master prompt too
The AI Posture Audit also has a regenerate panel: audit a site, see what's currently configured, pick a new stance (allow all / AEO-friendly / disallow all / custom), generate fresh robots.txt + ai.txt with the exact same UX as the standalone AI Bot Policy Generator.
The "Use current state" preset (the default after an audit runs) pre-fills every bot with whatever your site currently tells it — so you can see the entire observed-directive picture in one matrix, edit the bots you want to change, and emit a new file that preserves everything else. No more silent-drops where a regenerated file accidentally omits a bot your canonical set had.
When you regenerate, the master prompt automatically updates to include both the audit findings AND the new files. The LLM then validates: do the regenerated files resolve every disagreement from the audit? Are any edge cases the user missed?
So the full round-trip becomes:
- Audit → see disagreements + missing files
- Regenerate → pick stance + customize (starts from observed reality, not a generic preset)
- Copy master prompt → LLM produces the plan
- Deploy → re-audit to confirm alignment
Four clicks, one paste, one validation. The tool that used to surface problems now also closes the loop with recommended fixes it produced itself and validates them via the LLM.
The bot catalog: 21 named crawlers
The regenerate panel and the standalone AI Bot Policy Generator both cover 21 explicitly-named AI / answer-engine crawlers:
- OpenAI — GPTBot, ChatGPT-User, OAI-SearchBot
- Anthropic — ClaudeBot, Claude-Web, anthropic-ai
- Google — Google-Extended, Googlebot
- Perplexity — PerplexityBot
- Meta — Meta-ExternalAgent, FacebookBot
- Apple — Applebot-Extended, Applebot
- ByteDance — Bytespider
- Amazon — Amazonbot
- Common Crawl — CCBot
- Cohere — cohere-ai
- You.com — YouBot
- DuckDuckGo — DuckAssistBot
- Mistral — MistralAI-User
- Diffbot — Diffbot (knowledge-graph extraction, widely used in LLM pipelines)
- Timpi — Timpibot (decentralized search index)
That's the minimum AI-crawler surface as of 2026. Any site running "allow all" or "AEO-friendly / train-hostile" presets emits files with all 21 named. Previous catalog sizes dropped several bots — the current catalog is an explicit superset. If a new bot launches, add one entry to the bot catalog and every future regeneration includes it.
The earlier version of the regenerate panel dropped four bots (YouBot, FacebookBot, Diffbot, Timpibot) from its catalog, which meant a regenerated file could silently lose coverage vs. a hand-curated robots.txt. That was a real bug I caught running the panel against jwatte.com's own robots.txt and seeing the new output was narrower than what I'd spent months curating. Fix: expanded the catalog, un-gated FacebookBot from the legacy-only flag, and added MistralAI-User. Every regeneration now produces a file with at least the same coverage as a careful hand-written version — typically more.
What the tool still won't tell you
Audit scope is the AI-discovery surface at the ORIGIN level. It doesn't tell you:
- Whether individual pages have per-page meta robots disagreeing with the site-wide policy
- Whether specific URLs are 404ing, soft-404ing, or redirecting — that's the Mega Analyzer's Indexing Hygiene tab
- Which URLs in your sitemap are currently not indexed by Google — that's the Search Console Importer
- Whether your WCAG compliance is shipping — that's the WCAG Audit
So "master prompt" means "master of the AI-posture surface," not "audit the entire site in one call." For a full-site audit, run the Mega Analyzer.
Common findings I see on real sites
Patterns from audits on my own properties and the 40+ other sites I've run this against:
1. robots.txt present, ai.txt missing. Most common. Site has a reasonable robots.txt stance but has never declared a training-data policy. Default interpretation by training crawlers is "unspecified = permitted." Fix: generate an ai.txt with the AI Bot Policy Generator or use the regenerate panel on the Posture Audit directly.
2. llms.txt present, structure broken. Site published /llms.txt thinking presence was enough. Missing H1 title, no blockquote description, flat URL dump with no sections, relative URLs. Retrievers parse the file and silently deprioritize the site because the structure is malformed. Fix: reorganize into the canonical structure.
3. llms.txt present, /.well-known/llms.txt mirror missing. Retrievers checking RFC 8615 well-known locations first miss the file. Fix: copy the file to both paths or configure a server alias.
4. security.txt missing. Not directly an AI-posture signal, but AEO source-quality scoring weights it — it's a proxy for "this site takes itself seriously." Adding a 4-line security.txt is 5 minutes of work and lifts trust signals measurably.
5. humans.txt missing. Similarly boring, similarly weighted. Authorship-identity files help LLMs attribute the site to a verifiable team, which matters for E-E-A-T.
6. agent-card.json missing. Newer file, less critical, but some AI agent frameworks (including MCP-adjacent tooling) prefer sites with a declared agent card. Worth shipping on a personal/author site.
The master prompt surfaces all of these in one call. No hunting through six separate audit reports.
Why this evolution took longer than it should have
Honestly, I should have built the tool with the wider discovery-file scope from day one. The reason I didn't: when I shipped the first version, the crawl-parity problem was what I was personally fighting through (robots.txt vs ai.txt disagreements on my own sites), so I scoped the tool to that.
Three weeks later the question "what about my llms.txt" came up for a client audit and I realized the tool needed to grow. Another week after that I hit the same "copy six prompts" friction myself, and the master prompt became obvious.
The lesson: audit tools drift toward comprehensiveness because real use cases are comprehensive. Starting narrow is fine — but treat the narrow version as a pilot, not the final shape.
Related reading
- Mark It N/A: dismiss audit checks that don't apply — companion UX pattern for filtering findings you don't care about
- Export your audit — scans as JSON for reproducibility
- AI-posture consistency explained — the crawl-vs-training distinction and why robots.txt + ai.txt say different things
- llms.txt structural spec — H1 / blockquote / H2 sections / link-list format
- AI Bot Policy Generator — standalone tool for building the aligned files from scratch
Fact-check notes and sources
- Spawning ai.txt spec: site.spawning.ai/spaces/ai-txt
- llmstxt.org specification: llmstxt.org
- humanstxt.org: humanstxt.org
- RFC 8615 ("Well-Known URIs"): datatracker.ietf.org/doc/html/rfc8615
- RFC 9116 (security.txt): datatracker.ietf.org/doc/html/rfc9116
- JSON Feed specification: jsonfeed.org
- AI Agent Card / agent-card.json emerging convention: documented in the Anthropic MCP ecosystem
Run the AI Posture Audit on your site. Copy the master prompt at the top. Paste into Claude or ChatGPT. Get one consolidated fix plan instead of six.