AI Posture Audit Master Prompt — One Copy, Everything i...

The AI Posture Audit originally did one thing: fetch robots.txt, ai.txt, meta robots, and the X-Robots-Tag header, then cross-reference them per bot to flag crawl-directive disagreements. Useful, narrow, clear.

Then the question kept coming up: "okay, robots.txt is correct, but is my llms.txt present? Does it follow the spec? Am I serving humans.txt? Does my security.txt point somewhere? And now that I've ran four separate tools and have four separate AI fix prompts, how do I merge them into a single plan for the LLM?"

The answer turned out to be: expand the tool's scope to audit the whole AI-discovery surface, and emit one master prompt that consolidates every finding into a single paste.

What the tool now fetches

Thirteen files in one run, concurrent:

File	Purpose
`/robots.txt`	Crawl directives per bot (baseline)
`/ai.txt`	Training-data-use policy (Spawning spec)
`/.well-known/ai.txt`	RFC 8615 mirror of `/ai.txt`
`/llms.txt`	llmstxt.org, curated index of retrievable pages
`/llms-full.txt`	Expanded llms.txt with chapter summaries
`/.well-known/llms.txt`	RFC 8615 mirror of llms.txt
`/humans.txt`	humanstxt.org, team + authorship identity
`/.well-known/security.txt`	RFC 9116, security contact (AEO trust signal)
`/.well-known/agent-card.json`	Machine-readable identity for AI agents
`/.well-known/ai-plugin.json`	OpenAI plugin manifest
`/sitemap.xml`	XML sitemap, core discovery file
`/feed.xml`	RSS/Atom feed, syndication + AI content-freshness
`/feed.json`	JSON Feed, modern syndication

Plus the original per-page signals (meta robots, X-Robots-Tag header), so 15 signals in total per audit.

The Mega Analyzer now fetches the same discovery surface (I backported the three newest files — security.txt, agent-card.json, ai-plugin.json, once the AI Posture Audit proved they were worth tracking). Running either tool gives you the same file-presence matrix; the AI Posture Audit focuses on the directive layer while the Mega Analyzer folds it into a broader single-URL SEO + schema + E-E-A-T audit.

Each file gets three things in the results:

HTTP status, present (green), missing (red), or weird (amber for 403/500/redirects)
File size, useful for spotting empty or placeholder files
Purpose, a one-liner reminding you what this file's role is, so you know why missing matters

For /llms.txt specifically, the tool also runs the llmstxt.org structural validator and flags issues inline — "3 structural issue(s), see master prompt." A file that's present but malformed is worse than a missing file because the retrieval engine tries to use it and fails silently.

Why "master prompt" beats six separate prompts

Before this change, a thorough AI-surface audit required:

Run AI Posture Audit → copy its fix prompt
Run .well-known Audit → copy that prompt
Run llms.txt structural validator → copy that prompt
Run Mega Analyzer → copy the indexing hygiene bits
Paste all four into Claude separately, or concatenate them by hand

Each separate prompt arrived with its own preamble, its own constraints, its own site-context declaration. Claude would reason about each in isolation. You'd get four fix plans that sometimes conflicted, one would recommend adding llms.txt, another would recommend prioritizing ai.txt, and you'd have to pick an order yourself.

A consolidated prompt gives the LLM the full picture on a single pass:

What crawl directives are live
What training-data directive is live
Whether the llms.txt structure is spec-compliant
Which identity/discovery files are missing
Any regenerated files the user already generated (from the new regenerate panel)
Explicit "do not flag X as a conflict" guidance for the common false-positive (robots.txt allow + ai.txt disallow is the AEO-friendly pattern, not a mistake)

Claude or ChatGPT then produces one fix plan with internal prioritization: critical disagreements first, then missing files that meaningfully move AEO citation probability, then nice-to-haves. No merging or reconciling by hand.

The prompt structure

The master prompt has six sections, assembled from the same data the tool displays in the UI:

Crawl directives, robots.txt + meta robots + X-Robots-Tag verbatim
Training directive, ai.txt verbatim (plus a one-liner reminding the LLM these govern different things)
llms.txt, file contents + structural validation results
Identity + discovery files, presence/size for all 13 files
Per-bot posture summary, references back to the detailed conflict matrix in the UI
Regenerated files (if user produced them via the regenerate panel), the NEW robots.txt + ai.txt, so the LLM validates them against the audit findings rather than reinventing from scratch

Followed by a TASK section that declares constraints (no code-only guidance, curl verification per fix, don't flag the AEO pattern as a conflict) and priority grouping (critical → important → nice).

A 2,100-byte audit context on a typical site produces a ~6-8 KB master prompt. Easily fits into any chat window. Claude and ChatGPT both handle it in one turn.

The regenerate panel feeds the master prompt too

The AI Posture Audit also has a regenerate panel: audit a site, see what's currently configured, pick a new stance (allow all / AEO-friendly / disallow all / custom), generate fresh robots.txt + ai.txt with the exact same UX as the standalone AI Bot Policy Generator.

The "Use current state" preset (the default after an audit runs) pre-fills every bot with whatever your site currently tells it, so you can see the entire observed-directive picture in one matrix, edit the bots you want to change, and emit a new file that preserves everything else. No more silent-drops where a regenerated file accidentally omits a bot your canonical set had.

When you regenerate, the master prompt automatically updates to include both the audit findings AND the new files. The LLM then validates: do the regenerated files resolve every disagreement from the audit? Are any edge cases the user missed?

So the full round-trip becomes:

Audit → see disagreements + missing files
Regenerate → pick stance + customize (starts from observed reality, not a generic preset)
Copy master prompt → LLM produces the plan
Deploy → re-audit to confirm alignment

Four clicks, one paste, one validation. The tool that used to surface problems now also closes the loop with recommended fixes it produced itself and validates them via the LLM.

The bot catalog: 21 named crawlers

The regenerate panel and the standalone AI Bot Policy Generator both cover 21 explicitly-named AI / answer-engine crawlers:

OpenAI, GPTBot, ChatGPT-User, OAI-SearchBot
Anthropic, ClaudeBot, Claude-Web, anthropic-ai
Google, Google-Extended, Googlebot
Perplexity, PerplexityBot
Meta, Meta-ExternalAgent, FacebookBot
Apple, Applebot-Extended, Applebot
ByteDance, Bytespider
Amazon, Amazonbot
Common Crawl, CCBot
Cohere, cohere-ai
You.com, YouBot
DuckDuckGo, DuckAssistBot
Mistral, MistralAI-User
Diffbot, Diffbot (knowledge-graph extraction, widely used in LLM pipelines)
Timpi, Timpibot (decentralized search index)

That's the minimum AI-crawler surface as of 2026. Any site running "allow all" or "AEO-friendly / train-hostile" presets emits files with all 21 named. Previous catalog sizes dropped several bots, the current catalog is an explicit superset. If a new bot launches, add one entry to the bot catalog and every future regeneration includes it.

The earlier version of the regenerate panel dropped four bots (YouBot, FacebookBot, Diffbot, Timpibot) from its catalog, which meant a regenerated file could silently lose coverage vs. a hand-curated robots.txt. That was a real bug I caught running the panel against jwatte.com's own robots.txt and seeing the new output was narrower than what I'd spent months curating. Fix: expanded the catalog, un-gated FacebookBot from the legacy-only flag, and added MistralAI-User. Every regeneration now produces a file with at least the same coverage as a careful hand-written version, typically more.

What the tool still won't tell you

Audit scope is the AI-discovery surface at the ORIGIN level. It doesn't tell you:

Whether individual pages have per-page meta robots disagreeing with the site-wide policy
Whether specific URLs are 404ing, soft-404ing, or redirecting, that's the Mega Analyzer's Indexing Hygiene tab
Which URLs in your sitemap are currently not indexed by Google, that's the Search Console Importer
Whether your WCAG compliance is shipping, that's the WCAG Audit

So "master prompt" means "master of the AI-posture surface," not "audit the entire site in one call." For a full-site audit, run the Mega Analyzer.

Common findings I see on real sites

Patterns from audits on my own properties and the 40+ other sites I've run this against:

1. robots.txt present, ai.txt missing. Most common. Site has a reasonable robots.txt stance but has never declared a training-data policy. Default interpretation by training crawlers is "unspecified = permitted." Fix: generate an ai.txt with the AI Bot Policy Generator or use the regenerate panel on the Posture Audit directly.

2. llms.txt present, structure broken. Site published /llms.txt thinking presence was enough. Missing H1 title, no blockquote description, flat URL dump with no sections, relative URLs. Retrievers parse the file and silently deprioritize the site because the structure is malformed. Fix: reorganize into the canonical structure.

3. llms.txt present, /.well-known/llms.txt mirror missing. Retrievers checking RFC 8615 well-known locations first miss the file. Fix: copy the file to both paths or configure a server alias.

4. security.txt missing. Not directly an AI-posture signal, but AEO source-quality scoring weights it, it's a proxy for "this site takes itself seriously." Adding a 4-line security.txt is 5 minutes of work and lifts trust signals measurably.

5. humans.txt missing. Similarly boring, similarly weighted. Authorship-identity files help LLMs attribute the site to a verifiable team, which matters for E-E-A-T.

6. agent-card.json missing. Newer file, less critical, but some AI agent frameworks (including MCP-adjacent tooling) prefer sites with a declared agent card. Worth shipping on a personal/author site.

The master prompt surfaces all of these in one call. No hunting through six separate audit reports.

Why this evolution took longer than it should have

, I should have built the tool with the wider discovery-file scope from day one. The reason I didn't: when I shipped the first version, the crawl-parity problem was what I was personally fighting through (robots.txt vs ai.txt disagreements on my own sites), so I scoped the tool to that.

Three weeks later the question "what about my llms.txt" came up for a client audit and I realized the tool needed to grow. Another week after that I hit the same "copy six prompts" friction myself, and the master prompt became obvious.

The lesson: audit tools drift toward comprehensiveness because real use cases are comprehensive. Starting narrow is fine, but treat the narrow version as a pilot, not the final shape.

Fact-check notes and sources

Spawning ai.txt spec: site.spawning.ai/spaces/ai-txt
llmstxt.org specification: llmstxt.org
humanstxt.org: humanstxt.org
RFC 8615 ("Well-Known URIs"): datatracker.ietf.org/doc/html/rfc8615
RFC 9116 (security.txt): datatracker.ietf.org/doc/html/rfc9116
JSON Feed specification: jsonfeed.org
AI Agent Card / agent-card.json emerging convention: documented in the Anthropic MCP ecosystem

Run the AI Posture Audit on your site. Copy the master prompt at the top. Paste into Claude or ChatGPT. Get one consolidated fix plan instead of six.

The AI Posture Audit — From Per-Bot Matrix to One Master Prompt