Most sites that think they have an "AI bot policy" actually have three half-configured files that contradict each other. A common real state:
robots.txtdisallows GPTBot (copy-pasted from a blog post in 2023)ai.txtdoes not exist, so the default is "opt in to everything"/.well-known/ai.txtalso does not exist<meta name="robots" content="index,follow">has no AI-specific directives
A crawler hitting the site reads the most permissive signal. You think you blocked OpenAI training. You did not.
AI Bot Policy Generator emits all three files from one set of per-bot decisions, so they agree after deploy.
What the tool covers
22+ named AI crawlers. Each gets an explicit crawl stance (allowed to fetch pages) and an explicit training stance (allowed to use content for model training). The two are different — Google-Extended lets you opt out of training while still letting Googlebot crawl for search. Separating them is the whole point.
The named bots include: GPTBot (OpenAI), ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, ClaudeBot-User, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, Bytespider, Amazonbot, CCBot, MistralAI-User, Diffbot, FacebookBot, Meta-ExternalAgent, YouBot, Bingbot (with IndexNow), Yandex, DuckAssistBot, and more.
Three files, one source of truth
robots.txt — the classic. Written as User-agent: <bot> / Disallow: <path> blocks. Honored by well-behaved crawlers, which is most major AI bots.
ai.txt — the site-ai.org spec. A plain-text declaration of training preferences, scoped to AI training use. Emerging standard; site-ai.org is the source.
/.well-known/ai.txt — same content as /ai.txt but at the RFC 8615 well-known-URIs location. Some crawlers check one path, some the other, and some both. Emitting at both eliminates the guess.
The tool also emits <meta name="robots"> and X-Robots-Tag hints so server-level headers agree with the text files.
Opt-out semantics are not uniform
An important gotcha: some AI companies honor the Disallow: directive as both crawl and training opt-out (OpenAI's GPTBot). Others treat training opt-out as a separate signal (Google-Extended is explicitly "training-only"). Reading each company's docs is the only way to know which mode applies. The tool's output links to each company's official bot documentation so you can verify.
After deploy, audit for drift
Emit the three files, deploy, then run AI Posture Audit to confirm all three agree. If your CDN rewrites robots.txt or injects an X-Robots-Tag header, the audit catches the disagreement before a crawler does.
Quarterly re-runs are the right cadence. New major AI bots ship every few months; your policy should reflect them before they start crawling blind.
The "I'm a solo publisher, do I actually need this" answer
If you publish anything with revenue implications (product pages, book pages, paid service pages, a newsletter signup), yes. AI-generated summaries of your pages can reduce downstream click-through. You get to decide which crawlers get to contribute to that summarization. The three files are how that decision is communicated.
If you're pure ad-supported and every eyeball is valuable, lean permissive. If you sell a book or a product where the sale requires the buyer landing on your page, lean restrictive on the training axis while staying permissive on the crawl axis. The tool separates those two decisions per bot.
Related reading
- AI Posture Audit — verify the three files agree after deploy
- AI Crawler Access Auditor — per-bot allow/block verdict with CDN challenge detection
- ai.txt Generator — the simpler single-file predecessor
- Robots/LLM Drift Diff — before/after diff on your three files
Fact-check notes and sources
- OpenAI bot docs: platform.openai.com/docs/bots
- Anthropic ClaudeBot docs: docs.anthropic.com
- Google crawlers reference: developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
- ai.txt spec: site-ai.org
- RFC 8615 — Well-Known URIs: datatracker.ietf.org/doc/html/rfc8615
- RFC 9309 — Robots Exclusion Protocol: rfc-editor.org/rfc/rfc9309.html
The $100 Network treats AI bot policy as part of publisher infrastructure, not an afterthought. The three-file policy is the bedrock; everything downstream (Perplexity citations, Google AIO appearance, LLM training contribution) branches from it.