# Generate a Spec-Compliant /llms.txt From Your Sitemap in One Click

Point the generator at your sitemap. It fetches each URL&#39;s title + meta description, groups by path pattern into H2 sections, and emits a spec-compliant /llms.txt with H1 title, blockquote description, and link-list formatting. Diffs against any existing /llms.txt.

Author: J.A. Watte
Published: April 29, 2026
Source: https://jwatte.com/blog/blog-tool-llms-txt-generator/

---

_Part of the [AEO / GEO / AI-search audit tool stack](/blog/blog-new-aeo-audit-tools-2026/).  See the pillar post for the full catalog of sibling audits and where this one fits in the lineup._

The [llms.txt Validator](/tools/llms-txt-validator/) tells you your file is broken. The [LLMs.txt Generator](/tools/llms-txt-generator/) gives you a correct one — built automatically from your sitemap, titles, and meta descriptions.

## What the generator does

1. Fetches your `sitemap.xml` via the serverless proxy.
2. Caps to the first N URLs (default 80; configurable up to 150 for speed).
3. Fetches each URL, extracts `<title>` and `<meta name="description">`.
4. Groups by path-first-segment (`/docs/*` → "Docs", `/blog/*` → "Blog", `/` → "Core").
5. Emits a spec-compliant `/llms.txt` with H1 (your site name), blockquote description, H2 per group, and markdown link rows with title + description.
6. If a live `/llms.txt` exists at your domain, shows a diff: URLs to add, URLs to remove.

## Why auto-generation beats hand-writing

Hand-written `/llms.txt` files drift. You publish new content, forget to update the file, and retrievers see yesterday's catalog. Auto-generation from the sitemap means the file stays in sync with what you're actually publishing — as long as you rerun the generator periodically (monthly is plenty for most sites).

For sites with a build pipeline (Eleventy, Next.js, Astro), the pattern is: run the generator, save the output to `src/llms.txt.njk`, commit, and ship on next deploy. Automate as a CI step if the site updates daily.

## The diff mode

If `/llms.txt` already exists at your domain, the diff tab shows:

- **URLs to add.** In sitemap, not in current file. New content you haven't published to llms.txt yet.
- **URLs to remove.** In current file, not in sitemap. Deleted or deprecated content still referenced.

Copy the generated file over the live one and push. The validator should now pass 12/12 checks.

## Grouping choice

The default "group by first path segment" works for most sites. If your IA doesn't match URL paths (e.g. `/posts/foo` and `/posts/bar` are actually in different conceptual categories), you'll want to edit the output H2 section names manually before shipping.

Future improvement: read Open Graph `og:section` or a `<meta name="category">` tag if present, to let authors self-categorize without relying on path. Not yet implemented; add it as a template override if you need it.

## Related reading

- [LLMs.txt Validator](/tools/llms-txt-validator/) — 12 structural checks on the output
- [AI Posture Audit](/tools/ai-posture-audit/) — broader discovery-surface audit
- [ai.txt Generator](/tools/ai-txt-gen/) — companion training-policy file
- [llms.txt structural spec](/blog/blog-llms-txt-structure-spec/) — format reference

## Fact-check notes and sources

- llmstxt.org specification: [llmstxt.org](https://llmstxt.org)
- Sitemap protocol: [sitemaps.org/protocol.html](https://www.sitemaps.org/protocol.html)
- RFC 8615 (Well-Known URIs): [datatracker.ietf.org/doc/html/rfc8615](https://datatracker.ietf.org/doc/html/rfc8615)

---

*The $100 Network covers llms.txt as a site-network deliverable — one template, per-site fills. The generator is the template; the validator is the gate.*


---

Canonical HTML: https://jwatte.com/blog/blog-tool-llms-txt-generator/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-llms-txt-generator.webp
