# Why llms-full.txt Matters More Than llms.txt For Actual LLM Retrieval

llms.txt is a short pointer file. llms-full.txt is the long-form content map LLMs actually consume. Sites that publish both are ingested more reliably and cited more accurately than sites that publish only the short form — or neither.

Author: J.A. Watte
Published: April 23, 2026
Source: https://jwatte.com/blog/blog-tool-llms-full-txt-coverage-audit/

---

The llms.txt convention (from Jeremy Howard, llmstxt.org) defined two file formats:

- **llms.txt** — short-form discovery file. Points to the most important URLs. A map.
- **llms-full.txt** — long-form content map. Includes full content summaries, inline narrative, the actual substance.

Most sites that adopt the convention publish only llms.txt. That's a map with no terrain. The LLM that fetches it gets a table of contents; if it wants substance, it has to crawl the linked pages individually.

**llms-full.txt is where actual retrieval happens.** A well-written llms-full.txt is a 20-100KB text file containing summaries of every major content area, editorial voice, business description, author context — everything an LLM needs to understand your site without a full crawl.

Sites with both files are the ones getting preferred treatment in LLM-sourced knowledge.

## What the [llms-full.txt Coverage Audit](/tools/llms-full-txt-coverage-audit/) does

Paste a domain. The tool:

1. Fetches `/llms.txt`, `/llms-full.txt`, and `/.well-known/llms.txt`.
2. Checks presence + size of each.
3. Validates llms-full.txt structure: H1, H2 sections, content summaries, outbound links.
4. Cross-references llms.txt URLs against llms-full.txt content.
5. Emits a starter scaffold if llms-full.txt is missing.
6. Emits an AI fix prompt that recommends edits based on actual findings.

## The structure of a strong llms-full.txt

```markdown
# [Site Name]

> One-sentence description of the site's purpose and audience.

## About

> 50-100 word paragraph describing the business / publication / resource. Include founding year, scope, editorial slant.

## Key Content

- [Guide to Topic A](https://yoursite.com/guide-a/) — One-sentence summary of the page's value proposition.
- [Service Overview](https://yoursite.com/services/) — Pricing model, service area, expectations.

## Primary Services

### Service Name
> 3-5 sentence description: problem solved, who it's for, what's included, timeline, pricing.

## Case Studies

- [Case 1](...) — Outcome summary.

## Author / Publisher

> Who runs the site. Credentials, experience, external profiles.

## Updated

Last updated: YYYY-MM-DD
```

Size: 20-100KB. If yours is under 3KB, it's a stub. Over 200KB, it's probably too verbose — trim.

## Why this specifically helps LLM retrieval

When an LLM (or an agent using the LLM) decides whether to retrieve from your site, it often fetches `/llms-full.txt` first to understand the site's scope. The decision to dig deeper is made based on what's in that file.

A thin llms.txt-only site looks like "this site exists, has some pages" — the LLM might or might not invest retrieval budget there.

A comprehensive llms-full.txt site looks like "this site's domain is well-mapped, here's exactly what it covers and where to look for what" — the LLM retrieves confidently.

The delta shows up in citation rates: sites with llms-full.txt published 3-6 months before observation cycles typically see 2-3x the AI-citation rate of equivalent sites publishing only llms.txt.

## The maintenance cadence

- **New content shipped**: add an entry to llms-full.txt in the relevant section + link to it.
- **Monthly**: audit + refresh any section that references specific prices, service areas, or dates that might have changed.
- **Annually**: comprehensive revision. Rewrite intro to reflect current positioning. Update author bios. Prune obsolete entries.

A site that hasn't touched llms-full.txt in 18 months looks stale to retrievers. Freshness discipline applies here too.

## Who should prioritize this

**Should publish llms-full.txt:**
- Sites with 30+ pages
- Sites with clear editorial / service structure worth mapping
- Sites competing for AI-mediated queries where getting cited accurately matters

**Should probably skip:**
- Single-page landing sites
- Sites with <15 pages (llms.txt alone is sufficient)
- Sites that are genuinely agent-hostile by design (don't want to be retrieved)

Default: if you're publishing llms.txt, also publish llms-full.txt. The marginal effort is small; the marginal benefit is meaningful.

## Related reading

- [llms.txt Validator](/tools/llms-txt-validator/) — short-form companion
- [llms.txt Quality Scorer](/tools/llms-txt-quality-scorer/) — quality rubric
- [llms.txt Generator](/tools/llms-txt-generator/) — generate the short-form file
- [RAG Readiness Audit](/blog/blog-tool-rag-readiness-audit/) — adjacent retrieval-readiness

## Fact-check notes and sources

- llms.txt proposal: [llmstxt.org](https://llmstxt.org) — Jeremy Howard + Answer.AI
- llms-full.txt convention: same source, longer-form variant
- Retrieval advantage observational: community benchmarks and AEO-monitoring studies (2025-2026)

*This post is informational, not AEO-consulting advice. Mentions of llmstxt.org, Answer.AI, OpenAI, Anthropic, Google, Perplexity are nominative fair use. No affiliation is implied.*


---

Canonical HTML: https://jwatte.com/blog/blog-tool-llms-full-txt-coverage-audit/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-llms-full-txt-coverage-audit.webp