llms.txt is the new robots.txt and most sites are doing...

The llms.txt specification is barely a year old and it's already one of the most misimplemented files on the web. The idea is simple: give large language models a structured summary of your site so they know what to cite and how to describe you. In practice, most implementations are either missing entirely, structurally broken, or so stale they describe a version of the site that hasn't existed for months.

Why llms.txt matters now

When someone asks ChatGPT, Perplexity, or Gemini about your industry, the model decides in milliseconds which sources to trust and cite. Part of that decision comes from training data. But increasingly, AI systems are checking live files that sites publish specifically for machine consumption. That's what llms.txt is for.

Think of it as the difference between hoping a journalist finds your press kit and putting the press kit in their hands. Without llms.txt, an LLM has to infer what your site is about from crawled pages, which might be outdated, incomplete, or focused on the wrong content. With a well-structured llms.txt, you're telling the model directly: here's who we are, here's our best content, here's what we're authoritative on.

The structure most sites get wrong

The specification defines a clean format: an H1 title, a blockquote description, then H2 sections grouping your important URLs. Simple enough. But the quality scorer finds consistent problems:

Missing H1. The title line gets omitted or formatted as plain text instead of a proper heading. Without it, parsers can't reliably identify the document.

Broken links. Sites publish llms.txt once and never update it. Pages get renamed, products get discontinued, blog posts get consolidated. Six months later, half the URLs in the file return 404s. An LLM that follows a broken link learns nothing except that your site doesn't maintain its references.

Wrong location. The spec supports both /llms.txt at the root and /.well-known/llms.txt. Some sites put it in neither location, or put it somewhere creative like /docs/llms.txt where no parser will find it.

Stale descriptions. The blockquote description says "We're a startup building X" when the company pivoted to Y eighteen months ago. The description is the first thing an LLM reads. If it's wrong, everything downstream is wrong.

What the quality scorer checks

The LLMs.txt Quality Scorer fetches all three standard locations and scores what it finds:

Structural compliance checks whether the document follows the H1 / blockquote / H2 / link-list format the spec requires. Malformed documents get partially parsed or ignored entirely.

Link validity follows every URL in the file and confirms it returns a 200. Dead links drag your score down and waste the LLM's context window on errors.

Freshness compares the document's apparent content against what's actually on the linked pages. If your llms.txt describes services you no longer offer, the tool flags it.

Completeness checks whether you're missing key sections that would help an LLM understand your site, like a section for your most important pages or your area of expertise.

The llms-full.txt companion

Some sites also publish /llms-full.txt, a longer version that includes more detailed content for LLMs that can handle larger context. The quality scorer checks for this too and evaluates whether the full version actually adds value beyond the summary or just duplicates it.

The distinction matters because different AI systems have different context budgets. A terse llms.txt that points to your five best pages serves a constrained model well. A comprehensive llms-full.txt that includes full article text serves a model with a large context window. Publishing both covers the range.

If you're building a site that needs to be visible to AI answer engines from day one, The $97 Launch ($9.99 on Kindle) walks through the full setup including machine-readable discovery files.

Fact-check notes and sources

The llms.txt specification was proposed by Jeremy Howard in September 2024 and published at llmstxt.org.
Perplexity, Anthropic Claude, and other AI systems have documented support for checking llms.txt and similar discovery files.
The .well-known URI registry is maintained by IANA under RFC 8615.

llms.txt is the new robots.txt and most sites are doing it wrong