$cat llms-txt-explained.md

llms.txt Explained: The 'robots.txt for AI' and How to Ship a Good One

17 min readby MDflowview as .md
A glowing emerald wireframe document acting as a map, with light pathways branching to page nodes and a geometric AI agent core following one path, on a dark terminal-grid background

Type "robots.txt for AI" into any search box and you land on llms.txt — the proposed standard everyone is suddenly shipping and almost nobody is shipping well. The pitch is seductive: drop one Markdown file at the root of your site and AI assistants will finally understand you. The reality is messier. Most llms.txt files are a copy-pasted sitemap with no descriptions, Google has said plainly that it will not read them, and a 2026 study found that 97% of the files in the wild were never fetched by anything at all.

So is it snake oil, or is it worth your time? The honest answer is that it depends entirely on who you expect to read it — and on whether you ship a good one. This is the third post in our machine-readable-web series, after Google's Open Knowledge Format and MCP and A2A. Here is what llms.txt actually is, what a good one looks like (using ours as the worked example), the mistakes that make most of them useless, and how it differs from the files it keeps getting confused with.

TL;DRllms.txt is a curated Markdown file at your site root that hands an AI agent a short, ranked map of your most useful pages. It is a map, not a gate — closer to a sitemap than to robots.txt, and it has no enforcement. Google does not use it and most files in the wild are never read, so it is not an SEO trick. Its real value is the agents you deliberately point at your site (IDE assistants, MCP clients, in-product copilots) and the discipline of pairing it with clean .md versions of your pages. MDflow ships llms.txt plus .md twins of every shared document natively.

What is llms.txt?

llms.txt is a Markdown file, served at the root of a domain (yoursite.com/llms.txt), that gives large language models a curated, human-readable guide to the content that matters on your site. It was proposed by Jeremy Howard, co-founder of Answer.AI, on September 3, 2024, and the spec lives at llmstxt.org.

The motivation is a context problem. An LLM trying to understand your site at inference time has to wade through HTML full of navigation, ads, and JavaScript — and even then, most sites are far too large to fit in a context window. As Howard put it, "constructing the right context for LLMs based on a website is ambiguous." The person who owns the site already knows which handful of pages answer the question; llms.txt is a place to write it down.

The format is deliberately tiny. There is exactly one required element — an H1 — and a small, predictable shape after it:

# Project Name

> A short summary of what this project is and who it is for.

A few optional sentences of context, in plain Markdown.

## Docs
- [Quickstart](https://example.com/quickstart.md): Get running in five minutes.
- [API reference](https://example.com/api.md): Every endpoint and field.

## Optional
- [Changelog](https://example.com/changelog.md): Version history.

The rule, verbatim from the spec: each section under an ## heading is a "file list" — "a markdown list, containing a required markdown hyperlink [name](url), then optionally a : and notes about the file." One section name is reserved: ## Optional, whose links "can be skipped if a shorter context is needed." That single convention turns the file into a priority list — an agent on a tight budget reads everything except Optional.

Two companions round out the proposal:

  • .md twins. The spec asks you to serve a clean Markdown version of each page by appending .md to its URL (so /pricing also answers at /pricing.md). The links in llms.txt should point at those, so an agent gets prose, not a DOM.
  • /llms-full.txt. An optional single file that inlines the full Markdown of your key pages, so a model can swallow everything in one request instead of following links.

The crucial mental model: llms.txt is an invitation, not an instruction. It has no teeth. It does not block anyone, grant anyone access, or compel any crawler to behave. It simply makes it easy for a cooperating agent to find your best material. Hold on to that — it is the source of most of the confusion below.

What a good llms.txt looks like

A good llms.txt is short, curated, and described. It answers exactly one question for an agent — "where do I look to understand this product?" — and nothing else. The fastest way to show it is a real one, so here is how MDflow's own /llms.txt is built.

It opens the way the spec wants — an H1 and a one-blockquote summary that states what the product is and who (humans and agents) it serves:

# MDflow

> A clean, no-noise markdown editor in the browser... MDflow exposes an
> HTTP API and a Model Context Protocol (MCP) server... so AI agents can
> read, create, update, organize, and share markdown documents.

Then it groups links into labelled ## sections, each link carrying a description that tells the agent why it would click — not just where it goes:

## Docs
- [MCP server](https://mdflow.cz/docs/mcp): Connect to the hosted remote MCP
  server or install the local stdio server...
- [Public API](https://mdflow.cz/docs/api): HTTP API for folders, markdown
  documents, public sharing, and private email sharing...
- [OpenAPI specification](https://mdflow.cz/openapi.json): Machine-readable
  OpenAPI 3.1 description of the public API.

A few properties make it a good file rather than just a long one:

  • It is a "best of," not an inventory. It lists the docs, the API, the comparisons, the blog, and the app — the pages that explain MDflow — not the several-hundred marketing and share URLs. The sitemap is the inventory; llms.txt is the brochure.
  • Every link has a description. The : notes after each link is the part that does the work. "Machine-readable OpenAPI 3.1 description of the public API" tells an agent what it will get before it spends a fetch on it.
  • It points at machine-readable endpoints. The links resolve to things an agent can actually use — an OpenAPI JSON, a self-contained docs.md manual, raw .md twins — not just more HTML.
  • It mirrors the site's real structure. The sections (Docs, Comparisons, Blog, API summary) map to how the product is actually organized, so the map matches the territory.

That is the whole craft: pick the pages that matter, describe each in a sentence, point at clean Markdown, and stop.

llms.txt vs robots.txt, sitemap.xml, and AGENTS.md

The "robots.txt for AI" nickname is catchy and slightly wrong. robots.txt is a gate; llms.txt is a map. They sit at different layers — and so do the other files they get bundled with:

FileJobFormatAudienceEnforces?
robots.txtAccess control — what may be crawledDirectivesCrawlersBy convention
sitemap.xmlExhaustive list of every URLXMLSearch enginesNo
llms.txtCurated map of your best pagesMarkdownAI agentsNo
AGENTS.mdBuild/test/convention instructionsMarkdownCoding agents (in-repo)No
  • robots.txt (a 1990s convention, later formalized as RFC 9309) tells crawlers what they are allowed to fetch. It restricts. llms.txt grants nothing and blocks nothing — if you want to keep AI bots out, that is a robots.txt or firewall job, not an llms.txt one.
  • sitemap.xml lists every indexable URL so a search engine can discover them all. It is comprehensive by design. llms.txt is the opposite: deliberately short, opinionated, and curated. One is an index; the other is a recommendation.
  • AGENTS.md is a different animal entirely — an "open-format README for agents" that lives inside a code repo and tells a coding agent (OpenAI Codex, Cursor, Jules, Aider) how to build, test, and follow your conventions. It is instructions for an agent working on your code, not a discovery file for an agent reading your website.

In the framing from our OKF post: sitemap.xml and llms.txt are discovery, robots.txt is access, MCP is transport, AGENTS.md is instructions, and OKF is knowledge. They stack. You want all of them; none replaces another.

Common mistakes

Most llms.txt files in the wild make at least one of these errors:

  1. Pasting the sitemap. A 400-line dump of every URL is just a sitemap with the wrong extension. The whole point is curation — if an agent still has to guess which links matter, you have not helped it.
  2. No descriptions. A bare list of links throws away the most useful half of the format. The : one-line description is what lets an agent choose without fetching — skipping it is the single most common failure.
  3. Linking to HTML, not Markdown. If your links resolve to ad-and-nav-laden HTML, you have reintroduced the exact problem the file exists to solve. Pair llms.txt with .md twins and point at those.
  4. Treating it as an SEO hack. Google says it does not read the file (more below), and keyword-stuffing just makes it spam an agent has to second-guess. Write it for a reader, not a ranker.
  5. Confusing it with access control. llms.txt keeps no one out. If you want to govern AI training or crawling, that is a robots.txt and firewall concern, not an llms.txt one.
  6. Letting it rot. A map that points at moved or deleted pages is worse than none — the fix is to generate it, not hand-maintain it (see below).
  7. Malformed structure. No H1, no summary blockquote, headings that are not ## file-lists — the format is small, so follow it exactly.

Why it's still worth shipping — for developers and for agents

Here is the part the hype skips. Start with the uncomfortable data, then the genuine value.

The honest reality. An Ahrefs study of 137,210 domains in May 2026 found that 97% of published llms.txt files received zero requests during the study window — and that 96% of the requests that did land came from generic bots, not AI assistants. As the study put it, "Slackbot alone fetched llms.txt files more often than PerplexityBot did." Google has been explicit too: at Search Central Live in July 2025, Gary Illyes said Google does not support llms.txt and will not crawl it, and John Mueller likened it to the discredited keywords meta tag — ignored precisely because the site owner controls it and can game it. If your goal is ranking in AI Overviews or earning ChatGPT citations, llms.txt is, in Ahrefs' word, "largely decoration."

So why ship one at all? Because passive search crawlers are not the only readers — and for the readers that matter, the file earns its keep.

For developers, it is cheap, honest infrastructure.

  • The front door for agents you point at your site. When you tell Cursor, Claude Code, or an in-house copilot to "read the docs at example.com," llms.txt turns "the docs" into a precise, ranked list instead of a blind crawl — a deliberate, high-intent fetch, and exactly the case that works today.
  • The discipline is the deliverable. Curating your ten most important pages and describing each in a line is worth doing even if no bot ever reads it. It forces you to know what your site is for.
  • It is nearly free. One Markdown file, generated in CI — a few kilobytes for being ready the day the agent ecosystem standardizes on it.

For AI agents, it is the difference between a map and a maze.

  • Ranked, not exhaustive. An agent spends its limited context on the pages you marked important — reading Optional last — instead of crawling and guessing.
  • Clean content, one hop away. Links that resolve to .md give an agent prose it can quote and cite without stripping a DOM.
  • Discovery it can trust. Paired with an agent card and an OpenAPI spec, llms.txt is the human-readable entry point to a machine-readable surface.

The verdict: do not ship llms.txt for SEO. Ship it because the agents you and your users deliberately invoke will read it — and because curating it makes your site sharper regardless.

Which sites benefit most

llms.txt pays off most where agents are already being pointed at your content:

  1. Developer documentation. The original use case and still the strongest: docs are read by IDE assistants and coding agents constantly, so a curated map plus .md twins is a direct upgrade. (Stripe, Anthropic, Cursor, and Mintlify were early adopters.)
  2. API products and SDKs. When someone is wiring an agent into your API, an llms.txt that points straight at the reference and an OpenAPI spec shortens the path from "curious" to "calling."
  3. Agentic and AI-native apps. If your product is something agents connect to over MCP, llms.txt is the natural discovery beacon for the rest of your surface.
  4. Knowledge bases and docs-heavy SaaS. Support copilots and "ask our docs" assistants benefit from a curated entry point rather than a brittle crawl.
  5. Open-source projects. A docs site with a clean llms.txt lets a contributor's coding agent load the right context instantly.

The common thread: value tracks deliberate agent traffic, not passive search crawlers. If nobody points an agent at your site, your llms.txt sits unread — which is exactly what the adoption studies measure.

How to keep it in sync

The fastest way to make llms.txt useless is to write it by hand and let the site move out from under it. Two habits keep it honest:

  • Generate it, do not author it. Treat llms.txt as a build artifact derived from the same source as your nav, docs index, or sitemap — so a new page shows up in all of them at once. Hand-edited files drift within a sprint.
  • Serve real .md twins, automatically. The links are only as good as what they resolve to. If each page can emit a clean Markdown version of itself, the map and the territory never disagree.

This is exactly where a Markdown-native system has an unfair advantage: if your content is already Markdown, the .md twin is the source, not a lossy export, and the index is a query, not a copy-paste.

How MDflow fits

We did not bolt llms.txt on as a growth tactic. MDflow was built on the same bet the whole machine-readable-web stack rests on — that content should be portable Markdown that people and agents can both read — so the discovery layer fell out naturally.

What lines up today

  • A real, curated llms.txt. It follows the spec — an H1, a blockquote summary, and ## file-lists (Docs, Comparisons, Blog, API summary), each link described. It is the worked example in this very post.
  • .md twins of every shared document. Append .md to any shared MDflow link and you get clean Markdown with YAML frontmatter (title, canonical_url, md_url, visibility) and open CORS — the precise convention llms.txt asks you to pair with. (This post has one; see the link at the top.)
  • A full discovery surface, not just one file. Alongside llms.txt, MDflow ships a self-contained docs.md agent manual, an A2A agent card, and an OpenAPI 3.1 spec — completing the trilogy with our OKF (knowledge) and MCP/A2A (transport) posts.
  • Agents that read and write. The pages llms.txt advertises are backed by a live MCP server and HTTP API — the high-intent, deliberate traffic where llms.txt genuinely pays off.

Where we are headed

This is direction, not a dated commitment:

  • Per-workspace llms.txt and llms-full.txt. As workspaces mature, letting a published workspace or collection emit its own curated llms.txt — and an inlined llms-full.txt — so an agent can pull a whole knowledge set through one front door.
  • Auto-curated indexes. Generating the link list and its descriptions from folder descriptions, so the map stays in sync with the content by construction rather than by discipline.
  • Richer raw output. Continuing to make the .md twins the canonical, agent-ready representation of everything you publish.

If the agent web standardizes on llms.txt, the sites that win are the ones where Markdown was the substrate all along — not a file someone remembers to update by hand.

The bottom line

llms.txt is a good idea wrapped in bad marketing. It is not the "robots.txt for AI" — it controls nothing — and it is not an SEO lever, because the search engines that dominate today's traffic do not read it. It is a clean, curated, nearly-free map for the agents you and your users deliberately point at your site, and the act of writing a good one makes your content sharper for everyone. So ship one — just ship a good one: short, described, pointed at real Markdown, and generated so it never goes stale.

MDflow gives you that for free: write Markdown in the browser, and every document is already agent-ready — raw .md, a discovery index, and a live MCP server, all in sync.

Start free · Connect an AI agent · Read the API docs

Frequently asked questions

What is llms.txt?

llms.txt is a Markdown file at the root of your domain (yoursite.com/llms.txt) that gives AI agents a curated, plain-text map of your most important pages. Jeremy Howard of Answer.AI proposed it in September 2024. The format is simple: an H1 with your project name, a blockquote summary, then sections that list links as a name, a URL, and a short description. It is an invitation an agent can read at inference time, not an access-control file.

Does Google use llms.txt?

No. At Search Central Live in July 2025, Google's Gary Illyes said Google does not support llms.txt and has no plans to crawl it, and John Mueller compared it to the long-ignored keywords meta tag — because the site owner controls it and can game it. llms.txt is not a Google ranking signal, and there is no evidence it boosts AI Overviews or ChatGPT citations.

How is llms.txt different from robots.txt and sitemap.xml?

They do different jobs. robots.txt is an access-control gate that tells crawlers what they may fetch. sitemap.xml is an exhaustive XML inventory of every URL, built for search-engine indexing. llms.txt is a curated Markdown "best of" that points an AI agent at the few pages that actually explain your product. robots.txt restricts, sitemap.xml lists everything, and llms.txt curates — so you want all three.

Is llms.txt worth shipping if AI search ignores it?

For most sites, yes — but for the right reason. An Ahrefs study of 137,000 domains in 2026 found 97% of published llms.txt files were never requested, so do not expect an SEO or citation boost. The real payoff is the agents you deliberately point at your site — IDE assistants like Cursor and Claude Code, MCP clients, and in-product copilots — which do fetch it, plus the discipline of curating your key pages and serving clean .md versions. It is cheap insurance, not a growth hack.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a short index: links to your key pages with one-line descriptions, so an agent can choose what to load. llms-full.txt is an optional companion that inlines the full Markdown of those pages in a single file, so an agent can ingest everything at once without following links. Use llms.txt as the map and llms-full.txt as the whole book — the latter only when your content is small enough to fit a context window.

Further reading