llms.txt Explained: The 'robots.txt for AI' and How to Ship a Good One

Type "robots.txt for AI" into any search box and you land on llms.txt — the proposed standard everyone is suddenly shipping and almost nobody is shipping well. The pitch is seductive: drop one Markdown file at the root of your site and AI assistants will finally understand you. The reality is messier. Most llms.txt files are a copy-pasted sitemap with no descriptions, Google has said plainly that it will not read them, and a 2026 study found that 97% of the files in the wild were never fetched by anything at all.
So is it snake oil, or is it worth your time? The honest answer is that it depends entirely on who you expect to read it — and on whether you ship a good one. This is the third post in our machine-readable-web series, after Google's Open Knowledge Format and MCP and A2A. Here is what llms.txt actually is, what a good one looks like (using ours as the worked example), the mistakes that make most of them useless, and how it differs from the files it keeps getting confused with.
TL;DR —
llms.txtis a curated Markdown file at your site root that hands an AI agent a short, ranked map of your most useful pages. It is a map, not a gate — closer to a sitemap than to robots.txt, and it has no enforcement. Google does not use it and most files in the wild are never read, so it is not an SEO trick. Its real value is the agents you deliberately point at your site (IDE assistants, MCP clients, in-product copilots) and the discipline of pairing it with clean.mdversions of your pages. MDflow shipsllms.txtplus.mdtwins of every shared document natively.
What is llms.txt?
llms.txt is a Markdown file, served at the root of a domain (yoursite.com/llms.txt), that gives large language models a curated, human-readable guide to the content that matters on your site. It was proposed by Jeremy Howard, co-founder of Answer.AI, on September 3, 2024, and the spec lives at llmstxt.org.
The motivation is a context problem. An LLM trying to understand your site at inference time has to wade through HTML full of navigation, ads, and JavaScript — and even then, most sites are far too large to fit in a context window. As Howard put it, "constructing the right context for LLMs based on a website is ambiguous." The person who owns the site already knows which handful of pages answer the question; llms.txt is a place to write it down.
The format is deliberately tiny. There is exactly one required element — an H1 — and a small, predictable shape after it:
# Project Name
> A short summary of what this project is and who it is for.
A few optional sentences of context, in plain Markdown.
## Docs
- [Quickstart](https://example.com/quickstart.md): Get running in five minutes.
- [API reference](https://example.com/api.md): Every endpoint and field.
## Optional
- [Changelog](https://example.com/changelog.md): Version history.
The rule, verbatim from the spec: each section under an ## heading is a "file list" — "a markdown list, containing a required markdown hyperlink [name](url), then optionally a : and notes about the file." One section name is reserved: ## Optional, whose links "can be skipped if a shorter context is needed." That single convention turns the file into a priority list — an agent on a tight budget reads everything except Optional.
Two companions round out the proposal:
.mdtwins. The spec asks you to serve a clean Markdown version of each page by appending.mdto its URL (so/pricingalso answers at/pricing.md). The links inllms.txtshould point at those, so an agent gets prose, not a DOM./llms-full.txt. An optional single file that inlines the full Markdown of your key pages, so a model can swallow everything in one request instead of following links.
The crucial mental model: llms.txt is an invitation, not an instruction. It has no teeth. It does not block anyone, grant anyone access, or compel any crawler to behave. It simply makes it easy for a cooperating agent to find your best material. Hold on to that — it is the source of most of the confusion below.
What a good llms.txt looks like
A good llms.txt is short, curated, and described. It answers exactly one question for an agent — "where do I look to understand this product?" — and nothing else. The fastest way to show it is a real one, so here is how MDflow's own /llms.txt is built.
It opens the way the spec wants — an H1 and a one-blockquote summary that states what the product is and who (humans and agents) it serves:
# MDflow
> A clean, no-noise markdown editor in the browser... MDflow exposes an
> HTTP API and a Model Context Protocol (MCP) server... so AI agents can
> read, create, update, organize, and share markdown documents.
Then it groups links into labelled ## sections, each link carrying a description that tells the agent why it would click — not just where it goes:
## Docs
- [MCP server](https://mdflow.cz/docs/mcp): Connect to the hosted remote MCP
server or install the local stdio server...
- [Public API](https://mdflow.cz/docs/api): HTTP API for folders, markdown
documents, public sharing, and private email sharing...
- [OpenAPI specification](https://mdflow.cz/openapi.json): Machine-readable
OpenAPI 3.1 description of the public API.
A few properties make it a good file rather than just a long one:
- It is a "best of," not an inventory. It lists the docs, the API, the comparisons, the blog, and the app — the pages that explain MDflow — not the several-hundred marketing and share URLs. The sitemap is the inventory;
llms.txtis the brochure. - Every link has a description. The
: notesafter each link is the part that does the work. "Machine-readable OpenAPI 3.1 description of the public API" tells an agent what it will get before it spends a fetch on it. - It points at machine-readable endpoints. The links resolve to things an agent can actually use — an OpenAPI JSON, a self-contained
docs.mdmanual, raw.mdtwins — not just more HTML. - It mirrors the site's real structure. The sections (
Docs,Comparisons,Blog,API summary) map to how the product is actually organized, so the map matches the territory.
That is the whole craft: pick the pages that matter, describe each in a sentence, point at clean Markdown, and stop.
llms.txt vs robots.txt, sitemap.xml, and AGENTS.md
The "robots.txt for AI" nickname is catchy and slightly wrong. robots.txt is a gate; llms.txt is a map. They sit at different layers — and so do the other files they get bundled with:
| File | Job | Format | Audience | Enforces? |
|---|---|---|---|---|
| robots.txt | Access control — what may be crawled | Directives | Crawlers | By convention |
| sitemap.xml | Exhaustive list of every URL | XML | Search engines | No |
| llms.txt | Curated map of your best pages | Markdown | AI agents | No |
| AGENTS.md | Build/test/convention instructions | Markdown | Coding agents (in-repo) | No |
- robots.txt (a 1990s convention, later formalized as RFC 9309) tells crawlers what they are allowed to fetch. It restricts.
llms.txtgrants nothing and blocks nothing — if you want to keep AI bots out, that is arobots.txtor firewall job, not anllms.txtone. - sitemap.xml lists every indexable URL so a search engine can discover them all. It is comprehensive by design.
llms.txtis the opposite: deliberately short, opinionated, and curated. One is an index; the other is a recommendation. - AGENTS.md is a different animal entirely — an "open-format README for agents" that lives inside a code repo and tells a coding agent (OpenAI Codex, Cursor, Jules, Aider) how to build, test, and follow your conventions. It is instructions for an agent working on your code, not a discovery file for an agent reading your website.
In the framing from our OKF post: sitemap.xml and llms.txt are discovery, robots.txt is access, MCP is transport, AGENTS.md is instructions, and OKF is knowledge. They stack. You want all of them; none replaces another.
Common mistakes
Most llms.txt files in the wild make at least one of these errors:
- Pasting the sitemap. A 400-line dump of every URL is just a sitemap with the wrong extension. The whole point is curation — if an agent still has to guess which links matter, you have not helped it.
- No descriptions. A bare list of links throws away the most useful half of the format. The
: one-line descriptionis what lets an agent choose without fetching — skipping it is the single most common failure. - Linking to HTML, not Markdown. If your links resolve to ad-and-nav-laden HTML, you have reintroduced the exact problem the file exists to solve. Pair
llms.txtwith.mdtwins and point at those. - Treating it as an SEO hack. Google says it does not read the file (more below), and keyword-stuffing just makes it spam an agent has to second-guess. Write it for a reader, not a ranker.
- Confusing it with access control.
llms.txtkeeps no one out. If you want to govern AI training or crawling, that is arobots.txtand firewall concern, not anllms.txtone. - Letting it rot. A map that points at moved or deleted pages is worse than none — the fix is to generate it, not hand-maintain it (see below).
- Malformed structure. No H1, no summary blockquote, headings that are not
##file-lists — the format is small, so follow it exactly.
Why it's still worth shipping — for developers and for agents
Here is the part the hype skips. Start with the uncomfortable data, then the genuine value.
The honest reality. An Ahrefs study of 137,210 domains in May 2026 found that 97% of published llms.txt files received zero requests during the study window — and that 96% of the requests that did land came from generic bots, not AI assistants. As the study put it, "Slackbot alone fetched llms.txt files more often than PerplexityBot did." Google has been explicit too: at Search Central Live in July 2025, Gary Illyes said Google does not support llms.txt and will not crawl it, and John Mueller likened it to the discredited keywords meta tag — ignored precisely because the site owner controls it and can game it. If your goal is ranking in AI Overviews or earning ChatGPT citations, llms.txt is, in Ahrefs' word, "largely decoration."
So why ship one at all? Because passive search crawlers are not the only readers — and for the readers that matter, the file earns its keep.
For developers, it is cheap, honest infrastructure.
- The front door for agents you point at your site. When you tell Cursor, Claude Code, or an in-house copilot to "read the docs at example.com,"
llms.txtturns "the docs" into a precise, ranked list instead of a blind crawl — a deliberate, high-intent fetch, and exactly the case that works today. - The discipline is the deliverable. Curating your ten most important pages and describing each in a line is worth doing even if no bot ever reads it. It forces you to know what your site is for.
- It is nearly free. One Markdown file, generated in CI — a few kilobytes for being ready the day the agent ecosystem standardizes on it.
For AI agents, it is the difference between a map and a maze.
- Ranked, not exhaustive. An agent spends its limited context on the pages you marked important — reading
Optionallast — instead of crawling and guessing. - Clean content, one hop away. Links that resolve to
.mdgive an agent prose it can quote and cite without stripping a DOM. - Discovery it can trust. Paired with an agent card and an OpenAPI spec,
llms.txtis the human-readable entry point to a machine-readable surface.
The verdict: do not ship llms.txt for SEO. Ship it because the agents you and your users deliberately invoke will read it — and because curating it makes your site sharper regardless.
Which sites benefit most
llms.txt pays off most where agents are already being pointed at your content:
- Developer documentation. The original use case and still the strongest: docs are read by IDE assistants and coding agents constantly, so a curated map plus
.mdtwins is a direct upgrade. (Stripe, Anthropic, Cursor, and Mintlify were early adopters.) - API products and SDKs. When someone is wiring an agent into your API, an
llms.txtthat points straight at the reference and an OpenAPI spec shortens the path from "curious" to "calling." - Agentic and AI-native apps. If your product is something agents connect to over MCP,
llms.txtis the natural discovery beacon for the rest of your surface. - Knowledge bases and docs-heavy SaaS. Support copilots and "ask our docs" assistants benefit from a curated entry point rather than a brittle crawl.
- Open-source projects. A docs site with a clean
llms.txtlets a contributor's coding agent load the right context instantly.
The common thread: value tracks deliberate agent traffic, not passive search crawlers. If nobody points an agent at your site, your llms.txt sits unread — which is exactly what the adoption studies measure.
How to keep it in sync
The fastest way to make llms.txt useless is to write it by hand and let the site move out from under it. Two habits keep it honest:
- Generate it, do not author it. Treat
llms.txtas a build artifact derived from the same source as your nav, docs index, or sitemap — so a new page shows up in all of them at once. Hand-edited files drift within a sprint. - Serve real
.mdtwins, automatically. The links are only as good as what they resolve to. If each page can emit a clean Markdown version of itself, the map and the territory never disagree.
This is exactly where a Markdown-native system has an unfair advantage: if your content is already Markdown, the .md twin is the source, not a lossy export, and the index is a query, not a copy-paste.
How MDflow fits
We did not bolt llms.txt on as a growth tactic. MDflow was built on the same bet the whole machine-readable-web stack rests on — that content should be portable Markdown that people and agents can both read — so the discovery layer fell out naturally.
What lines up today
- A real, curated
llms.txt. It follows the spec — an H1, a blockquote summary, and##file-lists (Docs, Comparisons, Blog, API summary), each link described. It is the worked example in this very post. .mdtwins of every shared document. Append.mdto any shared MDflow link and you get clean Markdown with YAML frontmatter (title,canonical_url,md_url,visibility) and open CORS — the precise conventionllms.txtasks you to pair with. (This post has one; see the link at the top.)- A full discovery surface, not just one file. Alongside
llms.txt, MDflow ships a self-containeddocs.mdagent manual, an A2A agent card, and an OpenAPI 3.1 spec — completing the trilogy with our OKF (knowledge) and MCP/A2A (transport) posts. - Agents that read and write. The pages
llms.txtadvertises are backed by a live MCP server and HTTP API — the high-intent, deliberate traffic wherellms.txtgenuinely pays off.
Where we are headed
This is direction, not a dated commitment:
- Per-workspace
llms.txtandllms-full.txt. As workspaces mature, letting a published workspace or collection emit its own curatedllms.txt— and an inlinedllms-full.txt— so an agent can pull a whole knowledge set through one front door. - Auto-curated indexes. Generating the link list and its descriptions from folder descriptions, so the map stays in sync with the content by construction rather than by discipline.
- Richer raw output. Continuing to make the
.mdtwins the canonical, agent-ready representation of everything you publish.
If the agent web standardizes on llms.txt, the sites that win are the ones where Markdown was the substrate all along — not a file someone remembers to update by hand.
The bottom line
llms.txt is a good idea wrapped in bad marketing. It is not the "robots.txt for AI" — it controls nothing — and it is not an SEO lever, because the search engines that dominate today's traffic do not read it. It is a clean, curated, nearly-free map for the agents you and your users deliberately point at your site, and the act of writing a good one makes your content sharper for everyone. So ship one — just ship a good one: short, described, pointed at real Markdown, and generated so it never goes stale.
MDflow gives you that for free: write Markdown in the browser, and every document is already agent-ready — raw .md, a discovery index, and a live MCP server, all in sync.
Start free · Connect an AI agent · Read the API docs
Frequently asked questions
What is llms.txt?
llms.txt is a Markdown file at the root of your domain (yoursite.com/llms.txt) that gives AI agents a curated, plain-text map of your most important pages. Jeremy Howard of Answer.AI proposed it in September 2024. The format is simple: an H1 with your project name, a blockquote summary, then sections that list links as a name, a URL, and a short description. It is an invitation an agent can read at inference time, not an access-control file.
Does Google use llms.txt?
No. At Search Central Live in July 2025, Google's Gary Illyes said Google does not support llms.txt and has no plans to crawl it, and John Mueller compared it to the long-ignored keywords meta tag — because the site owner controls it and can game it. llms.txt is not a Google ranking signal, and there is no evidence it boosts AI Overviews or ChatGPT citations.
How is llms.txt different from robots.txt and sitemap.xml?
They do different jobs. robots.txt is an access-control gate that tells crawlers what they may fetch. sitemap.xml is an exhaustive XML inventory of every URL, built for search-engine indexing. llms.txt is a curated Markdown "best of" that points an AI agent at the few pages that actually explain your product. robots.txt restricts, sitemap.xml lists everything, and llms.txt curates — so you want all three.
Is llms.txt worth shipping if AI search ignores it?
For most sites, yes — but for the right reason. An Ahrefs study of 137,000 domains in 2026 found 97% of published llms.txt files were never requested, so do not expect an SEO or citation boost. The real payoff is the agents you deliberately point at your site — IDE assistants like Cursor and Claude Code, MCP clients, and in-product copilots — which do fetch it, plus the discipline of curating your key pages and serving clean .md versions. It is cheap insurance, not a growth hack.
What is the difference between llms.txt and llms-full.txt?
llms.txt is a short index: links to your key pages with one-line descriptions, so an agent can choose what to load. llms-full.txt is an optional companion that inlines the full Markdown of those pages in a single file, so an agent can ingest everything at once without following links. Use llms.txt as the map and llms-full.txt as the whole book — the latter only when your content is small enough to fit a context window.
Further reading
- llms.txt — The /llms.txt specification
- Answer.AI — The original /llms.txt proposal
- Ahrefs — We analyzed 137K sites: 97% of llms.txt files never get read
- Search Engine Journal — Google says llms.txt comparable to keywords meta tag
- AGENTS.md — The open format for agent instructions
- MDflow — Markdown for AI agents · MCP documentation · API documentation · FAQ