---
title: "llms.txt Explained: The 'robots.txt for AI' and How to Ship a Good One"
description: "llms.txt is a curated Markdown map of your site for AI agents. Here's what a good one looks like, the common mistakes, and how it differs from robots.txt."
author: "MDflow"
date: 2026-06-23
reading_time: "17 min"
canonical_url: https://mdflow.cz/blog/llms-txt-explained
md_url: https://mdflow.cz/blog/llms-txt-explained.md
---

# llms.txt Explained: The 'robots.txt for AI' and How to Ship a Good One

*Published June 23, 2026 · 17 min read*


Type "robots.txt for AI" into any search box and you land on **`llms.txt`** — the proposed standard everyone is suddenly shipping and almost nobody is shipping well. The pitch is seductive: drop one Markdown file at the root of your site and AI assistants will finally understand you. The reality is messier. Most `llms.txt` files are a copy-pasted sitemap with no descriptions, Google has said plainly that it will not read them, and a 2026 study found that 97% of the files in the wild were never fetched by anything at all.

So is it snake oil, or is it worth your time? The honest answer is that it depends entirely on *who* you expect to read it — and on whether you ship a *good* one. This is the third post in our machine-readable-web series, after [Google's Open Knowledge Format](/blog/google-open-knowledge-format-okf) and [MCP and A2A](/blog/mcp-and-a2a-agentic-interfaces). Here is what `llms.txt` actually is, what a good one looks like (using ours as the worked example), the mistakes that make most of them useless, and how it differs from the files it keeps getting confused with.

> **TL;DR** — `llms.txt` is a curated Markdown file at your site root that hands an AI agent a short, ranked map of your most useful pages. It is a *map*, not a *gate* — closer to a sitemap than to robots.txt, and it has no enforcement. Google does not use it and most files in the wild are never read, so it is not an SEO trick. Its real value is the agents you deliberately point at your site (IDE assistants, MCP clients, in-product copilots) and the discipline of pairing it with clean `.md` versions of your pages. [MDflow](/) ships `llms.txt` plus `.md` twins of every shared document natively.

## What is llms.txt?

`llms.txt` is a Markdown file, served at the root of a domain (`yoursite.com/llms.txt`), that gives large language models a curated, human-readable guide to the content that matters on your site. It was proposed by **Jeremy Howard**, co-founder of Answer.AI, on **September 3, 2024**, and the spec lives at [llmstxt.org](https://llmstxt.org/).

The motivation is a context problem. An LLM trying to understand your site at inference time has to wade through HTML full of navigation, ads, and JavaScript — and even then, most sites are far too large to fit in a context window. As Howard put it, "constructing the right context for LLMs based on a website is ambiguous." The person who *owns* the site already knows which handful of pages answer the question; `llms.txt` is a place to write it down.

The format is deliberately tiny. There is exactly one required element — an H1 — and a small, predictable shape after it:

```markdown
# Project Name

> A short summary of what this project is and who it is for.

A few optional sentences of context, in plain Markdown.

## Docs
- [Quickstart](https://example.com/quickstart.md): Get running in five minutes.
- [API reference](https://example.com/api.md): Every endpoint and field.

## Optional
- [Changelog](https://example.com/changelog.md): Version history.
```

The rule, verbatim from the spec: each section under an `##` heading is a "file list" — "a markdown list, containing a required markdown hyperlink `[name](url)`, then optionally a `:` and notes about the file." One section name is reserved: `## Optional`, whose links "can be skipped if a shorter context is needed." That single convention turns the file into a priority list — an agent on a tight budget reads everything *except* Optional.

Two companions round out the proposal:

- **`.md` twins.** The spec asks you to serve a clean Markdown version of each page by appending `.md` to its URL (so `/pricing` also answers at `/pricing.md`). The links in `llms.txt` should point at those, so an agent gets prose, not a DOM.
- **`/llms-full.txt`.** An optional single file that inlines the *full* Markdown of your key pages, so a model can swallow everything in one request instead of following links.

The crucial mental model: `llms.txt` is an **invitation, not an instruction**. It has no teeth. It does not block anyone, grant anyone access, or compel any crawler to behave. It simply makes it easy for a *cooperating* agent to find your best material. Hold on to that — it is the source of most of the confusion below.

## What a good llms.txt looks like

A good `llms.txt` is short, curated, and described. It answers exactly one question for an agent — "where do I look to understand this product?" — and nothing else. The fastest way to show it is a real one, so here is how MDflow's own [`/llms.txt`](/llms.txt) is built.

It opens the way the spec wants — an H1 and a one-blockquote summary that states what the product is and who (humans *and* agents) it serves:

```markdown
# MDflow

> A clean, no-noise markdown editor in the browser... MDflow exposes an
> HTTP API and a Model Context Protocol (MCP) server... so AI agents can
> read, create, update, organize, and share markdown documents.
```

Then it groups links into labelled `##` sections, each link carrying a description that tells the agent *why* it would click — not just where it goes:

```markdown
## Docs
- [MCP server](https://mdflow.cz/docs/mcp): Connect to the hosted remote MCP
  server or install the local stdio server...
- [Public API](https://mdflow.cz/docs/api): HTTP API for folders, markdown
  documents, public sharing, and private email sharing...
- [OpenAPI specification](https://mdflow.cz/openapi.json): Machine-readable
  OpenAPI 3.1 description of the public API.
```

A few properties make it a *good* file rather than just a long one:

- **It is a "best of," not an inventory.** It lists the docs, the API, the comparisons, the blog, and the app — the pages that *explain* MDflow — not the several-hundred marketing and share URLs. The [sitemap](/sitemap.xml) is the inventory; `llms.txt` is the brochure.
- **Every link has a description.** The `: notes` after each link is the part that does the work. "Machine-readable OpenAPI 3.1 description of the public API" tells an agent what it will get *before* it spends a fetch on it.
- **It points at machine-readable endpoints.** The links resolve to things an agent can actually use — an OpenAPI JSON, a self-contained `docs.md` manual, raw `.md` twins — not just more HTML.
- **It mirrors the site's real structure.** The sections (`Docs`, `Comparisons`, `Blog`, `API summary`) map to how the product is actually organized, so the map matches the territory.

That is the whole craft: pick the pages that matter, describe each in a sentence, point at clean Markdown, and stop.

## llms.txt vs robots.txt, sitemap.xml, and AGENTS.md

The "robots.txt for AI" nickname is catchy and slightly wrong. `robots.txt` is a *gate*; `llms.txt` is a *map*. They sit at different layers — and so do the other files they get bundled with:

| File | Job | Format | Audience | Enforces? |
| --- | --- | --- | --- | --- |
| **robots.txt** | Access control — what may be crawled | Directives | Crawlers | By convention |
| **sitemap.xml** | Exhaustive list of every URL | XML | Search engines | No |
| **llms.txt** | Curated map of your best pages | Markdown | AI agents | No |
| **AGENTS.md** | Build/test/convention instructions | Markdown | Coding agents (in-repo) | No |

- **robots.txt** (a 1990s convention, later formalized as RFC 9309) tells crawlers what they are *allowed* to fetch. It restricts. `llms.txt` grants nothing and blocks nothing — if you want to keep AI bots out, that is a `robots.txt` or firewall job, not an `llms.txt` one.
- **sitemap.xml** lists *every* indexable URL so a search engine can discover them all. It is comprehensive by design. `llms.txt` is the opposite: deliberately short, opinionated, and curated. One is an index; the other is a recommendation.
- **AGENTS.md** is a different animal entirely — an "open-format README for agents" that lives *inside a code repo* and tells a coding agent (OpenAI Codex, Cursor, Jules, Aider) how to build, test, and follow your conventions. It is instructions for an agent working *on* your code, not a discovery file for an agent reading *your website*.

In the framing from our [OKF post](/blog/google-open-knowledge-format-okf): `sitemap.xml` and `llms.txt` are **discovery**, `robots.txt` is **access**, [MCP](/docs/mcp) is **transport**, `AGENTS.md` is **instructions**, and OKF is **knowledge**. They stack. You want all of them; none replaces another.

## Common mistakes

Most `llms.txt` files in the wild make at least one of these errors:

1. **Pasting the sitemap.** A 400-line dump of every URL is just a sitemap with the wrong extension. The whole point is curation — if an agent still has to guess which links matter, you have not helped it.
2. **No descriptions.** A bare list of links throws away the most useful half of the format. The `: one-line description` is what lets an agent choose *without* fetching — skipping it is the single most common failure.
3. **Linking to HTML, not Markdown.** If your links resolve to ad-and-nav-laden HTML, you have reintroduced the exact problem the file exists to solve. Pair `llms.txt` with `.md` twins and point at those.
4. **Treating it as an SEO hack.** Google says it does not read the file (more below), and keyword-stuffing just makes it spam an agent has to second-guess. Write it for a reader, not a ranker.
5. **Confusing it with access control.** `llms.txt` keeps no one out. If you want to govern AI training or crawling, that is a `robots.txt` and firewall concern, not an `llms.txt` one.
6. **Letting it rot.** A map that points at moved or deleted pages is worse than none — the fix is to generate it, not hand-maintain it (see below).
7. **Malformed structure.** No H1, no summary blockquote, headings that are not `##` file-lists — the format is small, so follow it exactly.

## Why it's still worth shipping — for developers and for agents

Here is the part the hype skips. Start with the uncomfortable data, then the genuine value.

**The honest reality.** An [Ahrefs study](https://ahrefs.com/blog/llmstxt-study/) of 137,210 domains in May 2026 found that **97% of published `llms.txt` files received zero requests** during the study window — and that 96% of the requests that *did* land came from generic bots, not AI assistants. As the study put it, "Slackbot alone fetched `llms.txt` files more often than PerplexityBot did." Google has been explicit too: at Search Central Live in July 2025, Gary Illyes said Google does not support `llms.txt` and will not crawl it, and John Mueller likened it to the discredited keywords meta tag — ignored precisely because the site owner controls it and can game it. If your goal is ranking in AI Overviews or earning ChatGPT citations, `llms.txt` is, in Ahrefs' word, "largely decoration."

So why ship one at all? Because *passive search crawlers are not the only readers* — and for the readers that matter, the file earns its keep.

**For developers, it is cheap, honest infrastructure.**

- **The front door for agents you point at your site.** When you tell Cursor, Claude Code, or an in-house copilot to "read the docs at example.com," `llms.txt` turns "the docs" into a precise, ranked list instead of a blind crawl — a deliberate, high-intent fetch, and exactly the case that works *today*.
- **The discipline is the deliverable.** Curating your ten most important pages and describing each in a line is worth doing even if no bot ever reads it. It forces you to know what your site is *for*.
- **It is nearly free.** One Markdown file, generated in CI — a few kilobytes for being ready the day the agent ecosystem standardizes on it.

**For AI agents, it is the difference between a map and a maze.**

- **Ranked, not exhaustive.** An agent spends its limited context on the pages you marked important — reading `Optional` last — instead of crawling and guessing.
- **Clean content, one hop away.** Links that resolve to `.md` give an agent prose it can quote and cite without stripping a DOM.
- **Discovery it can trust.** Paired with an [agent card](/.well-known/agent-card.json) and an [OpenAPI spec](/openapi.json), `llms.txt` is the human-readable entry point to a machine-readable surface.

The verdict: do not ship `llms.txt` for SEO. Ship it because the agents you and your users deliberately invoke will read it — and because curating it makes your site sharper regardless.

## Which sites benefit most

`llms.txt` pays off most where agents are *already* being pointed at your content:

1. **Developer documentation.** The original use case and still the strongest: docs are read by IDE assistants and coding agents constantly, so a curated map plus `.md` twins is a direct upgrade. (Stripe, Anthropic, Cursor, and Mintlify were early adopters.)
2. **API products and SDKs.** When someone is wiring an agent into your API, an `llms.txt` that points straight at the reference and an OpenAPI spec shortens the path from "curious" to "calling."
3. **Agentic and AI-native apps.** If your product *is* something agents connect to over MCP, `llms.txt` is the natural discovery beacon for the rest of your surface.
4. **Knowledge bases and docs-heavy SaaS.** Support copilots and "ask our docs" assistants benefit from a curated entry point rather than a brittle crawl.
5. **Open-source projects.** A docs site with a clean `llms.txt` lets a contributor's coding agent load the right context instantly.

The common thread: value tracks *deliberate* agent traffic, not passive search crawlers. If nobody points an agent at your site, your `llms.txt` sits unread — which is exactly what the adoption studies measure.

## How to keep it in sync

The fastest way to make `llms.txt` useless is to write it by hand and let the site move out from under it. Two habits keep it honest:

- **Generate it, do not author it.** Treat `llms.txt` as a build artifact derived from the same source as your nav, docs index, or sitemap — so a new page shows up in all of them at once. Hand-edited files drift within a sprint.
- **Serve real `.md` twins, automatically.** The links are only as good as what they resolve to. If each page can emit a clean Markdown version of itself, the map and the territory never disagree.

This is exactly where a Markdown-native system has an unfair advantage: if your content is *already* Markdown, the `.md` twin is the source, not a lossy export, and the index is a query, not a copy-paste.

## How MDflow fits

We did not bolt `llms.txt` on as a growth tactic. [MDflow](/) was built on the same bet the whole machine-readable-web stack rests on — that content should be **portable Markdown that people and agents can both read** — so the discovery layer fell out naturally.

### What lines up today

- **A real, curated [`llms.txt`](/llms.txt).** It follows the spec — an H1, a blockquote summary, and `##` file-lists (Docs, Comparisons, Blog, API summary), each link described. It is the worked example in this very post.
- **`.md` twins of every shared document.** Append `.md` to any shared MDflow link and you get clean Markdown with YAML frontmatter (`title`, `canonical_url`, `md_url`, `visibility`) and open CORS — the precise convention `llms.txt` asks you to pair with. (This post has one; see the link at the top.)
- **A full discovery surface, not just one file.** Alongside `llms.txt`, MDflow ships a self-contained [`docs.md`](https://mdflow.cz/docs.md) agent manual, an [A2A agent card](/.well-known/agent-card.json), and an [OpenAPI 3.1 spec](/openapi.json) — completing the trilogy with our [OKF](/blog/google-open-knowledge-format-okf) (knowledge) and [MCP/A2A](/blog/mcp-and-a2a-agentic-interfaces) (transport) posts.
- **Agents that read and write.** The pages `llms.txt` advertises are backed by a live [MCP server](/docs/mcp) and [HTTP API](/docs/api) — the high-intent, deliberate traffic where `llms.txt` genuinely pays off.

### Where we are headed

This is **direction, not a dated commitment**:

- **Per-workspace `llms.txt` and `llms-full.txt`.** As workspaces mature, letting a published workspace or [collection](/faq) emit its own curated `llms.txt` — and an inlined `llms-full.txt` — so an agent can pull a whole knowledge set through one front door.
- **Auto-curated indexes.** Generating the link list and its descriptions from folder descriptions, so the map stays in sync with the content by construction rather than by discipline.
- **Richer raw output.** Continuing to make the `.md` twins the canonical, agent-ready representation of everything you publish.

If the agent web standardizes on `llms.txt`, the sites that win are the ones where Markdown was the substrate all along — not a file someone remembers to update by hand.

## The bottom line

`llms.txt` is a good idea wrapped in bad marketing. It is not the "robots.txt for AI" — it controls nothing — and it is not an SEO lever, because the search engines that dominate today's traffic do not read it. It *is* a clean, curated, nearly-free map for the agents you and your users deliberately point at your site, and the act of writing a good one makes your content sharper for everyone. So ship one — just ship a *good* one: short, described, pointed at real Markdown, and generated so it never goes stale.

MDflow gives you that for free: write Markdown in the browser, and every document is already agent-ready — raw `.md`, a discovery index, and a live MCP server, all in sync.

[Start free](/login) · [Connect an AI agent](/docs/mcp) · [Read the API docs](/docs/api)

## Frequently asked questions

### What is llms.txt?

`llms.txt` is a Markdown file at the root of your domain (`yoursite.com/llms.txt`) that gives AI agents a curated, plain-text map of your most important pages. Jeremy Howard of Answer.AI proposed it in September 2024. The format is simple: an H1 with your project name, a blockquote summary, then sections that list links as a name, a URL, and a short description. It is an invitation an agent can read at inference time, not an access-control file.

### Does Google use llms.txt?

No. At Search Central Live in July 2025, Google's Gary Illyes said Google does not support `llms.txt` and has no plans to crawl it, and John Mueller compared it to the long-ignored keywords meta tag — because the site owner controls it and can game it. `llms.txt` is not a Google ranking signal, and there is no evidence it boosts AI Overviews or ChatGPT citations.

### How is llms.txt different from robots.txt and sitemap.xml?

They do different jobs. `robots.txt` is an access-control gate that tells crawlers what they may fetch. `sitemap.xml` is an exhaustive XML inventory of every URL, built for search-engine indexing. `llms.txt` is a curated Markdown "best of" that points an AI agent at the few pages that actually explain your product. robots.txt restricts, sitemap.xml lists everything, and llms.txt curates — so you want all three.

### Is llms.txt worth shipping if AI search ignores it?

For most sites, yes — but for the right reason. An Ahrefs study of 137,000 domains in 2026 found 97% of published `llms.txt` files were never requested, so do not expect an SEO or citation boost. The real payoff is the agents you deliberately point at your site — IDE assistants like Cursor and Claude Code, MCP clients, and in-product copilots — which do fetch it, plus the discipline of curating your key pages and serving clean `.md` versions. It is cheap insurance, not a growth hack.

### What is the difference between llms.txt and llms-full.txt?

`llms.txt` is a short index: links to your key pages with one-line descriptions, so an agent can choose what to load. `llms-full.txt` is an optional companion that inlines the full Markdown of those pages in a single file, so an agent can ingest everything at once without following links. Use `llms.txt` as the map and `llms-full.txt` as the whole book — the latter only when your content is small enough to fit a context window.

## Further reading

- llms.txt — [The /llms.txt specification](https://llmstxt.org/)
- Answer.AI — [The original /llms.txt proposal](https://www.answer.ai/posts/2024-09-03-llmstxt.html)
- Ahrefs — [We analyzed 137K sites: 97% of llms.txt files never get read](https://ahrefs.com/blog/llmstxt-study/)
- Search Engine Journal — [Google says llms.txt comparable to keywords meta tag](https://www.searchenginejournal.com/google-says-llms-txt-comparable-to-keywords-meta-tag/544804/)
- AGENTS.md — [The open format for agent instructions](https://agents.md/)
- MDflow — [Markdown for AI agents](/markdown-ai) · [MCP documentation](/docs/mcp) · [API documentation](/docs/api) · [FAQ](/faq)

