$cat google-open-knowledge-format-okf.md

Google's Open Knowledge Format (OKF): What It Means for Developers and AI Agents

12 min readby MDflowview as .md
A glowing emerald knowledge graph of Markdown documents and a folder streaming as light into a geometric AI agent core, on a dark terminal-grid background

On June 12, 2026, Google Cloud quietly published a specification that may turn out to be one of the more consequential standards of the agentic era — and it arrived with almost none of the usual fanfare. It is called the Open Knowledge Format (OKF), and it is not a product, a platform, or an API. It is something smaller and far more durable: an agreed way to write down what an organization knows as plain Markdown files that any AI agent can read.

If you have spent the last year wiring foundation models into real workflows, you already know the punchline. The models are not the bottleneck anymore. Context is. OKF is Google's bet on how that context should be written down.

TL;DR — OKF packages knowledge as a directory of Markdown files with YAML frontmatter. The only required field is type. Agents read these files directly — and can update them — with no SDK, no proprietary catalog, and no special runtime. If you already write Markdown, you are most of the way there. MDflow has been built on exactly this model from day one.

What is the Open Knowledge Format?

The Open Knowledge Format is an open, vendor-neutral specification for representing curated knowledge as Markdown. An OKF "bundle" is simply a directory of .md files, where each file describes one concept — a database table, a metric, a dataset, an API, a runbook, a playbook — using YAML frontmatter for the few fields that need to be machine-readable, and ordinary Markdown for everything else.

The spec is deliberately tiny. It requires exactly one field on every concept: type. Beyond that, a small set of reserved fields is standardized so that bundles written by different producers can be consumed by different agents without translation:

  • type (required) — what kind of concept this is
  • title — the concept's name
  • description — a short summary
  • resource — a link to the underlying thing
  • tags — categorization labels
  • timestamp — an ISO 8601 time of last change

A single concept file looks like this:

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---
# Schema

| Column      | Type   | Description                       |
| ----------- | ------ | --------------------------------- |
| order_id    | STRING | Globally unique order identifier. |
| customer_id | STRING | FK to [customers](/tables/customers.md). |

# Joins

Joined with [customers](/tables/customers.md) on `customer_id`.

Bundles are organized as nested folders, with reserved index.md files for "progressive disclosure" as an agent navigates the hierarchy, and ordinary Markdown links ([customers](/tables/customers.md)) forming a cross-linked knowledge graph. That is the whole idea. As the authors put it, OKF is "just Markdown, just files, just YAML frontmatter."

OKF was released by Google Cloud tech leads Sam McVeety and Amir Hormati, alongside a reference enrichment agent that drafts OKF documents from BigQuery datasets, a static HTML visualizer, and three sample bundles. It is published as v0.1 on GitHub — explicitly "a starting point, not a finished standard." The guiding principle is that the value of a format comes from how many parties speak it, not from who owns it: format, not platform.

Why OKF is useful — for developers and AI agents

Every team building agents hits the same wall. A model can write code, summarize a document, or analyze a dataset, but it needs the right information to do any of it well — and that information is scattered across metadata catalogs, wikis, shared drives, code comments, notebooks, and the heads of senior engineers. Assembling it is the "context-assembly problem," and right now everyone solves it from scratch, in a bespoke and non-portable way.

OKF is useful because it makes that solution a format instead of a project.

For developers, it is refreshingly boring — in the best way.

  • No SDK, no integration, no lock-in. You write Markdown. It renders on GitHub, opens in any editor, and is grep-able and indexable by any tool you already use.
  • Knowledge becomes code. Because a bundle is just files, it lives in git: versioned, diffable in pull requests, reviewable, and subject to the same CI you already run. Knowledge stops rotting in a wiki nobody updates.
  • Producer/consumer independence. Whoever writes the knowledge (a human, a pipeline, an enrichment agent) is cleanly separated from whoever reads it (an agent, a visualizer, a search tool). Either side can change without breaking the other.
  • It survives migration. The same bundle works whether you switch clouds, models, or agent frameworks. You are not betting your knowledge base on one vendor's roadmap.

For AI agents, OKF is the difference between guessing and knowing.

  • Curated, not re-derived. This is the sharpest contrast with RAG. Retrieval re-derives meaning at query time from raw chunks and embeddings; an OKF bundle hands an agent curated, cross-linked concepts that someone (or something) already got right. Less hallucination, more provenance.
  • Structured where it counts. The YAML frontmatter gives agents queryable fields — type, tags, timestamp — while the Markdown body carries the nuance. Agents can filter, then read.
  • Navigable. index.md files and Markdown cross-links let an agent traverse a knowledge graph the way a person clicks through a wiki, loading detail only when it needs it.
  • Agents can maintain it. As Andrej Karpathy observed about this pattern, "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." The bookkeeping that causes humans to abandon their wikis is exactly what agents are good at.

It is worth being precise about where OKF sits, because it is easy to confuse with adjacent standards. Discovery formats like llms.txt and agent cards help an agent find your surface. Transport protocols like the Model Context Protocol (MCP) define how an agent calls your tools and fetches data. Project files like AGENTS.md and CLAUDE.md carry instructions. OKF is none of these — it standardizes the knowledge itself. They stack; they do not compete.

Which applications will benefit most

OKF is general, but a few categories stand to gain immediately.

  1. Data and analytics platforms. This is OKF's home turf — Google's own reference implementation enriches BigQuery tables into OKF documents. Data catalogs, semantic and metric layers, and "talk to your warehouse" assistants all need exactly the typed, described, cross-linked concepts OKF encodes.
  2. Internal developer platforms. Runbooks, architecture decision records, service catalogs, and API references are knowledge that already wants to be Markdown in a repo. OKF gives them a shape that on-call agents and platform copilots can consume.
  3. AI coding assistants. Tools like Cursor, Claude Code, and Codex already thrive on repo-local Markdown context. A conformant bundle is a natural upgrade from ad-hoc CLAUDE.md notes to a structured, navigable knowledge base.
  4. Customer-facing copilots and support agents. Product documentation, policies, and troubleshooting guides become a governed, citable source of truth instead of a brittle prompt stuffed with pasted text.
  5. Knowledge management and "second brain" tools. Personal wikis, PKM apps, and Obsidian-style vaults are already Markdown — and the ones that are markdown-native (rather than Markdown-as-an-export) will adopt OKF with the least friction.

The common thread: anywhere an agent needs durable, governed, citable context rather than a one-shot paste, OKF is a better substrate than a proprietary catalog or a pile of embeddings.

How MDflow fits

Here is the part that made us sit up. We did not build MDflow to implement OKF — the spec did not exist when we started. But MDflow already speaks OKF's native language, because we made the same core bet: knowledge should be portable Markdown that people and agents can read.

What already lines up today

Markdown-native storage. Every MDflow document is plain-text Markdown — not a proprietary format with an approximate "export to Markdown" button. What you write is what you keep, and what an agent reads. That is OKF's "just Markdown, just files" principle, in production.

Folders with descriptions — curated context, not just names. In MDflow, every folder carries a description that defines the intended context for the documents inside it. This is more than organization: it is the primary ranking signal our agent retrieval uses. It maps almost exactly onto OKF's index.md idea — a place to describe a region of the knowledge graph so an agent knows what it is looking at before it reads the details.

A context tool agents already call. MDflow's MCP server exposes mdflow_get_context: give it a topic and it scores folder descriptions first, then names and titles, and returns the most relevant Markdown bodies — readable context plus structured JSON. That is the consumer side of OKF, live today.

Raw Markdown with YAML frontmatter, over open CORS. Append .md to any shared MDflow link and you get the document as plain Markdown with YAML frontmatter (title, canonical_url, md_url, visibility) and Access-Control-Allow-Origin: *. An agent can fetch and cite a document in a single request — the same frontmatter-on-Markdown shape OKF formalizes. (This very post has a raw .md twin; look for the link at the top.)

Producers and consumers, not just readers. OKF's whole premise is that agents should read and update knowledge. MDflow's HTTP API and MCP server, authenticated with a Personal Access Token, let Claude, ChatGPT, Cursor, and Codex create, update, move, organize, and share documents — and keep folder descriptions current. The "bookkeeping LLMs are good at" has somewhere to live.

Discovery already published. MDflow ships the surrounding layers OKF assumes someone else provides: an llms.txt index, a self-contained docs.md agent manual, an A2A agent card, and an OpenAPI 3.1 spec. Discovery, transport, and knowledge in one place.

Governance built in. Curated knowledge needs provenance and control. MDflow already has public links and private email-based sharing, collections that group documents independently of folders, anchored comments, automatic version history, server-side ownership checks, and optional client-side AES-256 encryption. Knowledge you can trust is knowledge you can govern.

Where we are headed — a roadmap aligned with OKF

OKF gave us a vocabulary for where MDflow was already going, and a clear north star for what comes next. The following is direction, not a dated commitment, but it is the shape of our thinking:

  • First-class document types and tags. Adopting OKF's type field (and reserved fields like tags and timestamp) as native, editable metadata on every document — so a folder of notes can become a set of typed concepts without leaving the editor.
  • OKF import and export. Round-tripping a folder or collection to and from a conformant OKF bundle — a tarball of .md files with index.md context — so your MDflow workspace and the wider OKF ecosystem can exchange knowledge freely.
  • A collections API and richer remote MCP. Serving a whole collection to agents as a cross-linked bundle over HTTP, so an agent can pull an entire curated knowledge set, not just one document at a time.
  • Workspaces as bundles. Personal workspaces (in progress today) give each context its own scoped set of folders — a natural unit to publish as an OKF bundle.
  • Capture-to-knowledge. The MDflow Web Clipper already turns web pages into clean Markdown; the next step is dropping clipped pages straight into a typed, agent-ready bundle.
  • Agent-assisted enrichment. Google's reference implementation auto-drafts OKF documents and enriches them with a second LLM pass. The same idea fits MDflow naturally: let an agent propose folder descriptions, types, and cross-links for knowledge you already have.

If OKF becomes the lingua franca Google hopes it will, the tools that win are the ones where Markdown was the substrate all along — not a lossy export target. That is the bet MDflow made, and OKF is a strong signal we made the right one.

The bottom line

The Open Knowledge Format is small on purpose, and that is its strength. It does not ask you to adopt a platform, learn an SDK, or move your data. It asks you to write your knowledge as Markdown with a little structure on top — exactly the thing that is easy to produce, easy to consume, and easy to keep. For developers, that means knowledge that lives in git and never goes stale. For AI agents, it means curated, citable context instead of guesswork.

MDflow is where that knowledge can live for people and agents at the same time: write Markdown in the browser, give your folders meaning, share it cleanly, and connect your agents — all today, with a roadmap pointed straight at the OKF future.

Start free · Connect an AI agent · Read the API docs

Frequently asked questions

What is Google's Open Knowledge Format (OKF)?

OKF is an open, vendor-neutral specification, published by Google Cloud on June 12, 2026, for packaging organizational knowledge as a directory of Markdown files with YAML frontmatter. Each file describes one concept — a table, metric, runbook, API, and so on. The only required field is type; everything else is up to the author. It needs no SDK, no database, and no proprietary account to read or write.

Is OKF the same as RAG?

No. Retrieval-augmented generation re-derives knowledge at query time by embedding and retrieving raw chunks. An OKF bundle stores curated, cross-linked, human- and agent-readable concepts that an agent reads — and can update — directly. They are complementary: you can still index a bundle for retrieval, but the knowledge itself stays structured and authoritative.

Do I need Google Cloud to use OKF?

No. Google describes OKF as a "format, not a platform." It is just Markdown, just files, and just YAML frontmatter, so it works in any git repo, any filesystem, and any editor. Google Cloud's Knowledge Catalog can ingest it, but OKF is deliberately vendor-neutral and requires no proprietary runtime.

How is OKF different from llms.txt, MCP, or AGENTS.md?

They solve different layers. llms.txt and agent cards are about discovery; MCP is about transport; AGENTS.md/CLAUDE.md carry project instructions. OKF is about the knowledge itself — the curated, typed, cross-linked content an agent reads. They stack together rather than compete.

How does MDflow relate to OKF?

MDflow already speaks OKF's native language. It stores documents as portable Markdown, organizes them in folders whose descriptions act as curated context, serves raw .md with YAML frontmatter over open CORS, and exposes the whole workspace to agents through an HTTP API and an MCP server. That is the producer-and-consumer model OKF describes — available today.

Further reading