The format that won, and the reader nobody planned for.
Markdown was a generational good idea. John Gruber shipped it in 2004 as a way to write structured prose that read fine in source form and rendered to clean HTML. Twenty years later it is the default for README files, GitHub issues, Substack posts, Slack messages, Linear tickets, and the configuration files that tell AI coding agents how to behave inside your repo. The CLAUDE.md at the root of a Claude Code project, the .cursorrules file at the root of a Cursor project, the llms.txt file at the root of a website intended to be agent-readable, all of them are Markdown by convention.
None of those conventions were designed with the assumption that the file's primary reader would be a large language model running inside a coding agent. They were designed for humans, who happen to be a tolerant kind of parser. A human reading ## Setup understands "this is the start of a setup section" instantly. A human reading **important** reads emphasis. A human can tell a code block from a quote from a bullet from a list item, even with imperfect spacing, even with broken nesting, even with the kind of mixed indentation that would crash a strict parser.
Models are also tolerant, but tolerance is not free. Every ambiguous chunk an agent has to interpret is tokens spent guessing instead of acting. The more your context file leans on visual convention rather than structural signal, the more compute the agent burns reconstructing structure that could have been declared once at the top of the file.
HTML declares structure. That is the entire argument. The rest of this article is what that means in practice.
What HTML gives you that Markdown does not.
HTML is verbose. That is the cost. The benefits compound across five axes.
1. Typed regions instead of indented prose.
A Markdown section break is a heading line. The reader's job is to look at the line, count the hashes, and infer "this opens a section that ends when another heading of equal or lower depth appears, or when the document ends." HTML eliminates the inference. <section id="setup">...</section> declares the boundary explicitly, gives the region an addressable id, and lets a parser jump directly to it without scanning every line in between. For a 2,000 line context file, that is the difference between a model loading the right slice in one operation and re-reading the whole file to find it.
2. Definitions that look like definitions.
In Markdown, a glossary is a bullet list with colons in it. In HTML, a glossary is a <dl> with <dt> term and <dd> definition pairs. The two render identically in a browser. To a parser, the second is unambiguously a definition list. Wrap the first mention of a technical term in <dfn> and any tool building a glossary or index can lift it directly.
- CLAUDE.md
- The conventional Markdown file at a project root that an AI coding agent reads first to understand the codebase. Equivalent to a README written for an agent rather than a human.
- llms.txt
- A proposed convention (llmstxt.org) for a Markdown file at a website's root that lists the resources an AI agent should crawl. A table of contents for machine readers.
- JSON-LD
- A JSON-based syntax for embedding linked data (Schema.org vocabulary) inside an HTML page. The values are invisible to humans, structured for machines.
3. Embedded structured data.
This is the largest single advantage. HTML lets you place a JSON-LD block in the head of the document that asserts machine-readable facts about the page. A SoftwareApplication schema can list the app's name, version, supported platforms, license, and download URL. A TechArticle schema can list the headline, publish date, author, and word count. A FAQPage schema can list every question and answer in a form a model can lift verbatim.
None of this is visible to a human reading the page in a browser. All of it is parseable by any tool that knows Schema.org, which is every search engine since 2011 and every modern coding agent. The page you are reading right now ships four JSON-LD blocks for this reason. Open the source if you want to see them.
4. Inline rendered demos.
Markdown can embed images. Markdown cannot embed a working dashboard, an animated diagram, or a CSS-styled mockup of the thing the article is describing. HTML can.
That dashboard above is not a screenshot. It is HTML, CSS, and a sprinkle of SVG, written into this page. A Markdown source file describing the ffmpeg-render-pro tool could link to a screenshot of the dashboard or paste an ASCII art version of it. The HTML source can render the dashboard itself, animations and all, with no external assets. When a future maintainer reads this article, the demo is already running. When an agent reads the same article, the markup it parses includes the structural intent (worker, progress, status, log) labeled clearly with classes and ARIA attributes.
5. Hierarchical metadata that survives copy-paste.
Microdata attributes like itemscope, itemtype, and itemprop attach machine-readable typing to specific elements inside the prose. Wrap a person's name in <span itemprop="author"> and that name is now extractable even when the surrounding sentence is restructured. Wrap a software version in itemprop="softwareVersion" and the same. Markdown has no equivalent. The closest you can get is a YAML front-matter block at the top of the file, which is brittle, easy to break, and entirely separate from the prose body.
A working example.
Here is a small section of a real CLAUDE-style context file expressed twice. The Markdown version on the left is how most projects write this today. The HTML version on the right is what an agent ingests with less guesswork.
CLAUDE.md (Markdown)
# Project: ai-jobs.careers
A FastAPI job board that scores
every role 0 to 100 for AI agency.
## Stack
- Python 3.11
- FastAPI 0.110
- SQLite + FTS5
- Render Standard ($25/mo)
## Hot paths
- The extractor lives in
app/agents/extractor.py
- The scoring rubric is in
app/agents/rubric.py
## Don't do this
- Read SQLite rows by index
- Edit Render env vars during
a pipeline window
CLAUDE.html (HTML, same content)
<article itemscope itemtype="..SoftwareApplication">
<header>
<h1 itemprop="name">ai-jobs.careers</h1>
<p itemprop="description">A FastAPI job
board that scores every role 0 to 100
for <dfn>AI agency</dfn>.</p>
</header>
<section id="stack" aria-label="Tech stack">
<h2>Stack</h2>
<ul>
<li><span itemprop="runtimePlatform">
Python 3.11</span></li>
<li>FastAPI <span itemprop=
"softwareVersion">0.110</span></li>
</ul>
</section>
<section id="rules" data-priority="high">
<h2>Constraints</h2>
<p>Never read SQLite rows by index.</p>
</section>
</article>
The right-hand version is verbose. It is also self-describing in a way the left version is not. An agent reading the right side immediately knows "this is a SoftwareApplication record, the name is ai-jobs.careers, the runtime platform is Python 3.11." It does not have to infer that from a bullet list under a heading. The data-priority="high" attribute on the constraints section is a hint the agent can elevate when deciding what to remember. The aria-label attributes are accessibility metadata that doubles as section labeling for machine readers.
Most importantly, the HTML version still renders to clean readable prose in a browser. A human opening CLAUDE.html in a browser sees a tidy page with headings and lists. A human opening it in a text editor sees more verbose source. That is the tradeoff: slightly more work to write, every benefit at the read.
Where this matters most.
HTML over Markdown is not a religious position. It is a tool choice that pays off when the content has any of these properties:
- Long-lived agent context. A CLAUDE.html that lives at a repo root for a year and is read by dozens of agent sessions. The structural clarity compounds.
- Reference documentation. API docs, schema docs, runbook docs. Anything that will be queried by section.
- Glossary-heavy domains. Anywhere you find yourself defining terms more than three times. Definition lists are a category win.
- Visual specifications. UI specs, dashboard layouts, state machine diagrams. The inline rendered demo replaces a folder full of screenshots that go stale.
- FAQ content. FAQPage schema is the highest-leverage single schema for SEO and the easiest for an agent to consume. If your context file has any question and answer structure, declare it.
It pays off less when the content is genuinely throwaway: a one-off PR description, a tweet draft, a session-scoped scratch note. Use Markdown there. The cost of writing HTML for content that exists for ninety seconds is not worth the structural payoff.
The honest case against.
HTML has real downsides and pretending otherwise weakens the argument. Three honest objections.
It is harder to write by hand. Closing tags double the keystrokes. Even with editor support, the cognitive overhead per paragraph is real. The mitigation is to use a Markdown source and a Markdown-to-HTML converter on save, or to author in a WYSIWYG editor that emits semantic HTML. The author-time cost is the cost. The benefit accrues at every later read.
Diff noise. Pull requests that touch HTML look heavier than Markdown PRs because tag changes appear as line edits. Reviewers learn to read past it but the first impression is more cluttered. There is no clean fix, only the choice of whether the structural payoff is worth the diff hit.
GitHub does not render README.html. If you replace README.md with README.html, your repo front page on GitHub becomes a raw file dump. The fix is to keep README.md and link from it to a sibling CONTEXT.html or DEEP.html for the rich agent-facing content. Same approach for CLAUDE.md to CLAUDE.html.
"The agent reads what you give it. The richer the structure, the less work it does to find the part that matters."
How to migrate, concretely.
A pragmatic migration from a Markdown context file to an HTML one takes about 45 minutes for a typical CLAUDE.md of 300 to 600 lines.
- Start with the semantic skeleton. Create CLAUDE.html. Put
<article>around the whole document. Put<header>around the project name and one line description. Each existing##heading becomes a<section id="...">wrapping its content. - Carry through the heading hierarchy.
##becomes<h2>,###becomes<h3>. Resist the urge to flatten. Nested sections give parsers a tree they can traverse. - Convert lists thoughtfully. A "terms" list becomes a
<dl>. A "steps" list becomes an<ol>. A general bullet list stays<ul>. The same items render the same way but each carries the right semantic. - Tag commands precisely. Triple backtick blocks become
<pre><code class="language-bash">or whatever the language is. Inline commands get<kbd>for keys or<samp>for command samples. - Add at least one JSON-LD block. Pick the schema that fits: SoftwareApplication if it documents an app, TechArticle if it is a write-up, HowTo if it is a procedure. Fill in the obvious fields. Validate at validator.schema.org. Forty lines of JSON-LD adds zero visible weight and a lot of machine value.
- Wire the entry point. Keep CLAUDE.md but cut its body. Replace with a one-line pointer: "See CLAUDE.html for full context." Agents that read HTML follow the link. Agents that only read Markdown still get the pointer.
The full HowTo for this migration is in the JSON-LD at the top of this page. Open the source, search for "@type": "HowTo". The steps are machine-readable. An agent given this page could follow them programmatically.
Where this is going.
Three predictions, none of them bold.
First, CLAUDE.html will become a convention within twelve months. Not because of a coordinated announcement but because the teams running large agent fleets will quietly find that the parser stability of HTML beats the friction of richer Markdown extensions. The Anthropic skills repo at github.com/anthropics/skills already uses richer structure than the documentation conventions of three years ago. The drift toward typed regions is happening.
Second, llms.txt and llms-full.txt will keep their Markdown form but the documents they point at will trend HTML. llms.txt is a table of contents. The format that wins for tables of contents is plain text or shallow Markdown. The format that wins for the rich documents the table points at is HTML with embedded structured data.
Third, the gap between human-only readers and agent readers will narrow to zero. Every README will be read by both. Every API doc will be read by both. Every project spec will be read by both. The job of the writer becomes "serve both audiences with one document." HTML is the only format we currently have that pulls this off cleanly: it renders to readable prose in a browser, ships embedded metadata for machines, and supports interactive demos that document themselves.
The pragmatic recommendation.
You probably do not need to migrate every Markdown file in your repository today. You do not even need to rename CLAUDE.md to CLAUDE.html. The pragmatic move is smaller.
Pick the one document in your project that gets read most often by an agent. The CLAUDE.md at the repo root, or the SPEC.md for the feature you are shipping next quarter, or the RUNBOOK.md your oncall uses at 3 AM. Convert that one. Keep its old name as a one-line pointer. Watch the agent's behavior on that document over the next month.
If the agent gets to the right section faster, makes fewer wrong assumptions, and stops asking for clarification on things the document already covered, you have your answer. Roll the pattern out to the next document. Then the one after that. In a year, your repo will be a mix of Markdown and HTML where the HTML files are exactly the ones that earn the structural overhead. That is the right ratio.
The format that wins is the one that serves both readers, the human at 11 PM and the agent at 3 AM, without making either one work harder than they have to. HTML, for the documents that matter most, is that format.