# t0md (to Markdown)

## What this is

t0md.com is a free document-to-Markdown converter for humans and AI agents. The
website accepts uploads and returns clean Markdown. A remote MCP server lets AI
agents (Claude Code, Cursor, Windsurf, ChatGPT-with-MCP) call the same conversion
directly.

## Roadmap

| Format          | Status        |
|-----------------|---------------|
| PDF             | Live          |
| HTML            | Live          |
| Microsoft Word  | Live          |
| PowerPoint      | Live          |
| EPUB            | Coming soon   |
| XLSX            | Coming soon   |
| Image (OCR)     | Coming soon   |
| URL / webpage   | Coming soon   |

## For LLM agents

t0md exposes the MCP server on two transports at the same host. Pick whichever
your client supports:

- **Streamable-HTTP (recommended):** `https://t0md.com/mcp` — current MCP spec
  transport. POST your JSON-RPC, get a single JSON response. Used by Hermes,
  modern Claude Desktop, Cursor 0.42+, VS Code MCP extension, Continue.dev.
- **SSE (legacy):** `https://t0md.com/mcp/sse` + POST `/mcp/messages?sessionId=…`
  for replies. Used by Claude Code's `--transport sse` and OpenClaw's default.

### Install per agent

Claude Code (recommended):
```
claude mcp add --transport http t0md https://t0md.com/mcp
```

Claude Code (legacy SSE, for older releases):
```
claude mcp add --transport sse t0md https://t0md.com/mcp/sse
```

Claude Desktop / Cursor / Windsurf / VS Code (`mcpServers` JSON config):
```
"t0md": { "url": "https://t0md.com/mcp", "transport": "streamable-http" }
```

OpenClaw:
```
openclaw mcp set t0md '{"url":"https://t0md.com/mcp/sse"}'
```

Hermes (YAML):
```
mcp_servers:
  t0md:
    url: "https://t0md.com/mcp"
```

### Tools

Primary tool: **`convert_to_markdown`** — accepts PDF, HTML, DOCX, PPTX. Format
auto-detected from filename + content. Arguments:
- `file_base64` (string) — base64 of the document, OR
- `upload_id` (string) — reference returned by `POST /mcp/upload` (preferred for
  files larger than ~2 MB so you don't burn the model's output token budget)
- `filename` (string) — required when using `file_base64` so the server can
  detect the format. Ignored when `upload_id` is used.

Backward-compatibility alias: **`convert_pdf_to_markdown`** — only handles PDF,
takes `pdf_base64` / `pdf_upload_id` / `filename`. New integrations should use
`convert_to_markdown` instead.

Both tools return the Markdown inline plus a one-time download URL valid for
10 minutes.

## HTTP API

`POST /convert`
- Content-Type: multipart/form-data
- Form field `file`: a PDF, HTML, DOCX or PPTX (max 25 MB)
- Response: JSON `{ markdown, filename, pages, bytes, duration_ms }`

`POST /download`
- Form fields `markdown` and `filename`
- Response: a `.md` file attachment

`POST /mcp/upload`
- Multipart form, field `file` — stage a document for a later `upload_id` reference
- Response: JSON `{ upload_id, filename, bytes, expires_in_seconds }`

`GET /health`
- Response: `{ status: "healthy", service: "t0md", timestamp }`

## How conversion works

PDFs are extracted with poppler's `pdftotext -layout`. The raw text is then run
through heuristics to:

- collapse runs of blank lines
- detect bullet/numbered lists from leading glyphs
- promote ALL-CAPS short lines to `## headings`
- rejoin words split across lines by hyphenation

It's a pragmatic best-effort. For most academic papers, books and reports the
output is good enough to paste into an LLM context window.

## Privacy

Uploaded PDFs live in server memory only. Generated Markdown for MCP download
links is held for at most 10 minutes, then garbage-collected.

## Authoring

Built by [Kirill Zubovsky](https://kirillzubovsky.com). Sister site:
[md2doc.com](https://md2doc.com) (Markdown → Word).
