Convert Research Paper PDF to Markdown

Academic PDFs are some of the worst documents to feed an LLM directly — multi-column layouts, footnotes, equations, citation noise. Convert to Markdown first and you get something every model and notes app can read properly.

Why convert PDFs to Markdown for this?

Research papers carry a lot of structure that matters: the abstract, the sections, the methodology, the results, the references. PDF buries that structure in a fixed two-column layout that text-extraction shreds. The result is sentences interleaved across columns, footnotes spliced into body text, and tables flattened beyond recognition. Converting to Markdown puts the structure back: the abstract is its own section, methodology and results have clear headings, and the LLM (or your notes app, or your literature-review database) can address each section directly. Citations come through as Markdown links where the PDF had hyperlinked references.

How to use t0md

Drop the paper PDF on t0md, copy or download the Markdown. For a single-paper Q&A session, paste it into Claude or ChatGPT — answers cite by section heading. For literature review at scale, save .md files into Obsidian, Notion or a Zettelkasten and link papers to each other by topic. For RAG over a paper corpus, convert all papers at ingest and embed by section — retrieval pulls the right methodology or results chunk for any question.

Related guides

Frequently asked questions

Do equations survive the conversion?

Inline equations come through as text; LaTeX-rendered display equations may flatten depending on how the PDF embedded them. For LaTeX preservation specifically, source from the arXiv .tex file rather than the compiled PDF.

What about citations and references?

References sections are preserved as bulleted lists. In-text citation markers (e.g. "[12]") come through as text. Hyperlinks in the PDF become Markdown links in the output.

Will the two-column layout cause problems?

No — t0md handles the column merging during extraction so the Markdown reads top-to-bottom in reading order, not column-by-column-by-column.