6 min read
How to convert a PDF to LaTeX with equations preserved
By ScoutMyTool Editorial Team · Last updated: 2026-05-22
Introduction
“Convert this PDF to LaTeX with the equations intact” runs into a hard fact: the LaTeX source is not in the PDF. Compiling LaTeX to PDF turns the equations into rendered glyphs and throws the markup away, so there is nothing to extract back — converting to LaTeX means re-creating source that reproduces the page, and for equations that means math OCR (recognising the math from the image), which is impressive but imperfect. This guide gives the realistic workflow: extract text, recognise equations, rebuild structure, and — non-negotiably — verify every equation, because misrecognised math is a real error. And the best move when it is available: get the original .tex file.
What converts well, what does not
| Element | Recoverability |
|---|---|
| Body text | Good — extract/OCR to text |
| Equations | Hard — must be re-recognised (math OCR), then verified |
| Section structure | Approximate — rebuild headings |
| Tables | Approximate — reconstruct as LaTeX tables |
| Original LaTeX source | Not in the PDF — gone at export |
Step by step — reconstruct LaTeX from a PDF
- Look for the original .tex first. If the author has the source, use it — it has the real equations and beats any conversion.
- Extract the body text. Pull text with PDF to Word or OCR a scan with PDF OCR (see OCR + reformat).
- Recognise equations with math OCR. Run each equation through a dedicated math-OCR tool to get candidate LaTeX — recognition, not recovery, so expect imperfections.
- Rebuild the structure. Re-create sections, lists, and tables as LaTeX around the text and equations.
- Verify every equation. Proofread each recognised equation against the PDF — a wrong symbol changes the math; this is the essential step.
- Compile and compare. Build the LaTeX and visually compare the output to the original PDF to catch remaining errors.
- Decide if LaTeX is even needed. If you only need editable text or searchability, a simpler route (Word, or just OCR) may suffice — see academic-document workflows.
Related reading and tools
- PDF to Word: when editable text (not LaTeX) is enough.
- OCR + reformat: recovering text from scans.
- Academic research workflow: managing scholarly documents.
- Paper management & citations: the research-document side.
- Extract footnotes: another inference-based recovery task.
- PDF OCR tool: recover text in your browser.
- All ScoutMyTool PDF tools: the full toolkit.
FAQ
- Can I get the original LaTeX back from a PDF?
- No — the LaTeX source is not stored in the PDF. When a LaTeX document is compiled to PDF, the equations and text become rendered glyphs positioned on the page; the source markup (the \frac, \sum, environments) is discarded. So there is nothing to "recover" — converting a PDF to LaTeX means re-creating LaTeX that reproduces what the PDF shows, not extracting the original. For body text that is straightforward (extract the text, wrap in LaTeX). For equations it is genuinely hard, because you have to recognise the mathematical meaning from the rendered image, which is what math-OCR tools attempt. If the author still has the original .tex file, that is vastly better than any conversion — ask for it first.
- How are equations actually "converted" then?
- Through math OCR (mathematical expression recognition): a specialised recogniser looks at the rendered equation and tries to produce LaTeX (or MathML) that would render to the same thing. Dedicated tools (such as the well-known Mathpix, among others) are built for this and do a reasonable job on clean, standard notation. But it is recognition, not recovery, so it is imperfect — complex, unusual, or low-quality equations get misrecognised, and a single wrong symbol changes the math. So expect to feed equations through a math-OCR tool and then carefully verify each result against the original, because an unverified misrecognised equation is a real error, not a cosmetic one.
- How accurate is math OCR?
- For clean, standard, reasonably-sized equations from a good-quality PDF, modern math OCR is often quite good; for dense, multi-line, unusual-notation, or low-resolution equations it degrades, and it can confuse similar symbols, sub/superscripts, and grouping. Because mathematics is unforgiving — a misplaced subscript or a wrong operator changes the meaning entirely — you cannot treat the output as automatically correct. The honest stance: math OCR gives you a strong starting LaTeX you then proofread equation by equation against the source, fixing what it got wrong. It saves enormous transcription effort versus typing every equation by hand, but it does not remove the need to verify.
- What is the reliable workflow for PDF to LaTeX?
- Get the text and the math separately, then assemble. Extract the body text (or OCR it if the PDF is scanned), run the equations through a math-OCR recogniser to get candidate LaTeX, rebuild the document structure (sections, lists, tables as LaTeX), and then proofread — especially every equation — against the original PDF before trusting it. If the source .tex exists, use that instead and skip all of this. Think of it as reconstruction with verification, not a one-click convert: the tools do the heavy lifting on text and on recognising equations, and you do the correctness pass that mathematics demands.
- What about a scanned PDF of a math document?
- Harder still, but the same shape. A scan is images, so you OCR the text and math-OCR the equations, both of which are less accurate on scanned (versus born-digital) input, and verification matters even more. Low-resolution or skewed scans of dense equations are the worst case for math recognition. So OCR for text, math-OCR for equations, then a careful equation-by-equation verification pass against the scan. For an important document, budget real time for that verification — recovering correct LaTeX from a scanned math paper is achievable but is genuinely a reconstruct-and-check job, not an automatic one.
- When is converting to LaTeX worth it versus other formats?
- Convert to LaTeX when you genuinely need editable LaTeX source — to revise a paper, reuse equations, or re-typeset — and accept the recognition-and-verification effort. If you only need editable text (not LaTeX specifically), converting to a word processor format may be simpler, though equations there have their own recognition issues. If you only need to read or search the document, OCR for searchability is enough and you skip LaTeX entirely. So match the effort to the need: LaTeX reconstruction is worthwhile for real LaTeX editing of equation-heavy content, and overkill if you just need the text.
- Is it safe to process an unpublished paper online?
- Unpublished or under-review papers are confidential, so prefer tools that process files locally where possible. ScoutMyTool extracts text and OCRs entirely in your browser tab, so the document never leaves your machine for those steps; dedicated math-OCR is a separate specialised tool, so check its handling before uploading confidential equations. For anything pre-publication, confirm any tool you use does not retain or expose your content.
Citations
- Wikipedia — “LaTeX,” the typesetting system and its source markup. en.wikipedia.org/wiki/LaTeX
- Wikipedia — “MathML,” a math markup target alongside LaTeX. en.wikipedia.org/wiki/MathML
- Wikipedia — “Optical character recognition,” the basis of text and math recognition. en.wikipedia.org — OCR
Reconstruct it — then verify the math
Recover text and OCR scans with ScoutMyTool’s in-browser tools, recognise equations with a math-OCR tool, and verify every one against the original. Your document stays on your machine for the text steps.
Open PDF OCR →