When your PDF is a scan or photo of paper rather than a generated PDF (Word → Save As PDF, browser → Print to PDF, etc.). If you can't select or copy the text in your reader, the file needs OCR.

How long does it take?

About 10 seconds per page on average. A 5-page scanned contract finishes in under a minute; a 50-page document in 2-3 minutes.

90-99% on clean printed scans. Handwriting, low-DPI phone photos, very small fonts, and unusual scripts produce more errors. Always verify mission-critical text against the original.

English, French, Spanish, and German — applies to all three output modes (searchable PDF, DOCX, and plain text). Non-English passes go through the subprocess fallback because our warm pdf-daemon is English-only; expect 5-15 s of extra latency on the first non-English page. Need another language? Drop a note via the contact form.

How is OCR accuracy reported?

For plain-text output, every recognised word carries a Tesseract confidence score (0-100). The summary shows the document-wide mean (weighted by word count) and flags any page below 60% — those are usually low-DPI scans, glare, or fonts Tesseract struggles with. 90%+ across the document is typical for clean printed scans; below 80% suggests you should re-scan at higher DPI before trusting the output.

Yes — OCR has to run server-side because Tesseract doesn't run in the browser. Files are processed in a temp directory and deleted immediately after the response is sent. We don't keep copies, don't train models on them, don't share them.

How is this different from "OCR PDF (Make Searchable)"?

That tool is searchable-PDF only. PDF OCR (this one) is the unified version that also offers DOCX and plain-text outputs from the same drop.

Can I use this on a non-scanned PDF?

Yes, but it's wasted work — the existing text is already extractable. Use PDF to Text or PDF to Word directly for native PDFs; OCR is only useful when the page is an image.

PDF OCR

Run OCR on a scanned PDF — get a searchable PDF, plain text, or Word document. Pick from English / French / Spanish / German. Free, server-side, files deleted after processing.

Drop a scanned PDF (or click to choose) — up to 50 MB.

or click to browse

Output format

Searchable PDF

Same look, invisible text layer added.

Word (.docx)

Editable Word document.

Plain text (.txt)

Just the recognised words.

Language

Applies to all output modes. Non-English passes go through the subprocess fallback (slightly slower than the daemon warm path).

About this tool

Drop a scanned PDF — a photo of a contract, a scanned receipt, an old book chapter, anything where the page is an image instead of text — and choose what you want back: a searchable PDF (same look, with an invisible text layer added so you can copy / search), a Word DOCX (editable in Microsoft Word, Google Docs, or LibreOffice), or a plain .txt file (just the recognised words). The pipeline is OCRmyPDF + Tesseract running server-side on our infrastructure — the same open-source stack that powers paid services. We render each page at 300 DPI for accuracy, recognise the characters with Tesseract, and either layer the text back onto the original PDF (searchable PDF), pass the result through pdf2docx (Word), or extract the words directly (plain text). Files are processed and deleted right after you download. Plain-text mode supports English, French, Spanish, and German; searchable-PDF and DOCX modes default to English.

FAQ

When do I need OCR?
When your PDF is a scan or photo of paper rather than a generated PDF (Word → Save As PDF, browser → Print to PDF, etc.). If you can't select or copy the text in your reader, the file needs OCR.
How long does it take?
About 10 seconds per page on average. A 5-page scanned contract finishes in under a minute; a 50-page document in 2-3 minutes.
How accurate is it?
90-99% on clean printed scans. Handwriting, low-DPI phone photos, very small fonts, and unusual scripts produce more errors. Always verify mission-critical text against the original.
What languages?
English, French, Spanish, and German — applies to all three output modes (searchable PDF, DOCX, and plain text). Non-English passes go through the subprocess fallback because our warm pdf-daemon is English-only; expect 5-15 s of extra latency on the first non-English page. Need another language? Drop a note via the contact form.
How is OCR accuracy reported?
For plain-text output, every recognised word carries a Tesseract confidence score (0-100). The summary shows the document-wide mean (weighted by word count) and flags any page below 60% — those are usually low-DPI scans, glare, or fonts Tesseract struggles with. 90%+ across the document is typical for clean printed scans; below 80% suggests you should re-scan at higher DPI before trusting the output.
Is my file uploaded?
Yes — OCR has to run server-side because Tesseract doesn't run in the browser. Files are processed in a temp directory and deleted immediately after the response is sent. We don't keep copies, don't train models on them, don't share them.
How is this different from "OCR PDF (Make Searchable)"?
That tool is searchable-PDF only. PDF OCR (this one) is the unified version that also offers DOCX and plain-text outputs from the same drop.
Can I use this on a non-scanned PDF?
Yes, but it's wasted work — the existing text is already extractable. Use PDF to Text or PDF to Word directly for native PDFs; OCR is only useful when the page is an image.

More tools you might like

Hand-picked tools that pair well with this one — same audience, same intent.

PDF to Word

PDF

Convert any PDF (including scanned ones) into an editable Word document. Pick "Preserve formatting" for layout + tables or "Plain text" for the fastest possible cleanup.

OCR Extract Text Only

PDF

Extract plain text from a scanned PDF — runs OCR server-side, then pulls the recognised text out as a .txt file (no PDF output).

PDF to Excel

PDF

Convert a PDF into an Excel spreadsheet — choose .xlsx or legacy .xls. Scanned PDFs are supported.

OCR PDF (Make Searchable)

PDF

Run OCR on a scanned PDF to add a searchable, copy-able text layer — the file looks identical but you can now select text.

PDF to Text

PDF

Extract all text from a PDF as a plain .txt file — great for search-and-replace, archiving, or copy-paste. Or output Markdown with detected headings (#/##/###).

PDF Editor

PDF

Edit any PDF in your browser — add text, highlight, whiteout, sticky notes, freehand drawing, shapes, and images. Click to download. No uploads.

Word to PDF

PDF

Convert a Word document to PDF — preserves formatting, fonts, headers, tables, and images.

Triangle Area Calculator (3 methods)

Calc

Base × height, Heron's formula (SSS), or side-angle-side (SAS) — pick the inputs you have.