PDF OCR

Run OCR on a scanned PDF — get a searchable PDF, plain text, or Word document. Free, server-side, files deleted after processing.

Output format

Affects plain-text output only. Searchable-PDF and DOCX always use English (server config).

About this tool

Drop a scanned PDF — a photo of a contract, a scanned receipt, an old book chapter, anything where the page is an image instead of text — and choose what you want back: a searchable PDF (same look, with an invisible text layer added so you can copy / search), a Word DOCX (editable in Microsoft Word, Google Docs, or LibreOffice), or a plain .txt file (just the recognised words). The pipeline is OCRmyPDF + Tesseract running server-side on our infrastructure — the same open-source stack that powers paid services. We render each page at 300 DPI for accuracy, recognise the characters with Tesseract, and either layer the text back onto the original PDF (searchable PDF), pass the result through pdf2docx (Word), or extract the words directly (plain text). Files are processed and deleted right after you download. Plain-text mode supports English, French, Spanish, and German; searchable-PDF and DOCX modes default to English.

FAQ

  • When do I need OCR?
    When your PDF is a scan or photo of paper rather than a generated PDF (Word → Save As PDF, browser → Print to PDF, etc.). If you can't select or copy the text in your reader, the file needs OCR.
  • How long does it take?
    About 10 seconds per page on average. A 5-page scanned contract finishes in under a minute; a 50-page document in 2-3 minutes.
  • How accurate is it?
    90-99% on clean printed scans. Handwriting, low-DPI phone photos, very small fonts, and unusual scripts produce more errors. Always verify mission-critical text against the original.
  • What languages?
    English, French, Spanish, and German for plain-text output. Searchable PDF and DOCX use English. Need another language? Drop a note via the contact form.
  • Is my file uploaded?
    Yes — OCR has to run server-side because Tesseract doesn't run in the browser. Files are processed in a temp directory and deleted immediately after the response is sent. We don't keep copies, don't train models on them, don't share them.
  • How is this different from "OCR PDF (Make Searchable)"?
    That tool is searchable-PDF only. PDF OCR (this one) is the unified version that also offers DOCX and plain-text outputs from the same drop.
  • Can I use this on a non-scanned PDF?
    Yes, but it's wasted work — the existing text is already extractable. Use PDF to Text or PDF to Word directly for native PDFs; OCR is only useful when the page is an image.

More tools you might like