7 min read
How to fix corrupt PDF files (free tools that actually work)
By ScoutMyTool Editorial Team ยท Last updated: 2026-05-20
After working with hundreds of users on document-recovery cases, the corrupt-PDF problem comes in three different flavours that look identical from the user's side ("the file will not open") but require different tools to fix. Knowing which flavour you have changes what works. Below is the diagnostic, the recovery workflow that handles each flavour, and the honest assessment of when a file is truly past saving.
Diagnose the corruption first
Three failure modes, three different fixes:
- Will not open at all. Acrobat / Preview throws "this file is damaged and could not be repaired" or similar. Cause: damaged header, missing trailer, or corrupted cross-reference table. Fix: structural rebuild (pass 1 of the repair tool).
- Opens but pages are blank, scrambled, or missing. The reader finds the page tree but cannot render some or all pages. Cause: damaged content streams or damaged embedded fonts. Fix: content-stream recovery (pass 2 of the repair tool).
- Opens with text, missing images / fonts.Page structure intact but specific resources are damaged. Cause: damaged object streams for fonts or images, usually a partial-download truncation. Fix: re-embed standard fonts; rasterise from preview where possible.
Step-by-step: repair a corrupt PDF
The ScoutMyTool repair tool lives at scoutmytool.com/pdf/repair-corrupted-pdf. Runs client-side โ no upload, no signup, no quota.
- Drop your corrupt PDF. The tool runs an initial scan and reports what it found: file size, parse-ability, suspected corruption type. Confirm the diagnosis matches your symptoms.
- Pick the repair mode.
- Structural rebuild (default). Scans body for object boundaries, rebuilds cross-reference, restores trailer. Handles the "will not open at all" mode.
- Content-stream recovery. Walks page content streams and skips unrecoverable operations. Handles "opens but pages blank".
- Text-only extraction. Last-resort fallback that produces a plain-text dump of recoverable content.
- Click Repair. Live progress shows which recovery pass is running. Most files repair in 10โ30 seconds; very large or deeply-damaged files can take several minutes.
- Review the repair report. The tool shows what it recovered and what could not be saved. Typical successful report: "31 pages found, 30 rendered, 1 page partially recovered (text only)". Page-level breakdown shows exactly which pages need re-checking.
- Download and verify. Open the repaired PDF in Acrobat or Preview. The structural-rebuild pass is mostly transparent โ the file should look the same as before corruption. The content-stream recovery may show some "ghost" artefacts on the most-damaged pages.
- If repair fails entirely. Try the text-only extraction as a final fallback to recover at least the readable words. For images, run a separate pass with Extract Images โ it bypasses the page tree and walks embedded image objects directly.
When the file is genuinely unrecoverable
Three honestly-fatal modes:
- Truncation that lost data. A 10 MB PDF cut to 4 MB mid-download has lost 6 MB of content. Repair tools can rebuild structure around what remains, but they cannot conjure back missing bytes. Verify the expected file size against the source; if it does not match, restore from backup before spending time on repair.
- Encrypted with a lost key. No brute-force-free recovery is possible for properly encrypted PDFs. If the document was protected with a user password, removing the password first via Unlock PDF requires the password โ there is no shortcut.
- Multi-structure simultaneous damage.When the trailer, cross-reference table, content streams, and embedded resources are all corrupted, the file no longer has a self-consistent base for recovery heuristics to build from. Text-only extraction may still salvage words; structural recovery will not.
The PDF specification (ISO 32000-1, ยง7) describes the file structure that repair tools work against1; understanding which part is damaged is what determines recoverability. veraPDF and Adobe's preflight engine can both report structural integrity in detail if you want a forensic-grade diagnosis before attempting repair.
Related ScoutMyTool articles and tools
- Repair Corrupted PDF tool
- PDF to text โ recover plain text from a working PDF
- Extract Images tool โ for image-only recovery from partly-damaged PDFs.
- Why does my PDF look different on different devices? โ sometimes "corrupt" is actually a rendering difference.
- Unlock PDF โ for password-protected (not corrupt) files.
- PDF Editor โ re-author selected pages after recovery.
- Embedded fonts in PDF โ missing-font symptoms can mimic corruption.
Frequently asked questions
- What does it mean when a PDF "is corrupt"?
- PDF is a structured binary format with a header ("%PDF-1.x"), a body of indirect objects, a cross-reference table mapping object IDs to byte offsets, and a trailer pointing to the cross-reference. "Corrupt" means one of these structural pieces is damaged: a truncated file (download interrupted, disk full mid-write), a damaged cross-reference table (most common), missing or mismatched trailer, broken object streams, or content-stream parse errors. Different repair tools handle different corruption modes; no single tool handles all of them.
- Why did Acrobat refuse to open the file at all?
- Acrobat fails fast if the header is wrong, the trailer is missing, or the cross-reference table cannot be located. The fix in most cases is to rebuild the cross-reference table by scanning the file body for object boundaries โ this is what every PDF repair tool does as its first pass. If Acrobat would not even attempt to open the file, the repair pass usually succeeds; if Acrobat opens but reports "errors found", the corruption is deeper inside object streams and the repair pass may not be enough.
- My PDF opens but pages are blank, scrambled, or show "could not be loaded" errors.
- Two common causes. (a) The page content streams are damaged but the page tree is intact โ the reader knows there are pages but cannot render them. Fix: run the repair tool with "Re-render pages from preserved content" enabled; recovers most cases. (b) Embedded fonts are damaged โ pages render with the right text positions but no visible glyphs. Fix: re-embed standard fonts during repair (the repaired PDF will show in standard Times / Helvetica rather than the original typeface, but the content becomes legible).
- Can I recover anything from a truly broken PDF?
- Often, yes โ text extraction can work even when the structural repair fails. Run the file through a text-extractor that ignores the cross-reference and walks raw object streams (pdftotext with the -raw flag, or the ScoutMyTool repair tool's "Extract recoverable text" mode). The result is a plain-text dump of whatever could be read; layout is lost but the words come through. For partial recovery of images and tables, the same approach with -extract-images can pull out embedded image objects independent of the page structure.
- Is my PDF uploaded to your servers during repair?
- No. The repair operation runs entirely in your browser using pdf.js for parsing and a structural-recovery layer that rebuilds the cross-reference table and re-validates object boundaries. Your file is loaded into a sandboxed memory buffer, the rebuilt file is written to a new buffer, and the result is delivered as a download. Verify in DevTools Network โ zero outbound requests during repair. Important when the PDF was corrupted by a sync conflict or storage failure and the original copy is the only one.
- Are there PDF corruption modes that no tool can fix?
- Yes. (a) Truncation that lost data โ if a 10 MB PDF was cut to 5 MB mid-download, the missing 5 MB are not recoverable. (b) Encryption with a lost key โ the cipher-protected content streams cannot be decrypted without the key, full stop. (c) Damage to multiple critical structures simultaneously โ when the cross-reference table, trailer, and content streams are all corrupted, even the best heuristics cannot rebuild from a self-consistent base. For these cases, the realistic options are restoring from backup or accepting partial text-only recovery.
- What is the difference between PDF repair and PDF password recovery?
- Different operations entirely. Repair fixes structural damage; the document is intact, the file format is broken. Password recovery removes an access-control password from a file that is structurally fine; the document was deliberately locked, not damaged. The Unlock PDF tool handles password removal when you know the password; if you do not know the password, no tool can recover the contents without it, and any tool claiming to do so without a brute-force pass is either scamming or running a brute force you have not consented to.
Repair your corrupt PDF now โ free, no signup
Three recovery passes (structural, content-stream, text-only) cover almost every fixable corruption mode. Runs entirely in your browser โ your file never leaves your device.
Open the PDF Repair tool at scoutmytool.com/pdf/repair-corrupted-pdf โ