7 min read
PDF redaction โ properly remove sensitive info (not just black bars)
By ScoutMyTool Editorial Team ยท Last updated: 2026-05-20
After working with hundreds of users on document-sharing workflows, the single most consequential failure mode in the entire PDF universe is "I drew a black rectangle over the sensitive part" โ because the underlying text is still in the file, anyone reading the file can copy-paste it out, and the "redacted" content is published as plainly as if no redaction had been attempted. There is no excuse for this failure mode anymore; the tools to do it correctly are free and have been for years. Below is the workflow that actually works, with a clear test for verifying that it did.
Step-by-step: redact a PDF so the redacted text cannot be recovered
The ScoutMyTool tool lives at scoutmytool.com/pdf/redact-pdf. Runs client-side โ no upload, no signup, no quota.
- Drop your PDF. Loads into a sandboxed memory buffer; nothing is uploaded. This step is the first time the file leaves its original location, and it leaves to your own browser memory, not to a server.
- Mark regions to redact. Click-and-drag a rectangle over each sensitive area. Multiple rectangles per page are fine. For find-and-redact ("redact every instance of this person's name"), use the search box to find every occurrence and bulk-mark them.
- Optional: pattern redaction. Type a regex (e.g.
\d{3}-\d{2}-\d{4}for US SSNs) and the tool finds every match across the document and proposes them for redaction. Useful for bulk-removing SSNs, credit card numbers, email addresses, account numbers. - Pick the cover style. Default opaque black rectangle โ the convention for legal and government redaction. Optional: a labelled rectangle ("[REDACTED]", "[FOIA b(6)]", "[ATTORNEY-CLIENT PRIVILEGE]") for cases where the reader needs to know a redaction happened and why.
- Click Apply Redactions. The tool finds every text object that intersects a redaction rectangle, removes those text objects from the page content stream, removes the corresponding glyphs from any embedded font subset that becomes unused, draws the cover rectangle, and rebuilds the page.
- Verify the redaction worked. The tool runs an automatic post-check: extracts text from the redacted output, searches for any text that intersected a redaction region. If anything survived, you see a warning with the specific text โ almost always due to a rectangle that did not fully cover the text bounding box. Adjust the rectangle and re-apply.
- Optional: sanitise metadata. Click "Sanitise document". This removes document-level metadata (author, title, history) and strips incremental update entries that may contain a pre-redaction copy of the file. Recommended for any redacted file that will be published widely.
- Download. The redacted, sanitised PDF is delivered as a download. Verify externally by opening in a different tool (Acrobat, Preview, pdftotext) and attempting to copy-paste from a redacted region โ should yield nothing.
The well-documented failure modes you do NOT want to repeat
- Drawing a black rectangle in any PDF editor.Common in Preview, Acrobat's comment tool, online "PDF editors". Visual only. Underlying text remains in the content stream.
- Setting text colour to black on a black background. Slightly more sophisticated, same flaw. Copy-paste still returns the text; structural parsers still extract it.
- Highlighting with a black marker in a PDF annotation tool. Same problem at a different layer. The annotation lives above the page; the page content is unchanged.
- Printing to PDF after "redacting" with rectangles. Sometimes works (rasterises everything, which destroys the text) but unreliable โ depends on the printer driver respecting the annotation layer.
- Forgetting to sanitise metadata after redaction. The redacted content is gone from the page but lives on in document-level Author / Subject / Keywords fields, or in the file's incremental-update history. Sanitise.
The U.S. National Security Agency's declassifiedRedacting with Confidence guide1 documents these failure modes in detail and remains the most accessible plain- English reference for the topic. The PDF specification (ISO 32000-1) defines redaction structurally as the removal of content-stream objects, distinct from drawing opaque shapes on top2.
Related ScoutMyTool articles and tools
- PDF Redact tool
- PDF Redaction Permanent tool โ sanitises history alongside the redaction.
- HIPAA-aware redaction โ pre-configured patterns for healthcare PHI.
- Delete pages from PDF โ when whole pages should be removed rather than regions.
- Broader PDF redaction guide
- PDF security guide โ combined redaction + encryption + access control.
- Unlock PDF โ required first if the source is password-protected.
Frequently asked questions
- What is the difference between a real redaction and a black rectangle on top of text?
- A real redaction removes the underlying text from the PDF's content stream โ the bytes that encoded the redacted words are deleted, and a black rectangle is drawn in their place. A fake redaction (drawing a black rectangle over the text in any PDF editor) leaves the underlying text completely intact: anyone who copy-pastes the page gets the redacted text back, anyone who uses Ctrl-F still finds it, and any structural PDF parser still extracts it. Real redaction edits the content stream and is irreversible; fake redaction is only visual and has caused several high-profile information leaks where redacted court filings were trivially recovered.
- Examples of fake-redaction leaks that hit the news?
- Notable cases include the U.S. v. Manafort 2019 court filing where defence lawyers redacted client cooperation details with black rectangles; reporters at multiple outlets recovered the text within minutes by copy-pasting. The 2010 TSA full-body scanner study had its identifying personnel names "redacted" by image-overlay that left the original PDF text underneath. The pattern repeats: a sensitive document is "redacted" with what looks like a black bar, the file is published, and the redacted text is recovered by the next person to read it carefully. Every example shares the same root cause: the redaction was visual only, not structural.
- How does ScoutMyTool ensure real redaction?
- The tool finds the text objects in the PDF content stream that intersect each redaction rectangle, removes those text objects from the stream entirely, and then draws an opaque black rectangle in their place. The result PDF has no extractable text in the redacted regions โ copy-paste yields nothing, Ctrl-F finds nothing, every PDF parser returns empty. Verify by opening the redacted output in a text-extraction tool: the redacted words must not appear in the output. ScoutMyTool runs a built-in extraction check after redaction and warns if any redacted-region text survived.
- What about images that contain sensitive text (a scan of a contract with someone's name)?
- Image redaction is a different operation โ there is no underlying text to remove; the sensitivity is in the pixels of the image. The tool replaces the redacted region of the image with opaque black pixels, then re-encodes the image. To prevent recovery via image metadata or thumbnail caches, the tool also strips any EXIF/XMP metadata and re-generates the image's display stream from scratch. For scans where the entire document is image-only, run OCR first to identify the regions to redact, then redact at the pixel level.
- Is my PDF uploaded to your servers during redaction?
- No. Redaction runs entirely in your browser using pdf-lib. Your file is loaded into a sandboxed memory buffer, content streams are edited in-place, the modified file is delivered as a download. Verify in DevTools Network โ zero outbound requests. This is non-negotiable for redaction specifically: uploading a document to redact sensitive content to a third-party server defeats the entire purpose, because the unredacted document was sent to the third party as the first step.
- Can the redaction be reversed?
- Not from the output file. Real redaction physically removes the underlying text โ the redacted bytes are not stored anywhere in the output. The only way to "recover" redacted content is from the original unredacted source file (which you should treat as sensitive accordingly). The PDF specification has no mechanism for "hidden text underneath an opaque overlay" that some assume must exist; either the text is in the file or it is not.
- What about metadata โ author name, edit history, embedded font names?
- A complete redaction workflow removes structural metadata too. ScoutMyTool strips: document-level author / title / subject / keywords; XMP metadata; document outline entries that may reference redacted text; the original filename if it appeared in document properties. Embedded font names are technically not sensitive (just the font, not the user) and are preserved. For audit-grade redaction (court filings, FOIA responses), also run the redacted file through a "sanitise document" pass โ the Acrobat Pro feature, or the ScoutMyTool sanitise mode โ which removes deleted content from the file's incremental update history.
Redact your PDF properly โ free, no signup, no upload
Real redaction (text removed from content stream, not just covered), pattern-based bulk redaction, automatic verification pass. Runs entirely in your browser โ your sensitive document never leaves your device.
Open the PDF Redact tool at scoutmytool.com/pdf/redact-pdf โ