How to redact a PDF properly: real removal vs. a black bar

A black box over text is not redaction โ€” the words are still in the file. How to truly remove content, where data leaks, and how to verify it is gone.

6 min read

How to redact a PDF properly: real removal vs. a black bar

By ScoutMyTool Editorial Team ยท Last updated: 2026-05-21

Introduction

There is a recurring news story that makes every security person wince: an organization releases a โ€œredactedโ€ PDF, a reporter selects the text under the black bars, copies it into a document, and the secret names, numbers, or settlement figures are right there. It has happened to courts, governments, and big companies. The cause is always the same misunderstanding โ€” that drawing a black rectangle over text hides it. It does not. The text is still in the file. This guide is about the difference betweencovering and removing: why a black box leaks, what true redaction actually does, the hidden places sensitive data survives, and how to verify that what you redacted is genuinely, unrecoverably gone.

What actually removes content โ€” and what just hides it

Several techniques look identical on screen โ€” a black bar where text used to be โ€” but only some actually delete the underlying content. The difference is everything.

MethodLooks hiddenActually goneVerdict
Black rectangle / highlight over textYesNo โ€” text underneathNOT redaction โ€” leaks
Black box, then flatten onlyYesSometimes โ€” text may persistUnreliable
Change text color to whiteYesNo โ€” selectable/copyableNOT redaction โ€” leaks
Delete the text, then redact markYesYesTrue redaction
Redaction tool (remove + cover + flatten)YesYesTrue redaction โ€” verify
Rasterize the page after coveringYesYes (but loses text layer)Works; heavier file

Step by step โ€” redact so the data is actually gone

  1. Never start with a drawing tool. Do not reach for the rectangle or highlighter โ€” that only covers. Use a real redaction tool such as Redact PDF or Permanent Redaction.
  2. Mark every instance. Redact each occurrence of the sensitive value โ€” body text, headers, footers, captions. For repeated identifiers, a pattern redaction catches every match (SSNs, account numbers).
  3. Remove, then cover, then flatten. Apply the redaction so the underlying text/image data is deleted, the mark is placed, and the document is flattened โ€” see flattening a PDF.
  4. Strip metadata and secondary locations. Clear document properties, and check bookmarks, form fields, comments, and attachments โ€” sensitive values hide outside the visible page.
  5. Verify by extraction and search. In the output, try to select/copy the redacted text, search for the exact terms, and run a full text extraction โ€” all should come up empty. A redaction comparison demo shows the difference between covered and removed.
  6. Apply minimum necessary. If sharing beyond the required audience, redact everything outside the recipientโ€™s need โ€” see the privacy practices in PDF for healthcare and therapy records.
  7. Distribute only the verified copy. Keep the unredacted original secure; share only the flattened, verified file. Never send the working copy.

FAQ

Why is a black box over text not real redaction?
Because a PDF stores the text and the black box as two separate things stacked on top of each other. The rectangle is just an opaque shape drawn above the words; the words themselves are still in the file, fully intact. Anyone can select and copy the text right out from under the box, delete the box, or extract the raw text with a script or even a screen reader โ€” and the supposedly hidden information is revealed. This exact mistake has caused famous leaks of court filings, government documents, and contracts where reporters simply copy-pasted the text out of a "redacted" PDF. Visually covering and actually removing are completely different operations.
What does true redaction do differently?
True redaction removes the underlying content โ€” the actual text characters and any image data in the redacted area are deleted from the file, not merely hidden โ€” and then places a mark (usually a black or white box) where it was. After removal, the document is flattened so the change is permanent and the structure no longer carries the original content. The result: there is nothing left underneath to copy, extract, or recover. A proper redaction tool does these steps together (identify, remove, cover, flatten) and lets you verify afterward. The test is simple: in the output, try to select or search the text you redacted โ€” if it is truly gone, you find nothing.
Does flattening alone make a black box safe?
Not reliably. Flattening merges layers so they cannot be separated in a viewer, which stops the "delete the box" attack, but depending on how it is done the underlying text can still exist in the content stream and remain extractable by text-extraction tools or search. Flattening is a useful final step after true removal, but it is not a substitute for removing the content in the first place. Treat "I put a box and flattened it" as not good enough for anything sensitive; the only safe approach is to remove the content, then cover and flatten, then verify the text is unrecoverable.
What hidden places do I also need to check?
Redaction is not just the visible page. The same sensitive value can survive in the document metadata (title, author, keywords), in bookmarks or the table of contents, in form-field values, in comments and annotations, in layers, and in attached files. Text can also linger in a copy of the page that was edited rather than removed. A thorough redaction strips metadata and checks these secondary locations, not only the page body. For highly sensitive documents, after redacting, run a final extraction/search of the whole file for the terms you removed to confirm none survive anywhere.
Is rasterizing the page (turning it into an image) a safe shortcut?
It works for hiding text โ€” converting the page to a flat image after covering means there is no text layer left to extract โ€” but it has costs. The page becomes an image, so it is no longer searchable or selectable, accessibility suffers, and the file gets larger. It also does not help with metadata or other-page leaks. Rasterizing can be a pragmatic last resort when you cannot trust a redaction tool, but a proper redaction that removes only the sensitive content while keeping the rest of the page as real text is better in almost every way. If you do rasterize, still strip metadata.
How do I verify a redaction actually worked?
Test the output as an adversary would. Open the redacted file and try to select the text under each mark and copy it elsewhere โ€” you should get nothing. Use the find function to search for the exact words, names, or numbers you redacted; they should not be found. Run a text-extraction over the whole document and scan the result for the sensitive terms. Check the document properties/metadata. Only distribute the file once all of these come up clean. "Looks hidden on screen" is precisely the trap; verification by extraction and search is the only thing that confirms the information is genuinely gone.
Is it safe to redact a confidential PDF with an online tool?
Redaction is usually applied to the most sensitive documents you have, so the tool's data handling matters enormously. Prefer a tool that performs redaction locally in your browser and never uploads the file โ€” uploading a document to a server to redact it exposes the very content you are trying to protect. ScoutMyTool performs redaction client-side in your browser tab, so the document never leaves your machine, and lets you verify removal. For anything sensitive, never use a redaction tool that uploads.

Citations

  1. Wikipedia โ€” โ€œRedaction,โ€ including documented digital-redaction failures where covered text was recovered. en.wikipedia.org/wiki/Redaction
  2. Wikipedia โ€” โ€œSanitization (classified information),โ€ on truly removing sensitive content from documents. en.wikipedia.org โ€” Sanitization
  3. Wikipedia โ€” โ€œPDFโ€ (ISO 32000), the layered content model that makes a black box separate from the text beneath it. en.wikipedia.org/wiki/PDF

Redact so it is actually gone

ScoutMyTool removes the underlying content, flattens, and lets you verify โ€” entirely in your browser tab, so the sensitive document never leaves your machine.

Open Redact PDF โ†’