6 min read
PDF for journalists — source protection and secure sharing
By ScoutMyTool Editorial Team · Last updated: 2026-05-20
Journalists handling sensitive PDFs operate at the intersection of two very different concerns: protecting sources from identification, and protecting the journalist's own digital security from adversaries who may want to compromise the work. The PDF format carries metadata that can leak source identity, supports redactions that look complete but are routinely defeated, and serves as a potential malware vector when documents arrive from unknown sources. This article maps the operational habits that protect both sides, the tool choices that make those habits sustainable, and the threats each habit addresses.
This article is general information. For specific threat scenarios, consult your newsroom's security lead or organisations like Freedom of the Press Foundation.
Operational habits and the threats they address
| Habit | Threat addressed | Tool |
|---|---|---|
| Strip metadata from received documents | Source identification via author / producer / timestamps | ScoutMyTool Metadata Editor; exiftool -all= |
| Destructive redaction (not rectangle annotation) | Source name / address visible behind "redacted" rectangle | ScoutMyTool Redact PDF; Acrobat Pro Redact |
| Secure transmission with E2E encryption | Email interception by adversary | Signal, ProtonMail, SecureDrop |
| Air-gapped review for highest-sensitivity docs | Malware embedded in source documents | Dedicated review machine; OS like Qubes / Tails |
| Verify document authenticity | Fabricated "leaked" documents from bad-faith sources | Compare metadata, fonts, structure to known authentic samples |
| Limit who has access on the newsroom side | Internal leak of source material | Access-controlled shared storage; need-to-know basis |
| Document chain of custody | Legal challenge to document provenance | Per-receipt log: source / date / channel / who reviewed when |
Step by step — process a sensitive source document
- Receive on a dedicated channel. SecureDrop, Signal, ProtonMail, or in-person hand-off. Note source identity in the chain-of- custody log (kept separately from the document itself, in a newsroom-controlled secure log).
- Open in an air-gapped or isolated environment if the threat model warrants. For high-stakes documents, a dedicated machine not connected to your normal network reduces malware risk. For routine sensitivity, your regular newsroom machine with updated antivirus is acceptable.
- Strip metadata immediately. ScoutMyTool Metadata Editor client-side, or `exiftool -all= file.pdf` on a local machine. Verify with Properties → Description that no identifying fields remain. The cleaned file is the only version you work from going forward; archive the original separately under access control.
- Verify document authenticity. Compare structure, fonts, and metadata patterns to known-authentic documents from the same source organisation. For high-stakes documents, the verification step can take hours or days; do not publish unverified material regardless of time pressure.
- Apply destructive redactions before any publication.Redact source-identifying details, third-party names, and any information that would not survive editorial review. Acrobat Pro or ScoutMyTool Redact PDF; verify by attempting to select-copy redacted regions. Re-strip metadata after redaction (Acrobat's redaction can add metadata fields). Final pass: flatten the PDF to remove any annotation layer that might survive review.
The threat model matters more than the tools
Every operational habit above is tied to a specific threat. Stripping metadata addresses the source-identification threat; destructive redaction addresses the visible-redaction-leak threat; secure transmission addresses interception; air-gapped review addresses malware. For routine source documents where the threat model is low (public records, lightly-sensitive corporate documents), heavy operational security adds friction without proportionate benefit. For high-stakes documents where the threat model is severe (whistleblower documents, national-security leaks, criminal-source material), the full operational stack is necessary and the friction is part of the cost of protecting the source.
Develop the habit of consciously evaluating the threat model per document. Most newsroom leaks of source identity in the past two decades came from over-confident handling of documents the journalist misjudged as low-stakes. The judgment is the skill; the tools are secondary.
Related reading
- PDF redaction guide: destructive vs annotation redaction.
- PDF metadata editor: strip metadata before publication.
- Share PDFs securely: password and expiry layers.
- PDF security audit: checklist for sensitive document handling.
- Searchable PDF: OCR locally to avoid uploading source documents.
FAQ
- What metadata in a PDF could identify my source?
- Several fields. Author name (often set to source's real name automatically by their authoring tool). Producer / Creator (identifies the specific software and version used, narrowing the possible sender pool). Creation and modification timestamps (correlate with source's known activity patterns or work schedule). XMP custom fields (sometimes contain organisation-specific identifiers like internal document IDs, department codes, or workflow markers). EXIF data in embedded photos (camera serial number, GPS coordinates if the source photographed something). Before publishing any source-derived PDF, run a complete metadata strip; consult Source Protection-trained colleagues if available.
- How do I redact a document so the source cannot be re-identified?
- Use destructive redaction. Acrobat Pro: Tools → Redact → mark redaction areas → Apply Redactions (must explicitly apply, not just draw rectangles). ScoutMyTool Redact PDF runs the same destructive redaction client-side. Then strip metadata and flatten the file. Verify by trying to select-copy the redacted region; you should get the redaction colour, not the original text. Multiple high-profile leak cases (NSA / Reality Winner 2017, Manafort 2018) involved "redactions" that were actually rectangle annotations defeatable in seconds. Never rely on visual obscurity alone.
- Should I scan source documents to PDF rather than handle the originals?
- For source-protection reasons, often yes. Scanning produces an image-PDF that does not carry the metadata of the source's authoring tools — the scan's metadata reflects your scanner, not the source. Then OCR the scan locally (ScoutMyTool Make PDF Searchable, client-side) to make it searchable without re-introducing source-side processing. The scan path adds a buffer between source artefacts and the document you eventually publish. Trade-off: scanning loses some fidelity (text becomes pixels then OCR back); for high-stakes documents the trade-off is worth it; for routine source documents the original PDF is fine.
- How do I share sensitive PDFs with newsroom colleagues securely?
- Layered controls. Storage: access-controlled internal repository (most newsrooms have a sensitive-documents storage platform; if not, an encrypted volume on a dedicated machine). Transmission to colleagues: Signal or Wire for messages; SecureDrop for source-to-newsroom; never plain email with PDF attachments for sensitive content. Within the newsroom, limit who sees the document on need-to-know basis — small teams of reporters and editors, not broad distribution. Document the access list. After publication, retain the source documents per newsroom policy; some publications archive permanently, some destroy after a set period.
- What is "chain of custody" for source documents?
- A written record of every person who handled the document and when. For a journalism workflow: source sends document to reporter (logged with date, channel); reporter receives, verifies authenticity (logged with date, verification method); reporter shares with editor (logged); legal review (logged); publication (logged). The log lives separate from the document itself, typically in a dedicated secure workflow log. Useful for two reasons. First, if the document is challenged in court, you can demonstrate the provenance and handling. Second, if there is an internal leak, the access log narrows the investigation. Maintaining the log adds 30 seconds per handoff; the audit benefit is meaningful for high-stakes stories.
Citations
- Freedom of the Press Foundation — Source-protection guides and SecureDrop documentation.
- Committee to Protect Journalists — Digital safety kit.
- Citizen Lab — Source-protection and adversary-defence research.
- Reporters Without Borders — Online survival kit for journalists.
- ISO 32000-1:2008 — PDF metadata and encryption specifications.
Client-side PDF tools for journalism workflows
ScoutMyTool Metadata Editor and Redact PDF run entirely in the browser tab. Source documents never transit through a third-party server during processing.
Open the PDF toolkit →