PDF/A conversion — archive-quality PDF for long-term storage

What PDF/A actually is, which conformance level to pick, and how to convert + validate without paying for Acrobat.

11 min read

PDF/A conversion — archive-quality PDF for long-term storage (2026)

By ScoutMyTool Editorial Team · Last updated: 2026-05-20

Introduction

A library archivist explained PDF/A to me by holding up a 12-year-old PDF that no longer rendered correctly in any modern viewer — the fonts had been substituted, the JavaScript that drew a chart had broken with a Flash dependency, and the encryption password was lost. "This is why PDF/A exists," she said. "When a court asks for the 2018 filing in 2034, we have to be able to open it." PDF/A is the boring, unglamorous part of the PDF specification that keeps documents readable on a decades-long timeline. This article covers what each level (A-1, A-2, A-3, A-4) actually means, what gets stripped during conversion, and which free tools handle the conversion-and-validation pipeline without paying for Acrobat.

Why PDF/A is a different beast

A regular PDF is allowed to be lazy. It can reference fonts installed on the reader's machine, embed JavaScript that calls back to the original software, link to external documents on the open web, encrypt content with a password that only the original sender knows. Today that all works. In fifty years, some of those references will rot and the document will degrade into something that does not render correctly.

PDF/A solves the problem by forbidding everything that introduces external dependency. Defined by the ISO 19005 family of standards, PDF/A documents must embed every font they use, must not include JavaScript or audio/video (except within strict constraints in A-2/A-3/A-4), must not be encrypted, and must not reference external resources.1 The result is a self-contained file that any future PDF viewer can render the same way the original did — at the cost of being slightly larger and losing dynamic features.

The four PDF/A levels and their conformance tiers

LevelBase PDF specISO standardWhat it addsTypical use case
PDF/A-1aPDF 1.4ISO 19005-1:2005Full accessibility — tagged PDF structure tree required; Unicode-mapped text required.Public-sector and accessibility-mandated archival; the strictest A-1 conformance.
PDF/A-1bPDF 1.4ISO 19005-1:2005Visual reproduction only — guarantees the page looks the same in 50 years, no tagging requirement.When you need archival fidelity but not accessibility (mostly legacy back-files).
PDF/A-2a / A-2b / A-2uPDF 1.7ISO 19005-2:2011JPEG 2000, transparency, layers, embedded files (other PDF/A only), digital signatures. "u" requires Unicode mapping for text; "a" requires full tagging + accessibility.Most common archival level today — supports modern PDFs without losing fidelity. PDF/A-2u is a popular compromise (Unicode-mapped text, no tagging requirement).
PDF/A-3a / A-3b / A-3uPDF 1.7ISO 19005-3:2012Same as A-2 plus the ability to embed any file format (CSV, XML, source spreadsheet).E-invoicing standards (ZUGFeRD, Factur-X) — embed a machine-readable invoice as XML alongside the human-readable PDF.
PDF/A-4PDF 2.0ISO 19005-4:2020PDF 2.0 features; A-4e for engineering (3D, RichMedia), A-4f for embedded files.Long-term archival for documents produced after 2020; the modern default for greenfield workflows.

The trailing letter — a, b, u — indicates the conformance tier:

  • "a" (accessible) — requires the full tagged-PDF structure tree plus all the other constraints. The strictest tier; produces both archival and accessible output. Required where regulations cite both ISO 19005 and accessibility law (some EU public-sector mandates).
  • "b" (basic) — guarantees visual reproduction only. The page will look the same in fifty years, but no tagging is required.
  • "u" (Unicode-mapped text) — sits between "a" and "b". Requires every character to have a Unicode mapping so text can be reliably extracted and searched, but no tagging requirement. Available from PDF/A-2 onwards.

What PDF/A bans, and why

The constraints below are what make PDF/A different from a normal PDF. Knowing what each one bans (and why) helps you understand what your conversion tool may strip out of your source document.

FeatureStatusWhy
JavaScriptBanned in all PDF/A levelsFuture PDF viewers may not implement the same JavaScript engine. Archival demands fully self-contained rendering.
Audio / videoBanned in A-1; allowed-with-restrictions in A-2/A-3/A-4Multimedia codecs change; long-term preservation cannot guarantee playback. Modern levels allow embedding under strict format rules.
EncryptionBannedAn archived document must be readable without keys that may be lost.
External font referencesBanned — all fonts must be embeddedThe exact font used at creation must travel with the document; no relying on the reader's machine to provide it.
LZW compressionBannedHistorical patent uncertainty (now expired); excluded from PDF/A for safety.
TransparencyBanned in A-1; allowed in A-2/A-3/A-4Transparency required PDF 1.4+ rendering and was unreliable in early viewers; modern levels accept it.
External cross-document linksBannedLinks to other documents may rot; the archive must be self-contained.
Embedded filesBanned in A-1/A-2; allowed in A-3/A-4fA-3 was introduced specifically to allow ZUGFeRD-style embedded XML invoices.

When PDF/A is the right format

  • Government records and regulatory filings. US federal records under NARA guidelines, EU public-sector documents under EN 301 549 + the EAA, court e-filings in most jurisdictions all expect PDF/A.
  • Library, museum, and academic archival. Anything that goes into a long-term institutional repository (dSpace, Fedora Commons, Islandora) should be PDF/A — typically PDF/A-1 or PDF/A-2 — to meet preservation standards.
  • Legal e-discovery production. Producing parties in litigation often deliver document sets as PDF/A to ensure the production set is preserved identically through the entire litigation lifecycle.
  • E-invoicing standards. ZUGFeRD (Germany) and Factur-X (France) are based on PDF/A-3 specifically because A-3 allows embedded XML invoice data alongside the human-readable PDF.
  • Any "this document must be readable in twenty years" workflow. Personal record-keeping (tax returns, signed contracts, medical scans) is a valid non-institutional use of PDF/A. The cost of converting is small; the cost of a document degrading is potentially high.

Converting a regular PDF to PDF/A — five steps

  1. Pick a target level. For new archival in 2026, PDF/A-2u or PDF/A-4 is the right default. PDF/A-1 only if a specific regulation cites it. PDF/A-3 only if you need embedded XML (invoicing).
  2. Open the converter. Use ScoutMyTool's PDF/A converter in your browser; the conversion runs client-side. Alternatives: Adobe Acrobat Pro (paid), the free open-source veraPDF + ghostscript pipeline (CLI), or LibreOffice Writer / Draw (Export as PDF → tick "PDF/A-1b" or "PDF/A-2b" in options).
  3. Run the conversion. The tool inspects the source PDF, embeds any missing fonts (or subsets them for size), strips JavaScript, removes encryption, flattens transparency if targeting A-1, and writes the result. ScoutMyTool reports what was modified during the conversion.
  4. Validate the output. Run ScoutMyTool's PDF/A Compliance Validator or veraPDF to confirm the file actually conforms to the target level. The conversion process is best-effort and edge cases sometimes need manual fixes; validation catches them.
  5. Store with metadata. Save the PDF/A file with descriptive metadata (title, author, subject, keywords, date). Archival systems index against this metadata; a PDF/A file with no title and no author is harder to find later. Most tools set this from the source PDF's metadata, which means it is worth setting good metadata before conversion.

Free PDF/A validators worth knowing

  • veraPDF — Open-source reference validator developed by the PDF Association. Produces detailed reports against ISO 19005-1 through 19005-4. Available as a CLI tool, a Java GUI, and as a Maven library for integration into archival pipelines. The reference implementation cited by most archival institutions.
  • ScoutMyTool PDF/A Compliance Validator — Browser-based; reports conformance against the target level or the specific reasons for non-conformance. Convenient for one-off validation without installing veraPDF.
  • Adobe Acrobat Pro Preflight — Built-in to Acrobat Pro. The "Preflight" tool includes PDF/A validation profiles and can both report and auto-fix common issues.

Frequently asked questions

What is PDF/A and why is it different from regular PDF?
PDF/A is a family of ISO standards (19005-1 through 19005-4) for the long-term preservation of electronic documents. Where a regular PDF can rely on external resources (fonts installed on the reader's machine, JavaScript engines, network-resolved links), a PDF/A document must be entirely self-contained — every font embedded, no scripts, no encryption, no external dependencies. The promise is that a PDF/A file created in 2026 will render the same way in 2076, even if the original software is long gone.
Which PDF/A level should I pick — A-1, A-2, A-3, or A-4?
For new documents being archived today, PDF/A-2u or PDF/A-4 is the right default. PDF/A-1 is restrictive and based on PDF 1.4 (no transparency, no JPEG 2000) — only pick A-1 if a specific regulation cites it. PDF/A-2u allows transparency and modern features while keeping the tagging requirement optional. PDF/A-3 is identical to A-2 but allows embedded files — useful for invoicing (ZUGFeRD/Factur-X embed XML alongside the PDF). PDF/A-4 is the newest, based on PDF 2.0, and is the right choice for greenfield archival pipelines starting in 2026.
What is the difference between PDF/A-2a, PDF/A-2b, and PDF/A-2u?
The trailing letter is the conformance level. "a" (accessible) requires full tagged-PDF structure plus all the other PDF/A constraints — strict, but produces accessible-and-archival output in one file. "b" (basic) requires visual reproduction only — the page looks the same in fifty years but is not tagged. "u" (Unicode) sits between: visual reproduction plus a requirement that every character has a Unicode mapping (so the text can be reliably extracted and searched even in 2076). "u" is the most-used level in practice — adds searchability without the full accessibility burden of "a".
Why does PDF/A ban JavaScript?
Because future PDF viewers may not implement the same JavaScript engine, or may implement it differently. The whole promise of PDF/A is that the document renders the same way in fifty years; JavaScript is a moving target. The same logic applies to encryption (keys may be lost), external font references (the reader's system may not have the font), audio/video (codecs change), and external links (linked documents may disappear). PDF/A is the subset of PDF that can survive without those external dependencies.
Can I convert a regular PDF to PDF/A?
Yes, but the conversion has to fix or accept loss for any features the source uses that PDF/A bans. Common transformations during conversion: embedded fonts get baked in (system fonts get auto-subset-embedded), JavaScript is stripped, external links are converted to plain text or page-internal links, encryption is removed, transparency is flattened if targeting A-1. ScoutMyTool's PDF/A converter performs these transformations automatically and reports which features were modified during conversion.
How do I verify a PDF is actually PDF/A-compliant?
Run it through a validator. The reference implementation is veraPDF, an open-source validator developed by the PDF Association — it produces a detailed report against the ISO 19005-x conformance criteria. Adobe Acrobat Pro has a built-in Preflight tool that includes PDF/A validation. ScoutMyTool's PDF/A Compliance Validator runs in your browser and reports the conformance level (or the specific reasons for non-conformance). Always validate after conversion — the conversion process is best-effort, and edge cases (unusual fonts, ICC colour profiles, complex page content) sometimes produce output that does not pass.
Are PDF/A files larger than regular PDFs?
Usually slightly larger, because all fonts are fully embedded (or at least subset-embedded) rather than relying on the reader. The size increase is typically 5–15% over a regular PDF of the same content. For documents that already embed their fonts, the size difference is negligible. The trade is small; the long-term-readability guarantee is large. For massive archival projects (millions of documents), per-file size matters at scale, but for individual documents it almost never does.

Convert to PDF/A in your browser, free

Browser-based PDF/A converter. No upload — your document stays on your machine while we embed fonts, strip non-archival features, and write the archival output.

Open the free PDF/A converter →

References

  1. ISO 19005-1:2005, Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1). ISO catalogue: iso.org standard 38920 (accessed May 2026). Parts 2 (A-2), 3 (A-3), and 4 (A-4) extend the standard.
  2. PDF Association, PDF/A in a Nutshell. pdfa.org/pdfa-in-a-nutshell (accessed May 2026). Industry-standard plain-English overview of the four PDF/A parts and the conformance tiers.
  3. veraPDF Consortium, veraPDF — open-source PDF/A validation. verapdf.org (accessed May 2026). The reference implementation of PDF/A validation.
  4. ISO 32000-1:2008, Document management — Portable document format — Part 1: PDF 1.7. Public reference copy: opensource.adobe.com PDF32000_2008. The underlying PDF specification that PDF/A-2 and A-3 build on.