PDF translation — how to keep formatting across languages

Translation changes text length — some languages expand, others contract — which overflows boxes and breaks a PDF's layout. Add RTL scripts and font gaps, and 'keep the formatting' gets hard. How to do it well.

7 min read

PDF translation — how to keep formatting across languages

By ScoutMyTool Editorial Team · Last updated: 2026-05-21

The first time I had a tidy one-page PDF translated into German, it came back spilling onto a second page with text bursting out of every box — not because the translation was wrong, but because German simply takes more room than English, and the layout had no slack to absorb it. That is the heart of keeping formatting across languages: translation changes the lengthof the text (some languages expand, some contract), and a fixed PDF layout was built for the original length. Add right-to-left scripts that mirror the whole page and target scripts that need fonts your document lacks, and "just keep the formatting" becomes real work. This guide explains why translation breaks layout and how to preserve a professional appearance anyway.

The challenges — and how to handle each

ChallengeEffect on layoutHow to handle
Text expansionSome languages run noticeably longer; text overflowsLeave expansion room; re-fit after translating
Text contractionSome languages are much shorter; gaps and loose layoutRe-balance spacing; do not leave big voids
Right-to-left scriptsReading direction and layout mirrorUse RTL/bidi-aware layout, not just translated words
Script / font supportGlyphs missing; tofu boxesUse and embed a font covering the target script
Fixed boxes / tablesTranslated text no longer fits the cellAllow cells to grow or restyle for the new length

Step by step — translate and preserve the layout

  1. Work from the editable source if you can. Translate the original document the PDF was made from and re-export — adjusting layout in the source beats fighting a fixed PDF.
  2. If PDF-only, extract text cleanly. Pull the text out, OCR a scan first, and clean it so you are translating real prose, not fragments.
  3. Translate, expecting length to change. Anticipate expansion or contraction; the translated text will not occupy the same space as the original.
  4. Use and embed a font for the target script. Apply a font that covers the language’s script and embed it, so glyphs display everywhere and nothing renders as boxes.
  5. Handle RTL with bidi-aware layout. For right-to-left languages, mirror the layout and use bidirectional support, then have a native reader check it.
  6. Do a layout re-fit pass. Review every page for overflow, clipping, gaps, and broken tables; grow or rebalance boxes to fit the new text before finalising.

The principle: re-fit, don’t drop in

The mindset that preserves formatting across languages is to treat translation as "translate, then re-lay-out," never as a drop-in word swap. Because the translated text is a different length — and sometimes a different direction and script — the original layout cannot simply hold it; preserving the look means actively adapting the layout to the new text. That is exactly how professional localisation works: design with expansion room, translate, then re-fit and review. For a PDF specifically, work from the editable source where possible, embed fonts that cover the target script, give right-to-left languages a properly mirrored bidi layout, and always finish with a page-by-page re-fit pass. Skip that pass and you ship the burst-box, second-page mess; include it and the translated document looks as deliberate and polished as the original — which, after a real layout adaptation rather than a hopeful paste, is what it now is.

Related reading

FAQ

Why does translating a PDF break its formatting?
Mostly because translation changes the length of the text, and a PDF’s layout was built around the original length. Languages differ substantially in how much space the same meaning takes — translations into languages like German or Finnish often run noticeably longer than English, while translations into Chinese or Japanese are often much shorter. When the new text is longer, it overflows the boxes, lines, and pages designed for the original; when it is shorter, you get awkward gaps and loose, unbalanced layout. Add to this that some target languages are written right-to-left (which mirrors the whole reading direction and layout), and that the target script may need a font the document does not contain, and you can see why "just translate it and keep the formatting" is harder than it sounds. Preserving formatting means actively re-fitting the layout to the translated text, not expecting it to drop in unchanged.
How much do languages really expand or contract?
Enough to break tight layouts, and it varies by language and by how short the source text is. As a rough, widely-cited guide, translating from English into many European languages tends to lengthen the text, with shorter strings (like a button label or heading) expanding proportionally more than long paragraphs; translating into some Asian languages can shorten it considerably. The practical point is not a precise percentage — it depends on the specific languages and content — but the direction and the implication: you cannot assume the translated text occupies the same space as the original. This is exactly why software and document localisation professionals design with "expansion room" built in. For a PDF, it means buttons, table cells, captions, and tightly-set headings are the danger spots, because they have the least slack to absorb a longer translation.
What is the right workflow to translate a PDF and keep the layout?
Translate the text away from the fixed layout, then re-fit the layout to the result — ideally working from the editable source, not the PDF. If you have the original editable file (the document the PDF was made from), translate that and re-export, because adjusting layout in the source is far easier than fighting a fixed PDF. If all you have is the PDF, extract the text cleanly, translate it, and then place it back into a layout you adjust for the new length — growing or shrinking boxes, re-balancing spacing, fixing table cells. Either way, budget a layout pass after translation: review every page for overflow, clipped text, awkward gaps, and broken tables, and fix them. The mistake is treating translation as a drop-in replacement; treating it as "translate, then re-lay-out" is what actually preserves a professional appearance.
How do I handle right-to-left languages like Arabic or Hebrew?
You need genuine right-to-left (and bidirectional) support, not just the translated words poured into a left-to-right layout. RTL languages reverse the reading direction, which means text aligns to the right, the reading order of mixed content flips, and many layout elements — columns, navigation, the visual flow of the page — should mirror to feel correct to a native reader. Simply typing Arabic into an LTR-designed box often produces wrong alignment and broken ordering, especially where numbers or Latin terms are mixed in (the bidi algorithm has to handle the direction changes). So for RTL targets, use tools and layouts that are bidi-aware, mirror the layout where appropriate, and have the result checked by someone who reads the language. RTL is the case where "keep the formatting" most clearly means "adapt the formatting," because a faithful mirror, not a copy, is what reads correctly.
Why do some translated characters show as boxes or question marks?
Because the font in use does not contain glyphs for the target script, so the viewer has nothing to draw and shows tofu boxes or question marks. A document set in a font that only covers Latin characters cannot display Arabic, Chinese, Cyrillic, Devanagari, or other scripts — the characters are present as text, but the font lacks the shapes. The fix is to use a font that actually supports the target language’s script and to embed that font in the PDF, so the glyphs both exist and travel with the file. This is the same font-embedding discipline that matters for cross-device viewing, applied to scripts: pick a font with the coverage you need, apply it to the translated text, and embed it. Test on a device that does not have the font installed to confirm the glyphs really are embedded and not just rendering from your local fonts.
Is it safe to translate a confidential PDF online?
Be cautious, because the document content goes to whatever service does the translating. Many online PDF-translation tools upload your file to a third-party server, and machine-translation services process your text on their infrastructure — both of which are concerns for confidential material, and may also have data-handling implications. For sensitive documents, prefer handling the file with client-side tools for the PDF steps (extraction, re-export) so the file itself is not uploaded — ScoutMyTool’s PDF tools work client-side — and use a translation approach whose data handling you are comfortable with, or a professional human translator under confidentiality for high-stakes content. As always, confirm how each tool in the chain treats your data before sending a confidential document through it.

Citations

  1. Wikipedia — Translation (the discipline and its challenges)
  2. Wikipedia — Internationalization and localization (designing for text expansion)
  3. Wikipedia — Right-to-left script (RTL and bidirectional layout)
  4. Wikipedia — PDF (fixed layout and font embedding)

Get editable text to translate — in your browser

Convert your PDF to editable text or Word with ScoutMyTool so you can translate and re-fit the layout — client-side, so a confidential document never leaves your computer.

Open the PDF-to-Word tool →