How to make a multi-language PDF with embedded fonts

The fix for boxes and missing glyphs in multi-language PDFs is embedding fonts that cover your scripts. Why it breaks, how embedding/subsetting fixes it, and how to verify.

6 min read

How to make a multi-language PDF with embedded fonts

By ScoutMyTool Editorial Team ยท Last updated: 2026-05-22

Introduction

The classic multi-language PDF failure โ€” boxes, question marks, or wrong characters where the non-Latin text should be โ€” is almost always a font problem: the font lacks the glyphs, or it is not embedded so the reader substitutes one that does. The fix is to use fonts that cover all your scripts and embed them, so the glyphs travel inside the file and render identically everywhere. This guide explains why multi-language PDFs break, what embedding and subsetting do, how to choose fonts with the right coverage, how to verify the result (donโ€™t assume โ€” your machine may render fine when the file is broken for others), and the separate matter of right-to-left and complex-script layout.

Symptoms and their causes

SymptomCause
Boxes (โ–กโ–กโ–ก) or "tofu"Font lacks the glyphs / not embedded
Wrong characters / question marksEncoding or missing-glyph fallback
Renders for you, not for othersFont on your system but not embedded
RTL text out of orderRight-to-left handling, separate from fonts

Step by step โ€” fonts that render everywhere

  1. Choose fonts that cover your scripts. Ensure the font includes every script/character you use (broad-Unicode families, or one font per script).
  2. Embed (subset) the fonts. Store the glyphs in the PDF so it renders regardless of the readerโ€™s installed fonts; subset to keep size down for large (e.g. CJK) fonts.
  3. Author RTL/complex scripts properly. Use a tool that handles right-to- left direction and shaping โ€” fonts supply glyphs, layout supplies order โ€” see multilingual PDFs.
  4. Verify embedding. Confirm every font is embedded with Font Embedding Check โ€” one non-embedded font breaks it elsewhere.
  5. Proof on another device. Open it somewhere that lacks your fonts and confirm the scripts display correctly (not boxes).
  6. Validate for archival if needed. Embedding is required for PDF/A โ€” validate (see PDF/A).
  7. Mind translation accuracy too. Fonts are rendering; the translation is separate โ€” see bilingual PDFs and translation workflows.

FAQ

Why do multi-language PDFs show boxes or missing characters?
Almost always a font problem: the characters render as boxes (sometimes called "tofu"), question marks, or wrong glyphs because the font does not contain those characters, or because the font is not embedded in the PDF so the reader substitutes another font that lacks them. Different scripts (Cyrillic, Greek, Arabic, Hebrew, CJK, Indic, etc.) need fonts that actually include those glyphs. So a multi-language PDF breaks when it relies on a font missing the needed characters, or when it depends on a font being present on the reader's system (it might not be). The fix is using fonts that cover your scripts and embedding them, so the glyphs travel with the file.
What does embedding fonts do?
Embedding stores the actual font (or the needed subset of it) inside the PDF, so the document carries its own glyphs and renders identically on any device, regardless of what fonts the reader has installed. Without embedding, the PDF references a font by name and hopes the reader has it; if not, substitution kicks in and characters can render wrong or as boxes โ€” especially likely for non-Latin scripts the reader's default fonts do not cover. So embedding is what makes a multi-language PDF reliable everywhere: the fonts are in the file. This is also why embedding is required for archival PDF/A. Embed the fonts that cover all your languages and the document looks the same for everyone.
What is font subsetting and should I use it?
Subsetting embeds only the glyphs the document actually uses, rather than the entire font โ€” which keeps file size down, important because full fonts for large scripts (CJK fonts especially) are big. So subsetting gives you the reliability of embedding without bloating the file. The trade-off: a subsetted font cannot render characters you did not include, so if the document might be edited later to add new characters, full embedding is safer; for a final document, subsetting is ideal. For most finished multi-language PDFs, subset-embedding is the right choice โ€” full glyph coverage for what is in the document, reasonable file size. Either way the glyphs you use are embedded and render everywhere.
How do I choose fonts that cover my languages?
Use fonts whose character sets actually include all the scripts and characters you need โ€” some fonts cover many scripts (large Unicode-coverage font families exist for exactly this), while many common fonts cover only Latin (plus a little). So check that your chosen font includes, say, the Cyrillic, Arabic, or CJK glyphs your document uses, or use a font family designed for broad coverage, or mix fonts per script (a Latin font plus a CJK font). The key is that for every character in the document, an embedded font supplies the glyph. Picking fonts with the right coverage up front avoids the missing-glyph problem at the source, and embedding then guarantees they render.
How do I verify the fonts are embedded?
Do not assume โ€” check. Inspect the PDF's fonts to confirm every font is embedded (or subset-embedded), since a single non-embedded font is the thing that breaks rendering on another machine. A font-embedding check lists the fonts and their embedding status, flagging any that are not embedded. Also visually proof the document: open it and confirm the non-Latin text actually displays correctly (not boxes), ideally on a different device than you authored on, since your machine has the fonts installed and may render fine even when they are not embedded. So verify both ways โ€” the embedding status and the visual result on another device โ€” before trusting a multi-language PDF.
What about right-to-left or complex scripts?
Fonts and text layout are separate issues. Embedding the right font ensures the glyphs exist; correct handling of right-to-left scripts (Arabic, Hebrew) and complex shaping (Arabic letter joining, Indic scripts) is about how the text is laid out, which your authoring tool must support. So a PDF can have the right embedded font yet still show RTL text in the wrong order if it was authored without proper RTL support. Embed the fonts AND author with a tool that handles the script's direction and shaping, then verify the result reads correctly. For complex or RTL scripts, getting both the glyphs (fonts) and the layout (shaping/direction) right is necessary.
Is it safe to do this online?
For confidential documents, prefer a tool that processes files locally. ScoutMyTool checks font embedding and validates PDFs in your browser tab, so the document never leaves your machine. For anything sensitive, confirm the tool does not upload before using it โ€” and always verify the scripts render correctly on another device.

Citations

  1. Wikipedia โ€” โ€œFont embedding,โ€ storing fonts in the file. en.wikipedia.org/wiki/Font_embedding
  2. Wikipedia โ€” โ€œUnicode,โ€ the character standard behind multi-script text. en.wikipedia.org/wiki/Unicode
  3. Wikipedia โ€” โ€œComputer font,โ€ on fonts and glyph coverage. en.wikipedia.org/wiki/Computer_font

Every script, rendered everywhere

Verify font embedding with ScoutMyToolโ€™s in-browser tools โ€” the document never leaves your machine. Always proof the scripts on another device.

Open Font Embedding Check โ†’