6 min read
How to make a multi-language PDF with embedded fonts
By ScoutMyTool Editorial Team ยท Last updated: 2026-05-22
Introduction
The classic multi-language PDF failure โ boxes, question marks, or wrong characters where the non-Latin text should be โ is almost always a font problem: the font lacks the glyphs, or it is not embedded so the reader substitutes one that does. The fix is to use fonts that cover all your scripts and embed them, so the glyphs travel inside the file and render identically everywhere. This guide explains why multi-language PDFs break, what embedding and subsetting do, how to choose fonts with the right coverage, how to verify the result (donโt assume โ your machine may render fine when the file is broken for others), and the separate matter of right-to-left and complex-script layout.
Symptoms and their causes
| Symptom | Cause |
|---|---|
| Boxes (โกโกโก) or "tofu" | Font lacks the glyphs / not embedded |
| Wrong characters / question marks | Encoding or missing-glyph fallback |
| Renders for you, not for others | Font on your system but not embedded |
| RTL text out of order | Right-to-left handling, separate from fonts |
Step by step โ fonts that render everywhere
- Choose fonts that cover your scripts. Ensure the font includes every script/character you use (broad-Unicode families, or one font per script).
- Embed (subset) the fonts. Store the glyphs in the PDF so it renders regardless of the readerโs installed fonts; subset to keep size down for large (e.g. CJK) fonts.
- Author RTL/complex scripts properly. Use a tool that handles right-to- left direction and shaping โ fonts supply glyphs, layout supplies order โ see multilingual PDFs.
- Verify embedding. Confirm every font is embedded with Font Embedding Check โ one non-embedded font breaks it elsewhere.
- Proof on another device. Open it somewhere that lacks your fonts and confirm the scripts display correctly (not boxes).
- Validate for archival if needed. Embedding is required for PDF/A โ validate (see PDF/A).
- Mind translation accuracy too. Fonts are rendering; the translation is separate โ see bilingual PDFs and translation workflows.
Related reading and tools
- Multilingual PDFs: scripts, fonts, and layout.
- Bilingual PDF (EN+ES): a two-language build.
- Translation workflows: accurate multilingual delivery.
- Validate PDF/A: embedding required for archival.
- PDF fonts: how fonts work in PDFs.
- Font Embedding Check: verify embedding in your browser.
- All ScoutMyTool PDF tools: the full toolkit.
FAQ
- Why do multi-language PDFs show boxes or missing characters?
- Almost always a font problem: the characters render as boxes (sometimes called "tofu"), question marks, or wrong glyphs because the font does not contain those characters, or because the font is not embedded in the PDF so the reader substitutes another font that lacks them. Different scripts (Cyrillic, Greek, Arabic, Hebrew, CJK, Indic, etc.) need fonts that actually include those glyphs. So a multi-language PDF breaks when it relies on a font missing the needed characters, or when it depends on a font being present on the reader's system (it might not be). The fix is using fonts that cover your scripts and embedding them, so the glyphs travel with the file.
- What does embedding fonts do?
- Embedding stores the actual font (or the needed subset of it) inside the PDF, so the document carries its own glyphs and renders identically on any device, regardless of what fonts the reader has installed. Without embedding, the PDF references a font by name and hopes the reader has it; if not, substitution kicks in and characters can render wrong or as boxes โ especially likely for non-Latin scripts the reader's default fonts do not cover. So embedding is what makes a multi-language PDF reliable everywhere: the fonts are in the file. This is also why embedding is required for archival PDF/A. Embed the fonts that cover all your languages and the document looks the same for everyone.
- What is font subsetting and should I use it?
- Subsetting embeds only the glyphs the document actually uses, rather than the entire font โ which keeps file size down, important because full fonts for large scripts (CJK fonts especially) are big. So subsetting gives you the reliability of embedding without bloating the file. The trade-off: a subsetted font cannot render characters you did not include, so if the document might be edited later to add new characters, full embedding is safer; for a final document, subsetting is ideal. For most finished multi-language PDFs, subset-embedding is the right choice โ full glyph coverage for what is in the document, reasonable file size. Either way the glyphs you use are embedded and render everywhere.
- How do I choose fonts that cover my languages?
- Use fonts whose character sets actually include all the scripts and characters you need โ some fonts cover many scripts (large Unicode-coverage font families exist for exactly this), while many common fonts cover only Latin (plus a little). So check that your chosen font includes, say, the Cyrillic, Arabic, or CJK glyphs your document uses, or use a font family designed for broad coverage, or mix fonts per script (a Latin font plus a CJK font). The key is that for every character in the document, an embedded font supplies the glyph. Picking fonts with the right coverage up front avoids the missing-glyph problem at the source, and embedding then guarantees they render.
- How do I verify the fonts are embedded?
- Do not assume โ check. Inspect the PDF's fonts to confirm every font is embedded (or subset-embedded), since a single non-embedded font is the thing that breaks rendering on another machine. A font-embedding check lists the fonts and their embedding status, flagging any that are not embedded. Also visually proof the document: open it and confirm the non-Latin text actually displays correctly (not boxes), ideally on a different device than you authored on, since your machine has the fonts installed and may render fine even when they are not embedded. So verify both ways โ the embedding status and the visual result on another device โ before trusting a multi-language PDF.
- What about right-to-left or complex scripts?
- Fonts and text layout are separate issues. Embedding the right font ensures the glyphs exist; correct handling of right-to-left scripts (Arabic, Hebrew) and complex shaping (Arabic letter joining, Indic scripts) is about how the text is laid out, which your authoring tool must support. So a PDF can have the right embedded font yet still show RTL text in the wrong order if it was authored without proper RTL support. Embed the fonts AND author with a tool that handles the script's direction and shaping, then verify the result reads correctly. For complex or RTL scripts, getting both the glyphs (fonts) and the layout (shaping/direction) right is necessary.
- Is it safe to do this online?
- For confidential documents, prefer a tool that processes files locally. ScoutMyTool checks font embedding and validates PDFs in your browser tab, so the document never leaves your machine. For anything sensitive, confirm the tool does not upload before using it โ and always verify the scripts render correctly on another device.
Citations
- Wikipedia โ โFont embedding,โ storing fonts in the file. en.wikipedia.org/wiki/Font_embedding
- Wikipedia โ โUnicode,โ the character standard behind multi-script text. en.wikipedia.org/wiki/Unicode
- Wikipedia โ โComputer font,โ on fonts and glyph coverage. en.wikipedia.org/wiki/Computer_font
Every script, rendered everywhere
Verify font embedding with ScoutMyToolโs in-browser tools โ the document never leaves your machine. Always proof the scripts on another device.
Open Font Embedding Check โ