Fix ReportLab Unicode Font Errors
ReportLab's built-in Helvetica and Times fonts are Type 1 core fonts. They only cover the Latin-1 subset (roughly 256 code points). Any character outside that range — €, ™, ©, accented letters beyond basic Latin-1, Arabic, Chinese — produces one of two failures: a UnicodeEncodeError that crashes the script, or a silent substitution that renders as a small empty box (the "tofu" glyph) in the PDF.
This page shows the root cause, a diagnostic snippet to reproduce it, and the fix: registering a TrueType font via pdfmetrics.registerFont(TTFont(...)) so ReportLab can encode the full Unicode range.
This error is common in Generating PDF Reports Dynamically pipelines and appears almost universally in invoice generation — see Create Dynamic Invoice PDFs Automatically for the broader invoice pattern.
Root cause
ReportLab maps Python strings to PDF glyph indices using an internal encoding table. For the built-in core fonts, that encoding is WinAnsiEncoding (a Windows-1252 superset of Latin-1). The euro sign € is at code point U+20AC, which is outside the 0x00–0xFF window used by WinAnsiEncoding. When ReportLab tries to encode it, Python raises:
UnicodeEncodeError: 'latin-1' codec can't encode character '€' in position 3: ordinal not in range(256)
When the string goes through a different internal path (e.g. inside Paragraph with an XML-escaped entity), the error is suppressed but the glyph is missing from the output stream — the PDF viewer substitutes an empty box.
Both symptoms share the same root: the active font has no glyph table entry for the requested code point.
Minimal reproducible diagnostic
Run this to reproduce the error before applying the fix:
# pip install reportlab
from reportlab.pdfgen import canvas as rl_canvas
from reportlab.lib.pagesizes import A4
from pathlib import Path
OUT = Path("/tmp/rl_unicode_broken.pdf")
try:
c = rl_canvas.Canvas(str(OUT), pagesize=A4)
c.setFont("Helvetica", 12) # built-in core font — no Unicode
c.drawString(50, 700, "Price: €49.99") # U+20AC — outside WinAnsiEncoding
c.save()
print("Script completed — check the PDF for boxes or missing glyphs")
except UnicodeEncodeError as exc:
print(f"Reproduced: {exc}")
On most ReportLab versions this raises UnicodeEncodeError. On some it silently writes a box. Either outcome confirms the root cause.
Fix: register a TrueType font
Download a Unicode-complete TTF. DejaVu Sans is the most portable free option and ships with most Linux distributions.
# Linux (Debian/Ubuntu)
sudo apt install fonts-dejavu-core
# macOS
brew install font-dejavu
# Or download manually:
# https://github.com/dejavu-fonts/dejavu-fonts/releases
# Extract DejaVuSans.ttf from the archive.
Register and use the font before any canvas or Paragraph call:
# pip install reportlab
from pathlib import Path
from reportlab.pdfgen import canvas as rl_canvas
from reportlab.pdfbase import pdfmetrics # font registry
from reportlab.pdfbase.ttfonts import TTFont # TrueType loader
from reportlab.lib.pagesizes import A4
# --- Register once at module / script startup ---
FONT_PATH = Path("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf")
if not FONT_PATH.exists():
raise FileNotFoundError(
f"Font not found at {FONT_PATH}. Install fonts-dejavu-core or adjust FONT_PATH."
)
pdfmetrics.registerFont(TTFont("DejaVuSans", str(FONT_PATH))) # name + path
OUT = Path("/tmp/rl_unicode_fixed.pdf")
try:
c = rl_canvas.Canvas(str(OUT), pagesize=A4)
c.setFont("DejaVuSans", 12) # use the registered name, not "Helvetica"
c.drawString(50, 750, "Price: €49.99") # euro sign renders correctly
c.drawString(50, 730, "Trademark: ReportLab™") # trademark symbol
c.drawString(50, 710, "Name: Ångström Müller") # accented characters
c.save()
print(f"Written: {OUT}")
except Exception as exc:
raise RuntimeError(f"PDF generation failed: {exc}") from exc
Key changes on each modified line:
pdfmetrics.registerFont(TTFont("DejaVuSans", str(FONT_PATH)))— loads the TTF glyph table into ReportLab's registry; do this once before any draw call.c.setFont("DejaVuSans", 12)— switches the active font to the registered TrueType font; the string"DejaVuSans"must match the first argument toTTFont(...).
SVG: core font vs. TrueType font glyph lookup
Variant fix 1 — Platypus Paragraph styles
When using Paragraph flowables (the usual path in SimpleDocTemplate), set the font name on the ParagraphStyle, not on the canvas:
# pip install reportlab
from pathlib import Path
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.pagesizes import A4
FONT_PATH = Path("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf")
pdfmetrics.registerFont(TTFont("DejaVuSans", str(FONT_PATH)))
styles = getSampleStyleSheet()
# Override the font on a custom style — do NOT mutate the built-in "Normal" style
unicode_style = ParagraphStyle(
"unicode_body",
parent=styles["Normal"],
fontName="DejaVuSans", # changed from "Helvetica"
fontSize=11,
leading=15,
)
OUT = Path("/tmp/rl_para_unicode.pdf")
try:
doc = SimpleDocTemplate(str(OUT), pagesize=A4)
story = [
Paragraph("Invoice total: €1,249.00", unicode_style),
Paragraph("Trademark: Python™", unicode_style),
Paragraph("Contact: Ångström Müller", unicode_style),
]
doc.build(story)
print(f"Written: {OUT}")
except Exception as exc:
raise RuntimeError(f"Build failed: {exc}") from exc
Changed line: fontName="DejaVuSans" in the ParagraphStyle constructor — this propagates to all Paragraph flowables that use this style.
Variant fix 2 — CID fonts for CJK characters
For Chinese, Japanese, or Korean text, DejaVu Sans may not have sufficient coverage. Use a CID (Composite) font instead:
# pip install reportlab
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.cidfonts import UnicodeCIDFont
# ReportLab ships CID support for CJK fonts; no external file needed
pdfmetrics.registerFont(UnicodeCIDFont("HeiseiKakuGo-W5")) # Japanese sans-serif
from reportlab.pdfgen import canvas as rl_canvas
from reportlab.lib.pagesizes import A4
from pathlib import Path
OUT = Path("/tmp/rl_cid_unicode.pdf")
try:
c = rl_canvas.Canvas(str(OUT), pagesize=A4)
c.setFont("HeiseiKakuGo-W5", 14) # use the CID font name
c.drawString(50, 700, "日本語テスト") # Japanese text
c.save()
print(f"Written: {OUT}")
except Exception as exc:
raise RuntimeError(f"CID font render failed: {exc}") from exc
Available built-in CID fonts: HeiseiKakuGo-W5 (Japanese), HeiseiMin-W3 (Japanese serif), HYSMyeongJo-Medium (Korean), STSong-Light (Simplified Chinese).
Variant fix 3 — encoding='utf-8' on data sources
If the crash happens before any PDF call, the issue is in data loading, not in ReportLab. The fix is in the CSV/file read:
# pip install pandas
import pandas as pd
from pathlib import Path
# Wrong — omitting encoding lets Python pick the system default (often latin-1 on Windows)
# df = pd.read_csv(Path("invoices.csv"))
# Correct — explicit UTF-8 prevents UnicodeDecodeError at load time
df = pd.read_csv(Path("invoices.csv"), encoding="utf-8") # added encoding='utf-8'
If the source file was saved in Windows-1252 (common from Excel), use encoding="cp1252" or encoding="utf-8-sig" (for files with a BOM).
Verification
# pip install reportlab pypdf
from pathlib import Path
from pypdf import PdfReader
def verify_unicode_in_pdf(path: Path, expected_chars: list[str]) -> None:
reader = PdfReader(str(path))
text = " ".join(p.extract_text() or "" for p in reader.pages)
for char in expected_chars:
# Note: pypdf's text extraction may not round-trip all glyphs perfectly,
# but the absence of UnicodeEncodeError during build is the primary signal.
if char not in text:
print(f" Warning: '{char}' not found in extracted text (may be a pypdf limitation)")
else:
print(f" OK: '{char}' present in extracted text")
print(f"PDF built successfully: {path.name} ({len(reader.pages)} page(s))")
verify_unicode_in_pdf(Path("/tmp/rl_unicode_fixed.pdf"), ["€", "™", "Å"])
The primary verification signal is that pdfmetrics.registerFont and doc.build complete without raising UnicodeEncodeError. Visual inspection in a PDF viewer confirms glyph rendering.
Related
- Generating PDF Reports Dynamically — the parent guide where this error commonly appears; ReportLab canvas and Platypus patterns
- Create Dynamic Invoice PDFs Automatically — invoice pipeline where
€and accented customer names trigger this error - Fixing Encoding Errors in CSV Files — fix encoding issues in the data source before they reach the PDF renderer
- Automating PDF Extraction & Generation — full PDF automation overview
Part of Generating PDF Reports Dynamically.