Fix ReportLab Unicode Font Errors

ReportLab's built-in Helvetica and Times fonts are Type 1 core fonts. They only cover the Latin-1 subset (roughly 256 code points). Any character outside that range — , , ©, accented letters beyond basic Latin-1, Arabic, Chinese — produces one of two failures: a UnicodeEncodeError that crashes the script, or a silent substitution that renders as a small empty box (the "tofu" glyph) in the PDF.

This page shows the root cause, a diagnostic snippet to reproduce it, and the fix: registering a TrueType font via pdfmetrics.registerFont(TTFont(...)) so ReportLab can encode the full Unicode range.

This error is common in Generating PDF Reports Dynamically pipelines and appears almost universally in invoice generation — see Create Dynamic Invoice PDFs Automatically for the broader invoice pattern.

Root cause

ReportLab maps Python strings to PDF glyph indices using an internal encoding table. For the built-in core fonts, that encoding is WinAnsiEncoding (a Windows-1252 superset of Latin-1). The euro sign is at code point U+20AC, which is outside the 0x00–0xFF window used by WinAnsiEncoding. When ReportLab tries to encode it, Python raises:

UnicodeEncodeError: 'latin-1' codec can't encode character '€' in position 3: ordinal not in range(256)

When the string goes through a different internal path (e.g. inside Paragraph with an XML-escaped entity), the error is suppressed but the glyph is missing from the output stream — the PDF viewer substitutes an empty box.

Both symptoms share the same root: the active font has no glyph table entry for the requested code point.

Minimal reproducible diagnostic

Run this to reproduce the error before applying the fix:

# pip install reportlab
from reportlab.pdfgen import canvas as rl_canvas
from reportlab.lib.pagesizes import A4
from pathlib import Path

OUT = Path("/tmp/rl_unicode_broken.pdf")

try:
    c = rl_canvas.Canvas(str(OUT), pagesize=A4)
    c.setFont("Helvetica", 12)               # built-in core font — no Unicode
    c.drawString(50, 700, "Price: €49.99")   # U+20AC — outside WinAnsiEncoding
    c.save()
    print("Script completed — check the PDF for boxes or missing glyphs")
except UnicodeEncodeError as exc:
    print(f"Reproduced: {exc}")

On most ReportLab versions this raises UnicodeEncodeError. On some it silently writes a box. Either outcome confirms the root cause.

Fix: register a TrueType font

Download a Unicode-complete TTF. DejaVu Sans is the most portable free option and ships with most Linux distributions.

# Linux (Debian/Ubuntu)
sudo apt install fonts-dejavu-core
# macOS
brew install font-dejavu

# Or download manually:
# https://github.com/dejavu-fonts/dejavu-fonts/releases
# Extract DejaVuSans.ttf from the archive.

Register and use the font before any canvas or Paragraph call:

# pip install reportlab
from pathlib import Path
from reportlab.pdfgen import canvas as rl_canvas
from reportlab.pdfbase import pdfmetrics          # font registry
from reportlab.pdfbase.ttfonts import TTFont       # TrueType loader
from reportlab.lib.pagesizes import A4

# --- Register once at module / script startup ---
FONT_PATH = Path("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf")
if not FONT_PATH.exists():
    raise FileNotFoundError(
        f"Font not found at {FONT_PATH}. Install fonts-dejavu-core or adjust FONT_PATH."
    )
pdfmetrics.registerFont(TTFont("DejaVuSans", str(FONT_PATH)))  # name + path

OUT = Path("/tmp/rl_unicode_fixed.pdf")

try:
    c = rl_canvas.Canvas(str(OUT), pagesize=A4)
    c.setFont("DejaVuSans", 12)                    # use the registered name, not "Helvetica"
    c.drawString(50, 750, "Price: €49.99")         # euro sign renders correctly
    c.drawString(50, 730, "Trademark: ReportLab™") # trademark symbol
    c.drawString(50, 710, "Name: Ångström Müller") # accented characters
    c.save()
    print(f"Written: {OUT}")
except Exception as exc:
    raise RuntimeError(f"PDF generation failed: {exc}") from exc

Key changes on each modified line:

  • pdfmetrics.registerFont(TTFont("DejaVuSans", str(FONT_PATH))) — loads the TTF glyph table into ReportLab's registry; do this once before any draw call.
  • c.setFont("DejaVuSans", 12) — switches the active font to the registered TrueType font; the string "DejaVuSans" must match the first argument to TTFont(...).

SVG: core font vs. TrueType font glyph lookup

Core font vs TrueType font glyph lookup Shows that Helvetica uses WinAnsiEncoding which covers only 256 code points and fails on € (U+20AC), while a registered TTFont uses a full Unicode cmap and succeeds. String "€49.99" Helvetica WinAnsiEncoding € not found UnicodeEncodeError ✗ fails String "€49.99" TTFont (DejaVuSans) Unicode cmap € → glyph 0x20AC renders correctly ✓ works Fix: pdfmetrics.registerFont(TTFont("DejaVuSans", path)) → c.setFont("DejaVuSans", 12)

Variant fix 1 — Platypus Paragraph styles

When using Paragraph flowables (the usual path in SimpleDocTemplate), set the font name on the ParagraphStyle, not on the canvas:

# pip install reportlab
from pathlib import Path
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.pagesizes import A4

FONT_PATH = Path("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf")
pdfmetrics.registerFont(TTFont("DejaVuSans", str(FONT_PATH)))

styles = getSampleStyleSheet()
# Override the font on a custom style — do NOT mutate the built-in "Normal" style
unicode_style = ParagraphStyle(
    "unicode_body",
    parent=styles["Normal"],
    fontName="DejaVuSans",   # changed from "Helvetica"
    fontSize=11,
    leading=15,
)

OUT = Path("/tmp/rl_para_unicode.pdf")
try:
    doc = SimpleDocTemplate(str(OUT), pagesize=A4)
    story = [
        Paragraph("Invoice total: €1,249.00", unicode_style),
        Paragraph("Trademark: Python™", unicode_style),
        Paragraph("Contact: Ångström Müller", unicode_style),
    ]
    doc.build(story)
    print(f"Written: {OUT}")
except Exception as exc:
    raise RuntimeError(f"Build failed: {exc}") from exc

Changed line: fontName="DejaVuSans" in the ParagraphStyle constructor — this propagates to all Paragraph flowables that use this style.

Variant fix 2 — CID fonts for CJK characters

For Chinese, Japanese, or Korean text, DejaVu Sans may not have sufficient coverage. Use a CID (Composite) font instead:

# pip install reportlab
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.cidfonts import UnicodeCIDFont

# ReportLab ships CID support for CJK fonts; no external file needed
pdfmetrics.registerFont(UnicodeCIDFont("HeiseiKakuGo-W5"))  # Japanese sans-serif

from reportlab.pdfgen import canvas as rl_canvas
from reportlab.lib.pagesizes import A4
from pathlib import Path

OUT = Path("/tmp/rl_cid_unicode.pdf")
try:
    c = rl_canvas.Canvas(str(OUT), pagesize=A4)
    c.setFont("HeiseiKakuGo-W5", 14)          # use the CID font name
    c.drawString(50, 700, "日本語テスト")       # Japanese text
    c.save()
    print(f"Written: {OUT}")
except Exception as exc:
    raise RuntimeError(f"CID font render failed: {exc}") from exc

Available built-in CID fonts: HeiseiKakuGo-W5 (Japanese), HeiseiMin-W3 (Japanese serif), HYSMyeongJo-Medium (Korean), STSong-Light (Simplified Chinese).

Variant fix 3 — encoding='utf-8' on data sources

If the crash happens before any PDF call, the issue is in data loading, not in ReportLab. The fix is in the CSV/file read:

# pip install pandas
import pandas as pd
from pathlib import Path

# Wrong — omitting encoding lets Python pick the system default (often latin-1 on Windows)
# df = pd.read_csv(Path("invoices.csv"))

# Correct — explicit UTF-8 prevents UnicodeDecodeError at load time
df = pd.read_csv(Path("invoices.csv"), encoding="utf-8")   # added encoding='utf-8'

If the source file was saved in Windows-1252 (common from Excel), use encoding="cp1252" or encoding="utf-8-sig" (for files with a BOM).

Verification

# pip install reportlab pypdf
from pathlib import Path
from pypdf import PdfReader

def verify_unicode_in_pdf(path: Path, expected_chars: list[str]) -> None:
    reader = PdfReader(str(path))
    text = " ".join(p.extract_text() or "" for p in reader.pages)
    for char in expected_chars:
        # Note: pypdf's text extraction may not round-trip all glyphs perfectly,
        # but the absence of UnicodeEncodeError during build is the primary signal.
        if char not in text:
            print(f"  Warning: '{char}' not found in extracted text (may be a pypdf limitation)")
        else:
            print(f"  OK: '{char}' present in extracted text")
    print(f"PDF built successfully: {path.name} ({len(reader.pages)} page(s))")

verify_unicode_in_pdf(Path("/tmp/rl_unicode_fixed.pdf"), ["€", "™", "Å"])

The primary verification signal is that pdfmetrics.registerFont and doc.build complete without raising UnicodeEncodeError. Visual inspection in a PDF viewer confirms glyph rendering.

Part of Generating PDF Reports Dynamically.