Automating Word Document Creation

Generating Word documents by hand — or by recording macros — breaks the moment the data changes or the volume grows past a handful of files. Macro-recorded VBA is brittle across Office versions, COM automation is Windows-only, and copying content into templates by hand does not scale. python-docx lets you construct .docx files entirely from Python: no Word installation required, no COM automation, no platform restriction. The library writes the underlying OOXML XML directly, so the output is indistinguishable from a file Word would produce.

This guide covers every core building block in depth — paragraphs, headings, runs, tables, page breaks, and sections — then assembles them into a tested, production-ready script. For template-driven generation (Jinja2 placeholders, conditional blocks, row loops over datasets), see Dynamic Mail Merge with Python. For embedding photographs, logos, or programmatically generated charts, see Inserting Images into Word Documents.

Prerequisites

# system deps: none beyond Python 3.9+
pip install python-docx pandas

Confirm the library version before starting:

# pip install python-docx
import docx
print(docx.__version__)  # e.g. 1.1.2

Create an output directory for generated files:

mkdir -p output/word

python-docx ships with a built-in default template (default.docx) bundled inside the package. Calling Document() with no arguments starts from that template, which provides sensible margin defaults and the standard Word style set (Normal, Heading 1–9, Title, Body Text, Table Grid, etc.).


1. Inspect Before You Build

Before writing generation code against an existing corporate template, enumerate the styles it defines. Using an unknown style name silently falls back to Normal, which causes unexpected formatting that is hard to debug after the fact.

# pip install python-docx
from pathlib import Path
from docx import Document

REFERENCE = Path("reference.docx")

try:
    doc = Document(REFERENCE)
except FileNotFoundError:
    # No reference? Start from the built-in default template.
    doc = Document()

# Print every built-in paragraph and character style name
for style in doc.styles:
    if style.type.name in ("PARAGRAPH", "CHARACTER"):
        print(f"{style.type.name:12s}  {style.name}")

Run this once and capture the output. The style names you see — exactly as printed, with correct capitalisation and spacing — are the strings you pass to doc.add_paragraph(style=...) or run.style = doc.styles[name] later.


2. Core Building Blocks

Step 1 — Create a Document and Configure Page Layout

Document() opens or creates a .docx and returns the root object. Every subsequent element-add call appends to its content in document order. Set page geometry via the default section before adding content:

# pip install python-docx
from pathlib import Path
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.section import WD_ORIENT

OUTPUT = Path("output/word/report.docx")
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

doc = Document()

# Default section — first (and only) section in a new document
section = doc.sections[0]
section.page_width    = Inches(8.5)
section.page_height   = Inches(11)
section.orientation   = WD_ORIENT.PORTRAIT
section.top_margin    = Inches(1)
section.bottom_margin = Inches(1)
section.left_margin   = Inches(1.25)
section.right_margin  = Inches(1.25)

Margins are set in EMUs (English Metric Units) internally, but Inches() converts for you. Pt(), Cm(), and Mm() from docx.shared work the same way.

Step 2 — Headings

doc.add_heading("Quarterly Sales Report", level=0)   # Title style
doc.add_heading("Executive Summary",      level=1)   # Heading 1
doc.add_heading("Revenue by Region",      level=2)   # Heading 2
doc.add_heading("APAC Detail",            level=3)   # Heading 3

level=0 maps to Word's "Title" style. level=1 through level=9 map to Heading 1–9. Headings appear in the document's navigation pane and are picked up by automated table-of-contents generation.

Step 3 — Paragraphs and Runs

A paragraph holds one or more runs; a run is a contiguous span of text that shares identical character-level formatting. Understanding this distinction is essential — it is the same model Word uses internally.

# pip install python-docx
from docx.shared import Pt, RGBColor

# Simple paragraph: one call adds text and returns a Paragraph
plain = doc.add_paragraph("This paragraph uses the Normal style baseline.")

# Mixed-format paragraph: build it run by run
para  = doc.add_paragraph()                        # empty paragraph
run1  = para.add_run("Revenue grew ")
run2  = para.add_run("14 %")
run2.bold            = True
run2.font.color.rgb  = RGBColor(0x16, 0x65, 0x34)  # dark green

run3  = para.add_run(" year-over-year, driven by APAC expansion.")

# Paragraph-level spacing
para.paragraph_format.space_after  = Pt(6)
para.paragraph_format.space_before = Pt(0)

For detailed control of font family, size, and East-Asian character rendering, see Set Fonts and Styles with python-docx — that page covers the w:eastAsia oxml workaround and how to define reusable named character styles.

Step 4 — Tables

Tables are the most complex element. python-docx provides a clean API for creation and row-by-row population, but some formatting tasks (column widths, cell shading, repeating header rows) require dropping to the underlying XML.

# pip install python-docx
from docx.shared import Inches
from docx.oxml.ns import qn
from docx.oxml   import OxmlElement

headers = ["Region",    "Q3 Revenue", "Q4 Revenue", "YoY %"]
rows    = [
    ["APAC",            "$4.2 M",     "$5.1 M",     "+21 %"],
    ["EMEA",            "$3.8 M",     "$4.0 M",     "+5 %"],
    ["Americas",        "$6.1 M",     "$6.9 M",     "+13 %"],
    ["Middle East",     "$1.1 M",     "$1.4 M",     "+27 %"],
]

table = doc.add_table(rows=1, cols=len(headers))
table.style = "Table Grid"

# Set explicit column widths
for i, width in enumerate([Inches(1.6), Inches(1.4), Inches(1.4), Inches(1.0)]):
    for cell in table.columns[i].cells:
        cell.width = width

# Header row — bold text
hdr_cells = table.rows[0].cells
for i, text in enumerate(headers):
    hdr_cells[i].text = text
    hdr_cells[i].paragraphs[0].runs[0].bold = True

# Data rows
for row_data in rows:
    cells = table.add_row().cells
    for i, text in enumerate(row_data):
        cells[i].text = text

To repeat the header row on each printed page (important for long tables):

# Mark the header row as a repeating header via OOXML
tr    = table.rows[0]._tr
trPr  = tr.get_or_add_trPr()
tblHeader = OxmlElement("w:tblHeader")
trPr.append(tblHeader)

Step 5 — Page Breaks and Section Breaks

A soft page break inserts \x0c inside the current section. A section break creates a new <w:sectPr> block with independent margin, header, footer, and orientation settings.

# pip install python-docx
from docx.enum.section import WD_SECTION

# Soft page break — stays in the current section
doc.add_page_break()

# New section starting on the next page (landscape orientation)
landscape_section = doc.add_section(WD_SECTION.NEW_PAGE)
landscape_section.orientation  = WD_ORIENT.LANDSCAPE
landscape_section.page_width   = Inches(11)
landscape_section.page_height  = Inches(8.5)
landscape_section.left_margin  = Inches(0.75)
landscape_section.right_margin = Inches(0.75)

doc.add_heading("Wide Data Table", level=1)

WD_SECTION values: NEW_PAGE (most common), EVEN_PAGE, ODD_PAGE, CONTINUOUS (no page break), NEW_COLUMN.

Step 6 — Headers and Footers

# pip install python-docx
section  = doc.sections[0]
header   = section.header
footer   = section.footer

# Clear any default content
header.paragraphs[0].clear()
footer.paragraphs[0].clear()

header.paragraphs[0].add_run("Acme Corp — Confidential").bold = True

footer_para = footer.paragraphs[0]
footer_para.add_run("Generated by reporting pipeline  |  ")

# Add a PAGE field for automatic page numbering
from docx.oxml import OxmlElement
fldChar1 = OxmlElement("w:fldChar")
fldChar1.set(qn("w:fldCharType"), "begin")
instrText = OxmlElement("w:instrText")
instrText.text = "PAGE"
fldChar2 = OxmlElement("w:fldChar")
fldChar2.set(qn("w:fldCharType"), "end")

run = footer_para.add_run()
run._r.append(fldChar1)
run._r.append(instrText)
run._r.append(fldChar2)

Step 7 — Save

try:
    doc.save(OUTPUT)
    print(f"Saved: {OUTPUT}")
except PermissionError:
    print(f"File is open in Word — close it and retry: {OUTPUT}")
except OSError as exc:
    print(f"Save failed: {exc}")

3. Document() → Elements → .docx: How It Fits Together

The diagram traces the call sequence from Document() through the OOXML element tree to the saved .docx ZIP archive.

python-docx document build flow Flowchart showing how Document(), add_heading, add_paragraph, add_table, and add_section calls compose the OOXML element tree that is serialised to a .docx ZIP archive. Document() new or from template add_heading() level 0–9 → style name add_paragraph() + add_run() per span add_table() rows / cols / style add_section() margins / orientation OOXML element tree w:document └ w:body ├ w:p (paragraphs) ├ w:tbl (tables) └ w:sectPr doc.save() report.docx (ZIP) word/document.xml word/styles.xml word/media/*

The .docx format is a ZIP archive. doc.save() serialises the in-memory element tree to word/document.xml, writes styles to word/styles.xml, and bundles embedded media into word/media/. Unzip any .docx with unzip -d out report.docx to inspect the raw XML when debugging.


4. Edge Cases and Variants

Variant A — Continuing a Paragraph's Formatting Across Multiple Runs

When you need a single paragraph with mixed inline styles, build it from multiple runs on the same Paragraph object. Calling doc.add_paragraph() repeatedly creates separate paragraphs, each with its own spacing and style.

# pip install python-docx
from pathlib import Path
from docx import Document
from docx.shared import Pt

OUTPUT = Path("output/word/mixed_runs.docx")
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

doc  = Document()
para = doc.add_paragraph(style="Body Text")
para.paragraph_format.space_after = Pt(8)

run_a = para.add_run("Status: ")
run_b = para.add_run("APPROVED")
run_b.bold            = True
run_b.font.color.rgb  = __import__("docx.shared", fromlist=["RGBColor"]).RGBColor(0x16, 0x65, 0x34)
run_c = para.add_run(" — routed to finance on 2026-06-15.")

try:
    doc.save(OUTPUT)
    print(f"Saved: {OUTPUT}")
except OSError as exc:
    print(f"Save failed: {exc}")

Variant B — Multi-Column Section Layout

Some reports need a two-column layout for dense reference material or side-by-side comparisons. The w:cols element is not exposed through the python-docx high-level API and must be set via oxml:

# pip install python-docx
from pathlib import Path
from docx import Document
from docx.oxml.ns import qn
from docx.oxml   import OxmlElement

OUTPUT = Path("output/word/two_columns.docx")
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

doc  = Document()
sect = doc.sections[0]

cols = OxmlElement("w:cols")
cols.set(qn("w:num"),   "2")      # 2 equal columns
cols.set(qn("w:space"), "720")    # 0.5-inch gutter (720 twips)
sect._sectPr.append(cols)

doc.add_paragraph(
    "Text in a two-column section flows automatically from the bottom of the "
    "first column to the top of the second."
)

try:
    doc.save(OUTPUT)
    print(f"Saved: {OUTPUT}")
except OSError as exc:
    print(f"Save failed: {exc}")

Variant C — Merging Table Cells

Merged cells are common in report headers (spanning a label across multiple columns) and in structured forms:

# pip install python-docx
from pathlib import Path
from docx import Document
from docx.shared import Inches

OUTPUT = Path("output/word/merged_cells.docx")
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

doc   = Document()
table = doc.add_table(rows=3, cols=4)
table.style = "Table Grid"

# Merge the first two cells in row 0 (spans columns 0 and 1)
a = table.cell(0, 0)
b = table.cell(0, 1)
merged = a.merge(b)
merged.text = "Merged header"

# Standard cells for the remainder
for col_idx in range(2, 4):
    table.cell(0, col_idx).text = f"Col {col_idx}"

for row_idx in range(1, 3):
    for col_idx in range(4):
        table.cell(row_idx, col_idx).text = f"R{row_idx}C{col_idx}"

try:
    doc.save(OUTPUT)
    print(f"Saved: {OUTPUT}")
except OSError as exc:
    print(f"Save failed: {exc}")

5. Validation

After generating a file, parse it back and assert structural integrity before delivering it:

# pip install python-docx
from pathlib import Path
from docx import Document

def validate_docx(path: Path, min_paragraphs: int = 3, min_tables: int = 0) -> bool:
    """Return True if the file opens cleanly, has enough content, and no unrendered placeholders."""
    try:
        doc = Document(path)
    except Exception as exc:
        print(f"[FAIL] Cannot open {path.name}: {exc}")
        return False

    n_paragraphs = len(doc.paragraphs)
    n_tables     = len(doc.tables)

    if n_paragraphs < min_paragraphs:
        print(f"[WARN] {path.name}: only {n_paragraphs} paragraphs — possible truncation")
        return False

    if n_tables < min_tables:
        print(f"[WARN] {path.name}: expected {min_tables} table(s), found {n_tables}")
        return False

    # Scan for unrendered Jinja2 or placeholder syntax
    full_text = " ".join(p.text for p in doc.paragraphs)
    if "{{" in full_text or "}}" in full_text:
        print(f"[WARN] {path.name}: unrendered placeholder detected")
        return False

    print(f"[OK] {path.name}: {n_paragraphs} paragraphs, {n_tables} tables")
    return True

validate_docx(Path("output/word/report.docx"), min_paragraphs=5, min_tables=1)

For stricter validation, spot-check cell content by index:

from docx import Document
from pathlib import Path

doc  = Document(Path("output/word/report.docx"))
tbl  = doc.tables[0]
assert tbl.cell(0, 0).text == "Region", f"Unexpected header: {tbl.cell(0,0).text}"
assert tbl.cell(1, 0).text == "APAC",   f"Unexpected first row: {tbl.cell(1,0).text}"
print("Table spot-check passed.")

6. Performance and Scale

Memory: each Document() object holds the full XML tree in RAM. For batches of 1 000 or more documents, instantiate a fresh Document() per record and let it go out of scope after saving — do not accumulate open document objects in a list.

Parallelism: python-docx itself is not thread-safe at the document level (each document object manipulates a shared lxml element tree). Use concurrent.futures.ProcessPoolExecutor for CPU-bound generation tasks. Reserve ThreadPoolExecutor for I/O-bound bottlenecks such as writing to a slow network share or uploading completed files to object storage.

Atomic writes: write to a tempfile.NamedTemporaryFile in the same directory as the target, then rename. This avoids delivering a partial (unreadable) .docx if the process is interrupted mid-save:

# pip install python-docx
import shutil
import tempfile
from pathlib import Path
from docx import Document

def save_atomic(doc: Document, final_path: Path) -> None:
    """Write doc to a temp file in the same directory, then rename to final_path."""
    final_path.parent.mkdir(parents=True, exist_ok=True)
    with tempfile.NamedTemporaryFile(
        suffix=".docx", dir=final_path.parent, delete=False
    ) as tmp:
        tmp_path = Path(tmp.name)
    try:
        doc.save(tmp_path)
        shutil.move(str(tmp_path), str(final_path))
    except Exception:
        tmp_path.unlink(missing_ok=True)
        raise

Chunking large datasets: if the source data is a large CSV or database query, stream it in chunks rather than loading the full dataset into memory before generation begins:

import pandas as pd
from pathlib import Path

CHUNK_SIZE = 500
for i, chunk in enumerate(pd.read_csv("data/records.csv", chunksize=CHUNK_SIZE)):
    # build and save one document per chunk
    pass

7. Troubleshooting

SymptomRoot causeFix
PackageNotFoundError: Package not found at ...Passing a .doc or corrupt ZIP to Document()Convert .doc to .docx first (LibreOffice headless); validate extension before opening
Font name set but has no effect in WordRun-level font overridden by style-level or w:eastAsia not set for CJKSee Set Fonts and Styles with python-docx
Table row count is wrong at runtimetable.add_row() called outside the data loop or skipped on some iterationsCall table.add_row() exactly once per data record
InvalidSpanError on cell.merge()Merging already-merged cells or passing out-of-range indicesCheck cell._tc.tcPr for existing merge flags; restructure the merge order
PermissionError on doc.save()Target file is open in WordClose the file in Word, or save to a temp path and rename
Section orientation not appliedWidth/height not swapped after setting WD_ORIENT.LANDSCAPESet both page_width = Inches(11) and page_height = Inches(8.5) explicitly after changing orientation

8. Complete Working Script

# pip install python-docx pandas
"""
build_report.py — generate a sales report .docx from a CSV.

Usage:
    python build_report.py --data sales.csv --out output/report.docx --title "Q4 Report"
"""
import argparse
import shutil
import sys
import tempfile
from pathlib import Path

import pandas as pd
from docx import Document
from docx.shared import Inches, Pt, RGBColor
from docx.oxml.ns import qn
from docx.oxml   import OxmlElement


def build_document(df: pd.DataFrame, title: str) -> Document:
    doc = Document()

    # Margins
    section = doc.sections[0]
    for attr in ("top_margin", "bottom_margin", "left_margin", "right_margin"):
        setattr(section, attr, Inches(1))

    doc.add_heading(title, level=0)
    doc.add_heading("Summary", level=1)

    intro     = doc.add_paragraph()
    intro.add_run("Records in dataset: ")
    count_run = intro.add_run(str(len(df)))
    count_run.bold = True

    if df.empty:
        doc.add_paragraph("No records to display.")
        return doc

    doc.add_heading("Data", level=1)

    table = doc.add_table(rows=1, cols=len(df.columns))
    table.style = "Table Grid"

    # Mark header row as repeating
    tr   = table.rows[0]._tr
    trPr = tr.get_or_add_trPr()
    trPr.append(OxmlElement("w:tblHeader"))

    hdr = table.rows[0].cells
    for i, col in enumerate(df.columns):
        hdr[i].text = str(col)
        hdr[i].paragraphs[0].runs[0].bold = True

    for _, row in df.iterrows():
        cells = table.add_row().cells
        for i, val in enumerate(row):
            cells[i].text = str(val)

    return doc


def main() -> None:
    parser = argparse.ArgumentParser(description="Generate a .docx sales report from CSV.")
    parser.add_argument("--data",  required=True, type=Path, help="Input CSV path")
    parser.add_argument("--out",   required=True, type=Path, help="Output .docx path")
    parser.add_argument("--title", default="Report",          help="Document title")
    args = parser.parse_args()

    try:
        df = pd.read_csv(args.data)
    except FileNotFoundError:
        sys.exit(f"[ERROR] CSV not found: {args.data}")
    except pd.errors.EmptyDataError:
        sys.exit("[ERROR] CSV is empty.")

    doc      = build_document(df, args.title)
    out_path = args.out
    out_path.parent.mkdir(parents=True, exist_ok=True)

    with tempfile.NamedTemporaryFile(
        suffix=".docx", dir=out_path.parent, delete=False
    ) as tmp:
        tmp_path = Path(tmp.name)

    try:
        doc.save(tmp_path)
        shutil.move(str(tmp_path), str(out_path))
        print(f"[OK] Report saved: {out_path}")
    except Exception as exc:
        tmp_path.unlink(missing_ok=True)
        sys.exit(f"[ERROR] Save failed: {exc}")


if __name__ == "__main__":
    main()

Frequently Asked Questions

Does python-docx require Microsoft Word to be installed? No. It reads and writes the OOXML (.docx) ZIP format entirely in Python — no COM automation, no Office installation, no Windows dependency. The library works identically on Linux, macOS, and Windows.

How do I add page numbers to the footer? Page numbers in Word are implemented as a PAGE field in the w:fldChar / w:instrText XML pattern — there is no high-level python-docx method for it. Insert the three-element XML sequence (fldChar begininstrText PAGEfldChar end) into a run in section.footer.paragraphs[0] as shown in the Headers and Footers step above.

Can I copy a styled paragraph from one document into another? Not via the high-level API. You must clone the underlying _p XML element: from copy import deepcopy; target_doc.element.body.append(deepcopy(src_para._p)). This copies the element structure but does not copy style definitions — any styles referenced must already exist in the target document's styles.xml.

How do I set font family, size, and color on runs? Use run.font.name, run.font.size = Pt(12), and run.font.color.rgb = RGBColor(r, g, b). East-Asian font names require an extra w:eastAsia XML attribute not exposed by the API — the oxml workaround and named style approach are covered in detail in Set Fonts and Styles with python-docx.

What is the difference between python-docx and docxtpl?python-docx builds documents element by element in code; docxtpl renders a Jinja2-annotated .docx template against a data dictionary. Use python-docx when the document structure itself is data-driven (variable number of sections, tables whose shape changes per record); use docxtpl when the layout is fixed and only the values change — see Dynamic Mail Merge with Python.

How do I generate a table of contents? python-docx cannot build a live, auto-updating TOC — that requires Word's field calculation engine. You can insert a TOC field placeholder (w:fldChar + w:instrText 'TOC \\o "1-3"') that Word will update when the user opens the document and accepts the prompt to update fields.


Part of Word Document Templating & Batch Processing.

Explore next