Automating Excel Report Generation

Generating a formatted Excel report by hand once is manageable. Generating one every Monday, for every region, with conditional highlights, charts, and a summary row, is not. Python automates the entire sequence: pull data, aggregate, write styled cells, embed charts, and save — with no manual post-processing.

This guide covers the full workflow: library selection, pandas ExcelWriter integration, openpyxl and xlsxwriter styling, multiple sheets, summary rows, conditional formatting, and embedded charts. The techniques here complement the broader Python for Excel & CSV Data Processing guide.

Prerequisites

Python 3.9+, a virtual environment, and the following packages:

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install pandas openpyxl xlsxwriter

requirements.txt for a locked environment:

pandas==2.2.2
openpyxl==3.1.2
xlsxwriter==3.2.0

Create a sample CSV to follow along:

mkdir -p data output
python - <<'EOF'
import pandas as pd, numpy as np
rng = np.random.default_rng(42)
regions = ["North", "South", "East", "West"]
products = ["Widget A", "Widget B", "Gadget X"]
rows = []
for month in range(1, 7):
    for _ in range(40):
        rows.append({
            "month": f"2026-{month:02d}",
            "region": rng.choice(regions),
            "product": rng.choice(products),
            "units": int(rng.integers(10, 200)),
            "revenue": round(float(rng.uniform(500, 8000)), 2),
        })
pd.DataFrame(rows).to_csv("data/sales.csv", index=False)
print("data/sales.csv written")
EOF

Inspect the Input Before Writing

Always validate the incoming DataFrame before you write anything to Excel. Silent type coercions produce wrong number formats.

# pip install pandas
from pathlib import Path
import pandas as pd

INPUT = Path("data/sales.csv")

try:
    df = pd.read_csv(INPUT)
    print(df.dtypes)         # confirm numeric cols are not object
    print(df.isnull().sum()) # find gaps before aggregation
    print(df.head(3))
except FileNotFoundError:
    raise SystemExit(f"Missing input file: {INPUT}")

Expected: revenue and units as float64/int64. If they show as object, the CSV contains currency symbols or commas — strip them with pd.to_numeric(df["revenue"].str.replace(r"[$,]", "", regex=True), errors="coerce") before proceeding.

Data → Styler → Workbook → Sheets: The Pipeline

The diagram below shows how raw data flows through the four-stage pipeline this guide implements.

Excel report generation pipeline Shows four stages: CSV/DB Input feeds pandas Aggregation, which feeds ExcelWriter + Styler, which produces a Multi-Sheet Workbook with Summary, Detail, and Chart sheets. CSV / DB raw sales data mixed types pandas groupby / pivot type coercion ExcelWriter + Styler formats / charts Workbook Summary sheet Detail sheet Chart sheet ① Ingest ② Transform ③ Style ④ Output

Step 1 — Aggregate the Data

# pip install pandas
from pathlib import Path
import pandas as pd

INPUT = Path("data/sales.csv")

try:
    df = pd.read_csv(INPUT, parse_dates=["month"])
except FileNotFoundError:
    raise SystemExit(f"Missing: {INPUT}")

df["revenue"] = pd.to_numeric(df["revenue"], errors="coerce")
df["units"] = pd.to_numeric(df["units"], errors="coerce")
df.dropna(subset=["revenue", "units"], inplace=True)

# Regional summary
by_region = (
    df.groupby("region", as_index=False)
    .agg(total_revenue=("revenue", "sum"),
         total_units=("units", "sum"),
         transactions=("revenue", "count"))
)
by_region["avg_order"] = by_region["total_revenue"] / by_region["transactions"]

# Monthly trend
by_month = (
    df.groupby("month", as_index=False)
    .agg(total_revenue=("revenue", "sum"))
    .sort_values("month")
)

print(by_region)
print(by_month)

Step 2 — Write Multiple Sheets with Styled Headers

pandas.ExcelWriter delegates to xlsxwriter or openpyxl as the engine. Use xlsxwriter when creating a new file with heavy formatting; use openpyxl when you need to modify an existing file (see Reading Excel Files with Python for that path).

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report.xlsx")
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

# Re-use by_region and by_month from Step 1
try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False, startrow=1)
        by_month.to_excel(writer, sheet_name="Monthly Trend", index=False, startrow=1)

        wb = writer.book

        # --- define reusable formats ---
        hdr_fmt = wb.add_format({
            "bold": True, "bg_color": "#2563eb", "font_color": "#ffffff",
            "border": 1, "font_size": 12,
        })
        currency_fmt = wb.add_format({"num_format": "#,##0.00", "border": 1})
        int_fmt      = wb.add_format({"num_format": "#,##0",    "border": 1})
        text_fmt     = wb.add_format({"border": 1})

        for sheet_name, frame in [("Summary", by_region), ("Monthly Trend", by_month)]:
            ws = writer.sheets[sheet_name]
            # Write styled header row (row 0, overwriting the blank startrow)
            for col_idx, col_name in enumerate(frame.columns):
                ws.write(1, col_idx, col_name, hdr_fmt)
            ws.set_row(1, 20)

        # Column widths for Summary sheet
        ws_sum = writer.sheets["Summary"]
        ws_sum.set_column("A:A", 14, text_fmt)
        ws_sum.set_column("B:D", 16, currency_fmt)

except PermissionError:
    raise SystemExit(f"Close {OUTPUT} in Excel before running this script")
except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written: {OUTPUT}")

Step 3 — Number Formats and Summary Rows

A summary row at the bottom of the data gives stakeholders a quick total without opening a formula bar.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report.xlsx")

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False, startrow=1)
        wb = writer.book
        ws = writer.sheets["Summary"]

        currency_fmt = wb.add_format({"num_format": "#,##0.00", "bold": False})
        total_fmt    = wb.add_format({"num_format": "#,##0.00", "bold": True,
                                      "top": 2, "bg_color": "#f1f5f9"})
        label_fmt    = wb.add_format({"bold": True, "bg_color": "#f1f5f9"})

        n_rows = len(by_region)
        # data starts at Excel row 3 (0-indexed row 2) because startrow=1 + 1 header
        data_start = 3
        data_end   = data_start + n_rows - 1

        # Apply number format to revenue and avg_order columns
        for r in range(data_start, data_end + 1):
            ws.write(r - 1, 1, by_region.iloc[r - data_start]["total_revenue"], currency_fmt)
            ws.write(r - 1, 3, by_region.iloc[r - data_start]["avg_order"],     currency_fmt)

        # Summary / totals row
        summary_row = data_end  # 0-indexed
        ws.write(summary_row, 0, "TOTAL", label_fmt)
        ws.write_formula(summary_row, 1,
                         f"=SUM(B{data_start}:B{data_end})", total_fmt)
        ws.write_formula(summary_row, 2,
                         f"=SUM(C{data_start}:C{data_end})", total_fmt)
        ws.write(summary_row, 3, "", total_fmt)

except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written with summary row: {OUTPUT}")

Step 4 — Conditional Formatting

Highlight underperforming regions (revenue below a threshold) in red and top performers in green — without touching a single cell manually.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report_cf.xlsx")

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False)
        wb = writer.book
        ws = writer.sheets["Summary"]

        green_fmt = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})
        red_fmt   = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})

        n = len(by_region)
        rev_range = f"B2:B{n + 1}"   # revenue column, skip header

        # Values above 10 000 → green
        ws.conditional_format(rev_range, {
            "type": "cell", "criteria": ">=", "value": 10_000, "format": green_fmt
        })
        # Values below 5 000 → red
        ws.conditional_format(rev_range, {
            "type": "cell", "criteria": "<", "value": 5_000, "format": red_fmt
        })

        # 3-color scale on units column
        ws.conditional_format(f"C2:C{n + 1}", {
            "type": "3_color_scale",
            "min_color": "#FFC7CE",
            "mid_color": "#FFEB9C",
            "max_color": "#C6EFCE",
        })

except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written: {OUTPUT}")

Step 5 — Embed a Bar Chart

Charts embedded in the workbook are generated from the data already on the sheet — no separate image file required.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report_chart.xlsx")

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False)
        wb = writer.book
        ws = writer.sheets["Summary"]

        n = len(by_region)

        chart = wb.add_chart({"type": "column"})
        chart.add_series({
            "name":       "Total Revenue",
            "categories": ["Summary", 1, 0, n, 0],   # region labels (col A)
            "values":     ["Summary", 1, 1, n, 1],   # revenue values (col B)
            "fill":       {"color": "#2563eb"},
            "gap":        50,
        })
        chart.set_title({"name": "Revenue by Region"})
        chart.set_x_axis({"name": "Region"})
        chart.set_y_axis({"name": "Revenue ($)", "num_format": "#,##0"})
        chart.set_legend({"none": True})
        chart.set_size({"width": 480, "height": 288})

        ws.insert_chart("F2", chart)

except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written: {OUTPUT}")

Edge Cases and Variants

Modifying an Existing Template with openpyxl

When you need to inject data into a pre-branded template, openpyxl is the right tool. Note that opening in read_only mode prevents writes — see Fix openpyxl Read-Only Mode Error if you hit that error.

# pip install openpyxl pandas
from pathlib import Path
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.styles import Font, PatternFill, numbers

TEMPLATE = Path("templates/report_template.xlsx")
OUTPUT   = Path("output/report_filled.xlsx")

try:
    wb = load_workbook(TEMPLATE)           # NOT read_only — we need to write
    ws = wb["Data"]

    # Clear old data below header
    for row in ws.iter_rows(min_row=2, max_row=ws.max_row):
        for cell in row:
            cell.value = None

    # Write new data
    for r_idx, row in enumerate(dataframe_to_rows(by_region, index=False, header=False), start=2):
        for c_idx, value in enumerate(row, start=1):
            ws.cell(row=r_idx, column=c_idx, value=value)

    # Apply currency format to column B
    for cell in ws["B"][1:]:
        cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED2

    wb.save(OUTPUT)
except FileNotFoundError as exc:
    raise SystemExit(f"Template not found: {exc}")

print(f"Template filled: {OUTPUT}")

Writing Excel Formulas and Charts Directly

For advanced formula patterns (array formulas, named ranges, dynamic arrays), see the Writing Excel Formulas and Charts with openpyxl guide.

Memory-Efficient Writes with use_constant_memory

For DataFrames with more than 500 000 rows, xlsxwriter's constant-memory mode streams rows without buffering the whole sheet in RAM. The trade-off: you cannot go back and set cell formats after writing.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/large_report.xlsx")

options = {"constant_memory": True}

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter",
                        engine_kwargs={"options": options}) as writer:
        # Write in chunks if the DataFrame is chunked from a database
        big_df.to_excel(writer, sheet_name="Data", index=False)
except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

Validation

Assert the output is structurally correct before treating the run as successful:

# pip install openpyxl
from pathlib import Path
from openpyxl import load_workbook

OUTPUT = Path("output/sales_report.xlsx")

wb = load_workbook(OUTPUT, read_only=True)
assert "Summary" in wb.sheetnames, "Summary sheet missing"
ws = wb["Summary"]
rows = list(ws.iter_rows(values_only=True))
assert len(rows) >= 2, "No data rows written"
assert rows[0][1] is not None, "Column B header missing"
wb.close()
print("Validation passed")

Performance and Scale

  • Row limit: Excel supports ~1 048 576 rows per sheet. For datasets beyond 100 000 rows consider writing to CSV and referencing it via Power Query instead of embedding all data in the workbook.
  • Multiple sheets: Each to_excel(..., sheet_name=...) call within the same ExcelWriter context adds a sheet without re-opening the file.
  • Formula recalculation: xlsxwriter writes formulas as strings; Excel recalculates on open. If you need values pre-calculated, write the computed value with xlsxwriter's write() instead of write_formula(), or use openpyxl with data_only=True after the file has been opened and saved by Excel once.
  • Parquet intermediary: If the ETL step is expensive, save the aggregated DataFrames to Parquet and load from there for the styling step — decouples compute from presentation.

Troubleshooting

ErrorRoot causeFix
zipfile.BadZipFileOutput file was left open by Excel when the script ranClose the file in Excel first, or write to a temp path then rename
KeyError: 'Summary' when accessing writer.sheetsSheet name not yet created — to_excel must be called before accessing writer.sheets[name]Confirm to_excel(..., sheet_name="Summary") executes before the writer.sheets lookup
IllegalCharacterErrorCell value contains control characters (e.g. \x00) from a databaseStrip with df.replace(r"[\x00-\x1f]", "", regex=True) before writing
PermissionError on saveTarget .xlsx is open in another applicationWrite to a new timestamped filename or close the file
AttributeError: 'Worksheet' object has no attribute 'add_format'Mixed openpyxl/xlsxwriter API — add_format is xlsxwriter-onlyUse openpyxl's PatternFill/Font objects when the engine is openpyxl

Complete Working Script

#!/usr/bin/env python3
# pip install pandas xlsxwriter openpyxl
"""
generate_sales_report.py — produce a formatted multi-sheet Excel report.
Usage: python generate_sales_report.py --input data/sales.csv --output output/report.xlsx
"""
import argparse
from pathlib import Path
import pandas as pd


def load_and_aggregate(csv_path: Path):
    df = pd.read_csv(csv_path)
    df["revenue"] = pd.to_numeric(df["revenue"], errors="coerce")
    df["units"]   = pd.to_numeric(df["units"],   errors="coerce")
    df.dropna(subset=["revenue", "units"], inplace=True)

    by_region = (
        df.groupby("region", as_index=False)
        .agg(total_revenue=("revenue", "sum"),
             total_units=("units", "sum"),
             transactions=("revenue", "count"))
    )
    by_region["avg_order"] = (
        by_region["total_revenue"] / by_region["transactions"]
    ).round(2)

    by_month = (
        df.groupby("month", as_index=False)
        .agg(total_revenue=("revenue", "sum"))
        .sort_values("month")
    )
    return by_region, by_month


def write_report(by_region: "pd.DataFrame", by_month: "pd.DataFrame",
                 output: Path) -> None:
    output.parent.mkdir(parents=True, exist_ok=True)

    with pd.ExcelWriter(output, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary",       index=False)
        by_month.to_excel( writer, sheet_name="Monthly Trend", index=False)

        wb = writer.book

        hdr_fmt      = wb.add_format({"bold": True, "bg_color": "#2563eb",
                                      "font_color": "#ffffff", "font_size": 12})
        currency_fmt = wb.add_format({"num_format": "#,##0.00"})
        green_fmt    = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})
        red_fmt      = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})

        # --- Summary sheet ---
        ws = writer.sheets["Summary"]
        for c, name in enumerate(by_region.columns):
            ws.write(0, c, name, hdr_fmt)
        n = len(by_region)
        ws.set_column("A:A", 14)
        ws.set_column("B:D", 16, currency_fmt)
        ws.conditional_format(f"B2:B{n+1}", {
            "type": "cell", "criteria": ">=", "value": 10_000, "format": green_fmt
        })
        ws.conditional_format(f"B2:B{n+1}", {
            "type": "cell", "criteria": "<",  "value": 5_000,  "format": red_fmt
        })
        # Total row
        ws.write(n + 1, 0, "TOTAL",
                 wb.add_format({"bold": True, "bg_color": "#f1f5f9"}))
        ws.write_formula(n + 1, 1, f"=SUM(B2:B{n+1})",
                         wb.add_format({"num_format": "#,##0.00", "bold": True,
                                        "bg_color": "#f1f5f9"}))

        # Bar chart
        chart = wb.add_chart({"type": "column"})
        chart.add_series({
            "name":       "Revenue",
            "categories": ["Summary", 1, 0, n, 0],
            "values":     ["Summary", 1, 1, n, 1],
            "fill":       {"color": "#2563eb"},
        })
        chart.set_title({"name": "Revenue by Region"})
        chart.set_y_axis({"num_format": "#,##0"})
        chart.set_legend({"none": True})
        chart.set_size({"width": 400, "height": 240})
        ws.insert_chart("F2", chart)

        # --- Monthly Trend sheet ---
        ws2 = writer.sheets["Monthly Trend"]
        for c, name in enumerate(by_month.columns):
            ws2.write(0, c, name, hdr_fmt)
        ws2.set_column("A:B", 18, currency_fmt)


def main():
    parser = argparse.ArgumentParser(description="Generate formatted Excel sales report")
    parser.add_argument("--input",  default="data/sales.csv",     help="Input CSV path")
    parser.add_argument("--output", default="output/report.xlsx", help="Output .xlsx path")
    args = parser.parse_args()

    csv_path = Path(args.input)
    out_path = Path(args.output)

    if not csv_path.exists():
        raise SystemExit(f"Input not found: {csv_path}")

    try:
        by_region, by_month = load_and_aggregate(csv_path)
        write_report(by_region, by_month, out_path)
        print(f"Report written to {out_path}")
    except PermissionError:
        raise SystemExit(f"Cannot write {out_path} — close it in Excel first")
    except Exception as exc:
        raise SystemExit(f"Report generation failed: {exc}")


if __name__ == "__main__":
    main()

FAQ

Which engine should I use — openpyxl or xlsxwriter? Use xlsxwriter when creating a new file with heavy formatting and charts — it is faster and has a richer format API. Use openpyxl when loading and modifying an existing .xlsx file. You cannot use xlsxwriter to read or modify existing files.

How do I add a second header row (e.g. a report title above the column headers)? Pass startrow=2 to to_excel, then use ws.merge_range("A1:D1", "Monthly Report", title_fmt) before writing the data. The column header row lands at row 2 (0-indexed row 1).

Why do my formulas show as 0 when I open the file?xlsxwriter writes formulas as strings and relies on Excel to calculate on open. If you need pre-calculated values, write the Python-computed value directly with ws.write(row, col, value, fmt). Alternatively, open the file in openpyxl with data_only=False, then save — Excel will recalculate on next open.

Can I password-protect the generated report?xlsxwriter supports workbook-level protection: wb.set_properties(...) and ws.protect("password"). For PDF output with password protection, see the Watermarking and Securing PDFs guide.

How do I schedule this to run automatically? On Linux/macOS, add a cron entry: 0 7 1 * * /path/.venv/bin/python /path/generate_sales_report.py. See Automating Monthly Sales Reports in Excel for a full scheduling and delivery walkthrough.

Part of Python for Excel & CSV Data Processing.

Explore next