Automating Excel Report Generation
Generating a formatted Excel report by hand once is manageable. Generating one every Monday, for every region, with conditional highlights, charts, and a summary row, is not. Python automates the entire sequence: pull data, aggregate, write styled cells, embed charts, and save — with no manual post-processing.
This guide covers the full workflow: library selection, pandas ExcelWriter integration, openpyxl and xlsxwriter styling, multiple sheets, summary rows, conditional formatting, and embedded charts. The techniques here complement the broader Python for Excel & CSV Data Processing guide.
Prerequisites
Python 3.9+, a virtual environment, and the following packages:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install pandas openpyxl xlsxwriter
requirements.txt for a locked environment:
pandas==2.2.2
openpyxl==3.1.2
xlsxwriter==3.2.0
Create a sample CSV to follow along:
mkdir -p data output
python - <<'EOF'
import pandas as pd, numpy as np
rng = np.random.default_rng(42)
regions = ["North", "South", "East", "West"]
products = ["Widget A", "Widget B", "Gadget X"]
rows = []
for month in range(1, 7):
for _ in range(40):
rows.append({
"month": f"2026-{month:02d}",
"region": rng.choice(regions),
"product": rng.choice(products),
"units": int(rng.integers(10, 200)),
"revenue": round(float(rng.uniform(500, 8000)), 2),
})
pd.DataFrame(rows).to_csv("data/sales.csv", index=False)
print("data/sales.csv written")
EOF
Inspect the Input Before Writing
Always validate the incoming DataFrame before you write anything to Excel. Silent type coercions produce wrong number formats.
# pip install pandas
from pathlib import Path
import pandas as pd
INPUT = Path("data/sales.csv")
try:
df = pd.read_csv(INPUT)
print(df.dtypes) # confirm numeric cols are not object
print(df.isnull().sum()) # find gaps before aggregation
print(df.head(3))
except FileNotFoundError:
raise SystemExit(f"Missing input file: {INPUT}")
Expected: revenue and units as float64/int64. If they show as object, the CSV contains currency symbols or commas — strip them with pd.to_numeric(df["revenue"].str.replace(r"[$,]", "", regex=True), errors="coerce") before proceeding.
Data → Styler → Workbook → Sheets: The Pipeline
The diagram below shows how raw data flows through the four-stage pipeline this guide implements.
Step 1 — Aggregate the Data
# pip install pandas
from pathlib import Path
import pandas as pd
INPUT = Path("data/sales.csv")
try:
df = pd.read_csv(INPUT, parse_dates=["month"])
except FileNotFoundError:
raise SystemExit(f"Missing: {INPUT}")
df["revenue"] = pd.to_numeric(df["revenue"], errors="coerce")
df["units"] = pd.to_numeric(df["units"], errors="coerce")
df.dropna(subset=["revenue", "units"], inplace=True)
# Regional summary
by_region = (
df.groupby("region", as_index=False)
.agg(total_revenue=("revenue", "sum"),
total_units=("units", "sum"),
transactions=("revenue", "count"))
)
by_region["avg_order"] = by_region["total_revenue"] / by_region["transactions"]
# Monthly trend
by_month = (
df.groupby("month", as_index=False)
.agg(total_revenue=("revenue", "sum"))
.sort_values("month")
)
print(by_region)
print(by_month)
Step 2 — Write Multiple Sheets with Styled Headers
pandas.ExcelWriter delegates to xlsxwriter or openpyxl as the engine. Use xlsxwriter when creating a new file with heavy formatting; use openpyxl when you need to modify an existing file (see Reading Excel Files with Python for that path).
# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd
OUTPUT = Path("output/sales_report.xlsx")
OUTPUT.parent.mkdir(parents=True, exist_ok=True)
# Re-use by_region and by_month from Step 1
try:
with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
by_region.to_excel(writer, sheet_name="Summary", index=False, startrow=1)
by_month.to_excel(writer, sheet_name="Monthly Trend", index=False, startrow=1)
wb = writer.book
# --- define reusable formats ---
hdr_fmt = wb.add_format({
"bold": True, "bg_color": "#2563eb", "font_color": "#ffffff",
"border": 1, "font_size": 12,
})
currency_fmt = wb.add_format({"num_format": "#,##0.00", "border": 1})
int_fmt = wb.add_format({"num_format": "#,##0", "border": 1})
text_fmt = wb.add_format({"border": 1})
for sheet_name, frame in [("Summary", by_region), ("Monthly Trend", by_month)]:
ws = writer.sheets[sheet_name]
# Write styled header row (row 0, overwriting the blank startrow)
for col_idx, col_name in enumerate(frame.columns):
ws.write(1, col_idx, col_name, hdr_fmt)
ws.set_row(1, 20)
# Column widths for Summary sheet
ws_sum = writer.sheets["Summary"]
ws_sum.set_column("A:A", 14, text_fmt)
ws_sum.set_column("B:D", 16, currency_fmt)
except PermissionError:
raise SystemExit(f"Close {OUTPUT} in Excel before running this script")
except Exception as exc:
raise SystemExit(f"Export failed: {exc}")
print(f"Written: {OUTPUT}")
Step 3 — Number Formats and Summary Rows
A summary row at the bottom of the data gives stakeholders a quick total without opening a formula bar.
# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd
OUTPUT = Path("output/sales_report.xlsx")
try:
with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
by_region.to_excel(writer, sheet_name="Summary", index=False, startrow=1)
wb = writer.book
ws = writer.sheets["Summary"]
currency_fmt = wb.add_format({"num_format": "#,##0.00", "bold": False})
total_fmt = wb.add_format({"num_format": "#,##0.00", "bold": True,
"top": 2, "bg_color": "#f1f5f9"})
label_fmt = wb.add_format({"bold": True, "bg_color": "#f1f5f9"})
n_rows = len(by_region)
# data starts at Excel row 3 (0-indexed row 2) because startrow=1 + 1 header
data_start = 3
data_end = data_start + n_rows - 1
# Apply number format to revenue and avg_order columns
for r in range(data_start, data_end + 1):
ws.write(r - 1, 1, by_region.iloc[r - data_start]["total_revenue"], currency_fmt)
ws.write(r - 1, 3, by_region.iloc[r - data_start]["avg_order"], currency_fmt)
# Summary / totals row
summary_row = data_end # 0-indexed
ws.write(summary_row, 0, "TOTAL", label_fmt)
ws.write_formula(summary_row, 1,
f"=SUM(B{data_start}:B{data_end})", total_fmt)
ws.write_formula(summary_row, 2,
f"=SUM(C{data_start}:C{data_end})", total_fmt)
ws.write(summary_row, 3, "", total_fmt)
except Exception as exc:
raise SystemExit(f"Export failed: {exc}")
print(f"Written with summary row: {OUTPUT}")
Step 4 — Conditional Formatting
Highlight underperforming regions (revenue below a threshold) in red and top performers in green — without touching a single cell manually.
# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd
OUTPUT = Path("output/sales_report_cf.xlsx")
try:
with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
by_region.to_excel(writer, sheet_name="Summary", index=False)
wb = writer.book
ws = writer.sheets["Summary"]
green_fmt = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})
red_fmt = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})
n = len(by_region)
rev_range = f"B2:B{n + 1}" # revenue column, skip header
# Values above 10 000 → green
ws.conditional_format(rev_range, {
"type": "cell", "criteria": ">=", "value": 10_000, "format": green_fmt
})
# Values below 5 000 → red
ws.conditional_format(rev_range, {
"type": "cell", "criteria": "<", "value": 5_000, "format": red_fmt
})
# 3-color scale on units column
ws.conditional_format(f"C2:C{n + 1}", {
"type": "3_color_scale",
"min_color": "#FFC7CE",
"mid_color": "#FFEB9C",
"max_color": "#C6EFCE",
})
except Exception as exc:
raise SystemExit(f"Export failed: {exc}")
print(f"Written: {OUTPUT}")
Step 5 — Embed a Bar Chart
Charts embedded in the workbook are generated from the data already on the sheet — no separate image file required.
# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd
OUTPUT = Path("output/sales_report_chart.xlsx")
try:
with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
by_region.to_excel(writer, sheet_name="Summary", index=False)
wb = writer.book
ws = writer.sheets["Summary"]
n = len(by_region)
chart = wb.add_chart({"type": "column"})
chart.add_series({
"name": "Total Revenue",
"categories": ["Summary", 1, 0, n, 0], # region labels (col A)
"values": ["Summary", 1, 1, n, 1], # revenue values (col B)
"fill": {"color": "#2563eb"},
"gap": 50,
})
chart.set_title({"name": "Revenue by Region"})
chart.set_x_axis({"name": "Region"})
chart.set_y_axis({"name": "Revenue ($)", "num_format": "#,##0"})
chart.set_legend({"none": True})
chart.set_size({"width": 480, "height": 288})
ws.insert_chart("F2", chart)
except Exception as exc:
raise SystemExit(f"Export failed: {exc}")
print(f"Written: {OUTPUT}")
Edge Cases and Variants
Modifying an Existing Template with openpyxl
When you need to inject data into a pre-branded template, openpyxl is the right tool. Note that opening in read_only mode prevents writes — see Fix openpyxl Read-Only Mode Error if you hit that error.
# pip install openpyxl pandas
from pathlib import Path
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.styles import Font, PatternFill, numbers
TEMPLATE = Path("templates/report_template.xlsx")
OUTPUT = Path("output/report_filled.xlsx")
try:
wb = load_workbook(TEMPLATE) # NOT read_only — we need to write
ws = wb["Data"]
# Clear old data below header
for row in ws.iter_rows(min_row=2, max_row=ws.max_row):
for cell in row:
cell.value = None
# Write new data
for r_idx, row in enumerate(dataframe_to_rows(by_region, index=False, header=False), start=2):
for c_idx, value in enumerate(row, start=1):
ws.cell(row=r_idx, column=c_idx, value=value)
# Apply currency format to column B
for cell in ws["B"][1:]:
cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED2
wb.save(OUTPUT)
except FileNotFoundError as exc:
raise SystemExit(f"Template not found: {exc}")
print(f"Template filled: {OUTPUT}")
Writing Excel Formulas and Charts Directly
For advanced formula patterns (array formulas, named ranges, dynamic arrays), see the Writing Excel Formulas and Charts with openpyxl guide.
Memory-Efficient Writes with use_constant_memory
For DataFrames with more than 500 000 rows, xlsxwriter's constant-memory mode streams rows without buffering the whole sheet in RAM. The trade-off: you cannot go back and set cell formats after writing.
# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd
OUTPUT = Path("output/large_report.xlsx")
options = {"constant_memory": True}
try:
with pd.ExcelWriter(OUTPUT, engine="xlsxwriter",
engine_kwargs={"options": options}) as writer:
# Write in chunks if the DataFrame is chunked from a database
big_df.to_excel(writer, sheet_name="Data", index=False)
except Exception as exc:
raise SystemExit(f"Export failed: {exc}")
Validation
Assert the output is structurally correct before treating the run as successful:
# pip install openpyxl
from pathlib import Path
from openpyxl import load_workbook
OUTPUT = Path("output/sales_report.xlsx")
wb = load_workbook(OUTPUT, read_only=True)
assert "Summary" in wb.sheetnames, "Summary sheet missing"
ws = wb["Summary"]
rows = list(ws.iter_rows(values_only=True))
assert len(rows) >= 2, "No data rows written"
assert rows[0][1] is not None, "Column B header missing"
wb.close()
print("Validation passed")
Performance and Scale
- Row limit: Excel supports ~1 048 576 rows per sheet. For datasets beyond 100 000 rows consider writing to CSV and referencing it via Power Query instead of embedding all data in the workbook.
- Multiple sheets: Each
to_excel(..., sheet_name=...)call within the sameExcelWritercontext adds a sheet without re-opening the file. - Formula recalculation:
xlsxwriterwrites formulas as strings; Excel recalculates on open. If you need values pre-calculated, write the computed value withxlsxwriter'swrite()instead ofwrite_formula(), or useopenpyxlwithdata_only=Trueafter the file has been opened and saved by Excel once. - Parquet intermediary: If the ETL step is expensive, save the aggregated DataFrames to Parquet and load from there for the styling step — decouples compute from presentation.
Troubleshooting
| Error | Root cause | Fix |
|---|---|---|
zipfile.BadZipFile | Output file was left open by Excel when the script ran | Close the file in Excel first, or write to a temp path then rename |
KeyError: 'Summary' when accessing writer.sheets | Sheet name not yet created — to_excel must be called before accessing writer.sheets[name] | Confirm to_excel(..., sheet_name="Summary") executes before the writer.sheets lookup |
IllegalCharacterError | Cell value contains control characters (e.g. \x00) from a database | Strip with df.replace(r"[\x00-\x1f]", "", regex=True) before writing |
PermissionError on save | Target .xlsx is open in another application | Write to a new timestamped filename or close the file |
AttributeError: 'Worksheet' object has no attribute 'add_format' | Mixed openpyxl/xlsxwriter API — add_format is xlsxwriter-only | Use openpyxl's PatternFill/Font objects when the engine is openpyxl |
Complete Working Script
#!/usr/bin/env python3
# pip install pandas xlsxwriter openpyxl
"""
generate_sales_report.py — produce a formatted multi-sheet Excel report.
Usage: python generate_sales_report.py --input data/sales.csv --output output/report.xlsx
"""
import argparse
from pathlib import Path
import pandas as pd
def load_and_aggregate(csv_path: Path):
df = pd.read_csv(csv_path)
df["revenue"] = pd.to_numeric(df["revenue"], errors="coerce")
df["units"] = pd.to_numeric(df["units"], errors="coerce")
df.dropna(subset=["revenue", "units"], inplace=True)
by_region = (
df.groupby("region", as_index=False)
.agg(total_revenue=("revenue", "sum"),
total_units=("units", "sum"),
transactions=("revenue", "count"))
)
by_region["avg_order"] = (
by_region["total_revenue"] / by_region["transactions"]
).round(2)
by_month = (
df.groupby("month", as_index=False)
.agg(total_revenue=("revenue", "sum"))
.sort_values("month")
)
return by_region, by_month
def write_report(by_region: "pd.DataFrame", by_month: "pd.DataFrame",
output: Path) -> None:
output.parent.mkdir(parents=True, exist_ok=True)
with pd.ExcelWriter(output, engine="xlsxwriter") as writer:
by_region.to_excel(writer, sheet_name="Summary", index=False)
by_month.to_excel( writer, sheet_name="Monthly Trend", index=False)
wb = writer.book
hdr_fmt = wb.add_format({"bold": True, "bg_color": "#2563eb",
"font_color": "#ffffff", "font_size": 12})
currency_fmt = wb.add_format({"num_format": "#,##0.00"})
green_fmt = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})
red_fmt = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})
# --- Summary sheet ---
ws = writer.sheets["Summary"]
for c, name in enumerate(by_region.columns):
ws.write(0, c, name, hdr_fmt)
n = len(by_region)
ws.set_column("A:A", 14)
ws.set_column("B:D", 16, currency_fmt)
ws.conditional_format(f"B2:B{n+1}", {
"type": "cell", "criteria": ">=", "value": 10_000, "format": green_fmt
})
ws.conditional_format(f"B2:B{n+1}", {
"type": "cell", "criteria": "<", "value": 5_000, "format": red_fmt
})
# Total row
ws.write(n + 1, 0, "TOTAL",
wb.add_format({"bold": True, "bg_color": "#f1f5f9"}))
ws.write_formula(n + 1, 1, f"=SUM(B2:B{n+1})",
wb.add_format({"num_format": "#,##0.00", "bold": True,
"bg_color": "#f1f5f9"}))
# Bar chart
chart = wb.add_chart({"type": "column"})
chart.add_series({
"name": "Revenue",
"categories": ["Summary", 1, 0, n, 0],
"values": ["Summary", 1, 1, n, 1],
"fill": {"color": "#2563eb"},
})
chart.set_title({"name": "Revenue by Region"})
chart.set_y_axis({"num_format": "#,##0"})
chart.set_legend({"none": True})
chart.set_size({"width": 400, "height": 240})
ws.insert_chart("F2", chart)
# --- Monthly Trend sheet ---
ws2 = writer.sheets["Monthly Trend"]
for c, name in enumerate(by_month.columns):
ws2.write(0, c, name, hdr_fmt)
ws2.set_column("A:B", 18, currency_fmt)
def main():
parser = argparse.ArgumentParser(description="Generate formatted Excel sales report")
parser.add_argument("--input", default="data/sales.csv", help="Input CSV path")
parser.add_argument("--output", default="output/report.xlsx", help="Output .xlsx path")
args = parser.parse_args()
csv_path = Path(args.input)
out_path = Path(args.output)
if not csv_path.exists():
raise SystemExit(f"Input not found: {csv_path}")
try:
by_region, by_month = load_and_aggregate(csv_path)
write_report(by_region, by_month, out_path)
print(f"Report written to {out_path}")
except PermissionError:
raise SystemExit(f"Cannot write {out_path} — close it in Excel first")
except Exception as exc:
raise SystemExit(f"Report generation failed: {exc}")
if __name__ == "__main__":
main()
FAQ
Which engine should I use — openpyxl or xlsxwriter?
Use xlsxwriter when creating a new file with heavy formatting and charts — it is faster and has a richer format API. Use openpyxl when loading and modifying an existing .xlsx file. You cannot use xlsxwriter to read or modify existing files.
How do I add a second header row (e.g. a report title above the column headers)?
Pass startrow=2 to to_excel, then use ws.merge_range("A1:D1", "Monthly Report", title_fmt) before writing the data. The column header row lands at row 2 (0-indexed row 1).
Why do my formulas show as 0 when I open the file?xlsxwriter writes formulas as strings and relies on Excel to calculate on open. If you need pre-calculated values, write the Python-computed value directly with ws.write(row, col, value, fmt). Alternatively, open the file in openpyxl with data_only=False, then save — Excel will recalculate on next open.
Can I password-protect the generated report?xlsxwriter supports workbook-level protection: wb.set_properties(...) and ws.protect("password"). For PDF output with password protection, see the Watermarking and Securing PDFs guide.
How do I schedule this to run automatically?
On Linux/macOS, add a cron entry: 0 7 1 * * /path/.venv/bin/python /path/generate_sales_report.py. See Automating Monthly Sales Reports in Excel for a full scheduling and delivery walkthrough.
Related
- Automating Monthly Sales Reports in Excel — end-to-end scheduled report with multi-sheet groupby output
- Writing Excel Formulas and Charts with openpyxl — advanced formula patterns and chart types
- Reading Excel Files with Python — ingestion patterns before the report generation step
- Fix openpyxl Read-Only Mode Error — resolve write errors when loading an existing workbook