Automating Monthly Sales Reports in Excel
Manual compilation of monthly sales data introduces VLOOKUP failures, inconsistent date parsing, and formatting drift between contributors. This guide replaces that workflow with a deterministic Python pipeline: ingest multiple CSV exports, normalise the schema, aggregate with pandas groupby, and write a styled multi-sheet .xlsx file with openpyxl. The result is a reproducible script you schedule once and never touch again.
For the broader library-selection and ExcelWriter context, see Automating Excel Report Generation.
Root Cause of Manual-Report Failures
Monthly reports break because the process is ad hoc:
- Different analysts export CSVs with different date formats (
2026-01-05vs01/05/2026). - Column names shift month to month (
Amount→Revenue→Total_Amount). - Someone forgets to run the VLOOKUP, or runs it on stale data.
- Formatting is re-applied by hand, introducing colour inconsistencies.
A Python script enforces a fixed schema, fails loudly on unexpected input, and applies formatting programmatically — the same way, every time.
Environment Setup
Python 3.9+ required.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install pandas openpyxl
Place raw monthly CSVs in data/ with names matching monthly_sales_*.csv. Each file must contain at minimum: date, region, product, revenue, units.
Step 1 — Confirm the Input Schema
Before any aggregation, assert that the files contain the expected columns.
# pip install pandas
from pathlib import Path
import glob
import pandas as pd
DATA_DIR = Path("data")
REQUIRED = {"date", "region", "product", "revenue", "units"}
files = sorted(DATA_DIR.glob("monthly_sales_*.csv"))
if not files:
raise SystemExit(f"No CSVs found in {DATA_DIR}")
frames = []
for f in files:
df = pd.read_csv(f)
missing = REQUIRED - set(df.columns.str.lower())
if missing:
raise ValueError(f"{f.name} is missing columns: {missing}")
df.columns = df.columns.str.lower() # normalise to lowercase
frames.append(df)
raw = pd.concat(frames, ignore_index=True)
print(f"Loaded {len(raw):,} rows from {len(files)} files")
print(raw.dtypes)
Step 2 — Normalise Types
Date and currency columns arriving as strings will silently produce wrong aggregations. Coerce them explicitly and drop rows that cannot be parsed.
# pip install pandas
# Continues from Step 1 — raw DataFrame already in scope
raw["date"] = pd.to_datetime(raw["date"], format="mixed", errors="coerce")
raw["revenue"] = pd.to_numeric(raw["revenue"].astype(str)
.str.replace(r"[$,]", "", regex=True),
errors="coerce")
raw["units"] = pd.to_numeric(raw["units"], errors="coerce")
n_before = len(raw)
raw.dropna(subset=["date", "revenue", "units"], inplace=True)
n_dropped = n_before - len(raw)
if n_dropped:
print(f"Warning: dropped {n_dropped} rows with unparseable values")
raw["month"] = raw["date"].dt.to_period("M").astype(str) # "2026-01" etc.
Step 3 — Aggregate with groupby
Two aggregations feed two separate sheets: a regional summary and a month-over-month trend.
# pip install pandas
# Continues from Step 2
# Regional summary
by_region = (
raw.groupby("region", as_index=False)
.agg(
total_revenue =("revenue", "sum"),
total_units =("units", "sum"),
transactions =("revenue", "count"),
)
.sort_values("total_revenue", ascending=False)
)
by_region["avg_order"] = (
by_region["total_revenue"] / by_region["transactions"]
).round(2)
# Monthly trend
by_month = (
raw.groupby("month", as_index=False)
.agg(total_revenue=("revenue", "sum"),
total_units =("units", "sum"))
.sort_values("month")
)
# Month-on-month revenue change
by_month["mom_change"] = by_month["total_revenue"].pct_change().round(4)
print(by_region)
print(by_month)
Always call .reset_index() — or pass as_index=False — after groupby. Failing to do so produces a MultiIndex that serialises to Excel with a blank header row.
Step 4 — Write the Multi-Sheet Workbook
pandas.ExcelWriter with the openpyxl engine gives direct access to the workbook object for styling after to_excel writes the data.
# pip install pandas openpyxl
from pathlib import Path
from datetime import datetime
import pandas as pd
from openpyxl.styles import Font, PatternFill, Alignment, numbers
OUTPUT_DIR = Path("reports")
OUTPUT_DIR.mkdir(exist_ok=True)
timestamp = datetime.now().strftime("%Y%m")
OUTPUT = OUTPUT_DIR / f"monthly_sales_{timestamp}.xlsx"
HEADER_COLOR = "2563eb"
HEADER_FONT = "ffffff"
try:
with pd.ExcelWriter(OUTPUT, engine="openpyxl") as writer:
by_region.to_excel(writer, sheet_name="Regional Summary", index=False)
by_month.to_excel( writer, sheet_name="Monthly Trend", index=False)
wb = writer.book
for sheet_name, frame in [("Regional Summary", by_region),
("Monthly Trend", by_month)]:
ws = wb[sheet_name]
# Style header row
hdr_fill = PatternFill("solid", fgColor=HEADER_COLOR)
hdr_font = Font(bold=True, color=HEADER_FONT, size=12)
for cell in ws[1]:
cell.fill = hdr_fill
cell.font = hdr_font
cell.alignment = Alignment(horizontal="center")
# Freeze header
ws.freeze_panes = "A2"
# Auto-width columns (approximate)
for col in ws.columns:
max_len = max(
len(str(cell.value)) if cell.value is not None else 0
for cell in col
)
ws.column_dimensions[col[0].column_letter].width = min(max_len + 4, 30)
# Currency format for revenue columns
ws_r = wb["Regional Summary"]
n = len(by_region)
for row in ws_r.iter_rows(min_row=2, max_row=n + 1, min_col=2, max_col=2):
for cell in row:
cell.number_format = "#,##0.00"
for row in ws_r.iter_rows(min_row=2, max_row=n + 1, min_col=5, max_col=5):
for cell in row:
cell.number_format = "#,##0.00"
# Totals row
total_row = n + 2
ws_r.cell(total_row, 1).value = "TOTAL"
ws_r.cell(total_row, 1).font = Font(bold=True)
ws_r.cell(total_row, 2).value = by_region["total_revenue"].sum()
ws_r.cell(total_row, 2).font = Font(bold=True)
ws_r.cell(total_row, 2).number_format = "#,##0.00"
ws_r.cell(total_row, 3).value = int(by_region["total_units"].sum())
ws_r.cell(total_row, 3).font = Font(bold=True)
except PermissionError:
raise SystemExit(f"Close {OUTPUT} in Excel and retry")
except Exception as exc:
raise SystemExit(f"Report failed: {exc}")
print(f"Saved: {OUTPUT}")
Variant Fixes
Variant A — Mixed Date Formats Across Files
If some files use MM/DD/YYYY and others YYYY-MM-DD, format="mixed" handles the detection automatically in pandas 2.0+. For older pandas, iterate with dateutil:
# pip install pandas python-dateutil
from dateutil import parser as du
raw["date"] = raw["date"].apply(
lambda v: du.parse(str(v)) if pd.notna(v) else pd.NaT
)
Variant B — Preserving a Corporate Template
Load the template with openpyxl.load_workbook (not read_only=True) and write into it. If you encounter AttributeError: read-only errors, the workbook was opened in the wrong mode — the Fix openpyxl Read-Only Mode Error guide covers this exactly.
# pip install openpyxl pandas
from pathlib import Path
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
TEMPLATE = Path("templates/sales_template.xlsx")
OUTPUT = Path("reports/sales_filled.xlsx")
try:
wb = load_workbook(TEMPLATE) # editable mode — no read_only kwarg
ws = wb["Data"]
# Clear previous data rows, keep header
for row in ws.iter_rows(min_row=2, max_row=ws.max_row):
for cell in row:
cell.value = None
for r_idx, row_data in enumerate(
dataframe_to_rows(by_region, index=False, header=False), start=2
):
for c_idx, val in enumerate(row_data, start=1):
ws.cell(row=r_idx, column=c_idx, value=val)
wb.save(OUTPUT)
except FileNotFoundError as exc:
raise SystemExit(f"Template missing: {exc}")
print(f"Template filled: {OUTPUT}")
Verification
Open the file programmatically to confirm structural correctness before treating the run as done:
# pip install openpyxl
from pathlib import Path
from openpyxl import load_workbook
OUTPUT = Path("reports") / f"monthly_sales_{datetime.now().strftime('%Y%m')}.xlsx"
wb = load_workbook(OUTPUT, read_only=True)
assert "Regional Summary" in wb.sheetnames, "Missing Regional Summary sheet"
assert "Monthly Trend" in wb.sheetnames, "Missing Monthly Trend sheet"
ws = wb["Regional Summary"]
rows = list(ws.iter_rows(values_only=True))
assert len(rows) >= 2, "No data rows"
assert rows[0][0] == "region", f"Unexpected header: {rows[0][0]}"
assert isinstance(rows[1][1], (int, float)), "Revenue not numeric"
wb.close()
print("Verification passed")
Adding Conditional Formatting to the Sales Sheet
Once the data is written, applying conditional formatting flags anomalies without requiring readers to sort manually. The openpyxl ConditionalFormattingList API accepts standard Excel rule types.
# pip install openpyxl
from pathlib import Path
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Font
from openpyxl.formatting.rule import CellIsRule
OUTPUT = Path("reports/monthly_sales_latest.xlsx")
try:
wb = load_workbook(OUTPUT) # open for editing — NOT read_only
ws = wb["Regional Summary"]
n_data = ws.max_row - 1 # rows below the header
rev_range = f"B2:B{n_data + 1}"
green_fill = PatternFill("solid", fgColor="C6EFCE")
red_fill = PatternFill("solid", fgColor="FFC7CE")
green_font = Font(color="006100")
red_font = Font(color="9C0006")
ws.conditional_formatting.add(rev_range,
CellIsRule(operator="greaterThanOrEqual", formula=["10000"],
fill=green_fill, font=green_font))
ws.conditional_formatting.add(rev_range,
CellIsRule(operator="lessThan", formula=["5000"],
fill=red_fill, font=red_font))
wb.save(OUTPUT)
except FileNotFoundError as exc:
raise SystemExit(f"Run the export step first: {exc}")
print("Conditional formatting applied")
The CellIsRule approach mirrors the Excel "Cell Value Is" rule. For data-bar or icon-set rules, use DataBarRule and IconSetRule from the same module.
Scheduling and Delivery
A report that runs on demand is still a manual process. The value comes from automating delivery on a schedule.
Linux/macOS cron — run at 07:00 on the first day of each month:
0 7 1 * * /path/to/.venv/bin/python /path/to/generate_report.py >> /var/log/sales_report.log 2>&1
GitHub Actions — on a cron trigger, useful when the team already uses GitHub for the data pipeline:
name: Monthly Sales Report
on:
schedule:
- cron: "0 7 1 * *"
jobs:
report:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install pandas openpyxl
- run: python generate_report.py
- uses: actions/upload-artifact@v4
with:
name: monthly-sales-report
path: reports/*.xlsx
Email delivery — attach the generated file and send via SMTP:
# pip install pandas openpyxl
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders
from pathlib import Path
def email_report(report_path: Path, recipient: str, smtp_host: str,
smtp_port: int, sender: str, password: str) -> None:
msg = MIMEMultipart()
msg["From"] = sender
msg["To"] = recipient
msg["Subject"] = f"Monthly Sales Report — {report_path.stem}"
msg.attach(MIMEText("Please find the monthly sales report attached.", "plain"))
with report_path.open("rb") as f:
part = MIMEBase("application", "octet-stream")
part.set_payload(f.read())
encoders.encode_base64(part)
part.add_header("Content-Disposition",
f'attachment; filename="{report_path.name}"')
msg.attach(part)
with smtplib.SMTP_SSL(smtp_host, smtp_port) as server:
server.login(sender, password)
server.sendmail(sender, recipient, msg.as_string())
print(f"Report emailed to {recipient}")
Store credentials in environment variables, not in the script itself.
Troubleshooting
| Error | Root cause | Fix |
|---|---|---|
ValueError: cannot reindex from a duplicate axis | Overlapping row indices after concat | Call pd.concat(frames, ignore_index=True) |
ValueError: time data ... does not match format | CSV mixes date formats | Use format="mixed" or errors="coerce" with a null drop |
openpyxl.utils.exceptions.IllegalCharacterError | Control characters (\x00) in string cells | df.replace(r"[\x00-\x08\x0b\x0c\x0e-\x1f]", "", regex=True) before export |
SettingWithCopyWarning | Chained indexing on a DataFrame slice | Replace df['col'] = val with df.loc[:, 'col'] = val |
Totals row shows 0 | Formula written to wrong row index (off-by-one between Excel 1-indexed and Python 0-indexed) | Verify total_row = len(df) + 2 (header is row 1, data starts row 2) |
FAQ
How do I schedule this script to run on the first business day of each month?
On Linux/macOS use cron: 0 8 1 * * /path/.venv/bin/python /path/generate_report.py. Add a guard at the top of the script: from datetime import date; assert date.today().weekday() < 5, "Weekend — skipping".
Why does my pivot lose the header when exported to Excel?DataFrame.pivot_table returns a DataFrame with a named index. Pass index=False to to_excel, or call .reset_index() on the pivot result first.
Can I email the report automatically after generation?
Yes. After saving the file, use Python's smtplib with email.mime.multipart to attach the .xlsx and send via SMTP. For a full scheduling-and-delivery pattern, see Automating Excel Report Generation.
Related
- Automating Excel Report Generation — full pipeline with conditional formatting, charts, and multi-engine walkthrough
- Fix openpyxl Read-Only Mode Error — resolve write errors when loading templates
- Cleaning Messy CSV Data with pandas — schema normalisation before aggregation
- Reading Excel Files with Python — ingestion patterns when source data is already
.xlsx
Part of Automating Excel Report Generation.