Automating Monthly Sales Reports in Excel

Manual compilation of monthly sales data introduces VLOOKUP failures, inconsistent date parsing, and formatting drift between contributors. This guide replaces that workflow with a deterministic Python pipeline: ingest multiple CSV exports, normalise the schema, aggregate with pandas groupby, and write a styled multi-sheet .xlsx file with openpyxl. The result is a reproducible script you schedule once and never touch again.

For the broader library-selection and ExcelWriter context, see Automating Excel Report Generation.

Root Cause of Manual-Report Failures

Monthly reports break because the process is ad hoc:

  • Different analysts export CSVs with different date formats (2026-01-05 vs 01/05/2026).
  • Column names shift month to month (AmountRevenueTotal_Amount).
  • Someone forgets to run the VLOOKUP, or runs it on stale data.
  • Formatting is re-applied by hand, introducing colour inconsistencies.

A Python script enforces a fixed schema, fails loudly on unexpected input, and applies formatting programmatically — the same way, every time.

Environment Setup

Python 3.9+ required.

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install pandas openpyxl

Place raw monthly CSVs in data/ with names matching monthly_sales_*.csv. Each file must contain at minimum: date, region, product, revenue, units.

Step 1 — Confirm the Input Schema

Before any aggregation, assert that the files contain the expected columns.

# pip install pandas
from pathlib import Path
import glob
import pandas as pd

DATA_DIR = Path("data")
REQUIRED = {"date", "region", "product", "revenue", "units"}

files = sorted(DATA_DIR.glob("monthly_sales_*.csv"))
if not files:
    raise SystemExit(f"No CSVs found in {DATA_DIR}")

frames = []
for f in files:
    df = pd.read_csv(f)
    missing = REQUIRED - set(df.columns.str.lower())
    if missing:
        raise ValueError(f"{f.name} is missing columns: {missing}")
    df.columns = df.columns.str.lower()   # normalise to lowercase
    frames.append(df)

raw = pd.concat(frames, ignore_index=True)
print(f"Loaded {len(raw):,} rows from {len(files)} files")
print(raw.dtypes)

Step 2 — Normalise Types

Date and currency columns arriving as strings will silently produce wrong aggregations. Coerce them explicitly and drop rows that cannot be parsed.

# pip install pandas
# Continues from Step 1 — raw DataFrame already in scope

raw["date"]    = pd.to_datetime(raw["date"], format="mixed", errors="coerce")
raw["revenue"] = pd.to_numeric(raw["revenue"].astype(str)
                                .str.replace(r"[$,]", "", regex=True),
                                errors="coerce")
raw["units"]   = pd.to_numeric(raw["units"], errors="coerce")

n_before = len(raw)
raw.dropna(subset=["date", "revenue", "units"], inplace=True)
n_dropped = n_before - len(raw)
if n_dropped:
    print(f"Warning: dropped {n_dropped} rows with unparseable values")

raw["month"] = raw["date"].dt.to_period("M").astype(str)  # "2026-01" etc.

Step 3 — Aggregate with groupby

Two aggregations feed two separate sheets: a regional summary and a month-over-month trend.

# pip install pandas
# Continues from Step 2

# Regional summary
by_region = (
    raw.groupby("region", as_index=False)
    .agg(
        total_revenue  =("revenue", "sum"),
        total_units    =("units",   "sum"),
        transactions   =("revenue", "count"),
    )
    .sort_values("total_revenue", ascending=False)
)
by_region["avg_order"] = (
    by_region["total_revenue"] / by_region["transactions"]
).round(2)

# Monthly trend
by_month = (
    raw.groupby("month", as_index=False)
    .agg(total_revenue=("revenue", "sum"),
         total_units  =("units",   "sum"))
    .sort_values("month")
)

# Month-on-month revenue change
by_month["mom_change"] = by_month["total_revenue"].pct_change().round(4)

print(by_region)
print(by_month)

Always call .reset_index() — or pass as_index=False — after groupby. Failing to do so produces a MultiIndex that serialises to Excel with a blank header row.

Step 4 — Write the Multi-Sheet Workbook

pandas.ExcelWriter with the openpyxl engine gives direct access to the workbook object for styling after to_excel writes the data.

# pip install pandas openpyxl
from pathlib import Path
from datetime import datetime
import pandas as pd
from openpyxl.styles import Font, PatternFill, Alignment, numbers

OUTPUT_DIR = Path("reports")
OUTPUT_DIR.mkdir(exist_ok=True)
timestamp  = datetime.now().strftime("%Y%m")
OUTPUT     = OUTPUT_DIR / f"monthly_sales_{timestamp}.xlsx"

HEADER_COLOR = "2563eb"
HEADER_FONT  = "ffffff"

try:
    with pd.ExcelWriter(OUTPUT, engine="openpyxl") as writer:
        by_region.to_excel(writer, sheet_name="Regional Summary", index=False)
        by_month.to_excel( writer, sheet_name="Monthly Trend",    index=False)

        wb = writer.book

        for sheet_name, frame in [("Regional Summary", by_region),
                                   ("Monthly Trend",    by_month)]:
            ws = wb[sheet_name]

            # Style header row
            hdr_fill = PatternFill("solid", fgColor=HEADER_COLOR)
            hdr_font = Font(bold=True, color=HEADER_FONT, size=12)
            for cell in ws[1]:
                cell.fill      = hdr_fill
                cell.font      = hdr_font
                cell.alignment = Alignment(horizontal="center")

            # Freeze header
            ws.freeze_panes = "A2"

            # Auto-width columns (approximate)
            for col in ws.columns:
                max_len = max(
                    len(str(cell.value)) if cell.value is not None else 0
                    for cell in col
                )
                ws.column_dimensions[col[0].column_letter].width = min(max_len + 4, 30)

        # Currency format for revenue columns
        ws_r = wb["Regional Summary"]
        n    = len(by_region)
        for row in ws_r.iter_rows(min_row=2, max_row=n + 1, min_col=2, max_col=2):
            for cell in row:
                cell.number_format = "#,##0.00"
        for row in ws_r.iter_rows(min_row=2, max_row=n + 1, min_col=5, max_col=5):
            for cell in row:
                cell.number_format = "#,##0.00"

        # Totals row
        total_row = n + 2
        ws_r.cell(total_row, 1).value = "TOTAL"
        ws_r.cell(total_row, 1).font  = Font(bold=True)
        ws_r.cell(total_row, 2).value = by_region["total_revenue"].sum()
        ws_r.cell(total_row, 2).font  = Font(bold=True)
        ws_r.cell(total_row, 2).number_format = "#,##0.00"
        ws_r.cell(total_row, 3).value = int(by_region["total_units"].sum())
        ws_r.cell(total_row, 3).font  = Font(bold=True)

except PermissionError:
    raise SystemExit(f"Close {OUTPUT} in Excel and retry")
except Exception as exc:
    raise SystemExit(f"Report failed: {exc}")

print(f"Saved: {OUTPUT}")

Variant Fixes

Variant A — Mixed Date Formats Across Files

If some files use MM/DD/YYYY and others YYYY-MM-DD, format="mixed" handles the detection automatically in pandas 2.0+. For older pandas, iterate with dateutil:

# pip install pandas python-dateutil
from dateutil import parser as du

raw["date"] = raw["date"].apply(
    lambda v: du.parse(str(v)) if pd.notna(v) else pd.NaT
)

Variant B — Preserving a Corporate Template

Load the template with openpyxl.load_workbook (not read_only=True) and write into it. If you encounter AttributeError: read-only errors, the workbook was opened in the wrong mode — the Fix openpyxl Read-Only Mode Error guide covers this exactly.

# pip install openpyxl pandas
from pathlib import Path
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

TEMPLATE = Path("templates/sales_template.xlsx")
OUTPUT   = Path("reports/sales_filled.xlsx")

try:
    wb = load_workbook(TEMPLATE)   # editable mode — no read_only kwarg
    ws = wb["Data"]

    # Clear previous data rows, keep header
    for row in ws.iter_rows(min_row=2, max_row=ws.max_row):
        for cell in row:
            cell.value = None

    for r_idx, row_data in enumerate(
        dataframe_to_rows(by_region, index=False, header=False), start=2
    ):
        for c_idx, val in enumerate(row_data, start=1):
            ws.cell(row=r_idx, column=c_idx, value=val)

    wb.save(OUTPUT)
except FileNotFoundError as exc:
    raise SystemExit(f"Template missing: {exc}")

print(f"Template filled: {OUTPUT}")

Verification

Open the file programmatically to confirm structural correctness before treating the run as done:

# pip install openpyxl
from pathlib import Path
from openpyxl import load_workbook

OUTPUT = Path("reports") / f"monthly_sales_{datetime.now().strftime('%Y%m')}.xlsx"

wb = load_workbook(OUTPUT, read_only=True)
assert "Regional Summary" in wb.sheetnames, "Missing Regional Summary sheet"
assert "Monthly Trend"    in wb.sheetnames, "Missing Monthly Trend sheet"

ws = wb["Regional Summary"]
rows = list(ws.iter_rows(values_only=True))
assert len(rows) >= 2,          "No data rows"
assert rows[0][0] == "region",  f"Unexpected header: {rows[0][0]}"
assert isinstance(rows[1][1], (int, float)), "Revenue not numeric"
wb.close()
print("Verification passed")

Adding Conditional Formatting to the Sales Sheet

Once the data is written, applying conditional formatting flags anomalies without requiring readers to sort manually. The openpyxl ConditionalFormattingList API accepts standard Excel rule types.

# pip install openpyxl
from pathlib import Path
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Font
from openpyxl.formatting.rule import CellIsRule

OUTPUT = Path("reports/monthly_sales_latest.xlsx")

try:
    wb = load_workbook(OUTPUT)       # open for editing — NOT read_only
    ws = wb["Regional Summary"]

    n_data = ws.max_row - 1         # rows below the header
    rev_range = f"B2:B{n_data + 1}"

    green_fill = PatternFill("solid", fgColor="C6EFCE")
    red_fill   = PatternFill("solid", fgColor="FFC7CE")
    green_font = Font(color="006100")
    red_font   = Font(color="9C0006")

    ws.conditional_formatting.add(rev_range,
        CellIsRule(operator="greaterThanOrEqual", formula=["10000"],
                   fill=green_fill, font=green_font))
    ws.conditional_formatting.add(rev_range,
        CellIsRule(operator="lessThan", formula=["5000"],
                   fill=red_fill, font=red_font))

    wb.save(OUTPUT)
except FileNotFoundError as exc:
    raise SystemExit(f"Run the export step first: {exc}")

print("Conditional formatting applied")

The CellIsRule approach mirrors the Excel "Cell Value Is" rule. For data-bar or icon-set rules, use DataBarRule and IconSetRule from the same module.

Scheduling and Delivery

A report that runs on demand is still a manual process. The value comes from automating delivery on a schedule.

Linux/macOS cron — run at 07:00 on the first day of each month:

0 7 1 * * /path/to/.venv/bin/python /path/to/generate_report.py >> /var/log/sales_report.log 2>&1

GitHub Actions — on a cron trigger, useful when the team already uses GitHub for the data pipeline:

name: Monthly Sales Report
on:
  schedule:
    - cron: "0 7 1 * *"
jobs:
  report:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install pandas openpyxl
      - run: python generate_report.py
      - uses: actions/upload-artifact@v4
        with:
          name: monthly-sales-report
          path: reports/*.xlsx

Email delivery — attach the generated file and send via SMTP:

# pip install pandas openpyxl
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders
from pathlib import Path

def email_report(report_path: Path, recipient: str, smtp_host: str,
                 smtp_port: int, sender: str, password: str) -> None:
    msg = MIMEMultipart()
    msg["From"]    = sender
    msg["To"]      = recipient
    msg["Subject"] = f"Monthly Sales Report — {report_path.stem}"
    msg.attach(MIMEText("Please find the monthly sales report attached.", "plain"))

    with report_path.open("rb") as f:
        part = MIMEBase("application", "octet-stream")
        part.set_payload(f.read())
    encoders.encode_base64(part)
    part.add_header("Content-Disposition",
                    f'attachment; filename="{report_path.name}"')
    msg.attach(part)

    with smtplib.SMTP_SSL(smtp_host, smtp_port) as server:
        server.login(sender, password)
        server.sendmail(sender, recipient, msg.as_string())
    print(f"Report emailed to {recipient}")

Store credentials in environment variables, not in the script itself.

Troubleshooting

ErrorRoot causeFix
ValueError: cannot reindex from a duplicate axisOverlapping row indices after concatCall pd.concat(frames, ignore_index=True)
ValueError: time data ... does not match formatCSV mixes date formatsUse format="mixed" or errors="coerce" with a null drop
openpyxl.utils.exceptions.IllegalCharacterErrorControl characters (\x00) in string cellsdf.replace(r"[\x00-\x08\x0b\x0c\x0e-\x1f]", "", regex=True) before export
SettingWithCopyWarningChained indexing on a DataFrame sliceReplace df['col'] = val with df.loc[:, 'col'] = val
Totals row shows 0Formula written to wrong row index (off-by-one between Excel 1-indexed and Python 0-indexed)Verify total_row = len(df) + 2 (header is row 1, data starts row 2)

FAQ

How do I schedule this script to run on the first business day of each month? On Linux/macOS use cron: 0 8 1 * * /path/.venv/bin/python /path/generate_report.py. Add a guard at the top of the script: from datetime import date; assert date.today().weekday() < 5, "Weekend — skipping".

Why does my pivot lose the header when exported to Excel?DataFrame.pivot_table returns a DataFrame with a named index. Pass index=False to to_excel, or call .reset_index() on the pivot result first.

Can I email the report automatically after generation? Yes. After saving the file, use Python's smtplib with email.mime.multipart to attach the .xlsx and send via SMTP. For a full scheduling-and-delivery pattern, see Automating Excel Report Generation.

Part of Automating Excel Report Generation.