Converting Excel to JSON with Python

Passing pd.read_excel output directly to Python's json.dumps raises TypeError: Object of type Timestamp is not JSON serializable because the standard json module has no handler for NumPy or pandas types. This guide shows how to read an Excel workbook with pandas, pick the right orient option, fix serialization errors, and handle nested structures and multi-sheet workbooks. For consolidating several workbooks before exporting, see Merging Multiple Spreadsheets.

Root Cause

pandas auto-infers column types on read_excel. Date columns become datetime64[ns]; empty cells become numpy.nan or pandas.NaT. Neither type is recognized by the RFC 8259 JSON spec, so json.dumps raises immediately.

# pip install pandas openpyxl
import pandas as pd
import json

df = pd.read_excel("report.xlsx", engine="openpyxl")
json.dumps(df.to_dict(orient="records"))
# TypeError: Object of type Timestamp is not JSON serializable

Run df.dtypes and df.isna().sum() first to locate offending columns before writing a conversion script.

The three types most likely to break serialization are:

  • datetime64[ns] — pandas timestamp, no JSON equivalent without conversion
  • float64 containing numpy.nan — RFC 8259 does not allow NaN as a JSON value
  • object columns with mixed types — could contain pd.Timestamp, numpy.int64, or plain None

Prerequisites

# pip install pandas openpyxl
pip install pandas openpyxl

Create a test workbook to follow along:

# pip install pandas openpyxl
import pandas as pd
from pathlib import Path
from datetime import date

pd.DataFrame({
    "order_id": [1001, 1002, 1003],
    "customer":  ["Acme", "Globex", None],
    "order_date": pd.to_datetime(["2026-01-15", "2026-02-03", "2026-03-22"]),
    "amount":    [1250.50, 980.00, 2100.75],
}).to_excel(Path("report.xlsx"), index=False, engine="openpyxl")

Step 1 — Read and Inspect

Always inspect dtypes and null counts before attempting serialization. It takes two lines and prevents mysterious TypeError failures later.

# pip install pandas openpyxl
from pathlib import Path
import pandas as pd

SOURCE = Path("report.xlsx")

try:
    df = pd.read_excel(SOURCE, engine="openpyxl")
except FileNotFoundError as exc:
    raise SystemExit(f"File not found: {exc}")

print(df.dtypes)
# order_id               int64
# customer              object
# order_date    datetime64[ns]   ← will fail json.dumps
# amount               float64

print(df.isna().sum())
# customer    1  ← one null → NaN in a float column raises ValueError in json.dumps

The engine="openpyxl" parameter is required for .xlsx files. Using the old xlrd engine for .xlsx raises XLRDError — xlrd 2.x dropped .xlsx support. If you work with .xls (old binary format), use engine="xlrd". For more on engine selection and sheet options, see Reading Excel Files with Python.

Step 2 — Choose an orient

DataFrame.to_json(orient=...) and DataFrame.to_dict(orient=...) both take an orient argument that controls the JSON shape.

orientJSON shapeBest for
"records"[{col: val, ...}, ...]REST APIs, most consumers
"split"{columns: [...], data: [[...]]}Compact transfer, reconstruct with pd.read_json
"index"{row_index: {col: val}}Keyed lookup by row index
"columns"{col: {row_index: val}}Column-oriented stores
"values"[[val, ...], ...]Minimal size, no header
"table"JSON Table Schema + dataSelf-describing, pandas round-trip

"records" is the right default for API output. Use "table" if you need to reconstruct exact dtypes with pd.read_json(orient="table").

The orient also affects how NaN and None appear in the output. With orient="records" and df.to_json(), pandas renders NaN as null automatically — no manual replacement needed. With json.dumps(df.to_dict()), pandas NaN values survive as Python float('nan'), which the json module rejects with ValueError: Out of range float values are not JSON compliant. The fix is df.where(df.notna(), other=None) before calling to_dict.

Step 3 — Fix Serialization Errors

Option A — use to_json() (simplest)

DataFrame.to_json handles NumPy and pandas types automatically. The date_format="iso" argument converts Timestamp to ISO 8601 strings.

# pip install pandas openpyxl
from pathlib import Path
import pandas as pd

SOURCE = Path("report.xlsx")
DEST   = Path("output.json")

try:
    df = pd.read_excel(SOURCE, engine="openpyxl")
except FileNotFoundError as exc:
    raise SystemExit(f"File not found: {exc}")

# Replace NaN with None so JSON gets null, not NaN
df = df.where(df.notna(), other=None)

df.to_json(
    DEST,
    orient="records",
    date_format="iso",   # Timestamps → "2026-01-15T00:00:00"
    indent=2,
    force_ascii=False,   # preserve non-ASCII characters
)
print(f"Wrote {len(df)} records to {DEST}")

Option B — custom encoder for json.dumps()

When you need json.dumps (e.g., to embed the JSON in a larger dict), write a small encoder:

# pip install pandas openpyxl
from pathlib import Path
import json
import pandas as pd

SOURCE = Path("report.xlsx")

df = pd.read_excel(SOURCE, engine="openpyxl")
df = df.where(df.notna(), other=None)   # NaN → None → JSON null

class PandasEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (pd.Timestamp,)):
            return obj.isoformat()       # "2026-01-15T00:00:00"
        if hasattr(obj, "item"):         # numpy scalar (int64, float64, etc.)
            return obj.item()
        return super().default(obj)

records = df.to_dict(orient="records")
payload = json.dumps({"data": records, "count": len(records)}, cls=PandasEncoder, indent=2)
print(payload[:200])

obj.item() converts any NumPy scalar (int64, float64) to its native Python equivalent, which the standard json module can serialize.

When you control the read step, a simpler alternative is to force all columns to Python-native types at load time:

# pip install pandas openpyxl
from pathlib import Path
import pandas as pd
import json

SOURCE = Path("report.xlsx")

df = pd.read_excel(SOURCE, engine="openpyxl")

# Convert datetime cols to ISO strings; convert NaN → None
for col in df.select_dtypes(include=["datetime64[ns]", "datetimetz"]).columns:
    df[col] = df[col].dt.strftime("%Y-%m-%dT%H:%M:%S")   # ISO 8601

df = df.where(df.notna(), other=None)

# Now to_dict produces only native Python types: str, int, float, None
records = df.to_dict(orient="records")
print(json.dumps(records[:1], indent=2))   # no encoder needed

This approach avoids the custom encoder entirely and is easier to debug.

Step 4 — Nested Structures

Sometimes the flat records shape is wrong — the consumer expects grouped or hierarchical JSON. Build it from groupby:

# pip install pandas openpyxl
from pathlib import Path
import json
import pandas as pd

SOURCE = Path("report.xlsx")

df = pd.read_excel(SOURCE, engine="openpyxl")
df["order_date"] = df["order_date"].dt.strftime("%Y-%m-%d")  # date → string up front
df = df.where(df.notna(), other=None)

# Group orders by customer → {customer: [order, ...]}
nested = (
    df.groupby("customer", dropna=False)
      .apply(lambda g: g.drop(columns="customer").to_dict(orient="records"), include_groups=False)
      .to_dict()
)

print(json.dumps(nested, indent=2)[:400])

dt.strftime converts the date column to a plain string before the groupby, removing any remaining Timestamp objects.

A common variant is to produce a two-level nested structure: one object per customer, with a nested array of orders. The same groupby pattern applies — just choose the grouping key to match the consumer's expected shape.

Variant — Convert All Sheets

pd.read_excel(sheet_name=None) returns a dict of DataFrames — one key per sheet. This pattern, also used in Merging Multiple Spreadsheets, maps directly to a multi-sheet JSON structure:

# pip install pandas openpyxl
from pathlib import Path
import json
import pandas as pd

SOURCE = Path("annual_report.xlsx")
DEST   = Path("annual_report.json")

try:
    workbook: dict[str, pd.DataFrame] = pd.read_excel(
        SOURCE, sheet_name=None, engine="openpyxl"
    )
except FileNotFoundError as exc:
    raise SystemExit(f"File not found: {exc}")

result: dict[str, list] = {}
for sheet_name, df in workbook.items():
    df["order_date"] = pd.to_datetime(df.get("order_date", pd.Series(dtype="object")),
                                       errors="coerce").dt.strftime("%Y-%m-%d")
    df = df.where(df.notna(), other=None)
    result[sheet_name] = df.to_dict(orient="records")

DEST.write_text(json.dumps(result, indent=2, ensure_ascii=False))
print(f"Exported {len(result)} sheets to {DEST}")

Variant — Stream Large Workbooks via CSV

For very large Excel files (tens of thousands of rows), writing to JSON can exhaust memory. Convert to CSV first, then stream-process the CSV into JSON lines format, which most log-ingestion and analytics platforms accept:

# pip install pandas openpyxl
from pathlib import Path
import pandas as pd

SOURCE = Path("large_report.xlsx")
DEST   = Path("large_report.jsonl")   # one JSON object per line

try:
    # Read only needed columns to reduce memory footprint
    df = pd.read_excel(SOURCE, engine="openpyxl", usecols=["order_id", "customer", "order_date", "amount"])
except FileNotFoundError as exc:
    raise SystemExit(f"File not found: {exc}")

df["order_date"] = df["order_date"].dt.strftime("%Y-%m-%d")
df = df.where(df.notna(), other=None)

with DEST.open("w", encoding="utf-8") as fh:
    for record in df.to_dict(orient="records"):
        import json
        fh.write(json.dumps(record, ensure_ascii=False) + "\n")

print(f"Wrote {len(df)} JSON lines to {DEST}")

JSON Lines (.jsonl) lets consumers read one record at a time without loading the entire file into memory.

Verification

# pip install pandas openpyxl
import json
from pathlib import Path

raw = Path("output.json").read_text()

try:
    data = json.loads(raw)
except json.JSONDecodeError as exc:
    raise SystemExit(f"Invalid JSON: {exc}")

assert isinstance(data, list),             "Expected a JSON array at top level"
assert all("order_id" in r for r in data), "Missing order_id in at least one record"

# Confirm no NaN survived serialization
raw_text = Path("output.json").read_text()
assert "NaN" not in raw_text, "NaN found in JSON output — replace before serializing"

print(f"Valid JSON — {len(data)} records, keys: {list(data[0].keys())}")

Round-trip back to a DataFrame to confirm no data was lost:

# pip install pandas openpyxl
import pandas as pd

df_check = pd.read_json("output.json", orient="records")
print(df_check.dtypes)
print(df_check.head())

If date columns come back as integers (millisecond epoch), pass convert_dates=["order_date"] to pd.read_json to restore them as datetime64.

FAQ

Why does df.to_json() produce epoch integers for dates instead of ISO strings? The default date_format for to_json is "epoch" (milliseconds since 1970-01-01). Pass date_format="iso" to get "2026-01-15T00:00:00".

How do I keep integer columns as integers instead of floats in the JSON output? pandas promotes integer columns that contain NaN to float64 (because integers cannot represent NaN). Replace NaN with a sentinel (e.g., 0 or -1) before serializing, or use pd.Int64Dtype (nullable integer dtype) so the column stays integer-typed even with missing values.

Can I convert a DataFrame to JSON without writing a file? Yes. df.to_json(orient="records") returns a string when no path is passed. You can embed it directly in an API response or pass it to json.loads.

What is the fastest way to convert a large Excel file to JSON? Read only the columns you need with usecols=, convert date columns up front with dt.strftime, replace NaN with None, then write JSON Lines with a streaming loop. Avoid loading the entire output string into memory before writing.

Common Mistakes

MistakeResultFix
json.dumps(df.to_dict()) without whereValueError: Out of range float values are not JSON compliant for NaNReplace NaN with None before serialization
df.to_json() without date_format="iso"Dates serialize as millisecond epoch integersPass date_format="iso"
orient="columns" for API outputNested {col: {idx: val}} structure breaks most REST consumersUse orient="records"
Keeping index=True in to_jsonExtra numeric index key in every recordDefault is index=True; pass index=False with orient="records"

Part of Merging Multiple Spreadsheets.

/html>