Converting Excel to JSON with Python
Passing pd.read_excel output directly to Python's json.dumps raises TypeError: Object of type Timestamp is not JSON serializable because the standard json module has no handler for NumPy or pandas types. This guide shows how to read an Excel workbook with pandas, pick the right orient option, fix serialization errors, and handle nested structures and multi-sheet workbooks. For consolidating several workbooks before exporting, see Merging Multiple Spreadsheets.
Root Cause
pandas auto-infers column types on read_excel. Date columns become datetime64[ns]; empty cells become numpy.nan or pandas.NaT. Neither type is recognized by the RFC 8259 JSON spec, so json.dumps raises immediately.
# pip install pandas openpyxl
import pandas as pd
import json
df = pd.read_excel("report.xlsx", engine="openpyxl")
json.dumps(df.to_dict(orient="records"))
# TypeError: Object of type Timestamp is not JSON serializable
Run df.dtypes and df.isna().sum() first to locate offending columns before writing a conversion script.
The three types most likely to break serialization are:
datetime64[ns]— pandas timestamp, no JSON equivalent without conversionfloat64containingnumpy.nan— RFC 8259 does not allowNaNas a JSON valueobjectcolumns with mixed types — could containpd.Timestamp,numpy.int64, or plainNone
Prerequisites
# pip install pandas openpyxl
pip install pandas openpyxl
Create a test workbook to follow along:
# pip install pandas openpyxl
import pandas as pd
from pathlib import Path
from datetime import date
pd.DataFrame({
"order_id": [1001, 1002, 1003],
"customer": ["Acme", "Globex", None],
"order_date": pd.to_datetime(["2026-01-15", "2026-02-03", "2026-03-22"]),
"amount": [1250.50, 980.00, 2100.75],
}).to_excel(Path("report.xlsx"), index=False, engine="openpyxl")
Step 1 — Read and Inspect
Always inspect dtypes and null counts before attempting serialization. It takes two lines and prevents mysterious TypeError failures later.
# pip install pandas openpyxl
from pathlib import Path
import pandas as pd
SOURCE = Path("report.xlsx")
try:
df = pd.read_excel(SOURCE, engine="openpyxl")
except FileNotFoundError as exc:
raise SystemExit(f"File not found: {exc}")
print(df.dtypes)
# order_id int64
# customer object
# order_date datetime64[ns] ← will fail json.dumps
# amount float64
print(df.isna().sum())
# customer 1 ← one null → NaN in a float column raises ValueError in json.dumps
The engine="openpyxl" parameter is required for .xlsx files. Using the old xlrd engine for .xlsx raises XLRDError — xlrd 2.x dropped .xlsx support. If you work with .xls (old binary format), use engine="xlrd". For more on engine selection and sheet options, see Reading Excel Files with Python.
Step 2 — Choose an orient
DataFrame.to_json(orient=...) and DataFrame.to_dict(orient=...) both take an orient argument that controls the JSON shape.
| orient | JSON shape | Best for |
|---|---|---|
"records" | [{col: val, ...}, ...] | REST APIs, most consumers |
"split" | {columns: [...], data: [[...]]} | Compact transfer, reconstruct with pd.read_json |
"index" | {row_index: {col: val}} | Keyed lookup by row index |
"columns" | {col: {row_index: val}} | Column-oriented stores |
"values" | [[val, ...], ...] | Minimal size, no header |
"table" | JSON Table Schema + data | Self-describing, pandas round-trip |
"records" is the right default for API output. Use "table" if you need to reconstruct exact dtypes with pd.read_json(orient="table").
The orient also affects how NaN and None appear in the output. With orient="records" and df.to_json(), pandas renders NaN as null automatically — no manual replacement needed. With json.dumps(df.to_dict()), pandas NaN values survive as Python float('nan'), which the json module rejects with ValueError: Out of range float values are not JSON compliant. The fix is df.where(df.notna(), other=None) before calling to_dict.
Step 3 — Fix Serialization Errors
Option A — use to_json() (simplest)
DataFrame.to_json handles NumPy and pandas types automatically. The date_format="iso" argument converts Timestamp to ISO 8601 strings.
# pip install pandas openpyxl
from pathlib import Path
import pandas as pd
SOURCE = Path("report.xlsx")
DEST = Path("output.json")
try:
df = pd.read_excel(SOURCE, engine="openpyxl")
except FileNotFoundError as exc:
raise SystemExit(f"File not found: {exc}")
# Replace NaN with None so JSON gets null, not NaN
df = df.where(df.notna(), other=None)
df.to_json(
DEST,
orient="records",
date_format="iso", # Timestamps → "2026-01-15T00:00:00"
indent=2,
force_ascii=False, # preserve non-ASCII characters
)
print(f"Wrote {len(df)} records to {DEST}")
Option B — custom encoder for json.dumps()
When you need json.dumps (e.g., to embed the JSON in a larger dict), write a small encoder:
# pip install pandas openpyxl
from pathlib import Path
import json
import pandas as pd
SOURCE = Path("report.xlsx")
df = pd.read_excel(SOURCE, engine="openpyxl")
df = df.where(df.notna(), other=None) # NaN → None → JSON null
class PandasEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (pd.Timestamp,)):
return obj.isoformat() # "2026-01-15T00:00:00"
if hasattr(obj, "item"): # numpy scalar (int64, float64, etc.)
return obj.item()
return super().default(obj)
records = df.to_dict(orient="records")
payload = json.dumps({"data": records, "count": len(records)}, cls=PandasEncoder, indent=2)
print(payload[:200])
obj.item() converts any NumPy scalar (int64, float64) to its native Python equivalent, which the standard json module can serialize.
When you control the read step, a simpler alternative is to force all columns to Python-native types at load time:
# pip install pandas openpyxl
from pathlib import Path
import pandas as pd
import json
SOURCE = Path("report.xlsx")
df = pd.read_excel(SOURCE, engine="openpyxl")
# Convert datetime cols to ISO strings; convert NaN → None
for col in df.select_dtypes(include=["datetime64[ns]", "datetimetz"]).columns:
df[col] = df[col].dt.strftime("%Y-%m-%dT%H:%M:%S") # ISO 8601
df = df.where(df.notna(), other=None)
# Now to_dict produces only native Python types: str, int, float, None
records = df.to_dict(orient="records")
print(json.dumps(records[:1], indent=2)) # no encoder needed
This approach avoids the custom encoder entirely and is easier to debug.
Step 4 — Nested Structures
Sometimes the flat records shape is wrong — the consumer expects grouped or hierarchical JSON. Build it from groupby:
# pip install pandas openpyxl
from pathlib import Path
import json
import pandas as pd
SOURCE = Path("report.xlsx")
df = pd.read_excel(SOURCE, engine="openpyxl")
df["order_date"] = df["order_date"].dt.strftime("%Y-%m-%d") # date → string up front
df = df.where(df.notna(), other=None)
# Group orders by customer → {customer: [order, ...]}
nested = (
df.groupby("customer", dropna=False)
.apply(lambda g: g.drop(columns="customer").to_dict(orient="records"), include_groups=False)
.to_dict()
)
print(json.dumps(nested, indent=2)[:400])
dt.strftime converts the date column to a plain string before the groupby, removing any remaining Timestamp objects.
A common variant is to produce a two-level nested structure: one object per customer, with a nested array of orders. The same groupby pattern applies — just choose the grouping key to match the consumer's expected shape.
Variant — Convert All Sheets
pd.read_excel(sheet_name=None) returns a dict of DataFrames — one key per sheet. This pattern, also used in Merging Multiple Spreadsheets, maps directly to a multi-sheet JSON structure:
# pip install pandas openpyxl
from pathlib import Path
import json
import pandas as pd
SOURCE = Path("annual_report.xlsx")
DEST = Path("annual_report.json")
try:
workbook: dict[str, pd.DataFrame] = pd.read_excel(
SOURCE, sheet_name=None, engine="openpyxl"
)
except FileNotFoundError as exc:
raise SystemExit(f"File not found: {exc}")
result: dict[str, list] = {}
for sheet_name, df in workbook.items():
df["order_date"] = pd.to_datetime(df.get("order_date", pd.Series(dtype="object")),
errors="coerce").dt.strftime("%Y-%m-%d")
df = df.where(df.notna(), other=None)
result[sheet_name] = df.to_dict(orient="records")
DEST.write_text(json.dumps(result, indent=2, ensure_ascii=False))
print(f"Exported {len(result)} sheets to {DEST}")
Variant — Stream Large Workbooks via CSV
For very large Excel files (tens of thousands of rows), writing to JSON can exhaust memory. Convert to CSV first, then stream-process the CSV into JSON lines format, which most log-ingestion and analytics platforms accept:
# pip install pandas openpyxl
from pathlib import Path
import pandas as pd
SOURCE = Path("large_report.xlsx")
DEST = Path("large_report.jsonl") # one JSON object per line
try:
# Read only needed columns to reduce memory footprint
df = pd.read_excel(SOURCE, engine="openpyxl", usecols=["order_id", "customer", "order_date", "amount"])
except FileNotFoundError as exc:
raise SystemExit(f"File not found: {exc}")
df["order_date"] = df["order_date"].dt.strftime("%Y-%m-%d")
df = df.where(df.notna(), other=None)
with DEST.open("w", encoding="utf-8") as fh:
for record in df.to_dict(orient="records"):
import json
fh.write(json.dumps(record, ensure_ascii=False) + "\n")
print(f"Wrote {len(df)} JSON lines to {DEST}")
JSON Lines (.jsonl) lets consumers read one record at a time without loading the entire file into memory.
Verification
# pip install pandas openpyxl
import json
from pathlib import Path
raw = Path("output.json").read_text()
try:
data = json.loads(raw)
except json.JSONDecodeError as exc:
raise SystemExit(f"Invalid JSON: {exc}")
assert isinstance(data, list), "Expected a JSON array at top level"
assert all("order_id" in r for r in data), "Missing order_id in at least one record"
# Confirm no NaN survived serialization
raw_text = Path("output.json").read_text()
assert "NaN" not in raw_text, "NaN found in JSON output — replace before serializing"
print(f"Valid JSON — {len(data)} records, keys: {list(data[0].keys())}")
Round-trip back to a DataFrame to confirm no data was lost:
# pip install pandas openpyxl
import pandas as pd
df_check = pd.read_json("output.json", orient="records")
print(df_check.dtypes)
print(df_check.head())
If date columns come back as integers (millisecond epoch), pass convert_dates=["order_date"] to pd.read_json to restore them as datetime64.
FAQ
Why does df.to_json() produce epoch integers for dates instead of ISO strings?
The default date_format for to_json is "epoch" (milliseconds since 1970-01-01). Pass date_format="iso" to get "2026-01-15T00:00:00".
How do I keep integer columns as integers instead of floats in the JSON output?
pandas promotes integer columns that contain NaN to float64 (because integers cannot represent NaN). Replace NaN with a sentinel (e.g., 0 or -1) before serializing, or use pd.Int64Dtype (nullable integer dtype) so the column stays integer-typed even with missing values.
Can I convert a DataFrame to JSON without writing a file?
Yes. df.to_json(orient="records") returns a string when no path is passed. You can embed it directly in an API response or pass it to json.loads.
What is the fastest way to convert a large Excel file to JSON?
Read only the columns you need with usecols=, convert date columns up front with dt.strftime, replace NaN with None, then write JSON Lines with a streaming loop. Avoid loading the entire output string into memory before writing.
Common Mistakes
| Mistake | Result | Fix |
|---|---|---|
json.dumps(df.to_dict()) without where | ValueError: Out of range float values are not JSON compliant for NaN | Replace NaN with None before serialization |
df.to_json() without date_format="iso" | Dates serialize as millisecond epoch integers | Pass date_format="iso" |
orient="columns" for API output | Nested {col: {idx: val}} structure breaks most REST consumers | Use orient="records" |
Keeping index=True in to_json | Extra numeric index key in every record | Default is index=True; pass index=False with orient="records" |
Related
- Merging Multiple Spreadsheets — consolidate files before exporting to JSON
- Reading Excel Files with Python — engine selection and sheet targeting with
read_excel - Cleaning Messy CSV Data with pandas — fix encoding and type issues before serialization
- Fix pandas merge Overlapping Column Suffixes — clean up merged DataFrames before converting to JSON
Part of Merging Multiple Spreadsheets.