Converting Excel to JSON with Python
This guide resolves the TypeError: Object of type 'Timestamp' is not JSON serializable and ValueError: NaN/NaT failures that occur when converting Excel files to JSON. Analysts and developers frequently encounter these errors when preparing datasets for API ingestion or web integration. While this workflow focuses on single-file conversion, it integrates seamlessly into broader Python for Excel & CSV Data Processing pipelines. For multi-workbook consolidation prior to export, refer to Merging Multiple Spreadsheets before applying the serialization fix below.
Diagnosing the Serialization Error
The native Python json module strictly adheres to RFC 8259, which does not support datetime, numpy.nan, or pandas.NaT types. When pandas reads an Excel file, it auto-infers column types, often converting date columns to datetime64[ns] and empty cells to NaN (float). Passing these directly to json.dumps() triggers immediate crashes.
Root Cause Identification:
- Run
print(df.dtypes)to locatedatetime64[ns]orobjectcolumns containing mixed types. - Run
print(df.isna().sum())to identify columns withNaNorNaTvalues. - Map Excel formatting artifacts (e.g., trailing spaces, currency symbols) to Python strings that require sanitization.
Diagnostic Snippet:
import pandas as pd
df = pd.read_excel('input.xlsx')
print(df.dtypes)
print(df.isna().sum())
# Output will show datetime64[ns] and float64 (for NaN) columns
Implementing the Type-Safe Conversion Script
Execute the following reproducible workflow to bypass native serialization limits. The script forces string parsing on load, replaces missing values with JSON-compliant null, and deploys a fallback encoder for residual datetime objects.
Prerequisites:
pip install pandas openpyxl
Execution Script:
import pandas as pd
import json
from datetime import datetime
def excel_to_json_safe(filepath, output_path):
# Read with explicit string parsing to preserve raw values
df = pd.read_excel(filepath, dtype=str)
# Replace NaN/None with JSON-compatible null
df = df.where(pd.notnull(df), None)
# Convert to list of dictionaries
records = df.to_dict(orient='records')
# Custom encoder for residual datetime/decimal objects
def custom_serializer(obj):
if isinstance(obj, (datetime, pd.Timestamp)):
return obj.isoformat()
raise TypeError(f'Object of type {type(obj).__name__} is not JSON serializable')
with open(output_path, 'w') as f:
json.dump(records, f, indent=2, default=custom_serializer)
# Execute conversion
excel_to_json_safe('input.xlsx', 'output.json')
How It Works:
dtype=stroverrides pandas auto-inference, preventing prematuredatetime64conversion.df.where(pd.notnull(df), None)converts pandasNaNto PythonNone, which serializes to JSONnull.custom_serializeracts as a safety net for any datetime objects that bypass string parsing, converting them to ISO 8601 format.orient='records'outputs a flat array of objects, matching standard REST API expectations.
Validating Output and Handling Edge Cases
After generation, verify the JSON structure to prevent downstream parsing failures.
- Syntax Validation: Parse the output file in Python to catch malformed syntax:
import json
with open('output.json', 'r') as f:
data = json.load(f)
print(f"Valid JSON. Records: {len(data)}")
- Whitespace Stripping: Excel cells often contain invisible trailing spaces. Apply
.str.strip()before serialization if strict string matching is required. - Key Casing Consistency: Downstream consumers often expect camelCase or snake_case. Standardize headers using
df.columns.str.replace(' ', '_').str.lower()before callingto_dict().
Common Mistakes
- Using
df.to_json()withoutorient='records': The default column-oriented output creates deeply nested dictionaries that break most REST API parsers expecting a flat array of objects. - Ignoring
NaNvalues before serialization: Python'sjsonmodule cannot serializenumpy.nanorpandas.NaT, causing immediateValueErrorcrashes duringjson.dumps(). - Relying on the
xlrdengine for.xlsxfiles: Modern pandas versions default toopenpyxlfor.xlsx. Forcingxlrdfails on newer Excel formats and corrupts date formatting during the read phase.
Frequently Asked Questions
Why does pandas throw TypeError: Object of type Timestamp is not JSON serializable?
JSON natively lacks a datetime type. Pandas preserves Excel dates as Timestamp objects, requiring explicit .isoformat() conversion or a custom encoder before json.dump().
How do I handle Excel cells with mixed data types?
Force dtype=str in pd.read_excel(), then apply targeted regex or .astype() conversions post-load to standardize columns before serialization.
Can I convert multiple sheets to a single JSON file?
Yes. Iterate through pd.ExcelFile().sheet_names, append each sheet's to_dict('records') to a master list, and serialize the combined array.