Converting Excel to JSON with Python

This guide resolves the TypeError: Object of type 'Timestamp' is not JSON serializable and ValueError: NaN/NaT failures that occur when converting Excel files to JSON. Analysts and developers frequently encounter these errors when preparing datasets for API ingestion or web integration. While this workflow focuses on single-file conversion, it integrates seamlessly into broader Python for Excel & CSV Data Processing pipelines. For multi-workbook consolidation prior to export, refer to Merging Multiple Spreadsheets before applying the serialization fix below.

Diagnosing the Serialization Error

The native Python json module strictly adheres to RFC 8259, which does not support datetime, numpy.nan, or pandas.NaT types. When pandas reads an Excel file, it auto-infers column types, often converting date columns to datetime64[ns] and empty cells to NaN (float). Passing these directly to json.dumps() triggers immediate crashes.

Root Cause Identification:

  1. Run print(df.dtypes) to locate datetime64[ns] or object columns containing mixed types.
  2. Run print(df.isna().sum()) to identify columns with NaN or NaT values.
  3. Map Excel formatting artifacts (e.g., trailing spaces, currency symbols) to Python strings that require sanitization.

Diagnostic Snippet:

import pandas as pd
df = pd.read_excel('input.xlsx')
print(df.dtypes)
print(df.isna().sum())
# Output will show datetime64[ns] and float64 (for NaN) columns

Implementing the Type-Safe Conversion Script

Execute the following reproducible workflow to bypass native serialization limits. The script forces string parsing on load, replaces missing values with JSON-compliant null, and deploys a fallback encoder for residual datetime objects.

Prerequisites:

pip install pandas openpyxl

Execution Script:

import pandas as pd
import json
from datetime import datetime

def excel_to_json_safe(filepath, output_path):
 # Read with explicit string parsing to preserve raw values
 df = pd.read_excel(filepath, dtype=str)
 
 # Replace NaN/None with JSON-compatible null
 df = df.where(pd.notnull(df), None)
 
 # Convert to list of dictionaries
 records = df.to_dict(orient='records')
 
 # Custom encoder for residual datetime/decimal objects
 def custom_serializer(obj):
 if isinstance(obj, (datetime, pd.Timestamp)):
 return obj.isoformat()
 raise TypeError(f'Object of type {type(obj).__name__} is not JSON serializable')
 
 with open(output_path, 'w') as f:
 json.dump(records, f, indent=2, default=custom_serializer)

# Execute conversion
excel_to_json_safe('input.xlsx', 'output.json')

How It Works:

  • dtype=str overrides pandas auto-inference, preventing premature datetime64 conversion.
  • df.where(pd.notnull(df), None) converts pandas NaN to Python None, which serializes to JSON null.
  • custom_serializer acts as a safety net for any datetime objects that bypass string parsing, converting them to ISO 8601 format.
  • orient='records' outputs a flat array of objects, matching standard REST API expectations.

Validating Output and Handling Edge Cases

After generation, verify the JSON structure to prevent downstream parsing failures.

  1. Syntax Validation: Parse the output file in Python to catch malformed syntax:
import json
with open('output.json', 'r') as f:
data = json.load(f)
print(f"Valid JSON. Records: {len(data)}")
  1. Whitespace Stripping: Excel cells often contain invisible trailing spaces. Apply .str.strip() before serialization if strict string matching is required.
  2. Key Casing Consistency: Downstream consumers often expect camelCase or snake_case. Standardize headers using df.columns.str.replace(' ', '_').str.lower() before calling to_dict().

Common Mistakes

  • Using df.to_json() without orient='records': The default column-oriented output creates deeply nested dictionaries that break most REST API parsers expecting a flat array of objects.
  • Ignoring NaN values before serialization: Python's json module cannot serialize numpy.nan or pandas.NaT, causing immediate ValueError crashes during json.dumps().
  • Relying on the xlrd engine for .xlsx files: Modern pandas versions default to openpyxl for .xlsx. Forcing xlrd fails on newer Excel formats and corrupts date formatting during the read phase.

Frequently Asked Questions

Why does pandas throw TypeError: Object of type Timestamp is not JSON serializable? JSON natively lacks a datetime type. Pandas preserves Excel dates as Timestamp objects, requiring explicit .isoformat() conversion or a custom encoder before json.dump().

How do I handle Excel cells with mixed data types? Force dtype=str in pd.read_excel(), then apply targeted regex or .astype() conversions post-load to standardize columns before serialization.

Can I convert multiple sheets to a single JSON file? Yes. Iterate through pd.ExcelFile().sheet_names, append each sheet's to_dict('records') to a master list, and serialize the combined array.