Remove a Password from a PDF with Python
When you open a password-protected PDF programmatically without calling reader.decrypt(), pypdf raises pypdf.errors.FileNotDecryptedError — or silently returns an empty page tree — before you can read, merge, or process any content. The fix is to decrypt the file in-place and write a clean copy.
Authorized use only. This technique is for removing passwords from documents you own or have explicit permission to modify — files you encrypted yourself, documents issued to you with a known password, or organizational PDFs where your role grants access. Bypassing password protection on documents you do not own is a legal and ethical violation.
Root Cause
pypdf does not automatically prompt for a password. Opening an encrypted file succeeds (the PdfReader object is created), but accessing .pages before calling .decrypt() raises:
pypdf.errors.FileNotDecryptedError: File has not been decrypted
If you call .decrypt() with the wrong password it returns 0 without raising an exception — subsequent .pages access then raises FileNotDecryptedError anyway. The silent return value is the common trap.
Minimal Diagnostic
# pip install pypdf
from pathlib import Path
from pypdf import PdfReader
ENCRYPTED = Path("protected.pdf")
try:
reader = PdfReader(ENCRYPTED)
print(f"Encrypted: {reader.is_encrypted}")
# Accessing .pages before decrypt() raises FileNotDecryptedError
# print(len(reader.pages)) # would raise if encrypted
except FileNotFoundError:
print(f"File not found: {ENCRYPTED}")
If is_encrypted is True the file needs decrypt() before any page-level operation. If it is False, no password removal is required — the file is already open.
Fix: Decrypt and Save a Clean Copy
# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader, PdfWriter
from pypdf.errors import FileNotDecryptedError
ENCRYPTED = Path("protected.pdf")
DECRYPTED = Path("decrypted.pdf")
PASSWORD = "your_password_here" # replace with the actual password
def remove_pdf_password(
source: Path,
output: Path,
password: str,
) -> None:
"""
Open an encrypted PDF, decrypt it, and write an unencrypted copy.
Only use this on documents you own or are authorized to access.
"""
if not source.exists():
raise FileNotFoundError(f"Source not found: {source}")
try:
reader = PdfReader(source)
if not reader.is_encrypted:
# Already unencrypted — copy as-is
print(f"{source.name} is not encrypted; copying without changes")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
else:
# Decrypt returns: 0 = wrong password, 1 = user pw, 2 = owner pw
result = reader.decrypt(password) # call decrypt() before .pages
if result == 0:
raise ValueError(f"Incorrect password for {source.name}")
# Copy pages into a fresh writer — new writer has no encryption dict
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page) # pages are now accessible
output.parent.mkdir(parents=True, exist_ok=True)
with open(output, "wb") as fh:
writer.write(fh) # output file has no encryption
print(f"Decrypted: {output}")
except FileNotDecryptedError:
# Raised if .pages is accessed before a successful decrypt()
print("FileNotDecryptedError: call reader.decrypt(password) before reading pages")
raise
except ValueError:
raise
except Exception as exc:
raise RuntimeError(f"Failed to remove password from {source.name}: {exc}") from exc
if __name__ == "__main__":
import os
remove_pdf_password(
ENCRYPTED,
DECRYPTED,
password=os.environ["PDF_USER_PW"], # pull from environment, never hardcode
)
The key line is reader.decrypt(password) called immediately after opening. The returned integer tells you which type of password matched — always check it before proceeding. Copying pages into a fresh PdfWriter (one that has never had .encrypt() called on it) produces a file with no encryption dictionary.
Variant: Check Encryption Algorithm Before Decrypting
pypdf exposes the encryption metadata before decryption, which lets you log the cipher in use or branch on AES vs RC4:
# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader
ENCRYPTED = Path("protected.pdf")
def inspect_encryption(source: Path) -> None:
reader = PdfReader(source)
if not reader.is_encrypted:
print("Not encrypted")
return
# ._encryption is an internal attribute; read-only inspection is fine
enc = reader._encryption
if enc:
print(f"Filter : {enc.entry.get('/Filter', 'unknown')}")
print(f"V (version): {enc.entry.get('/V', '?')}")
print(f"Length : {enc.entry.get('/Length', '?')} bits")
# V=4 or V=5 → AES; V=1 or V=2 → RC4
v = enc.entry.get("/V", 0)
cipher = "AES" if v >= 4 else "RC4"
print(f"Cipher : {cipher} (V={v})")
else:
print("Encryption metadata unavailable without decryption")
RC4-encrypted files (V=1 or V=2) decrypt with the same reader.decrypt() call — pypdf handles both cipher types transparently. The distinction matters if you are auditing legacy documents for compliance.
Variant: Batch Decrypt a Directory
# pip install "pypdf>=3.17"
import os
from pathlib import Path
from pypdf import PdfReader, PdfWriter
from pypdf.errors import FileNotDecryptedError
INPUT_DIR = Path("./locked")
OUTPUT_DIR = Path("./unlocked")
def batch_decrypt(source_dir: Path, output_dir: Path, password: str) -> None:
output_dir.mkdir(parents=True, exist_ok=True)
pdfs = sorted(source_dir.glob("*.pdf"))
if not pdfs:
print(f"No PDFs in {source_dir}")
return
ok, skipped, failed = 0, 0, 0
for pdf in pdfs:
out = output_dir / pdf.name
try:
reader = PdfReader(pdf)
if not reader.is_encrypted:
skipped += 1
print(f" SKIP (not encrypted): {pdf.name}")
continue
result = reader.decrypt(password)
if result == 0:
failed += 1
print(f" FAIL (wrong password): {pdf.name}")
continue
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
with open(out, "wb") as fh:
writer.write(fh)
ok += 1
print(f" OK: {pdf.name}")
except FileNotDecryptedError:
failed += 1
print(f" FAIL (FileNotDecryptedError): {pdf.name}")
except Exception as exc:
failed += 1
print(f" ERR {pdf.name}: {exc}")
print(f"\nDone: {ok} decrypted, {skipped} skipped, {failed} failed")
if __name__ == "__main__":
batch_decrypt(INPUT_DIR, OUTPUT_DIR, password=os.environ["PDF_USER_PW"])
Log failures to a CSV for manual review rather than halting mid-batch — one corrupt or wrongly-passworded file should not block the rest.
Verification
After writing the decrypted file, assert it is no longer encrypted and page count matches:
# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader
def verify_decrypted(original: Path, decrypted: Path) -> bool:
"""Confirm the output is not encrypted and has the same page count."""
try:
orig_reader = PdfReader(original)
# Provide the password only to count pages from the encrypted source
import os
orig_reader.decrypt(os.environ.get("PDF_USER_PW", ""))
original_pages = len(orig_reader.pages)
dec_reader = PdfReader(decrypted)
if dec_reader.is_encrypted:
print(f"FAIL: {decrypted.name} is still encrypted")
return False
if len(dec_reader.pages) != original_pages:
print(f"FAIL: page count mismatch ({len(dec_reader.pages)} vs {original_pages})")
return False
print(f"PASS: {decrypted.name} — not encrypted, {len(dec_reader.pages)} pages")
return True
except Exception as exc:
print(f"ERROR: {exc}")
return False
For pipelines that feed decrypted output into Extracting Tables from PDFs or text parsers, run this check before handing off — it catches the silent-wrong-password case that can produce an apparently valid but empty file.
Preserving Metadata When Decrypting
Copying pages with add_page() transfers visual content but not the /Info metadata dictionary (author, title, subject, keywords) or the document outline (bookmarks). Preserve them explicitly when they matter:
# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader, PdfWriter
import os
def decrypt_preserve_metadata(source: Path, output: Path, password: str) -> None:
"""Decrypt and write a clean copy, preserving metadata and bookmarks."""
reader = PdfReader(source)
if not reader.is_encrypted:
raise ValueError(f"{source.name} is not encrypted")
result = reader.decrypt(password)
if result == 0:
raise ValueError(f"Wrong password for {source.name}")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
# Copy /Info metadata if present
if reader.metadata:
writer.add_metadata(dict(reader.metadata))
output.parent.mkdir(parents=True, exist_ok=True)
with open(output, "wb") as fh:
writer.write(fh)
print(f"Decrypted (metadata preserved): {output}")
if __name__ == "__main__":
decrypt_preserve_metadata(
Path("protected.pdf"),
Path("decrypted.pdf"),
password=os.environ["PDF_USER_PW"],
)
If the encrypted source has a complex outline (nested bookmarks, named destinations), use writer.clone_document_from_reader(reader) instead of iterating add_page() — it copies the full document tree including the outline.
Downstream Use: Feeding Decrypted PDFs into Parsers
The most common reason to remove a password is to make the file readable by extraction tools. Extracting Tables from PDFs libraries such as pdfplumber and camelot require an unencrypted byte stream — they do not accept password arguments. After decrypting to a clean file (or io.BytesIO buffer), pass the output path normally:
# pip install "pypdf>=3.17" pdfplumber
import io, os
from pathlib import Path
from pypdf import PdfReader, PdfWriter
import pdfplumber
def decrypt_to_buffer(source: Path, password: str) -> io.BytesIO:
"""Decrypt a PDF into a BytesIO buffer — no intermediate file on disk."""
reader = PdfReader(source)
if reader.is_encrypted:
if reader.decrypt(password) == 0:
raise ValueError("Wrong password")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
buf = io.BytesIO()
writer.write(buf)
buf.seek(0)
return buf
if __name__ == "__main__":
buf = decrypt_to_buffer(Path("protected.pdf"), os.environ["PDF_USER_PW"])
with pdfplumber.open(buf) as pdf:
for i, page in enumerate(pdf.pages):
tables = page.extract_tables()
print(f"Page {i+1}: {len(tables)} table(s)")
Using io.BytesIO avoids writing a temporary unencrypted file to disk, which matters in environments where the working directory is logged or audited.
Common Mistakes
| Symptom | Root cause | Fix |
|---|---|---|
FileNotDecryptedError when accessing .pages | .decrypt() not called, or called and returned 0 | Check reader.is_encrypted; call reader.decrypt(pw) and verify return value ≠ 0 |
decrypt() returns 0 silently | Wrong password supplied | Double-check password; catch the 0 return explicitly and raise ValueError |
| Output file is still encrypted | PdfWriter was created but .encrypt() was called on it inadvertently, or you re-opened the input by mistake | Ensure the writer is freshly instantiated with no .encrypt() call |
| Metadata stripped from output | Copying pages with add_page() does not carry /Info dict | Use writer.add_metadata(reader.metadata) before writing if you need to preserve author, title, etc. |
| Page count changes after decrypt | Some encrypted PDFs embed additional pages as annotations | Compare using len(reader.pages) on original (after decrypt) vs output |
Frequently Asked Questions
Does removing a password change the visual content of the PDF? No. Decryption is a pure cryptographic operation on the stream encoding. Text, images, fonts, and layout are unchanged.
What if I only have the owner password, not the user password?
Pass the owner password to reader.decrypt(). It succeeds with return value 2 (owner match), which grants full access. The same function accepts either password type.
Can I re-encrypt with a new password immediately after decrypting?
Yes — after copying pages to a fresh PdfWriter, call .encrypt() on that writer before saving. See Add Password Protection to PDF Files for the full re-encryption pattern.
What about PDF files encrypted with certificate-based (public-key) security rather than a password?
pypdf does not support certificate-based decryption. Use pikepdf with the appropriate private key PEM for those cases.
Related
- Watermarking and Securing PDFs — full overview of overlays, AES-256 encryption, and permission flags
- Add Password Protection to PDF Files — encrypt a PDF with user and owner passwords
- Merging and Splitting PDF Documents — decrypt before merging encrypted source files
Part of Watermarking and Securing PDFs.