Remove a Password from a PDF with Python

When you open a password-protected PDF programmatically without calling reader.decrypt(), pypdf raises pypdf.errors.FileNotDecryptedError — or silently returns an empty page tree — before you can read, merge, or process any content. The fix is to decrypt the file in-place and write a clean copy.

Authorized use only. This technique is for removing passwords from documents you own or have explicit permission to modify — files you encrypted yourself, documents issued to you with a known password, or organizational PDFs where your role grants access. Bypassing password protection on documents you do not own is a legal and ethical violation.

Root Cause

pypdf does not automatically prompt for a password. Opening an encrypted file succeeds (the PdfReader object is created), but accessing .pages before calling .decrypt() raises:

pypdf.errors.FileNotDecryptedError: File has not been decrypted

If you call .decrypt() with the wrong password it returns 0 without raising an exception — subsequent .pages access then raises FileNotDecryptedError anyway. The silent return value is the common trap.

Minimal Diagnostic

# pip install pypdf
from pathlib import Path
from pypdf import PdfReader

ENCRYPTED = Path("protected.pdf")

try:
    reader = PdfReader(ENCRYPTED)
    print(f"Encrypted: {reader.is_encrypted}")
    # Accessing .pages before decrypt() raises FileNotDecryptedError
    # print(len(reader.pages))  # would raise if encrypted
except FileNotFoundError:
    print(f"File not found: {ENCRYPTED}")

If is_encrypted is True the file needs decrypt() before any page-level operation. If it is False, no password removal is required — the file is already open.

Fix: Decrypt and Save a Clean Copy

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader, PdfWriter
from pypdf.errors import FileNotDecryptedError

ENCRYPTED   = Path("protected.pdf")
DECRYPTED   = Path("decrypted.pdf")
PASSWORD    = "your_password_here"   # replace with the actual password


def remove_pdf_password(
    source: Path,
    output: Path,
    password: str,
) -> None:
    """
    Open an encrypted PDF, decrypt it, and write an unencrypted copy.
    Only use this on documents you own or are authorized to access.
    """
    if not source.exists():
        raise FileNotFoundError(f"Source not found: {source}")

    try:
        reader = PdfReader(source)

        if not reader.is_encrypted:
            # Already unencrypted — copy as-is
            print(f"{source.name} is not encrypted; copying without changes")
            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)
        else:
            # Decrypt returns: 0 = wrong password, 1 = user pw, 2 = owner pw
            result = reader.decrypt(password)   # call decrypt() before .pages
            if result == 0:
                raise ValueError(f"Incorrect password for {source.name}")

            # Copy pages into a fresh writer — new writer has no encryption dict
            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)           # pages are now accessible

        output.parent.mkdir(parents=True, exist_ok=True)
        with open(output, "wb") as fh:
            writer.write(fh)                    # output file has no encryption
        print(f"Decrypted: {output}")

    except FileNotDecryptedError:
        # Raised if .pages is accessed before a successful decrypt()
        print("FileNotDecryptedError: call reader.decrypt(password) before reading pages")
        raise
    except ValueError:
        raise
    except Exception as exc:
        raise RuntimeError(f"Failed to remove password from {source.name}: {exc}") from exc


if __name__ == "__main__":
    import os
    remove_pdf_password(
        ENCRYPTED,
        DECRYPTED,
        password=os.environ["PDF_USER_PW"],   # pull from environment, never hardcode
    )

The key line is reader.decrypt(password) called immediately after opening. The returned integer tells you which type of password matched — always check it before proceeding. Copying pages into a fresh PdfWriter (one that has never had .encrypt() called on it) produces a file with no encryption dictionary.

Variant: Check Encryption Algorithm Before Decrypting

pypdf exposes the encryption metadata before decryption, which lets you log the cipher in use or branch on AES vs RC4:

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader

ENCRYPTED = Path("protected.pdf")


def inspect_encryption(source: Path) -> None:
    reader = PdfReader(source)
    if not reader.is_encrypted:
        print("Not encrypted")
        return

    # ._encryption is an internal attribute; read-only inspection is fine
    enc = reader._encryption
    if enc:
        print(f"Filter     : {enc.entry.get('/Filter', 'unknown')}")
        print(f"V (version): {enc.entry.get('/V', '?')}")
        print(f"Length     : {enc.entry.get('/Length', '?')} bits")
        # V=4 or V=5 → AES; V=1 or V=2 → RC4
        v = enc.entry.get("/V", 0)
        cipher = "AES" if v >= 4 else "RC4"
        print(f"Cipher     : {cipher} (V={v})")
    else:
        print("Encryption metadata unavailable without decryption")

RC4-encrypted files (V=1 or V=2) decrypt with the same reader.decrypt() call — pypdf handles both cipher types transparently. The distinction matters if you are auditing legacy documents for compliance.

Variant: Batch Decrypt a Directory

# pip install "pypdf>=3.17"
import os
from pathlib import Path
from pypdf import PdfReader, PdfWriter
from pypdf.errors import FileNotDecryptedError

INPUT_DIR  = Path("./locked")
OUTPUT_DIR = Path("./unlocked")


def batch_decrypt(source_dir: Path, output_dir: Path, password: str) -> None:
    output_dir.mkdir(parents=True, exist_ok=True)
    pdfs = sorted(source_dir.glob("*.pdf"))
    if not pdfs:
        print(f"No PDFs in {source_dir}")
        return

    ok, skipped, failed = 0, 0, 0
    for pdf in pdfs:
        out = output_dir / pdf.name
        try:
            reader = PdfReader(pdf)
            if not reader.is_encrypted:
                skipped += 1
                print(f"  SKIP (not encrypted): {pdf.name}")
                continue

            result = reader.decrypt(password)
            if result == 0:
                failed += 1
                print(f"  FAIL (wrong password): {pdf.name}")
                continue

            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)
            with open(out, "wb") as fh:
                writer.write(fh)
            ok += 1
            print(f"  OK: {pdf.name}")

        except FileNotDecryptedError:
            failed += 1
            print(f"  FAIL (FileNotDecryptedError): {pdf.name}")
        except Exception as exc:
            failed += 1
            print(f"  ERR {pdf.name}: {exc}")

    print(f"\nDone: {ok} decrypted, {skipped} skipped, {failed} failed")


if __name__ == "__main__":
    batch_decrypt(INPUT_DIR, OUTPUT_DIR, password=os.environ["PDF_USER_PW"])

Log failures to a CSV for manual review rather than halting mid-batch — one corrupt or wrongly-passworded file should not block the rest.

Verification

After writing the decrypted file, assert it is no longer encrypted and page count matches:

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader


def verify_decrypted(original: Path, decrypted: Path) -> bool:
    """Confirm the output is not encrypted and has the same page count."""
    try:
        orig_reader = PdfReader(original)
        # Provide the password only to count pages from the encrypted source
        import os
        orig_reader.decrypt(os.environ.get("PDF_USER_PW", ""))
        original_pages = len(orig_reader.pages)

        dec_reader = PdfReader(decrypted)
        if dec_reader.is_encrypted:
            print(f"FAIL: {decrypted.name} is still encrypted")
            return False

        if len(dec_reader.pages) != original_pages:
            print(f"FAIL: page count mismatch ({len(dec_reader.pages)} vs {original_pages})")
            return False

        print(f"PASS: {decrypted.name} — not encrypted, {len(dec_reader.pages)} pages")
        return True
    except Exception as exc:
        print(f"ERROR: {exc}")
        return False

For pipelines that feed decrypted output into Extracting Tables from PDFs or text parsers, run this check before handing off — it catches the silent-wrong-password case that can produce an apparently valid but empty file.

Preserving Metadata When Decrypting

Copying pages with add_page() transfers visual content but not the /Info metadata dictionary (author, title, subject, keywords) or the document outline (bookmarks). Preserve them explicitly when they matter:

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader, PdfWriter
import os


def decrypt_preserve_metadata(source: Path, output: Path, password: str) -> None:
    """Decrypt and write a clean copy, preserving metadata and bookmarks."""
    reader = PdfReader(source)
    if not reader.is_encrypted:
        raise ValueError(f"{source.name} is not encrypted")

    result = reader.decrypt(password)
    if result == 0:
        raise ValueError(f"Wrong password for {source.name}")

    writer = PdfWriter()
    for page in reader.pages:
        writer.add_page(page)

    # Copy /Info metadata if present
    if reader.metadata:
        writer.add_metadata(dict(reader.metadata))

    output.parent.mkdir(parents=True, exist_ok=True)
    with open(output, "wb") as fh:
        writer.write(fh)
    print(f"Decrypted (metadata preserved): {output}")


if __name__ == "__main__":
    decrypt_preserve_metadata(
        Path("protected.pdf"),
        Path("decrypted.pdf"),
        password=os.environ["PDF_USER_PW"],
    )

If the encrypted source has a complex outline (nested bookmarks, named destinations), use writer.clone_document_from_reader(reader) instead of iterating add_page() — it copies the full document tree including the outline.

Downstream Use: Feeding Decrypted PDFs into Parsers

The most common reason to remove a password is to make the file readable by extraction tools. Extracting Tables from PDFs libraries such as pdfplumber and camelot require an unencrypted byte stream — they do not accept password arguments. After decrypting to a clean file (or io.BytesIO buffer), pass the output path normally:

# pip install "pypdf>=3.17" pdfplumber
import io, os
from pathlib import Path
from pypdf import PdfReader, PdfWriter
import pdfplumber


def decrypt_to_buffer(source: Path, password: str) -> io.BytesIO:
    """Decrypt a PDF into a BytesIO buffer — no intermediate file on disk."""
    reader = PdfReader(source)
    if reader.is_encrypted:
        if reader.decrypt(password) == 0:
            raise ValueError("Wrong password")
    writer = PdfWriter()
    for page in reader.pages:
        writer.add_page(page)
    buf = io.BytesIO()
    writer.write(buf)
    buf.seek(0)
    return buf


if __name__ == "__main__":
    buf = decrypt_to_buffer(Path("protected.pdf"), os.environ["PDF_USER_PW"])
    with pdfplumber.open(buf) as pdf:
        for i, page in enumerate(pdf.pages):
            tables = page.extract_tables()
            print(f"Page {i+1}: {len(tables)} table(s)")

Using io.BytesIO avoids writing a temporary unencrypted file to disk, which matters in environments where the working directory is logged or audited.

Common Mistakes

Symptom	Root cause	Fix
`FileNotDecryptedError` when accessing `.pages`	`.decrypt()` not called, or called and returned `0`	Check `reader.is_encrypted`; call `reader.decrypt(pw)` and verify return value ≠ 0
`decrypt()` returns `0` silently	Wrong password supplied	Double-check password; catch the `0` return explicitly and raise `ValueError`
Output file is still encrypted	`PdfWriter` was created but `.encrypt()` was called on it inadvertently, or you re-opened the input by mistake	Ensure the writer is freshly instantiated with no `.encrypt()` call
Metadata stripped from output	Copying pages with `add_page()` does not carry `/Info` dict	Use `writer.add_metadata(reader.metadata)` before writing if you need to preserve author, title, etc.
Page count changes after decrypt	Some encrypted PDFs embed additional pages as annotations	Compare using `len(reader.pages)` on original (after decrypt) vs output

Frequently Asked Questions

Does removing a password change the visual content of the PDF? No. Decryption is a pure cryptographic operation on the stream encoding. Text, images, fonts, and layout are unchanged.

What if I only have the owner password, not the user password? Pass the owner password to reader.decrypt(). It succeeds with return value 2 (owner match), which grants full access. The same function accepts either password type.

Can I re-encrypt with a new password immediately after decrypting? Yes — after copying pages to a fresh PdfWriter, call .encrypt() on that writer before saving. See Add Password Protection to PDF Files for the full re-encryption pattern.

What about PDF files encrypted with certificate-based (public-key) security rather than a password? pypdf does not support certificate-based decryption. Use pikepdf with the appropriate private key PEM for those cases.

Watermarking and Securing PDFs — full overview of overlays, AES-256 encryption, and permission flags
Add Password Protection to PDF Files — encrypt a PDF with user and owner passwords
Merging and Splitting PDF Documents — decrypt before merging encrypted source files

Part of Watermarking and Securing PDFs.