Remove a Password from a PDF with Python

When you open a password-protected PDF programmatically without calling reader.decrypt(), pypdf raises pypdf.errors.FileNotDecryptedError — or silently returns an empty page tree — before you can read, merge, or process any content. The fix is to decrypt the file in-place and write a clean copy.

Authorized use only. This technique is for removing passwords from documents you own or have explicit permission to modify — files you encrypted yourself, documents issued to you with a known password, or organizational PDFs where your role grants access. Bypassing password protection on documents you do not own is a legal and ethical violation.

Root Cause

pypdf does not automatically prompt for a password. Opening an encrypted file succeeds (the PdfReader object is created), but accessing .pages before calling .decrypt() raises:

pypdf.errors.FileNotDecryptedError: File has not been decrypted

If you call .decrypt() with the wrong password it returns 0 without raising an exception — subsequent .pages access then raises FileNotDecryptedError anyway. The silent return value is the common trap.

Minimal Diagnostic

# pip install pypdf
from pathlib import Path
from pypdf import PdfReader

ENCRYPTED = Path("protected.pdf")

try:
    reader = PdfReader(ENCRYPTED)
    print(f"Encrypted: {reader.is_encrypted}")
    # Accessing .pages before decrypt() raises FileNotDecryptedError
    # print(len(reader.pages))  # would raise if encrypted
except FileNotFoundError:
    print(f"File not found: {ENCRYPTED}")

If is_encrypted is True the file needs decrypt() before any page-level operation. If it is False, no password removal is required — the file is already open.

Fix: Decrypt and Save a Clean Copy

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader, PdfWriter
from pypdf.errors import FileNotDecryptedError

ENCRYPTED   = Path("protected.pdf")
DECRYPTED   = Path("decrypted.pdf")
PASSWORD    = "your_password_here"   # replace with the actual password


def remove_pdf_password(
    source: Path,
    output: Path,
    password: str,
) -> None:
    """
    Open an encrypted PDF, decrypt it, and write an unencrypted copy.
    Only use this on documents you own or are authorized to access.
    """
    if not source.exists():
        raise FileNotFoundError(f"Source not found: {source}")

    try:
        reader = PdfReader(source)

        if not reader.is_encrypted:
            # Already unencrypted — copy as-is
            print(f"{source.name} is not encrypted; copying without changes")
            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)
        else:
            # Decrypt returns: 0 = wrong password, 1 = user pw, 2 = owner pw
            result = reader.decrypt(password)   # call decrypt() before .pages
            if result == 0:
                raise ValueError(f"Incorrect password for {source.name}")

            # Copy pages into a fresh writer — new writer has no encryption dict
            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)           # pages are now accessible

        output.parent.mkdir(parents=True, exist_ok=True)
        with open(output, "wb") as fh:
            writer.write(fh)                    # output file has no encryption
        print(f"Decrypted: {output}")

    except FileNotDecryptedError:
        # Raised if .pages is accessed before a successful decrypt()
        print("FileNotDecryptedError: call reader.decrypt(password) before reading pages")
        raise
    except ValueError:
        raise
    except Exception as exc:
        raise RuntimeError(f"Failed to remove password from {source.name}: {exc}") from exc


if __name__ == "__main__":
    import os
    remove_pdf_password(
        ENCRYPTED,
        DECRYPTED,
        password=os.environ["PDF_USER_PW"],   # pull from environment, never hardcode
    )

The key line is reader.decrypt(password) called immediately after opening. The returned integer tells you which type of password matched — always check it before proceeding. Copying pages into a fresh PdfWriter (one that has never had .encrypt() called on it) produces a file with no encryption dictionary.

Variant: Check Encryption Algorithm Before Decrypting

pypdf exposes the encryption metadata before decryption, which lets you log the cipher in use or branch on AES vs RC4:

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader

ENCRYPTED = Path("protected.pdf")


def inspect_encryption(source: Path) -> None:
    reader = PdfReader(source)
    if not reader.is_encrypted:
        print("Not encrypted")
        return

    # ._encryption is an internal attribute; read-only inspection is fine
    enc = reader._encryption
    if enc:
        print(f"Filter     : {enc.entry.get('/Filter', 'unknown')}")
        print(f"V (version): {enc.entry.get('/V', '?')}")
        print(f"Length     : {enc.entry.get('/Length', '?')} bits")
        # V=4 or V=5 → AES; V=1 or V=2 → RC4
        v = enc.entry.get("/V", 0)
        cipher = "AES" if v >= 4 else "RC4"
        print(f"Cipher     : {cipher} (V={v})")
    else:
        print("Encryption metadata unavailable without decryption")

RC4-encrypted files (V=1 or V=2) decrypt with the same reader.decrypt() call — pypdf handles both cipher types transparently. The distinction matters if you are auditing legacy documents for compliance.

Variant: Batch Decrypt a Directory

# pip install "pypdf>=3.17"
import os
from pathlib import Path
from pypdf import PdfReader, PdfWriter
from pypdf.errors import FileNotDecryptedError

INPUT_DIR  = Path("./locked")
OUTPUT_DIR = Path("./unlocked")


def batch_decrypt(source_dir: Path, output_dir: Path, password: str) -> None:
    output_dir.mkdir(parents=True, exist_ok=True)
    pdfs = sorted(source_dir.glob("*.pdf"))
    if not pdfs:
        print(f"No PDFs in {source_dir}")
        return

    ok, skipped, failed = 0, 0, 0
    for pdf in pdfs:
        out = output_dir / pdf.name
        try:
            reader = PdfReader(pdf)
            if not reader.is_encrypted:
                skipped += 1
                print(f"  SKIP (not encrypted): {pdf.name}")
                continue

            result = reader.decrypt(password)
            if result == 0:
                failed += 1
                print(f"  FAIL (wrong password): {pdf.name}")
                continue

            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)
            with open(out, "wb") as fh:
                writer.write(fh)
            ok += 1
            print(f"  OK: {pdf.name}")

        except FileNotDecryptedError:
            failed += 1
            print(f"  FAIL (FileNotDecryptedError): {pdf.name}")
        except Exception as exc:
            failed += 1
            print(f"  ERR {pdf.name}: {exc}")

    print(f"\nDone: {ok} decrypted, {skipped} skipped, {failed} failed")


if __name__ == "__main__":
    batch_decrypt(INPUT_DIR, OUTPUT_DIR, password=os.environ["PDF_USER_PW"])

Log failures to a CSV for manual review rather than halting mid-batch — one corrupt or wrongly-passworded file should not block the rest.

Verification

After writing the decrypted file, assert it is no longer encrypted and page count matches:

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader


def verify_decrypted(original: Path, decrypted: Path) -> bool:
    """Confirm the output is not encrypted and has the same page count."""
    try:
        orig_reader = PdfReader(original)
        # Provide the password only to count pages from the encrypted source
        import os
        orig_reader.decrypt(os.environ.get("PDF_USER_PW", ""))
        original_pages = len(orig_reader.pages)

        dec_reader = PdfReader(decrypted)
        if dec_reader.is_encrypted:
            print(f"FAIL: {decrypted.name} is still encrypted")
            return False

        if len(dec_reader.pages) != original_pages:
            print(f"FAIL: page count mismatch ({len(dec_reader.pages)} vs {original_pages})")
            return False

        print(f"PASS: {decrypted.name} — not encrypted, {len(dec_reader.pages)} pages")
        return True
    except Exception as exc:
        print(f"ERROR: {exc}")
        return False

For pipelines that feed decrypted output into Extracting Tables from PDFs or text parsers, run this check before handing off — it catches the silent-wrong-password case that can produce an apparently valid but empty file.

Preserving Metadata When Decrypting

Copying pages with add_page() transfers visual content but not the /Info metadata dictionary (author, title, subject, keywords) or the document outline (bookmarks). Preserve them explicitly when they matter:

# pip install "pypdf>=3.17"
from pathlib import Path
from pypdf import PdfReader, PdfWriter
import os


def decrypt_preserve_metadata(source: Path, output: Path, password: str) -> None:
    """Decrypt and write a clean copy, preserving metadata and bookmarks."""
    reader = PdfReader(source)
    if not reader.is_encrypted:
        raise ValueError(f"{source.name} is not encrypted")

    result = reader.decrypt(password)
    if result == 0:
        raise ValueError(f"Wrong password for {source.name}")

    writer = PdfWriter()
    for page in reader.pages:
        writer.add_page(page)

    # Copy /Info metadata if present
    if reader.metadata:
        writer.add_metadata(dict(reader.metadata))

    output.parent.mkdir(parents=True, exist_ok=True)
    with open(output, "wb") as fh:
        writer.write(fh)
    print(f"Decrypted (metadata preserved): {output}")


if __name__ == "__main__":
    decrypt_preserve_metadata(
        Path("protected.pdf"),
        Path("decrypted.pdf"),
        password=os.environ["PDF_USER_PW"],
    )

If the encrypted source has a complex outline (nested bookmarks, named destinations), use writer.clone_document_from_reader(reader) instead of iterating add_page() — it copies the full document tree including the outline.

Downstream Use: Feeding Decrypted PDFs into Parsers

The most common reason to remove a password is to make the file readable by extraction tools. Extracting Tables from PDFs libraries such as pdfplumber and camelot require an unencrypted byte stream — they do not accept password arguments. After decrypting to a clean file (or io.BytesIO buffer), pass the output path normally:

# pip install "pypdf>=3.17" pdfplumber
import io, os
from pathlib import Path
from pypdf import PdfReader, PdfWriter
import pdfplumber


def decrypt_to_buffer(source: Path, password: str) -> io.BytesIO:
    """Decrypt a PDF into a BytesIO buffer — no intermediate file on disk."""
    reader = PdfReader(source)
    if reader.is_encrypted:
        if reader.decrypt(password) == 0:
            raise ValueError("Wrong password")
    writer = PdfWriter()
    for page in reader.pages:
        writer.add_page(page)
    buf = io.BytesIO()
    writer.write(buf)
    buf.seek(0)
    return buf


if __name__ == "__main__":
    buf = decrypt_to_buffer(Path("protected.pdf"), os.environ["PDF_USER_PW"])
    with pdfplumber.open(buf) as pdf:
        for i, page in enumerate(pdf.pages):
            tables = page.extract_tables()
            print(f"Page {i+1}: {len(tables)} table(s)")

Using io.BytesIO avoids writing a temporary unencrypted file to disk, which matters in environments where the working directory is logged or audited.

Common Mistakes

SymptomRoot causeFix
FileNotDecryptedError when accessing .pages.decrypt() not called, or called and returned 0Check reader.is_encrypted; call reader.decrypt(pw) and verify return value ≠ 0
decrypt() returns 0 silentlyWrong password suppliedDouble-check password; catch the 0 return explicitly and raise ValueError
Output file is still encryptedPdfWriter was created but .encrypt() was called on it inadvertently, or you re-opened the input by mistakeEnsure the writer is freshly instantiated with no .encrypt() call
Metadata stripped from outputCopying pages with add_page() does not carry /Info dictUse writer.add_metadata(reader.metadata) before writing if you need to preserve author, title, etc.
Page count changes after decryptSome encrypted PDFs embed additional pages as annotationsCompare using len(reader.pages) on original (after decrypt) vs output

Frequently Asked Questions

Does removing a password change the visual content of the PDF? No. Decryption is a pure cryptographic operation on the stream encoding. Text, images, fonts, and layout are unchanged.

What if I only have the owner password, not the user password? Pass the owner password to reader.decrypt(). It succeeds with return value 2 (owner match), which grants full access. The same function accepts either password type.

Can I re-encrypt with a new password immediately after decrypting? Yes — after copying pages to a fresh PdfWriter, call .encrypt() on that writer before saving. See Add Password Protection to PDF Files for the full re-encryption pattern.

What about PDF files encrypted with certificate-based (public-key) security rather than a password? pypdf does not support certificate-based decryption. Use pikepdf with the appropriate private key PEM for those cases.

Part of Watermarking and Securing PDFs.