Extracting PDF pages with Python

I wanted to have a quick solution to provide sample PDF pages from my Command Line Handbook to potential customers. It turns out that with Python this is very easy to do in an automated and repeatable way.

The task at hand is then:

Given a PDF with multiple pages
Extract a given page or a collection of pages as a separate PDF

PyPDF2

Personally, I have used a Python PDF library called PyPDF2. With PyPDF2, we just need to:

Install PyPDF2 via pip install pypdf2 or use a dependency manager of our choice
Read the original PDF file with open() Python function
Use PdfFileReader object to read a page or multiple pages to extract
Use PdfFileWriter object to add those pages to a new virtual PDF file
Save the new pages as a new file

Example

To see this in action, look at my own example of extracting multiple pages using a range to produce a new PDF file:

from PyPDF2 import PdfFileReader, PdfFileWriter

writer = PdfFileWriter()
with open("full.pdf", 'rb') as infile:
    for page in range(11,17):
        reader = PdfFileReader(infile)
        writer.addPage(reader.getPage(page))
        with open(f'example.pdf', 'wb') as outfile:
            writer.write(outfile)

It shouldn't be difficult to alter the example to your own needs as necessary.

Last updated on 18.12.2022.