Extracting PDF pages with Python

how-to python

I wanted to have a quick solution to provide sample PDF pages from my Command Line Handbook to potential customers. It turns out that with Python this is very easy to do in an automated and repeatable way.

The task at hand is then:

PyPDF2

Personally, I have used a Python PDF library called PyPDF2. With PyPDF2, we just need to:

  1. Install PyPDF2 via pip install pypdf2 or use a dependency manager of our choice
  2. Read the original PDF file with open() Python function
  3. Use PdfFileReader object to read a page or multiple pages to extract
  4. Use PdfFileWriter object to add those pages to a new virtual PDF file
  5. Save the new pages as a new file

Example

To see this in action, look at my own example of extracting multiple pages using a range to produce a new PDF file:

from PyPDF2 import PdfFileReader, PdfFileWriter

writer = PdfFileWriter()
with open("full.pdf", 'rb') as infile:
for page in range(11,17):
reader = PdfFileReader(infile)
writer.addPage(reader.getPage(page))
with open(f'example.pdf', 'wb') as outfile:
writer.write(outfile)

It shouldn't be difficult to alter the example to your own needs as necessary.

Last updated on 18.12.2022.