Software development and beyond

Extracting PDF pages with Python

I wanted to have a quick solution to provide sample PDF pages from my book Command Line: A Modern Introduction to potential customers. It turns out that with Python this is very easy to do in an automated and repeatable way.

The task at hand is then:

PyPDF2

Personally, I have used a Python PDF library called PyPDF2. With PyPDF2, we just need to:

  1. Install PyPDF2 via pip install pypdf2 or use a dependency manager of our choice
  2. Read the original PDF file with open() Python function
  3. Use PdfFileReader object to read a page or multiple pages to extract
  4. Use PdfFileWriter object to add those pages to a new virtual PDF file
  5. Save the new pages as a new file

Example

To see this in action, look at my own example of extracting multiple pages using a range to produce a new PDF file:

from PyPDF2 import PdfFileReader, PdfFileWriter

writer = PdfFileWriter()
with open("full.pdf", 'rb') as infile:
for page in range(11,17):
reader = PdfFileReader(infile)
writer.addPage(reader.getPage(page))
with open(f'example.pdf', 'wb') as outfile:
writer.write(outfile)

It shouldn't be difficult to alter the example to your own needs as necessary.

Last updated on 18.12.2022.

how-to python