Extracting PDF pages with Python
I wanted to have a quick solution to provide sample PDF pages from my book Command Line: A Modern Introduction to potential customers. It turns out that with Python this is very easy to do in an automated and repeatable way.
The task at hand is then:
- Given a PDF with multiple pages
- Extract a given page or a collection of pages as a separate PDF
Personally, I have used a Python PDF library called PyPDF2. With PyPDF2, we just need to:
- Install PyPDF2 via
pip install pypdf2or use a dependency manager of our choice
- Read the original PDF file with
PdfFileReaderobject to read a page or multiple pages to extract
PdfFileWriterobject to add those pages to a new virtual PDF file
- Save the new pages as a new file
To see this in action, look at my own example of extracting multiple pages using a range to produce a new PDF file:
from PyPDF2 import PdfFileReader, PdfFileWriter
writer = PdfFileWriter()
with open("full.pdf", 'rb') as infile:
for page in range(11,17):
reader = PdfFileReader(infile)
with open(f'example.pdf', 'wb') as outfile:
It shouldn't be difficult to alter the example to your own needs as necessary.
Last updated on 18.12.2022.