Extracting PDF pages with Python
I wanted to have a quick solution to provide sample PDF pages from my Command Line Handbook to potential customers. It turns out that with Python this is very easy to do in an automated and repeatable way.
The task at hand is then:
- Given a PDF with multiple pages
- Extract a given page or a collection of pages as a separate PDF
PyPDF2
Personally, I have used a Python PDF library called PyPDF2. With PyPDF2, we just need to:
- Install PyPDF2 via
pip install pypdf2
or use a dependency manager of our choice - Read the original PDF file with
open()
Python function - Use
PdfFileReader
object to read a page or multiple pages to extract - Use
PdfFileWriter
object to add those pages to a new virtual PDF file - Save the new pages as a new file
Example
To see this in action, look at my own example of extracting multiple pages using a range to produce a new PDF file:
from PyPDF2 import PdfFileReader, PdfFileWriter
writer = PdfFileWriter()
with open("full.pdf", 'rb') as infile:
for page in range(11,17):
reader = PdfFileReader(infile)
writer.addPage(reader.getPage(page))
with open(f'example.pdf', 'wb') as outfile:
writer.write(outfile)
It shouldn't be difficult to alter the example to your own needs as necessary.
Last updated on 18.12.2022.