Creating website screenshots with Python and pyppeteer

pyppeteer is a Python port of a headless Chromium browser automation library puppeteer. It is a very useful tool in general and can be used to perform a number of tasks like web scraping, automation and making website screenshots! The advantage of using an actual browser to create website screenshots is clear: it gives us the very same result people will see in their own browsers.

Once the screenshot is taken, we will use Python imagining library Pillow to resize the resulting image to the size we want. We can call our code in number of ways, but for this example I will use CLI library Click. You can learn more about Click in my other blog post Building command-line interfaces in Python.

So let’s begin by installing our dependencies:

pip install pyppeteer
pip install pillow
pip install click

Pyppeteer will also need a working copy of Chromium, which can be downloaded directly by invoking pyppeteer-install command which will become available after installation.

Once this is done, we can start creating our program. The main code will start a headless Chromium, navigate to a page we want to capture and save its screenshot to a file of our choosing. The interesting thing is that pyppeteer has asynchronous API, so we will define our function also as asynchronous and later on use Python’s AsyncIO event loop to run it:

async def capture_screenshot(url: str, path: Path, viewport_width: int, viewport_height: int) -> None:
     browser = await launch()
     page = await browser.newPage()
     await page.setViewport({'width': viewport_width, 'height': viewport_height})
     await page.goto(url)
     await page.screenshot({'path': path})
     await browser.close()

The code is very straightforward due to the pyppeteer’s simple API. The important decision here is that we want to be able to provide browser’s viewport dimensions. Unless set to a proper screen size, Chromium would start with a very small window and the screenshot would be useless.

After the screenshot is saved, we want to resize it to a size for publishing, so for that let’s implement a resize_screenshot function:

def resize_screenshot(original_path: Path, resized_path: Path, width: int, height: int) -> None:
     im = Image.open(original_path)
     im.thumbnail((width, height))
     im.save(resized_path)

To tie it all together, we will use Click to implement a command. This command will allow us to change all of the parameters we need: the URL, viewport size, final screenshot size and the filenames used for saving the images:

 @cli.command()
 @click.option('--url')
 @click.option('--viewport_width', default=1200)
 @click.option('--viewport_height', default=800)
 @click.option('--width', default=700)
 @click.option('--height', default=466)
 @click.option('--filename', default='screenshot.png')
 @click.option('--resized_filename', default='screenshot_resized.png')
 def screenshot(url, viewport_width, viewport_height, width, height, filename, resized_filename):
     click.echo("Capturing screenshot…")
     original_path = Path(filename)
     resized_path = Path(resized_filename)
     asyncio.get_event_loop().run_until_complete(capture_screenshot(url, original_path, viewport_width, viewport_height))
     resize_screenshot(original_path, resized_path, width, height)
     click.echo("Done")

As our capture_screenshot function is async, we had to create AsyncIO event loop to invoke it. You can see the whole working example including the imports in my python-screenshots repository.

To run the program, simply invoke our screenshot command:

python main.py screenshot --url=https://stribny.name

And this is all! By default, screenshot.png and screenshot_resized.png files will be saved in the project directory for further processing.

Last updated on 1.7.2020.