Getting Started with Pyppeteer: A Python Guide to Puppeteer
PythonDiscover how to use Pyppeteer for browser automation and web scraping. Learn about cover setup, scripting, headless browsers, user agents, and more.

Justas Vitaitis
Key Takeaways
-
Pyppeteer is an unofficial Python port of Puppeteer, allowing Python developers to automate Chromium browsers for tasks like web scraping and testing.
-
To install, run pip install pyppeteer. On first use, Pyppeteer automatically downloads a compatible version of Chromium.
-
Use await page commands to interact with the DOM, manage user agents, and handle headless browsers.
Browser automation is a process that helps you control interaction with web pages with code. You can click buttons, fill out forms, conduct automated testing, and scrape data. One popular tool for this is Puppeteer . It’s a Chromium browser automation library built for Node.js.
If you prefer Python to Node.js, however, you can use Pyppeteer instead. It’s the unofficial Python port of Puppeteer. With it, Python developers can take full advantage of browser automation.
After following through with our Pyppeteer tutorial, you should be able to understand how to run Pyppeteer, interact with pages, and handle dynamically loaded content.
What Is Pyppeteer and Why Use It?
Pyppeteer is an unofficial Python port of Puppeteer, replicating its main functionality for Chromium browser automation. The goal is to make browser automation smoother without learning JavaScript.
Compared to Selenium, Pyppeteer can feel faster in certain JavaScript-heavy pages since it directly controls Chromium through the DevTools protocol and gives better overall control, but Selenium remains more stable and widely supported. While Playwright is also powerful , Pyppeteer remains lighter and simpler for beginners.
It makes Pyppeteer a strong choice for small to mid-size web scraping or web automation projects.
Installing Pyppeteer
Getting started with Pyppeteer is quick. First, make sure your system meets the basic requirements. You’ll need:
- Python 3.6-3.10, since Pyppeteer may not work properly on newer versions.
- Windows, macOS, or Linux.
- An internet connection (Pyppeteer downloads Chromium on first use).
To install it, open your IDE and its terminal and run:
pip install pyppeteer
The command installs the Pyppeteer library. Before we move on further, make sure that you always use a virtual environment when working on Python projects. It keeps your dependencies clean and avoids version clashes.
Here’s how you can do it:
python -m venv venv
source venv/bin/activate # For macOS/Linux
# OR
venv\Scripts\activate # For Windows
pip install pyppeteer
Some IDEs, such as PyCharm, will automatically create a virtual environment for you, so you may skip that step if yours does.
After that, you’re ready to run Pyppeteer. Once installed, the first time you use Pyppeteer, it will automatically download a headless version of Chromium. It’s what the Pyppeteer library uses for browser automation.
Setting Up Your First Script
For your first script, we’ll keep it simple. This script will:
- Launch a Chromium browser.
- Open a web page.
- Take a screenshot.
We’ll also explain each part step by step. Here’s the complete code to begin with:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(
executablePath='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe',
headless=True
)
page = await browser.newPage()
await page.goto('https://iproyal.com')
await page.screenshot({'path': 'example.png'})
await browser.close()
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(main())
- import asyncio. Needed to handle asynchronous functions. Pyppeteer uses async/await for every action.
- from pyppeteer import launch. It loads the main Pyppeteer library function used to run Pyppeteer and start browser automation.
- browser = await launch(). Since Pyppeteer is no longer actively maintained, the Chrome download will likely break. Set the executable path to your Google Chrome browser and add headless mode.
- page = await browser.newPage(). Opens a fresh tab to interact with.
- await page.goto(' https://iproyal.com '). Loads the page URL.
- await page.screenshot(...). Captures the screen as an image.
- await browser.close(). Shuts down the browser instance.
You’ll notice that every action includes await page or await browser. It’s because Pyppeteer relies heavily on async functions. Python needs to pause and wait for the browser to complete each action. So, without await, your script might break or skip steps.
In general, the script shows how web automation with Pyppeteer works. It’s also a perfect base for doing more, like web scraping or controlling user agents later.
Scraping a Website With Pyppeteer
Now we can take the script further and use it for web scraping. Pyppeteer can handle both static and JavaScript-rendered pages, though for highly dynamic sites, newer tools may provide better compatibility.
In this example, you’ll extract:
- The page title.
- The main heading.
- The first paragraph text.
You’ll also see how to wait for elements and deal with pages rendered by JS function calls.
Here’s the script:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(
executablePath='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe',
headless=True
)
page = await browser.newPage()
await page.goto('https://iproyal.com')
await page.waitForSelector('h1')
title = await page.title()
heading = await page.Jeval('h1', 'el => el.textContent')
paragraph = await page.Jeval('p', 'el => el.textContent')
print("Title:", title)
print("Heading:", heading)
print("Paragraph:", paragraph)
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
- await page.goto(...) loads the page.
- await page.waitForSelector(...) waits for elements to appear. This is crucial for pages with dynamic content.
- await page.title() pulls the title.
- await page.Jeval(...) runs a small JavaScript function to grab element text.
You can change “H1” or “p” to any CSS selector you want to target. That’s how web scraping works. You’re selecting DOM elements and getting their values.
Dealing With JavaScript-Rendered Content
Sometimes content won’t show up right away. It loads after the page’s scripts run. Pyppeteer handles that easily.
For example, if you’re scraping a page that loads extra data with JavaScript, you can wait like this:
await page.waitForSelector('.dynamic-data')
data = await page.Jeval('.dynamic-data', 'el => el.textContent')
Here, await page waits for the element to exist. You’re letting the browser do the heavy lifting, just like a human visiting the site would.
Many traditional scraping tools struggle with JavaScript-rendered content, but modern browser automation libraries like Pyppeteer, Playwright, and Selenium can all handle it well.
Pyppeteer runs a real Chromium instance, so it can fully render pages, execute JavaScript, and interact with dynamically loaded content, just like a real browser.
Handling Headless Browsers and Advanced Options
By default, Pyppeteer runs in headless browser mode. It means that it opens Chromium in the background, with no visible window. It’s faster, uses less memory, and is perfect for browser automation tasks like web scraping, Pyppeteer tests, and data collection.
But sometimes, you need to see what the browser is doing. That’s where non-headless mode comes in.
Here’s how you can switch easily:
browser = await launch(headless=False)
It opens a real, visible browser window, so you can watch your script in action. It’s helpful for debugging or when working with complex dynamic content.
Custom Viewport and Window Size
You can control the size of the browser window with options like this:
browser = await launch(
headless=False,
args=['--window-size=1280,800'],
defaultViewport={
'width': 1280,
'height': 800
}
)
It gives you a complete view, helpful for layout-based web automation or when saving full-page screenshots.
Setting Custom User Agents
Some websites check your user agent to see what browser you’re using. If it looks fake, they may block you. Changing it can help you blend in:
await page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) CustomAgent")
You can change the user agent to match a real browser. It’s essential for web scraping, avoiding bot detection, loading mobile versions of pages, and more.
You’ll likely use await page.setUserAgent(...) often in your scripts. It helps every part of your browser automation look more real.
Use it together with viewport settings to simulate real user behavior:
await page.setViewport({
'width': 375,
'height': 667,
'isMobile': True
})
This specific configuration, for example, mimics a mobile phone screen.
Intercepting and Blocking Requests
If you want faster scraping, you can block images, styles, or ads with request interception:
await page.setRequestInterception(True)
page.on('request', lambda req: asyncio.ensure_future(
req.abort() if req.resourceType in ['image', 'stylesheet'] else req.continue_()
))
It keeps your script light. You’ll save time and bandwidth during Pyppeteer tests or large-scale web scraping projects.
Taking Screenshots With Custom Options
You can customize screenshots, too:
await page.screenshot({
'path': 'screenshot.png',
'fullPage': True
})
Use it during Pyppeteer tests to track what your script sees. You can also compare results when testing across different user agents.
These options give you full control over the Chromium browser automation library. Pyppeteer has the capacity to handle it smoothly, whether you’re working with headless browsers, managing dynamic content, or faking user agents.
Conclusion
You’ve now learned how to set up and run Pyppeteer, build scripts, and use browser automation to scrape content and control pages. The Pyppeteer library lets Python users control Chromium browsers through a Python interface, without needing to write JavaScript for most tasks, though JavaScript snippets can still be used when needed.