50% OFF Residential Proxies for 9 months — use code IPR50 at checkout

Get The Deal
Back to blog

How to Use Crawlee for Web Scraping: A Step-by-Step Tutorial

Vilius Dumcius

Last updated -
How to

Ready to get started?

Register now

Web scraping is a popular method to gather data from websites automatically, be it product price data, company-related information, reviews, or more. And instead of having to do all that manually, which would take ages, Crawlee steps in to help.

Crawlee makes web scraping and browser automation fast, intuitive, and reliable. In this tutorial, you will learn how to install Crawlee, set it up, and build a simple web scraper step by step.

What Is Crawlee?

Crawlee is an open-source browser automation library for Node.js. It lets you control headless browser or HTTP sessions, helping you perform both web scraper and browser automation with ease.

Unlike other popular tools, Crawlee combines both HTTP requests and JavaScript, providing flexible options for fetching and processing data. It means you no longer need to limit yourself to static HTML pages since Crawlee can also scrape dynamic pages.

Key Features of Crawlee

  • Easy crawler setup with both headless browsers and HTTP clients.
  • Built-in support for proxy rotation and concurrency.
  • Automatic retry and error handling.
  • Data export to files like JSON or CSV.
  • Integration with popular tools like browser automation library frameworks.

Why Use Crawlee for Web Scraping

Crawlee speeds up web scraping by automating tasks you’d otherwise have to do manually. It handles requests and runs JavaScript in pages when needed. It offers simple tools for proxy rotation, which helps prevent IP blocking.

The combined support for web scraping and browser automation makes it more versatile than tools that cover only one method.

How to Install Crawlee

Installing Crawlee is a straightforward process and only takes a few minutes if you already have Node.js set up. If you don’t, download it from Node.js website . After that, follow these steps.

1. Initialize a New Node.js Project

If you’re starting a new project, run:

mkdir my-crawlee-project
cd my-crawlee-project
npm init -y

This sets up a basic package.json file for dependency management.

If using an IDE such as Visual Studio Code, simply create a new folder and make it your project directory.

2. Install Crawlee Via npm

Now, install Crawlee using the following command:

npm install crawlee

This command downloads Crawlee and adds it to your project’s dependencies.

Make sure to also create a file in your project directory named crawler.js.

3. Verifying the Installation

Once installed, you can verify everything is working by creating a basic script like this:

// crawler.js
import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
  async requestHandler({ request, $, log }) {
    log.info(`Crawling: ${request.url}`);
    const title = $('title').text();
    console.log(`Title of ${request.url}: ${title}`);
  },
});

await crawler.run(['https://example.com']);

Run it using:

node crawler.js

If you get an error, create (or edit) the package.json file to:

{
  "type": "module",
  "dependencies": {
    "crawlee": "^3.13.9"
  }
}

Basic Structure of a Crawlee Scraper

1. Crawler Setup

You can choose between two main types:

  • HTTP-based crawling. Uses lightweight requests, is fast and simple.
  • Browser-based crawling. Loads pages more like a real browser, great for web scraping and browser pages that use JavaScript.

For simple pages, HTTP is faster. But when pages rely on JavaScript, use a headless (or even a headful) browser. You can pick based on what the website needs.

2. Handling Requests and Responses

The request handler is where you write what happens when the crawler visits a page. You’re basically giving instructions. This part handles each page one at a time and lets you control how links are followed or retried.

3. Extracting Data from Web Pages

Inside the request handler, you use tools like Cheerio to extract text, links, images, or anything else. For web scraping and browser automation, that’s the most important part. Your tools read the pages and pick out what’s needed. Then, you can store the data in JSON or CSV.

Here is a simple example of what the project script could look like using HTTP crawling:

import { CheerioCrawler } from 'crawlee';


(async () => {
  const crawler = new CheerioCrawler({
    async requestHandler({ $, request }) {
      console.log(`Scraping: ${request.url}`);
      const titles = [];
      $('h2.article-title').each((index, el) => {
        titles.push($(el).text());
      });
      console.log(titles);
    },
  });

  await crawler.run(['https://example-blog.com']);
})();

What the script does is this:

  • Uses CheerioCrawler for HTTP-based crawling.
  • Visits each URL and grabs all <h2> elements with classes article-title.
  • Logs the titles to the console.

Web scraping can be quite simple when using Crawlee. Just make sure you tailor it to your specific project and you’re good to go.

Simple Web Scraping Example Using Crawlee

Here’s a basic example to help you get started with web scraping using Crawlee. The code below will scrape the titles of blog posts from a website of your choosing. It’s simple, clear, and easy to copy for your own projects.

First, make sure you’ve already run this:

npm install crawlee

Then, create a file named scrape.js and add the following code:

import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
  async requestHandler({ request, $ }) {
    console.log(`Visiting: ${request.url}`);

    const titles = [];

    // Adjust the selector if needed
    $('h2.post-title, h2.entry-title, h2.tp-headline-s').each((_, el) => {
      const title = $(el).text().trim();
      if (title) titles.push(title);
    });

    console.log('Captured titles:', titles);
  },
});

await crawler.run(['https://iproyal.com/blog/']);
;

Let’s break it down. The code:

  • Uses CheerioCrawler for fast HTTP requests.
  • Goes to the start URL and grabs all h2 classes of h2.post-title, h2.entry-title, h2.tp-headline-s > text`.
  • Logs each title to the console.
  • Follows the ‘Next Page’ link if one is found.

Managing Proxies in Crawlee

When performing web scraping on a small scale, one IP address might be enough. However, if you’re collecting data from multiple pages or websites, you’ll likely encounter blocks.

Websites can detect lots of traffic from one IP and block or slow you down. Proxies can help as you can use them to rotate IPs automatically.

import { CheerioCrawler } from 'crawlee';

import { CheerioCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfig = new ProxyConfiguration({
  proxyUrls: [
    'http://user:[email protected]:8000',
    'http://proxy2.example.com:8000',
  ],
});

const crawler = new CheerioCrawler({
  proxyConfiguration: proxyConfig,
  async requestHandler({ request, response }) {
    console.log(`URL: ${request.url} - Status: ${response.statusCode}`);
  },
});

await crawler.run(['https://iproyal.com']);

This kind of setup allows you to scrape more safely and reliably at scale. With Crawlee, you don’t need to manage the proxy logic all by yourself - it handles that behind the scenes.

Conclusion

Now you know how to install Crawlee, set up a basic web scraping and browser automation script, and add proxy rotation.

Crawlee handles both headless browsers and HTTP modes, includes JavaScript execution, and simplifies management of requests and proxies.

If you’d like to learn more, check out our guides on free web crawling tools and scraping with Python .

Create Account

Author

Vilius Dumcius

Product Owner

With six years of programming experience, Vilius specializes in full-stack web development with PHP (Laravel), MySQL, Docker, Vue.js, and Typescript. Managing a skilled team at IPRoyal for years, he excels in overseeing diverse web projects and custom solutions. Vilius plays a critical role in managing proxy-related tasks for the company, serving as the lead programmer involved in every aspect of the business. Outside of his professional duties, Vilius channels his passion for personal and professional growth, balancing his tech expertise with a commitment to continuous improvement.

Learn More About Vilius Dumcius
Share on

Related articles