Web Scraping with Cheerio and Node.js: A Beginner-Friendly Guide

Nerijus Kriaučiūnas

Last updated - May 27, 2025

Key Takeaways

Cheerio web scraping is great for fast projects that don’t need JavaScript.
Always handle errors properly, respect robots.txt, and space out your HTTP requests.
Use Cheerio object and npm install cheerio to start scraping today.

What Is Cheerio in Node.js?

The Cheerio library is a lightweight HTML parsing library for Node.js. It’s like jQuery, but only for the server. You don’t get the browser part. What you do get is the functionality to select and change HTML using a syntax you might already know. It helps you move through the DOM (Document Object Model) without needing a browser.

People love it because it’s fast and doesn’t rely on visual rendering. Cheerio gets the job done faster than most web scraping libraries if you don’t need to execute JavaScript. It’s been specifically built by developers who wanted a fast tool for web scraping without the heavy browser parts.

Cheerio is at its best when you only need static content and not interactive or dynamic data like drop-downs or pop-ups. You should use Cheerio when you need speed and don’t care about animations or dynamic scripts.

However, if you do need to execute JavaScript or scrape from pages that change after each load, Puppeteer could be a better choice. We’ll provide a brief comparison between the two later in the article.

The best part is that Cheerio is a completely open-source tool, and anyone can contribute to it. Also, it’s totally free to use under the MIT license.

Setting Up Node.js and Cheerio

Before you do anything else, make sure Node.js is installed. Then, run this code:

npm install cheerio axios

You need axios because it’s a promise-based HTTP client which we will use to make HTTP requests. Here’s a simple structure to keep things clean:

/web-scraper
│
├── index.js
├── package.json
└── /data

Keep your entire scraping logic in index.js for now. You can organize later when your projects grow.

Building Your First Web Scraper

Now, let’s build a basic scraper. To make it easier to understand, let’s create a script that will grab news headlines from a public site.

#!/usr/bin/env node

const axios = require('axios');
const cheerio = require('cheerio');

const URL = 'https://www.theguardian.com/europe';
const selectors = [
  'h3.card-headline span.show-underline',         // main cards
  'h3.card-sublink-headline span.show-underline', // sub-links under cards
  'a.js-headline-text'                            // standard News list items
];

async function fetchHeadlinesVerbose() {
  try {
    // 1) Fetch page
    console.log('>> Fetching:', URL);
    const { data: html } = await axios.get(URL, {
      headers: { 'User-Agent': 'Mozilla/5.0' }
    });
    console.log(`>> Loaded HTML (length ${html.length} chars)`);
    console.log(html.slice(0, 200).replace(/\n/g, ' ') + '…\n');

    // 2) Load into Cheerio
    const $ = cheerio.load(html);

    const allFound = new Set();

    // 3) Try each selector
    for (const sel of selectors) {
      const elems = $(sel);
      console.log(`>> Selector "${sel}" matched ${elems.length} elements`);
      elems.slice(0, 5).each((i, el) => {
        const txt = $(el).text().trim();
        console.log(`   [${i+1}] "${txt}"`);
        if (txt) allFound.add(txt);
      });
    }

    // 4) Fallback if nothing
    if (allFound.size === 0) {
      console.warn('⚠️ No headlines found with the specific selectors—falling back to any /2025/ links');
      const fallback = $('a[href*="/2025/"]');
      console.log(`>> Fallback selector matched ${fallback.length} links`);
      fallback.slice(0, 5).each((i, el) => {
        const txt = $(el).text().trim();
        console.log(`   [${i+1}] "${txt}"`);
        if (txt) allFound.add(txt);
      });
    }

    // 5) Final output
    const list = Array.from(allFound);
    console.log(`\n>> Total unique headlines collected: ${list.length}\n`);
    list.forEach((h, i) => console.log(`${i+1}. ${h}`));

  } catch (err) {
    console.error('❌ Error fetching/parsing page:', err.message);
  }
}

fetchHeadlinesVerbose();

This function pulls the HTML, loads it into the Cheerio object, and finds headline tags. You can tailor this script to pull any other static data points from websites that you need. Just keep in mind that you’ll need to adjust the script accordingly.

Cheerio web scraping is light, fast, and simple. There’s no need to mess with page loads or visual rendering.

Error Handling & Best Practices

Sometimes code breaks, and it happens more often than you may think. You should always use try/catch when making HTTP requests. Also, check for missing elements. Not every page has what you’re looking for.

Here are some more tips for you:

Respect robots.txt. If the site doesn’t allow web scraping, you should respect it.
Set delays. Don’t hit a server with hundreds of requests per second.
Use headers. Set a user-agent to look like a normal browser and a regular visitor.
Use a VPN. If you’re scraping from a blocked region, a VPN helps access the geo-blocked content.
Use a proxy. If you’d like to minimize your chances of getting blocked, you may want to try rotating proxies .

Cheerio vs Puppeteer: Which Should You Use?

If you’re trying to decide between scraping with Puppeteer or Cheerio, here’s a quick comparison to help you decide:

Feature	Cheerio	Puppeteer
Speed	Fast	Slower
Loads JavaScript	No	Yes
Good for static HTML	Yes	Yes
Needs browser	No	Yes
Easy to learn	Very easy	A bit more complex

In short, if you don’t need any interactivity, Cheerio web scraping is the better pick. You’ll avoid overhead and finish your tasks faster.

Conclusion

If you’re new to web scraping, the Cheerio library makes it easy to start your web data extraction journey. It has a simple syntax and fast HTML parsing, which makes it great for web scraping websites that don’t rely on dynamic scripts.

Pair it with a promise-based HTTP client like axios, and you’ve got a solid toolset for scraping static pages. If you’d like to learn more about web scraping, check out our article on the best programming languages for scraping .

Create Account

Author

Nerijus Kriaučiūnas

Head of DevOps

With a strong background in system administration, Nerijus has honed his expertise in web hosting and infrastructure management through roles at various companies. As the Head of DevOps at IPRoyal, he oversees product administration while playing a key role in managing residential and ISP proxies. His vast technical expertise ensures streamlined operations across all IPRoyal’s services. When he’s not focused on work, Nerijus enjoys cycling, playing basketball, and hitting the slopes for a ski session.

Learn More About Nerijus Kriaučiūnas

Share on

Web Scraping with Cheerio and Node.js: A Beginner-Friendly Guide

Key Takeaways

In This Article

Ready to get started?

What Is Cheerio in Node.js?

Setting Up Node.js and Cheerio

Building Your First Web Scraper

Error Handling & Best Practices

Cheerio vs Puppeteer: Which Should You Use?

Conclusion

Related articles

How To Scrape Data From Amfibi Business Directory

How To Scrape Data From Glassdoor

Web Scraping with Cheerio and Node.js: A Beginner-Friendly Guide