Web Scraping with Cheerio and Node.js: A Beginner-Friendly Guide


Nerijus Kriaučiūnas
Key Takeaways
-
Cheerio web scraping is great for fast projects that don’t need JavaScript.
-
Always handle errors properly, respect robots.txt, and space out your HTTP requests.
-
Use Cheerio object and npm install cheerio to start scraping today.
In This Article
If you’re thinking about getting into web scraping, that’s a good call on your side. It’s one of the best ways to gather data from websites without copy-pasting everything by hand. And with the help of Cheerio NodeJS, you can start small and learn quickly without many headaches.
In this guide, we will walk you through everything you need to know: setting up your tool, building a web scraper, and how to handle some common errors.
What Is Cheerio in Node.js?
The Cheerio library is a lightweight HTML parsing library for Node.js. It’s like jQuery, but only for the server. You don’t get the browser part. What you do get is the functionality to select and change HTML using a syntax you might already know. It helps you move through the DOM (Document Object Model) without needing a browser.
People love it because it’s fast and doesn’t rely on visual rendering. Cheerio gets the job done faster than most web scraping libraries if you don’t need to execute JavaScript. It’s been specifically built by developers who wanted a fast tool for web scraping without the heavy browser parts.
Cheerio is at its best when you only need static content and not interactive or dynamic data like drop-downs or pop-ups. You should use Cheerio when you need speed and don’t care about animations or dynamic scripts.
However, if you do need to execute JavaScript or scrape from pages that change after each load, Puppeteer could be a better choice. We’ll provide a brief comparison between the two later in the article.
The best part is that Cheerio is a completely open-source tool, and anyone can contribute to it. Also, it’s totally free to use under the MIT license.
Setting Up Node.js and Cheerio
Before you do anything else, make sure Node.js is installed. Then, run this code:
npm install cheerio axios
You need axios because it’s a promise-based HTTP client which we will use to make HTTP requests. Here’s a simple structure to keep things clean:
/web-scraper
│
├── index.js
├── package.json
└── /data
Keep your entire scraping logic in index.js for now. You can organize later when your projects grow.
Building Your First Web Scraper
Now, let’s build a basic scraper. To make it easier to understand, let’s create a script that will grab news headlines from a public site.
#!/usr/bin/env node
const axios = require('axios');
const cheerio = require('cheerio');
const URL = 'https://www.theguardian.com/europe';
const selectors = [
'h3.card-headline span.show-underline', // main cards
'h3.card-sublink-headline span.show-underline', // sub-links under cards
'a.js-headline-text' // standard News list items
];
async function fetchHeadlinesVerbose() {
try {
// 1) Fetch page
console.log('>> Fetching:', URL);
const { data: html } = await axios.get(URL, {
headers: { 'User-Agent': 'Mozilla/5.0' }
});
console.log(`>> Loaded HTML (length ${html.length} chars)`);
console.log(html.slice(0, 200).replace(/\n/g, ' ') + '…\n');
// 2) Load into Cheerio
const $ = cheerio.load(html);
const allFound = new Set();
// 3) Try each selector
for (const sel of selectors) {
const elems = $(sel);
console.log(`>> Selector "${sel}" matched ${elems.length} elements`);
elems.slice(0, 5).each((i, el) => {
const txt = $(el).text().trim();
console.log(` [${i+1}] "${txt}"`);
if (txt) allFound.add(txt);
});
}
// 4) Fallback if nothing
if (allFound.size === 0) {
console.warn('⚠️ No headlines found with the specific selectors—falling back to any /2025/ links');
const fallback = $('a[href*="/2025/"]');
console.log(`>> Fallback selector matched ${fallback.length} links`);
fallback.slice(0, 5).each((i, el) => {
const txt = $(el).text().trim();
console.log(` [${i+1}] "${txt}"`);
if (txt) allFound.add(txt);
});
}
// 5) Final output
const list = Array.from(allFound);
console.log(`\n>> Total unique headlines collected: ${list.length}\n`);
list.forEach((h, i) => console.log(`${i+1}. ${h}`));
} catch (err) {
console.error('❌ Error fetching/parsing page:', err.message);
}
}
fetchHeadlinesVerbose();
This function pulls the HTML, loads it into the Cheerio object, and finds headline tags. You can tailor this script to pull any other static data points from websites that you need. Just keep in mind that you’ll need to adjust the script accordingly.
Cheerio web scraping is light, fast, and simple. There’s no need to mess with page loads or visual rendering.
Error Handling & Best Practices
Sometimes code breaks, and it happens more often than you may think. You should always use try/catch when making HTTP requests. Also, check for missing elements. Not every page has what you’re looking for.
Here are some more tips for you:
- Respect robots.txt. If the site doesn’t allow web scraping, you should respect it.
- Set delays. Don’t hit a server with hundreds of requests per second.
- Use headers. Set a user-agent to look like a normal browser and a regular visitor.
- Use a VPN. If you’re scraping from a blocked region, a VPN helps access the geo-blocked content.
- Use a proxy. If you’d like to minimize your chances of getting blocked, you may want to try rotating proxies .
Cheerio vs Puppeteer: Which Should You Use?
If you’re trying to decide between scraping with Puppeteer or Cheerio, here’s a quick comparison to help you decide:
Feature | Cheerio | Puppeteer |
---|---|---|
Speed | Fast | Slower |
Loads JavaScript | No | Yes |
Good for static HTML | Yes | Yes |
Needs browser | No | Yes |
Easy to learn | Very easy | A bit more complex |
In short, if you don’t need any interactivity, Cheerio web scraping is the better pick. You’ll avoid overhead and finish your tasks faster.
Conclusion
If you’re new to web scraping, the Cheerio library makes it easy to start your web data extraction journey. It has a simple syntax and fast HTML parsing, which makes it great for web scraping websites that don’t rely on dynamic scripts.
Pair it with a promise-based HTTP client like axios, and you’ve got a solid toolset for scraping static pages. If you’d like to learn more about web scraping, check out our article on the best programming languages for scraping .

Author
Nerijus Kriaučiūnas
Head of DevOps
With a strong background in system administration, Nerijus has honed his expertise in web hosting and infrastructure management through roles at various companies. As the Head of DevOps at IPRoyal, he oversees product administration while playing a key role in managing residential and ISP proxies. His vast technical expertise ensures streamlined operations across all IPRoyal’s services. When he’s not focused on work, Nerijus enjoys cycling, playing basketball, and hitting the slopes for a ski session.
Learn More About Nerijus Kriaučiūnas