50% OFF Residential Proxies for 9 months — use code IPR50 at checkout

Get The Deal

In This Article

Back to blog

The Ultimate CSS Selector Cheat Sheet for Web Scraping

Tutorials

Master CSS selectors for web scraping with examples in Scrapy & Selenium. Extract data efficiently using classes, IDs, attributes & pseudo-classes.

Justas Palekas

Last updated - ‐ 13 min read

Have you tried to scrape a website but couldn’t find a way to grab the exact data you wanted? The simple truth about web scraping is that it isn’t about the tool you use, but how you target the elements you need to extract. That’s where CSS selectors come in.

These elements are like GPS coordinates, helping you locate the data you’re interested in. They inform the web scraper precisely where the text, images, or links are hiding on the page. The best part is, they work the same way across popular scraping tools like Scrapy, Puppeteer, and Selenium.

This blog is the ultimate CSS selector cheat sheet. We are also going to cover scraping smarter and faster with short examples, code snippets, and more.

What Are CSS Selectors?

So, what exactly are CSS selectors? CSS selectors are patterns used to target specific elements within a web page’s HTML. If you’ve ever written HTML and styled a webpage, then you’ve already written selectors, such as .button for buttons or #header for the page header.

However, here’s the key difference: in web scraping, we’re not using selectors to style or decorate the page - we’re using them as a precise way of pointing to the exact data we want to extract.

You can think of a web page as a large tree of HTML elements or tags (<div>, <p>, <a>). Without specific selectors, your scraper would extract the entire tree, which is far from optimal.

With selectors, you can specify that you want only the links in the nav bar, or just the product names. In other words, selectors are like a search filter for the raw code of the page.

Here is how it works in Scrapy and Selenium:

  • Scrapy (Python)
# Scrapy example: extract all menu link texts
response.css('.menu-link::text').getall()
# Output: ['Shop', 'About']
  • Selenium (JavaScript)
// Get all elements with class "menu-link"
let elements = await driver.findElements(By.css(".menu-link"));

// Extract the text from each element
let texts = await Promise.all(elements.map(el => el.getText()));

console.log(texts); // ['Shop', 'About']

In one or three lines, we've pulled just the Shop and About text, ignoring everything else. That's the real power of selectors. They allow you to think in terms of the exact data you need, rather than the entire page.

Types of Selectors in CSS (With Scraping Examples)

Let's go over the different types of selectors you will use in practice. It is important to note that not all selectors are created equal. We'll walk through the six main types of selectors and demonstrate how each one works with real HTML elements through examples and quick scraping snippets.

Universal Selector

These select all the elements on a page. This selector is not commonly used in scraping for data extraction, as it will return all of the matching elements on the page.

However, it is useful for quick debugging, seeing which nodes contain text, or listing all tags on a page when you don't know the structure. You can use it to quickly scan where the content is located before narrowing down to a more specific selector.

<div>
<p>First</p>
<p>Second</p>
</div>
  • Scrapy (Python)
response.css('div *::text').getall()
  • Selenium (JavaScript)
let texts = await Promise.all((await driver.findElements(By.css('div *'))).map(el => el.getText()));

The universal identifier is helpful as a starting point when you need to apply or scrape something across all of the elements on a page. For example, when you need to count or extract every node (for instance, grabbing all links or all text nodes before further filtering). Another good example would be when you're debugging a page and want to see everything that’s being targeted quickly.

Type (Element/Tag) Selector

Type selectors target a specific HTML tag. They are your best choice when you want all instances of a kind of element. For example, if you need all links (a) or all images (img).

This approach is beneficial when the page uses semantic tags consistently, like all article titles are h2. However, it can be tricky if the site mixes tags. A good practice is to use type selectors to grab basic collections like links, paragraphs, and images before refining.

<div>
<a href="https://example.com/home">Home</a>
<a href="https://example.com/shop">Shop</a>
<a href="https://example.com/about">About</a>
<img src="https://example.com/img1.jpg" alt="Image 1">
<img src="https://example.com/img2.png" alt="Image 2">
<img src="https://example.com/img3.jpg" alt="Image 3">
</div>
  • Scrapy (Python)
# Get all link URLs
response.css('a::attr(href)').getall()
# Get all image sources
response.css('img::attr(src)').getall()
  • Selenium (JavaScript)
// All link URLs
let links = await Promise.all((await driver.findElements(By.css('a'))).map(el => el.getAttribute('href')));
// All image sources
let imgs = await Promise.all((await driver.findElements(By.css('img'))).map(el => el.getAttribute('src')));

Class Selector

Class selectors, which target elements by their class attribute (.class name), are the most common in scraping because classes are often applied to groups of elements you’re likely interested in, such as product names, article headings, and prices.

Although they may not be unique, they are usually descriptive and stable. If a site has semantic class names, such as .product-title, it might be best to look at them first before starting the scraping process.

<div class="product">
<div class="product-title">Laptop</div>
<div class="price">$1200</div>
</div>
<div class="product">
<div class="product-title">Phone</div>
<div class="price">$800</div>
</div>
<div class="product">
<div class="product-title">Headphones</div>
<div class="price">$150</div>
</div>
  • Scrapy (Python)
# Get all product titles
response.css('.product-title::text').getall()
# ['Laptop', 'Phone', 'Headphones']
# Get all product prices
response.css('.price::text').getall()
# ['$1200', '$800', '$150']
  • Selenium (JavaScript)
let titles = await Promise.all(
(await driver.findElements(By.css('.product-title'))).map(el => el.getText())
);
// Product prices
let prices = await Promise.all(
(await driver.findElements(By.css('.price'))).map(el => el.getText())
);

ID Selector (#id)

This is the most specific selector for a single element, because IDs should be unique on a page. You can use ID attributes when you need exactly one element and when the IDs appear consistently.

A caveat to using IDs is that some sites generate dynamic IDs on every load. In this case, IDs are not reliable for web scraping.

<h1 id="main-title">The Ultimate CSS Selector Guide</h1>
  • Scrapy (Python)
# Get headline text by ID
response.css('#main-title::text').get()
# 'The Ultimate CSS Selector Guide'
  • Selenium (JavaScript)
// Locate the element by its ID and get its text
let headline = await driver.findElement(By.css('#main-title')).getText();
// 'The Ultimate CSS Selector Guide'

The thing to remember about IDs is that they are the best choice when you want just one element, like a main header.

Attribute Selectors

Attribute values are powerful for web scraping. These selectors let you match elements by any attribute that has been set in the HTML.

They are useful for targeting links with specific patterns, images with specific file types, or specific data attributes used by JavaScript. They allow exact matches as in ([attr=”x”]), contains (=),* starts with (^=), or ends with ($=).

Use attributes when classes/IDs are missing or when you want to filter data by specific attribute values.

<a href="/shop/laptops">Shop Laptops</a> # href attribute value
<a href="/shop/phones">Shop Phones</a>
<a href="/about">About Us</a>
<img src="banner.jpg" alt="Banner">
<img src="logo.png" alt="Logo">
<img src="product.jpg" alt="Product">
  • Scrapy (Python)
# Get all links containing 'shop' in their href attribute value
shop_links = response.css('a[href*="shop"]::attr(href)').getall()
# ['/shop/laptops', '/shop/phones']
# Get all images ending with .jpg
jpg_images = response.css('img[src$=".jpg"]::attr(src)').getall()
# ['banner.jpg', 'product.jpg']
  • Selenium (JavaScript)
// Get all links containing 'shop'
let shop_links = await Promise.all(
(await driver.findElements(By.css('a[href*="shop"]'))).map(el => el.getAttribute("href"))
);
// ['http://example.com/shop/laptops', 'http://example.com/shop/phones']
// Get all images ending with .jpg
let jpg_images = await Promise.all(
(await driver.findElements(By.css('img[src$=".jpg"]'))).map(el => el.getAttribute("src"))
);
// ['http://example.com/banner.jpg', 'http://example.com/product.jpg']

Pseudo-Classes

Pseudo-classes (also called pseudo-selectors) are used when you need positional selection. They enable you to select elements by position or state rather than by name or attribute values.

Common positional pseudo-classes include:

  • :first-child selects the first child of its parent element.
  • :last-child selects the last child of its parent element.
  • :first-of-type selects the first element of a given type among its siblings.
  • :last-of-type selects the last element of a given type among its siblings.
  • :nth-child(n) selects the nth child of its parent (can use numbers, odd, even).
  • :nth-last-child(n) selects the nth child from the end of the parent.
  • Adjacent sibling selector (+) selects an element immediately following another element.
  • General sibling selector (~) selects all elements that share the same parent and come after a specific element.

To use these classes effectively, you should also understand how child selectors work.

A problem with using position selectors presents itself on a website where the page layout changes frequently. Use pseudo-selectors with extra caution when working with such sites.

<ul>
<li>Laptop</li>
<li>Phone</li>
<li>Headphones</li>
<li>Monitor</li>
</ul>
  • Scrapy (Python)
# Get the first item in the list
first_item = response.css('ul li:first-child::text').get()
# 'Laptop'
# Get all odd items in the list
odd_items = response.css('ul li:nth-child(odd)::text').getall() # the nth child selector or nth-of-type pseudo-selector
# ['Laptop', 'Headphones']
  • Selenium (JavaScript)
// First item in the list
let first_item = await driver.findElement(By.css('ul li:first-child')).getText();
// 'Laptop'
// All odd items in the list
let odd_items = await Promise.all(
(await driver.findElements(By.css('ul li:nth-child(odd)'))).map(el => el.getText()) // the nth element
);
// ['Laptop', 'Headphones']

For more robust targets, you can combine pseudo-classes, like the nth child pseudo-selector, with classes or tags. This approach enables you to target elements based on both their position and other attributes, keeping your scraping logic organized and efficient.

Note: While these classes target elements based on their state or position, pseudo-elements let you target portions of an element, such as the first letter or content inserted before or after the element.

Ready to get started?
Register now

Combining & Nesting Selectors

When scraping real-world websites, you'll often need to be much more specific than grabbing IDs or classes. That's where combining and nesting selectors come in.

Instead of targeting a single element type, you can join selectors to get the exact data you want from the site and even refer to the current element with the & symbol in some selector contexts to apply styles or filters relative to it.

For instance, you might want only the product title inside a .product container, or the price that belongs to the same product. This approach makes your scraping projects much cleaner and avoids gathering unrelated elements from other parts of the page.

<div class="product">
<h2 class="title">Laptop</h2>
<span class="price">$1200</span>
</div>
<div class="product">
<h2 class="title">Phone</h2>
<span class="price">$800</span>
</div>
<div class="product">
<h2 class="title">Headphones</h2>
<span class="price">$150</span>
</div>
  • Scrapy (Python)
# Get all product titles nested inside .product
titles = response.css('.product .title::text').getall()
# ['Laptop', 'Phone', 'Headphones']
# Get all prices nested inside .product
prices = response.css('.product .price::text').getall()
# ['$1200', '$800', '$150']
  • Selenium (JavaScript)
// Get all product titles
let titles = await Promise.all(
(await driver.findElements(By.css('.product .title'))).map(el => el.getText())
);
// ['Laptop', 'Phone', 'Headphones']
// Get all product prices
let prices = await Promise.all(
(await driver.findElements(By.css('.product .price'))).map(el => el.getText())
);
// ['$1200', '$800', '$150']

Notice how .product .title means the scraper finds any element with the class title that lives inside a parent element or tag with the class product. This is called a descendant selector. By combining selectors, you can also reference the current element (& in some frameworks) to apply further filtering or extraction relative to that element, keeping your scraping projects precise and organized.

This same logic applies to .product .price. By combining and nesting selectors, you can keep your scraping logic clean and straightforward and get the exact result you need.

CSS Selectors in Scrapy and Selenium

When it comes to scraping, there are two very popular tools you'll likely come across - Scrapy and Selenium. Both of these tools use CSS selectors from a web page. However, they work in a slightly different way from each other.

Scrapy is built for speed. It first downloads the raw HTML of a page and lets you query it directly with CSS selectors. This makes it extremely fast. A significant caveat is that it doesn't execute the JavaScript on the page. In other words, if the site loads content dynamically by executing JavaScript code, Scrapy won't see the dynamic data.

Selenium, on the other hand, automates a real browser, such as Chrome or Firefox. It actually executes JavaScript, can click buttons, and scroll the page, just like a human would. The downside to this, compared to Scrapy, is that it is relatively slower.

While CSS selectors cover most use cases, sometimes they are insufficient. Some cases where CSS selectors might be inadequate include the following scenarios:

  • You need to navigate upwards in the DOM - CSS is not capable of this.
  • You need very complex conditions to pick the correct elements.
  • You are dealing with wrongly or improperly structured HTML.

That is where XPath comes in. XPath is much more powerful, but it is also more verbose.

XPath is short for XML Path Language. It is a query language, initially built for navigating XML documents, but it works for HTML documents. Unlike CSS selectors, XPath gives a much richer toolset. This allows you to move in any direction in the DOM, filter by conditions, and even match elements by text context.

CSS selectors only move downwards. XPath can climb back up the DOM tree. For instance, if you find a price node but need the product title in the parent element, XPath can accomplish this in a single expression.

<div class="product">
<h2 class="title">Laptop</h2>
<span class="price">$1200</span>
</div>
  • Scrapy (Python)
response.xpath('//span[@class="price"]/../h2[@class="title"]/text()').get()
  • Selenium (JavaScript)
let title = await driver.findElement(By.xpath("//span[@class='price']/../h2[@class='title']")).getText();

XPath can apply logic, such as filters or comparisons, while scraping. In the example below, we remove the “$”, convert to a number, and select products under $500.

  • Scrapy (Python)
# Titles of products priced under $500
response.xpath('//div[@class="product"][number(translate(.//span[@class="price"]/text(), "$", "")) < 500]//h2[@class="title"]/text()').getall()
  • Selenium (JavaScript)
// Titles of products under $500
let cheapTitles = await Promise.all(
(await driver.findElements(By.xpath("//div[@class='product'][number(translate(.//span[@class='price']/text(), '$', '')) < 500]//h2[@class='title']")))
.map(el => el.getText())
);

Many sites have an inconsistent structure. XPath’s // descendant search and normalize-space() can help you pull text even when tags are messy.

  • Scrapy (Python)
# Get all text inside a product block
response.xpath('//div[contains(@class, "product")]//text()').getall()
# or get a normalized title attribute value
response.xpath('normalize-space(//h2[@class="title"])').get()
  • Selenium (JavaScript)
// Selenium: grab the whole product block text
let productTexts = await Promise.all(
(await driver.findElements(By.xpath("//div[contains(@class,'product')]"))).map(el => el.getText())
);

Another powerful feature of XPath is that it allows you to select elements by text, a capability that CSS selectors cannot provide. For instance, if you need the exact link that says “About Us”, XPath can match by the inner text.

  • Scrapy (Python)
# Exact text match
response.xpath('//a[text()="About Us"]/@href').get()
# or a tolerant contains/normalize match
response.xpath('//a[contains(normalize-space(.), "About Us")]/@href').get()
  • Selenium (JavaScript)
let aboutHref = await driver.findElement(By.xpath("//a[text()='About Us']")).getAttribute('href');

You can think of CSS selectors as a sharp knife. They are fast and efficient for most jobs. XPath, on the other hand, is like a Swiss Army knife. It's bulkier, but it provides all the additional tools you need for scraping.

Debugging Selectors in Browser DevTools

One of the most valuable tools for web scraping is the browser DevTools. It comes built into Chrome, Firefox, and Edge. It is the best environment to test and debug your selectors before you use them in Scrapy or Selenium.

To open the DevTools, right-click on something on the webpage and select 'Inspect.' It opens up the HTML tree in the Element panel. Alternatively, you can press Ctrl + Shift + I.

Navigating across various tags will highlight where their locations are on the page, allowing you to see just precisely what you are targeting. You can try your selectors in DevTools itself. For CSS selectors, open the Console tab and type:

document.querySelectorAll('classname')

This will instantly display all elements that match the classname selector. If you see the correct elements getting highlighted, that's evidence that your selector works.

You can even do the same for XPath with:

$x('//a[text()="About Us"]')

Here is another shortcut: in the 'Elements' panel, press Ctrl + F (or Cmd + F on Mac). A small search box appears at the bottom. Paste CSS selectors or an XPath, and DevTools will highlight corresponding elements directly in the HTML.

For example, paste the classname, say .product or .price or

//div[@class="product"]//span[@class="price"]

Both will highlight all price tags inside product blocks.

This is a time-saving step because instead of writing code, running it, and debugging later on, you can check your selector logic right in the browser. Once it works in DevTools, you can safely copy and paste it into your scraping script.

In summary, always test selectors first in DevTools. It is time-saving, reduces errors, and simplifies your scraping process a great deal.

Conclusion

CSS selectors are a fundamental tool of web scraping, so it’s essential to understand how they work fully.

When you grasp the art of using CSS selectors for performance, XPath for some flexibility, and DevTools for troubleshooting, the process of web scraping becomes less like trial and error. It becomes more like piecing together a puzzle with the proper tool sets at your disposal.

Create Account
Share on
Article by IPRoyal
Meet our writers
Data News in Your Inbox

No spam whatsoever, just pure data gathering news, trending topics and useful links. Unsubscribe anytime.

No spam. Unsubscribe anytime.

Related articles