How to Use Scrapy for Web Scraping: A Step-by-Step Tutorial


Nerijus Kriaučiūnas
Key Takeaways
-
Scrapy is best for large-scale web scraping projects.
-
Use CSS selectors in your Scrapy spiders to extract the data you need.
-
If Scrapy can’t handle JavaScript, use Selenium or Splash.
In This Article
Scrapy is a fast and powerful web scraping framework made with Python that you can use to launch your own web scraping project. It’s good at web crawling and collecting lots of data, and it works with different types of websites.
Scrapy web scraping works well for structured websites and can handle big loads easily. A Scrapy spider goes to web pages, grabs data, and brings it back to you.
With this framework, you can extract structured data, such as names, prices, images, and more. You can also do all that automatically, eliminating the need for manual copying.
You can even run many spiders at once and schedule them. It’s one of the best frameworks for web scraping tasks.
Is Scrapy Better Than BeautifulSoup or Selenium for Web Scraping?
Here’s how Scrapy compares to some other solutions:
Scrapy | BeautifulSoup | Selenium | |
---|---|---|---|
Speed | Very fast | Slow | Slow |
JavaScript support | No | No | Yes |
Most useful for | Large-scale web scraping | Simple HTML pages | JavaScript-heavy sites |
Learning curve | Medium | Easy | Hard |
Automation | High | Low | High |
To sum up, you should use Scrapy when you need to perform web crawling or collect large amounts of data from multiple pages. If you’re going for a single page and it’s a simple, static HTML page, consider using BeautifulSoup, a great tool for small web scraping tasks.
However, if the website is filled with dynamic JavaScript content, only Selenium out of the three will be able to handle it properly. If you need more information, here’s a detailed article about the differences between Scrapy, BeautifulSoup, and Selenium .
Steps to Creating Scrapy Projects
Setting up your Scrapy project is the first step to successful web scraping. Scrapy works best with Python 3.7 to 3.11 . Make sure your environment is set up accordingly, and follow this Scrapy tutorial to build your scraper. Make sure you have an IDE, such as PyCharm or Visual Studio Code, at the ready.
First, install Scrapy using the terminal:
pip install scrapy
To verify the installation, use this:
scrapy version
You should see the Scrapy version printed if everything is correctly installed. Bonus tip: it’s best to use a virtual environment venv for clean dependencies. Some IDEs will do that automatically, though!
After you’ve finished installing Scrapy, navigate to the folder where you want your Scrapy project to live and run it:
scrapy startproject my_scraper
It will generate a folder structured like this:
my_scraper/
├── my_scraper/
│ ├── __init__.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders/
├── scrapy.cfg
You’ll be working mainly in the spiders/ directory. Create your first spider here using this:
cd my_scraper
scrapy genspider demo iproyal.com
This creates a demo.py file where your scraping logic will go.
Writing and Running Your Scrapy Spiders
Once the structure is ready, let’s write and run a working spider. This is where most of the web scraping logic happens. Here’s a simple spider that scrapes key feature blocks from IPRoyal’s residential proxy page:
import scrapy
class ResidentialProxiesSpider(scrapy.Spider):
name = "residential_proxies"
start_urls = ['https://iproyal.com/residential-proxies/']
def parse(self, response):
features = response.css('section div.flex.flex-col')
for feature in features:
title = feature.css('h2::text, h3::text').get()
description = feature.css('p::text').get()
if title and description:
yield {
'feature_title': title.strip(),
'feature_description': description.strip(),
}
for li in response.css('li::text'):
text = li.get().strip()
if text:
yield {
'bullet_point': text
}
Now, save this as residential_proxies.py inside your spiders/ folder. You can run the spider with this code:
scrapy crawl residential_proxies
If you want to export your data easily into JSON or CSV, use one of these code snippets respectively:
scrapy crawl residential_proxies -o iproyal_features.json
scrapy crawl residential_proxies -o iproyal_features.csv
You can also try using a shell to test CSS selectors and prep your web scraping code before final runs:
scrapy shell 'https://iproyal.com/residential-proxies/'
It opens a live terminal Scrapy shell where you can perform some testing. If you’d like to learn more, we have a complete guide on web scraping with Python that will help you out.
Before proceeding with any type of scraping, make sure you respect robots.txt. You can check them by adding /robots.txt to any website’s domain. For example, ours would be https://iproyal.com/robots.txt .
Can Scrapy Scrape JavaScript Websites?
No, for heavy JavaScript sites, web scraping becomes trickier. The Scrapy engine doesn’t run JavaScript. If a website loads data using JS, you won’t see the content in your spider.
But there are ways around it.You can either add a headless browser, such as Splash, or switch to Selenium, which is great for handling JavaScript content.
This Python web scraping tool is more advanced, more complex, and way slower, but it works. However, for most normal websites, you don’t really need it.
Common Problems and How to Solve Them
Scrapy might not work if:
- You have the wrong Python version.
- You didn’t install Scrapy the right way.
- You used a bad CSS selector.
- The site blocks bots.
If you encounter a blocking issue and your Scrapy spiders can’t retrieve information, try using a Scrapy proxy and user-agents. They will help you avoid bans and blocks by rotating IPs and making the website believe you’re a genuine user.
If your web scraping efforts fail for any other reason, be sure to read the logs carefully. They will most likely show what went wrong.
Remember that Scrapy also has its limits. It’s not the best option if you’re scraping pages that load content slowly or websites that are filled with dynamic JavaScript functions. For those, you might want to consider Selenium.
Conclusion
After reading this Scrapy tutorial, you now know that Scrapy makes web scraping faster and cleaner. It’s great for big projects and repetitive tasks. Even though it has limitations and doesn’t run JavaScript, you can still work your way around that with other tools.
Scrapy web scraping tools let you scale your data extraction without writing too much code. Not only is this web scraping framework great for large-scale scraping, it’s also efficient at web crawling if that’s what you need.
Creating a Scrapy project is easy if you follow this Scrapy tutorial. If necessary, go back through the steps again and launch your first Scrapy spider out into the digital world.

Author
Nerijus Kriaučiūnas
Head of DevOps
With a strong background in system administration, Nerijus has honed his expertise in web hosting and infrastructure management through roles at various companies. As the Head of DevOps at IPRoyal, he oversees product administration while playing a key role in managing residential and ISP proxies. His vast technical expertise ensures streamlined operations across all IPRoyal’s services. When he’s not focused on work, Nerijus enjoys cycling, playing basketball, and hitting the slopes for a ski session.
Learn More About Nerijus Kriaučiūnas