50% OFF Residential Proxies for 9 months — use code IPR50 at checkout

Get The Deal
Back to blog

How to Scrape Zillow: A Step-by-Step Guide (Using Python)

Vilius Dumcius

Last updated -
How to

Key Takeaways

  • Zillow data is valuable, but some of it is protected. Always read the site’s rules.

  • You can scrape Zillow using Python, but web scraping often gives errors and blocks. Prepare for them.

  • For big web scraping projects, use proxies or tools like Selenium.

Ready to get started?

Register now

Do you think Zillow data collection would be invaluable for your project? You probably thought about web scraping Zillow at some point so you don’t have to manually collect all the real estate data.

You’re in luck, because in this guide we will show you step-by-step how to extract real estate data straight from Zillow’s site using Python.

You’ll learn how to scrape Zillow real estate listings, deal with multiple pages, avoid errors, and stay on the safer side of the law. By the end, you’ll have your own working Zillow scraper that grabs the information that you want, such as prices, addresses, and pictures from Zillow properties.

Please keep in mind that this data scraping article is for educational purposes only. You’re 100% responsible for how you use the code and approach web scraping.

Before we jump into coding, it’s important to look at the rules. Zillow’s Terms of Service says that you can’t use bots or scrape their real estate information without permission. That includes automated tools to access or collect Zillow data.

However, if you haven’t registered or logged in, you may technically claim that you haven’t accepted those rules. The key here is knowing how to differentiate between public data and private data.

Even though you can view Zillow properties freely, that doesn’t mean you can scrape all of their real estate info with automated tools like Zillow proxies . Some of the property data is protected, especially if it’s behind a login, since you then explicitly agree to their Terms of Service. Public and non-personal data is safe to collect. Private and/or personal content is off-limits.

Remember, this guide is for learning only. You shouldn’t use it on live websites unless you have permission or you’re not breaking Terms of Service.

Python Code: Scraping Zillow Listings

Let’s start with a simple Python code that shows how to scrape Zillow listings using requests, BeautifulSoup, and pandas. Before trying this code on Zillow, remember that it may violate their rules. Also, the site actively uses JavaScript and bot protection.

Zillow’s data is embedded in JSON inside <script> tags, so this Zillow scraper example shows how to extract and parse that JSON:

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
    "Accept-Language": "en-US,en;q=0.9"
}
url = "https://www.zillow.com/homes/for_sale/New-York,-NY_rb/"

resp = requests.get(url, headers=headers)
resp.raise_for_status()

soup = BeautifulSoup(resp.text, "html.parser")
next_data_tag = soup.find("script", {"id": "__NEXT_DATA__"})
if not next_data_tag:
    raise RuntimeError("Could not find the __NEXT_DATA__ script block.")

payload = json.loads(next_data_tag.string)
listings = (
    payload
    .get("props", {})
    .get("pageProps", {})
    .get("searchPageState", {})
    .get("listResults", [])
)

results = []
for item in listings:
    results.append({
        "Title":   item.get("statusText", "N/A"),
        "Price":   item.get("price",      "N/A"),
        "Address": item.get("address",    "N/A"),
        "Beds":    item.get("beds",       "N/A"),
        "Baths":   item.get("baths",      "N/A"),
        "Image":   item.get("imgSrc",     "N/A"),
        "URL":     item.get("detailUrl",  "N/A"),
    })

df = pd.DataFrame(results)
print(df.head())

For web scraping to work, you may need to rotate proxies or handle bot detection (CAPTCHA, JavaScript), since Zillow is likely to block the requests library. You’ll get an error message like

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.zillow.com/homes/for_sale/New-York,-NY_rb/.

If you’re into high-scale Zillow web scraping, you may want to use tools like Selenium, Playwright, and Zillow API to extract real estate data efficiently.

Scraping Multiple Pages (Pagination)

Zillow uses URL parameters for page numbers. So, to scrape Zillow across several pages, you just change the p= value. However, Zillow uses dynamic content rendering and has anti-bot protections in place to prevent data scraping.

The following code may work on static HTML versions, but for full access to real estate data, you should consider tools like Selenium. Here’s how it works:

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
import time

headers = {
    "User-Agent": "Mozilla/5.0",
    "Accept-Language": "en-US,en;q=0.9"
}

base_url = "https://www.zillow.com/homes/for_sale/New-York,-NY_rb/{page}_p/"

all_results = []

for page in range(1, 6):
    url = base_url.format(page=page)
    print(f"Scraping page {page}...")

    response = requests.get(url, headers=headers)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "html.parser")

    try:
        for script in soup.find_all("script", {"type": "application/json"}):
            if "cat1" in script.text:
                data = json.loads(script.contents[0])
                listings = data["props"]["pageProps"]["searchPageState"]["cat1"]["searchResults"]["listResults"]

                for item in listings:
                    all_results.append({
                        "Title": item.get("statusText", "N/A"),
                        "Price": item.get("price", "N/A"),
                        "Address": item.get("address", "N/A"),
                        "Beds": item.get("beds", "N/A"),
                        "Baths": item.get("baths", "N/A"),
                        "Image URL": item.get("imgSrc", "N/A"),
                        "Listing URL": item.get("detailUrl", "N/A")
                    })
                break
    except Exception as e:
        print(f"Error on page {page}: {e}")

    time.sleep(2)

df = pd.DataFrame(all_results)
print(df.head())

This code allows your Zillow scraper to crawl through multiple Zillow real estate listings. If you’re building an app or site that aggregates prices from different platforms, you can check out our guide on aggregator websites .

Troubleshooting Common Issues

If you scrape Zillow property data often, you’re bound to hit some walls at some points. Here’s how to fix some common issues:

  • 403 errors. It means that Zillow has blocked your bot. Try changing your headers or using a proxy to continue scraping Zillow data.
  • Empty responses. That could be due to JavaScript rendering. Try using tools like Selenium to scrape Zillow data effectively.
  • Missing data. Not all listings have the same information. Always use .get() when pulling Zillow data from the website..

Remember, web scraping Zillow isn’t perfect and you will face issues. And since Zillow doesn’t offer a free Zillow API, you’ll have to find your way to work around it. If you need full access, consider applying for the Zillow Search API, which could unfortunately take a long time.

Conclusion

Now you know how to scrape Zillow data, how to handle multiple real estate data pages, how to fix common errors and what tools to use for large-scale Zillow real estate data scraping. Just make sure that you do that ethically and respect the website’s Terms of Service.

With tools like Python, you can collect Zillow real estate data for analysis, investment, or learning. And as long as you stay smart and respectful, you can scrape Zillow data with fewer interruptions.

Create Account

Author

Vilius Dumcius

Product Owner

With six years of programming experience, Vilius specializes in full-stack web development with PHP (Laravel), MySQL, Docker, Vue.js, and Typescript. Managing a skilled team at IPRoyal for years, he excels in overseeing diverse web projects and custom solutions. Vilius plays a critical role in managing proxy-related tasks for the company, serving as the lead programmer involved in every aspect of the business. Outside of his professional duties, Vilius channels his passion for personal and professional growth, balancing his tech expertise with a commitment to continuous improvement.

Learn More About Vilius Dumcius
Share on

Related articles