How to Scrape Walmart Product Data Using Python: A Step-by-Step Guide

Marijus Narbutas

Last updated - July 17, 2025

Key Takeaways

Use proper headers and respect robots.txt to avoid anti-bot measures when you scrape Walmart product data.
Requests, BeautifulSoup, and the JSON library make it easy to pull and save all the JSON data from Walmart product pages.
Exporting to a CSV file or JSON file gives you flexible options for using the Walmart data you collected.

What is Web Scraping and Why It Matters

Web scraping means writing code to extract data from a web page automatically. It retrieves useful product data like price data and review data. Common use cases include price tracking, inventory analysis, and spotting market trends.

For example, you might want to scrape Walmart product pages to monitor Walmart prices daily. Or, if you’re building an external Walmart database, you’ll also need to commit to a massive Walmart search pages web scraping project.

Either way, web scraping Walmart can be a daunting project, and your scraping expertise will be the decisive factor in how well you do it.

Is It Legal to Scrape Walmart?

Yes, with some caveats. Data must not be personal, not require a login, not be subject to copyright, and be publicly available. Luckily, Walmart product information is publicly available data. However, this should still not be taken as legal advice - always consult a legal professional before engaging in any type of web scraping.

First, you must check robots.txt on Walmart’s website. Use a proper user-agent header and follow the site rules to avoid anti-bot measures.

Take pauses and avoid overloading the server to prevent triggering anti-botting measures. Make sure to set up your Walmart scraper to respect Walmart’s rate limits and consider using proxy services to minimize the chances of getting blocked while scraping Walmart’s search pages.

Specifically for this case, you may want to get a Walmart proxy to help you scrape data from product pages, Walmart product reviews, Walmart search results, and more.

Tools You Need to Get Started

Before you scrape Walmart prices, Walmart product information, or other desired data points, you need a few tools in your Python environment. These help send requests, pull HTML content, and read JSON data inside script tag blocks on a product page.

Here are the main Python libraries you need:

requests sends HTTP requests to Walmart’s website.
beautifulsoup parses HTML content and finds tags like ‘span tag’ or ‘script tag’.
selenium is useful if Walmart hides product details behind JavaScript.
json is included with Python, used to parse JSON format and extract product data.

Install them using pip:

pip install requests beautifulsoup4 selenium

Next, open Chrome or Firefox. Right-click on any Walmart product page and choose ‘Inspect’. You’ll use these developer tools to locate where the product ID and review data are hidden.

Look in the script tag sections for all the Walmart product info. This helps you understand what to extract and what's useful for Walmart web scraping. With this information, you’ll be ready for smooth data extraction.

Finding Walmart Product URLs and SKUs

Every Walmart product page has a unique ID in the URL or page’s script tag. SKUs show up near span tag elements labeled ‘SKU’.

Visit https://www.walmart.com/ip/ plus the product ID. View HTML content via developer tools and search for the script tag that contains all the JSON data for product details.

Writing Your First Python Scraper

Now, let’s see how you can build a working Python script to scrape data from Walmart. This script pulls product details like name, price, and review data from a Walmart product page. We’ll break it into parts so it’s easier to follow.

Step 1: Import Libraries and Set Up

import requests
from bs4 import BeautifulSoup
import json

These imports give you access to the requests library for making HTTP requests, BeautifulSoup for parsing HTML content, and the JSON library to load JSON format.

Step 2: Set URL and User-Agent

url = "https://www.walmart.com/ip/123456789"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}

Use a real user-agent header to prevent triggering anti-bot measures when scraping data from web pages. Set a Walmart product page URL that includes a real product ID.

Step 3: Send GET Request

response = requests.get(url, headers=headers)
print(response.status_code)

This part sends a GET request. If the status code is 200, you got the page to load successfully. If not, something’s wrong. It could be an IP block or a bad header.

Additionally, check the response text as well. You may be getting a CAPTCHA challenge, which returns a successful HTTP code (200), but won’t contain the required data - you can resolve this issue by adding print(response.text).

Step 4: Parse HTML and Find JSON

soup = BeautifulSoup(response.text, "html.parser")
script = soup.find("script", {"type": "application/ld+json"})

We search for a script tag that holds the JSON data. This tag has the product data we want, clean and structured.

Step 5: Load and Extract Product Info

data = json.loads(script.string)
print("Name:", data["name"])
print("Price:", data["offers"]["price"])
print("Rating:", data["aggregateRating"]["ratingValue"])

We use the JSON library to read the JSON format from the tag. Then, we grab and print key product details. This is basic data parsing, which is essential for Walmart scraping.

You may need a more complex setup as Walmart is quite satisfied with its anti-botting systems. Using a browser automation library should work better if you get various challenges:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json
import time
from typing import Dict, Optional


def scrape_walmart_product_debug(product_url: str, headless: bool = False) -> Optional[Dict]:
    """
    Enhanced Walmart scraper with debugging and fallback methods.

    Args:
        product_url: Full URL to the Walmart product
        headless: Whether to run Chrome in headless mode (set to False for debugging)

    Returns:
        Dictionary with product data or None if failed
    """
    chrome_opts = Options()
    if headless:
        chrome_opts.add_argument("--headless=new")

    # Essential Chrome options
    chrome_opts.add_argument("--disable-gpu")
    chrome_opts.add_argument("--no-sandbox")
    chrome_opts.add_argument("--disable-dev-shm-usage")
    chrome_opts.add_argument("--disable-blink-features=AutomationControlled")
    chrome_opts.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_opts.add_experimental_option('useAutomationExtension', False)

    # More realistic user agent
    chrome_opts.add_argument(
        "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    )

    driver = webdriver.Chrome(service=Service(), options=chrome_opts)

    # Execute script to hide webdriver property
    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")

    try:
        print(f"Loading URL: {product_url}")
        driver.get(product_url)

        # Wait a bit for page to load
        time.sleep(3)

        # Check if we got blocked or redirected
        current_url = driver.current_url
        print(f"Current URL: {current_url}")

        if "blocked" in current_url.lower() or "captcha" in current_url.lower():
            print("Detected blocking or captcha page")
            return None

        # Method 1: Try JSON-LD scripts
        print("\n=== Method 1: Searching for JSON-LD scripts ===")
        try:
            script_elements = driver.find_elements(By.CSS_SELECTOR, 'script[type="application/ld+json"]')
            print(f"Found {len(script_elements)} JSON-LD script elements")

            for i, script_el in enumerate(script_elements):
                try:
                    raw_json = script_el.get_attribute("innerHTML")
                    print(f"Script {i + 1} content preview: {raw_json[:200]}...")

                    data = json.loads(raw_json)

                    # Check if this script contains product data
                    if isinstance(data, dict) and any(key in data for key in ["name", "@type"]):
                        if data.get("@type") == "Product" or "name" in data:
                            print(f"Found potential product data in script {i + 1}")
                            return extract_product_data(data)

                    elif isinstance(data, list):
                        for item in data:
                            if isinstance(item, dict) and (item.get("@type") == "Product" or "name" in item):
                                print(f"Found product data in script {i + 1} (array format)")
                                return extract_product_data(item)

                except json.JSONDecodeError as e:
                    print(f"JSON decode error in script {i + 1}: {e}")
                    continue
                except Exception as e:
                    print(f"Error processing script {i + 1}: {e}")
                    continue

        except Exception as e:
            print(f"Error finding JSON-LD scripts: {e}")

        # Method 2: Try CSS selectors as fallback
        print("\n=== Method 2: Trying CSS selectors ===")
        result = {}

        # Try to find product name
        name_selectors = [
            'h1[data-automation="product-title"]',
            'h1[id*="main-title"]',
            'h1.prod-ProductTitle',
            'h1',
            '[data-testid="product-title"]'
        ]

        for selector in name_selectors:
            try:
                name_element = driver.find_element(By.CSS_SELECTOR, selector)
                result["name"] = name_element.text.strip()
                print(f"Found name using selector '{selector}': {result['name']}")
                break
            except:
                continue

        # Try to find price
        price_selectors = [
            '[data-testid="price-current"]',
            '.price-current',
            '[itemprop="price"]',
            '.price .visuallyhidden',
            'span[data-automation="price"]',
            '.price-group .price-characteristic'
        ]

        for selector in price_selectors:
            try:
                price_element = driver.find_element(By.CSS_SELECTOR, selector)
                price_text = price_element.text.strip()
                if price_text and '$' in price_text:
                    result["price"] = price_text
                    print(f"Found price using selector '{selector}': {result['price']}")
                    break
            except:
                continue

        # Try to find rating
        rating_selectors = [
            '[data-testid="reviews-section"] [data-testid="average-rating"]',
            '.average-rating .rating-number',
            '[data-automation="reviews-rating"]',
            '.stars-reviews-count-node .f6'
        ]

        for selector in rating_selectors:
            try:
                rating_element = driver.find_element(By.CSS_SELECTOR, selector)
                rating_text = rating_element.text.strip()
                if rating_text:
                    result["rating"] = rating_text
                    print(f"Found rating using selector '{selector}': {result['rating']}")
                    break
            except:
                continue

        if result:
            print(f"\n=== Final result from CSS selectors ===")
            return {
                "name": result.get("name", "N/A"),
                "price": result.get("price", "N/A"),
                "rating": result.get("rating", "N/A"),
                "review_count": "N/A"
            }

        print("\n=== Debug: Page source preview ===")
        page_source = driver.page_source
        print(f"Page source length: {len(page_source)}")
        print("First 500 characters:")
        print(page_source[:500])

        return None

    except Exception as e:
        print(f"Error during scraping: {e}")
        return None

    finally:
        driver.quit()


def extract_product_data(data: dict) -> dict:
    """Extract product information from JSON-LD data"""
    result = {
        "name": "N/A",
        "price": "N/A",
        "rating": "N/A",
        "review_count": "N/A"
    }

    # Extract name
    result["name"] = data.get("name", "N/A")

    # Extract price
    offers = data.get("offers", {})
    if isinstance(offers, list) and offers:
        offers = offers[0]
    if isinstance(offers, dict):
        price = offers.get("price") or offers.get("lowPrice") or offers.get("highPrice")
        if price:
            result["price"] = f"${price}" if not str(price).startswith('$') else str(price)

    # Extract rating
    rating_data = data.get("aggregateRating", {})
    if rating_data:
        rating_value = rating_data.get("ratingValue")
        review_count = rating_data.get("reviewCount")
        if rating_value:
            result["rating"] = str(rating_value)
        if review_count:
            result["review_count"] = str(review_count)

    return result


def print_product_info(product_data: Dict) -> None:
    """Pretty print product information"""
    if not product_data:
        print("No product data available")
        return

    print(f"\n=== PRODUCT INFORMATION ===")
    print(f"Name:         {product_data['name']}")
    print(f"Price:        {product_data['price']}")
    print(f"Rating:       {product_data['rating']}")
    print(f"Review Count: {product_data['review_count']}")


if __name__ == "__main__":
    # Test with the provided URL
    url = "https://www.walmart.com/ip/onn-20-LED-Soundbar-with-2-Internal-Speakers/226407466"

    print("Starting Walmart product scraper with debugging...")
    print("Running in non-headless mode for debugging (you'll see the browser)")

    product_info = scrape_walmart_product_debug(url, headless=False)
    print_product_info(product_info)

Saving the Data

Exporting scraped product data to a file helps with data collection and analysis.

To save the data to a CSV file:

import csv
with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["product id","name","price"])
    writer.writerow([pid, name, price])

To save the data to JSON:

with open("output.json", "w") as f:
    json.dump(data, f)

This saves the complete JSON data to a JSON file. Use it for later analysis, especially if you’re building a Walmart Scraper API or want to export data from several Walmart product pages.

Now you’ve got a complete code example for a basic method of web scraping Walmart. It’s a solid first step to scrape Walmart product pages, gather valuable data, and analyze it your way.

Can You Make Money Scraping Walmart Data?

Yes, if you scrape Walmart data, you can:

Do freelance gigs to track price changes or reviews on Walmart pages.
Build a Walmart Scraper API or deliver Walmart web scraping services as SaaS.
Work in data research jobs analyzing Walmart data or market trends.

There are several ways you could monetize scraping Walmart data. Just make sure you keep things ethical, legal, and clean.

Conclusion

You now know how to scrape Walmart, from selecting a Walmart product page to writing a Python script, data parsing, and exporting to a CSV file or JSON file.

The web scraping process is not the easiest, but it is manageable with the right solutions: setting a real user-agent header, using proxies, and managing rate limits.

With this knowledge of Walmart product data collection, you can build a Walmart scraper, avoid anti-scraping measures, and unlock all the data from Walmart’s website you need for solo projects or business ventures.

If you’re interested in learning more about web scraping, check out how to use a no-code web scraper to extract data without writing much code.

Create Account

Author

Marijus Narbutas

Senior Software Engineer

With more than seven years of experience, Marijus has contributed to developing systems in various industries, including healthcare, finance, and logistics. As a backend programmer who specializes in PHP and MySQL, Marijus develops and maintains server-side applications and databases, ensuring our website works smoothly and securely, providing a seamless experience for our clients. In his free time, he enjoys gaming on his PS5 and stays active with sports like tricking, running, and weight lifting.

Learn More About Marijus Narbutas

Share on