How to Scrape Yellow Pages Without Getting Blocked (2025)

Build a reliable Yellow Pages scraper with undetected-chromedriver, residential proxies, and other tools for adaptive business data scraping.

Justas Vitaitis

Last updated - October 28, 2025 ‐ 39 min read

Websites

Key Takeaways

Scraping Yellow Pages is not inherently illegal if you stay within public listings and use the data responsibly.
To scrape Yellow Pages reliably, you need rotating residential proxies, spoofed browser identities, and human-like behavior profiles.
Company names, addresses, and phone numbers may need cleaning due to inconsistent formatting.

Scraping Yellow Pages is never a plug-and-play process, but setting one up is worth it. A reliable Yellow Pages data scraper provides access to business names, phone numbers, addresses, and other business details that fuel lead generation, outreach, or competitive tracking.

This article shows you how to scrape Yellow Pages data in 2025 without getting blocked and what to prepare before writing a single line of code.

Is It Legal and Worth It to Scrape Yellow Pages?

The truth is, scraping always resides in a gray area, but a case called hiQ Labs v. LinkedIn provided us with a solid foundation to work with. At one point, the courts pushed back against LinkedIn’s aggressive anti-scraping stance, suggesting that collecting data from public profiles isn’t inherently unlawful.

The outcome didn’t fully settle the debate, but it confirmed something important: just because your scraper is automated doesn’t mean it’s a hacker. What matters most is what you access and whether you’re honoring reasonable boundaries.

Most websites, including Yellow Pages, do everything they can to block you. Still, for the proper use case, extracting Yellow Pages business listing data is worthwhile. Especially if what you need is raw, structured, local business information at scale.

Think about what this directory holds: business names, telephone numbers, addresses, categories, and hours. It’s the online version of the traditional printed phone book. You can:

Generate leads for SEO or cold outreach: Lead scraping on this website gives you a fresh pool of geo-targeted phone numbers and business details for lead campaigns.
Track competitor listings: Yellow Pages data gives you a clear view of how your competitors show up in the directory.

But, as with everything these days, it takes some effort.

Challenges When Scraping Yellow Pages

Say you go ahead and write a simple script to scrape Yellow Pages business data. Chances are, it’s going to run into session tracking, aggressive bot detection, rotating JavaScript layouts, and hard blocks. If your scraper doesn’t account for all that, you won’t even get the chance to extract business details, let alone build anything useful.

Rate limits and bans

You won’t get a neat little pop-up saying you’ve been banned for trying to scrape Yellow Pages. That would be way too easy. What you’re more likely to hit is a soft ban.

One minute, your web scraper is pulling business names and telephone numbers just fine. Next, you’re getting redirected to splash pages, the HTML starts shifting for no reason, or you’re served a vague “something went wrong” message that doesn’t go away.

CAPTCHAs

Sometimes, the Yellow Pages just comes out and says it: prove you’re not a bot. That usually happens when you’re scraping too fast, your requests are too neat and linear, or you’re recycling sessions or cookies recklessly.

It can also kick in if your scraper keeps revisiting pages like it’s stuck in a loop. It’s their polite way of calling you out, but let’s be honest, they know you’re not human.

Data consistency and cleaning

If you do manage to get past the blocks, the next thing you’ll run into is messy data. You may see small businesses with slightly different names, telephone numbers in different formats, and business addresses split or jumbled, depending on the page.

Sometimes the business information may be duplicated with subtle changes, and sometimes it’s missing key fields entirely.

Now let’s discuss how you’ll actually overcome the challenges we just walked through. First, you get your tools in place. Then we’ll break down the logic that powers a scraper that doesn’t get blocked or trip over inaccurate data.

Residential proxies

Start with a small pool of residential proxies . Make sure the proxy location matches your target. If you’re scraping roofing contractors in Los Angeles, you don’t want your IP address showing up in Nepal.

The Yellow Pages typically display around 50 listings per page. In light scraping, a single proxy might work. But if you encounter rate limits or soft blocks, you’ll need to rotate. As a rule of thumb, running two to five rotating proxies per active page is usually sufficient to stay under the radar.

Undetected-chromedriver

You’re using undetected-chromedriver because it removes the fingerprint giveaways that trigger bot detection on Yellow Pages. It doesn’t make you invisible, but it keeps your scraper from revealing itself before it even loads the first listing.

Python

You’ll need Python installed, preferably the latest version. Most scraping scripts today are written in Python due to its ease of integration with Selenium, proxy rotation, and data processing.

Code editor

Finally, you’ll need a code editor. It doesn’t matter if it’s VS Code, Sublime, or anything else. You’ll use it to write, test, and iterate your scraper until it performs as needed.

What Data Can You Extract From Yellow Pages?

Your scraper can gather the following data when properly configured:

Business name

This is the anchor of your scraping stack. But Yellow Pages listings don’t follow strict naming conventions. You’ll see slight variations in abbreviations, legal suffixes, or typos that make it hard to treat names as unique identifiers. You’ll need to normalize these if you’re building anything off company names.

Business address

You’ll get the full address, but the structure isn’t reliable. Sometimes it’s broken into a street, city, and state. Other times, it’s a single long string with inconsistent separators. Yellow Pages doesn’t enforce a single format, so your scraper needs logic to detect and clean addresses that come jumbled or incomplete.

Telephone number

This is the data you probably need the most. However, the formatting is often inconsistent. Some Yellow Pages business numbers use dashes, others parentheses, and some include country codes, while others do not.

Website

Not every listing has one, but when it’s there, it’s usually in a clean anchor tag. Sometimes you’ll find third-party pages instead of official domains. If your goal is to scrape business websites for deeper crawling, plan for some noise.

Ready to get started?

Step-By-Step Guide to Scraping Yellow Pages

So, here’s what really brought you here: we’re going to build a scraper that collects company names and telephone numbers for roofing contractors in Los Angeles.

Step 1: Install Dependencies

You are going to need just a few tools to get this scraper running:

Python: This is the language you’ll write everything in. Use the latest version (3.9 or higher is recommended) to avoid compatibility issues.
Selenium: Even though we’re using undetected-chromedriver, it’s built on top of Selenium, so this is required
Undetected-chromedriver: This is the stealth layer. It hides navigator.webdriver, patches WebGL, AudioBuffer, and canvas fingerprints to bypass bot detection.

Installing Python

Step 1: Open Command Prompt or Terminal and type:

Python --version

If it returns something like Python 3.11.5, you're good to go.

If not, download the latest version from Python.org . During setup, check the box that says “Add Python to PATH.” That part matters.

Installing Selenium

Still in your Command Prompt or Terminal, write this command:

pip install selenium

It should take a minute or two to download and install.

Installing Undetected-Chromedriver

Run the following command in your terminal:

pip install undetected-chromedriver

It should take a minute or two to download and install.

That’s it, you now have all the prerequisites installed. Now let’s create the Python file.

Step 2: Create and Store Your Yellow Pages Scraper

Let’s keep things simple and store your scraper on the Desktop. In your Command Prompt, type:

cd %USERPROFILE%\Desktop & mkdir yellowpages-scraper & cd yellowpages-scraper

This opens your Desktop folder, creates a subfolder called yellowpages-scraper, and puts you inside it. Let's name our file 'YellowPageExtractor'. Use the following command to create it:

type nul > YellowPageExtractor.py

Now, open the file we just created in your favorite code editor so we can start writing code.

Step 3: Write Your Import Statements

The first step is to import the modules that we need. Here they are:

#Browser Control & Stealth Automation
from selenium.webdriver.common.by import By

from selenium.webdriver.common.keys import Keys

from selenium.webdriver.common.action_chains import ActionChains

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

import undetected_chromedriver as uc

#Proxy, Fingerprint & Spoofing Logic
import time
import random
import os
import sys
import uuid
import logging
import itertools
import re
import base64

# Alerting & Output
import smtplib
from email.message import EmailMessage
from email.mime.text import MIMEText
import csv
import json
import pandas as pd
from datetime import datetime

Browser Control and Stealth Imports

This first group of imports enables your scraper to behave like a real person using a browser. Each one serves a specific role in making that behavior believable.

from selenium.webdriver.common.by import By

This tells the scraper what kind of locator it will use to find elements. Whether it's an ID, class, or CSS selector, this line lays the groundwork for grabbing names and phone numbers from Yellow Pages.

from selenium.webdriver.common.keys import Keys

We will use this to simulate keyboard actions such as typing, pressing Enter, or correcting a typo. This is part of what makes the bot look human.

from selenium.webdriver.common.action_chains import ActionChains

This allows your scraper to move the mouse, hover over elements, or pause briefly before clicking. These tiny actions help it avoid standing out.

from selenium.webdriver.support.ui import WebDriverWait and from selenium.webdriver.support import expected_conditions as EC

Together, these give the bot the ability to wait until something is actually on the page before interacting with it. That means it doesn't click on a search button before the page is ready.

from selenium.webdriver.chrome.options import Options

This brings up the settings menu for how Chrome should behave. You will use it to set the language, screen size, and turn off any features that might trigger detection.

import undetected_chromedriver as uc

This is the core of your stealth setup. It launches a modified version of Chrome that hides most signs of automation. Without it, Yellow Pages would likely block you before the first result even loads.

Anti-bot and Spoofing Utilities

The next set of imports provides your scraper with adaptive intelligence. With these tools, it can behave unpredictably and respond to what it sees in real time.

import time and import random

Grouped because they always work as a pair. Time.sleep() is our delay engine. It creates pauses between actions so the bot doesn’t move at machine speed. Random adds jitter to those delays, making them feel more human and less predictable.

Import os and import sys

These give the scraper control over the environment it's running in. Os handles file-level tasks, such as checking if our output CSV exists. Sys lets us exit the script cleanly if stealth checks fail or scraping hits a dead end.

import uuid

We will use this to generate unique values, such as session IDs, cookie markers, or temp filenames. It keeps each run distinct and harder to fingerprint.

import logging

Replaces messy print() calls with structured, timestamped logs. This lets us track what the scraper is doing, when it fails, and why.

import re

Brings in Python’s regular expression engine. You’ll use it in extract_timezone() to extract the city or state from proxy strings and map them to time zones.

Data Logging and Output

This set of imports powers our output logging system:

import csv

Used to write the scraped company data into a structured CSV file.

from datetime import datetime

Adds a timestamp to each entry so we know exactly when each record was logged.

Step 4: Configure Logging

We already imported logging to track what the scraper is doing. Now we configure it with this line:

logging.basicConfig(level=logging.INFO, format='[%(asctime)s] %(message)s')

This sets up Python’s built-in logging system to only display messages at INFO level or higher. It also formats each log entry with a timestamp, so you can see exactly when each action occurs.

Step 5: Set Up Global Settings

We’re almost done laying the groundwork. This step defines the key configuration variables that control how your scraper behaves during execution. Add the following lines to your script:

# Global Settings
HEADLESS = False  # Set to True later if you spoof it correctly
MAX_RETRIES = 5
BLOCK_BACKOFF_STEPS = [5, 10, 20, 30]  # seconds

HEADLESS

Set to False so you can see Chrome as it runs. Once everything is set up and you’re ready to go stealth, switch it to True to run headlessly in the background.

MAX_RETRIES

Defines the number of times the scraper should retry after encountering a block or error. Right now, it’s set to five attempts before giving up.

BLOCK_BACKOFF_STEPS

Controls how long the scraper waits between retries. It starts with 5 seconds and increases with each failure to reduce detection risk.

Step 6: Format Your Proxy String

Because we’re scraping listings for roofing contractors in Los Angeles, it makes sense to route all traffic through Los Angeles-based proxies. Yellow Pages listings are geo-targeted, meaning your search results depend heavily on where your IP address appears to be located.

We're using Residential Proxies for this purpose. Here is our pool:

#Proxy pool 
PROXIES = [
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-TCDPEKCM_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-ZbgnB6u4_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-wrAACjQS_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-EE8Rhsih_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-fGo0pPHi_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-7JrqLZN2_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-mCRJTcks_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-51dDpfgf_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-Ve2WXTEV_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-jurD1lGK_lifetime-30m"
]

Each proxy follows this format:

[ADDRESS]:[PORT]:[LOCATION/SESSION ID]

We don’t pass usernames or passwords directly in the proxy string. This approach simplifies authentication and avoids compatibility issues with undetected-chromedriver, which doesn’t always play well with proxies requiring inline credentials.

Instead, we use IP whitelisting. This means we pre-authorize the IP address of our scraping machine directly in the IPRoyal dashboard. For more information, read our IP whitelisting authentication quick-start guide .

Step 7: Create a Time Zone Lookup Table

This is part of our stealth layer. We've already defined a pool of 10 unique Los Angeles residential proxies, but that alone isn’t enough.

If your proxy indicates you're in LA but your browser's time zone shows India, a site like Yellow Pages can flag the discrepancy. So we need a way to translate the proxy’s location into the correct time zone.

That’s what the following line does:

TIMEZONE_MAP = {
    "city-losangeles": "America/Los_Angeles"
}

It maps the city tag from your proxy to a real time zone. Later, when we extract "city-losangeles" from the proxy string, this lookup informs the scraper to set the browser clock to "America/Los_Angeles", ensuring everything remains aligned.

Step 8: Match Proxy Location to the Right Time Zone

We’ve already defined our proxy pool and the time zone lookup table. Now we need a translator that connects the two, so our scraper knows what time zone to assign based on the proxy.

That’s where this function comes in:

def extract_timezone(proxy_string):
    match = re.search(r"(city|state)-([a-z]+)", proxy_string)
    if not match:
        raise ValueError(f"Could not extract location from: {proxy_string}")
    
    loc_key = f"{match.group(1)}-{match.group(2)}"
    timezone = TIMEZONE_MAP.get(loc_key)
    
    if not timezone:
        raise ValueError(f"No timezone mapped for: {loc_key}")
    
    return timezone

Here’s what it does:

Takes the whole proxy string as input.
Uses a regular expression to extract the city or state, like "city-losangeles."
If it doesn’t find a valid location, it raises an error, so you can check your proxy formatting.
It then builds the key for our lookup table and checks if there’s a matching time zone.
If there’s no match, it raises another error to help you debug.
If all goes well, it returns the correct time zone string, ready to be used to set the browser clock.

This is a small but essential step in making sure your browser and proxy match.

Step 9: Define Your Identity Pool

We’re still building stealth. It’s all about closing tiny leaks that could reveal we’re not human. You won’t patch everything, but you’ll get close enough for your scraper to remain under the radar.

The identity pool below gives your scraper multiple browser “costumes” to wear. Each one looks, feels, and behaves like a real person using a real device in Los Angeles. You can always tweak or expand it to fit your project.

# Identites to use 
IDENTITY_POOL = [
    {
        "identity_id": "la_win_edge_01",
        "device_group": "desktop-windows",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.113 Safari/537.36 Edg/125.0.2535.67",
        "viewport": (1920, 1080),
        "platform": "Win32",
        "hardware_concurrency": 8,
        "max_touch_points": 0,
        "webgl_vendor": "Intel Inc.",
        "webgl_renderer": "Intel UHD Graphics 770"
    },
    {
        "identity_id": "la_mac_safari_02",
        "device_group": "desktop-mac",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_5_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
        "viewport": (1680, 1050),
        "platform": "MacIntel",
        "hardware_concurrency": 8,
        "max_touch_points": 0,
        "webgl_vendor": "Apple Inc.",
        "webgl_renderer": "Apple M2 Pro GPU"
    },
    {
        "identity_id": "la_linux_chrome_03",
        "device_group": "desktop-linux",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.6367.119 Safari/537.36",
        "viewport": (1600, 900),
        "platform": "Linux x86_64",
        "hardware_concurrency": 4,
        "max_touch_points": 0,
        "webgl_vendor": "Mesa/X.org",
        "webgl_renderer": "Mesa Intel(R) Arc A380 Graphics (DG2)"
    },
    {
        "identity_id": "la_win_chrome_04",
        "device_group": "desktop-windows",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.6312.86 Safari/537.36",
        "viewport": (1366, 768),
        "platform": "Win32",
        "hardware_concurrency": 6,
        "max_touch_points": 0,
        "webgl_vendor": "NVIDIA Corporation",
        "webgl_renderer": "NVIDIA GeForce GTX 1660"
    },
    {
        "identity_id": "la_mac_firefox_05",
        "device_group": "desktop-mac",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 12.6) Gecko/20100101 Firefox/117.0",
        "viewport": (1440, 900),
        "platform": "MacIntel",
        "hardware_concurrency": 4,
        "max_touch_points": 0,
        "webgl_vendor": "ATI Technologies Inc.",
        "webgl_renderer": "AMD Radeon Pro 560 OpenGL Engine"
    },
    {
        "identity_id": "la_linux_firefox_06",
        "device_group": "desktop-linux",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0",
        "viewport": (1920, 1080),
        "platform": "Linux x86_64",
        "hardware_concurrency": 8,
        "max_touch_points": 0,
        "webgl_vendor": "X.Org",
        "webgl_renderer": "Radeon RX 6600 (DRM 3.40.0)"
    },
    {
        "identity_id": "la_win_ie_compat_07",
        "device_group": "desktop-windows",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)",
        "viewport": (1024, 768),
        "platform": "Win32",
        "hardware_concurrency": 2,
        "max_touch_points": 0,
        "webgl_vendor": "Microsoft",
        "webgl_renderer": "Microsoft Basic Render Driver"
    },
    {
        "identity_id": "la_mac_chrome_08",
        "device_group": "desktop-mac",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.110 Safari/537.36",
        "viewport": (1280, 800),
        "platform": "MacIntel",
        "hardware_concurrency": 4,
        "max_touch_points": 0,
        "webgl_vendor": "Apple Inc.",
        "webgl_renderer": "Apple M1 GPU"
    },
    {
        "identity_id": "la_win_firefox_09",
        "device_group": "desktop-windows",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:115.0) Gecko/20100101 Firefox/115.0",
        "viewport": (1360, 768),
        "platform": "Win32",
        "hardware_concurrency": 4,
        "max_touch_points": 0,
        "webgl_vendor": "Google Inc.",
        "webgl_renderer": "ANGLE (Intel, Intel HD Graphics 630, Direct3D11 vs_5_0 ps_5_0)"
    },
    {
        "identity_id": "la_linux_edge_10",
        "device_group": "desktop-linux",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.113 Safari/537.36 Edg/125.0.2535.67",
        "viewport": (1536, 864),
        "platform": "Linux x86_64",
        "hardware_concurrency": 4,
        "max_touch_points": 0,
        "webgl_vendor": "Google Inc.",
        "webgl_renderer": "ANGLE (NVIDIA, GeForce RTX 3070, Direct3D11 vs_5_0 ps_5_0)"
    }
]

Let's pick the first one apart:

identity_id: Just a nickname for debugging. It helps you track which persona is active.
device_group: The type of device we’re simulating. In this case, a Windows desktop.
timezone: Tied to your proxy’s location. Here, it’s set to match a Los Angeles IP.
user_agent: What the browser claims to be. This one mimics Edge on Windows.
viewport: The screen resolution. This identity uses full HD: 1920x1080.
platform: Another JavaScript fingerprint - this tells the site we’re on Win32.
hardware_concurrency: How many CPU cores the browser exposes (JavaScript level).
max_touch_points: Tells sites how many fingers the device supports. 0 = no touchscreen.
webgl_vendor + webgl_renderer: GPU fingerprints. These must match your virtual setup to avoid flags.

You can use this as a starting point to build your own identity list. Just make sure to mix in a variety of OS, browsers, and hardware to help your scraper look like dozens of different real people.

Step 10: Define a Function to Pair a Proxy With a Matching Identity

The next step is to define a function that selects a random proxy from your proxy pool and pairs it with a browser identity from your identity pool, but only after verifying that both share the same time zone. Use the following function:

# Proxy assignment and identity logic 
def assign_proxy_and_identity():
    proxy = random.choice(PROXIES)
    logging.info(f"Selected proxy: {proxy}")

    try:
        timezone = extract_timezone(proxy)
    except ValueError as e:
        logging.error(f"Failed to extract timezone: {e}")
        raise

    # Filter identities by matching timezone
    matching_identities = [idn for idn in IDENTITY_POOL if idn["timezone"] == timezone]
    
    if not matching_identities:
        logging.error(f"No identities found for timezone: {timezone}")
        raise Exception(f"No identities match timezone: {timezone}")

    identity = random.choice(matching_identities)
    logging.info(f"Assigned identity with user agent: {identity['user_agent']}")

    return proxy, identity

Here’s what it does:

Selects a random proxy from your PROXIES list.
Logs which proxy was chosen.
Extracts the time zone from the proxy string using extract_timezone().
Filters your identity pool for matches with the same time zone.
If no match is found, it raises an error.
Otherwise, it picks one identity at random.
Logs the assigned identity’s user agent.
Returns the matched proxy and identity pair.

This makes sure your scraper presents a consistent and believable browser profile that aligns with the geographic location of your proxy.

Step 11: Define Behavior Profiles for Stealth Scraping

To help your scraper stay undetected, define a list of behavior profiles that control how it interacts with the page. Each profile simulates a different type of user by tweaking things such as scroll speed, typing style, and mouse movement. Here’s the full list. You can customize or expand it as needed:

# Each profile defines timing, interaction, and navigation tendencies
BEHAVIOR_PROFILES = [
    {
        "name": "Fast Clicker",
        "base_delay": 0.5,
        "scroll_pattern": "none",
        "hover_before_click": False,
        "re_click_probability": 0.05,
        "slow_typing": False,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Deliberate Reader",
        "base_delay": 2.5,
        "scroll_pattern": "linear",
        "hover_before_click": True,
        "re_click_probability": 0.02,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Back-and-Forth Browser",
        "base_delay": 1.2,
        "scroll_pattern": "jittery",
        "hover_before_click": True,
        "re_click_probability": 0.1,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Precise Shopper",
        "base_delay": 1.0,
        "scroll_pattern": "none",
        "hover_before_click": False,
        "re_click_probability": 0.01,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Researcher",
        "base_delay": 2.0,
        "scroll_pattern": "linear",
        "hover_before_click": True,
        "re_click_probability": 0.03,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Curious Clicker",
        "base_delay": 1.7,
        "scroll_pattern": "jittery",
        "hover_before_click": True,
        "re_click_probability": 0.2,
        "slow_typing": False,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Hesitant User",
        "base_delay": 3.0,
        "scroll_pattern": "none",
        "hover_before_click": True,
        "re_click_probability": 0.0,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
]

Each profile includes these core elements:

name: A simple nickname to identify the profile in logs or during tests.
base_delay: The base wait time (in seconds) between actions like typing, clicking, or scrolling. A value like 0.5 simulates a fast user, while higher values mimic slower, more deliberate interaction.
Scroll_pattern: Controls whether and how the scraper scrolls the page. Options like "linear" or "jittery" simulate browsing, while "none" skips scrolling entirely.
Hover_before_click: If set to True, the bot will briefly hover over elements before clicking, mimicking natural user behavior.
Re-click_probability: Adds randomness by allowing the bot to occasionally click the same button twice, just like a real user might.
Slow_typing: If enabled, simulates human-like keystroke delays instead of typing everything instantly.
Move_mouse_between_actions: When True, the bot moves the mouse cursor between actions to avoid looking robotic.

These profiles add unpredictability and variation, which helps your scraper bypass detection.

Step 12: Define a Function to Pick a Random Behavior Profile

To make each session act like a unique user, we define a function that randomly selects one of the behavior profiles we created earlier.

# Function to randomly select a behavior profile for a session
def assign_behavior_profile():
    return random.choice(BEHAVIOR_PROFILES)

This function pulls a random user behavior style, such as ‘Fast Clicker’ or ‘Deliberate Reader’, ensuring that every scraping run behaves slightly differently.

Step 13: Prepare Your CSV File for Output

Let’s set up a clean CSV file to store your scraping results. This function checks if yellowpages_data.csv already exists. If not, it creates it and adds column headers for timestamp, name, and phone.

ef initialize_csv():
    filename = "yellowpages_data.csv"
    if not os.path.exists(filename):
        with open(filename, mode="w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(["timestamp", "name", "phone"])

That’s it, groundwork’s done. Now it’s time to start building the scraping logic.

Step 14: Launch Your Stealth Browser

Time to build the heart of your bot: the undetected browser launcher. This function spins up a stealth Chrome instance patched against major fingerprint leaks, helping you stay invisible to anti-bot systems.

def launch_stealth_browser(proxy: str, identity: dict):
    """
    Launch a stealthy undetected Chrome browser instance with proxy and identity settings applied.
    """


    # Extract proxy credentials and IP
    proxy_address = proxy.split("/")[2]  # Extracts 'geo.iproyal.com:12321'

    proxy_address = proxy.split("/")[2]  # Gets host:port from your whitelisted proxy string
    # Stealthy browser options
    options = uc.ChromeOptions()
    options.add_argument(f'--proxy-server=http://{proxy_address}')
    options.add_argument(f"--window-size={identity['viewport'][0]},{identity['viewport'][1]}")
    options.add_argument(f"--lang=en-US,en;q=0.9")
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_experimental_option("prefs", {
        "credentials_enable_service": False,
        "profile.password_manager_enabled": False
    })
    if HEADLESS:
        options.add_argument("--headless=new")


    # Start undetected Chrome driver
    driver = uc.Chrome(options=options)
    driver.execute_script("window.localStorage.setItem('bb_test_key', 'value');")
    driver.execute_script("window.sessionStorage.setItem('bb_test_key', 'value');")
    driver.execute_script("document.cookie = 'bb_user_sim=1234567; path=/';")

    driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
        "source": f"""
            Object.defineProperty(navigator, 'webdriver', {{get: () => undefined}});
            Object.defineProperty(navigator, 'userAgent', {{get: () => "{identity['user_agent']}" }});
            Object.defineProperty(navigator, 'platform', {{get: () => "{identity['platform']}" }});
            Object.defineProperty(navigator, 'hardwareConcurrency', {{get: () => {identity['hardware_concurrency']} }});
            Object.defineProperty(navigator, 'maxTouchPoints', {{get: () => {identity['max_touch_points']} }});
            Object.defineProperty(navigator, 'languages', {{get: () => ['en-US', 'en'] }});
            Object.defineProperty(navigator, 'language', {{get: () => 'en-US' }});
        """
    })
    # Enable DevTools protocol
    driver.execute_cdp_cmd("Page.enable", {})
    # Add fake chrome.runtime support
    driver.execute_cdp_cmd("Runtime.evaluate", {
        "expression": "Object.defineProperty(navigator, 'chrome', { get: () => ({ runtime: {} }) });"
    })
    driver.execute_script("""
        Object.defineProperty(navigator, 'plugins', {
            get: () => [1, 2, 3, 4, 5]
        });

        Object.defineProperty(navigator, 'mimeTypes', {
            get: () => [1, 2, 3]
        });

        navigator.mediaDevices = {
            enumerateDevices: () => Promise.resolve([
                { kind: 'audioinput', label: 'Built-in Microphone' },
                { kind: 'videoinput', label: 'Integrated Camera' }
            ])
        };

        Object.defineProperty(screen, 'colorDepth', {
            get: () => 24
        });
    """)
    # Proxy authentication via DevTools Protocol
    driver.execute_cdp_cmd('Network.enable', {})
    # Set timezone via DevTools
    driver.execute_cdp_cmd("Emulation.setTimezoneOverride", {"timezoneId": identity["timezone"]})

    # Inject WebGL vendor/renderer spoofing if needed
    driver.execute_script("""
        Object.defineProperty(navigator, 'platform', {get: () => '%s'});
        Object.defineProperty(navigator, 'hardwareConcurrency', {get: () => %d});
        Object.defineProperty(navigator, 'maxTouchPoints', {get: () => %d});
        Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
        Object.defineProperty(navigator, 'language', {get: () => 'en-US'});
        Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
    """ % (
        identity['platform'],
        identity['hardware_concurrency'],
        identity['max_touch_points']
    ))
    behavior = assign_behavior_profile()
    webgl_vendor = identity['webgl_vendor']
    webgl_renderer = identity['webgl_renderer']
    driver.execute_script(f'''
        const getParameter = WebGLRenderingContext.prototype.getParameter;
        WebGLRenderingContext.prototype.getParameter = function(parameter) {{
            if (parameter === 37445) return "{webgl_vendor}";
            if (parameter === 37446) return "{webgl_renderer}";
            return getParameter(parameter);
        }};
    ''')
    logging.info(f"WebGL spoof applied: {webgl_vendor} / {webgl_renderer}")

    # Canvas fingerprint spoofing and AudioBuffer
    driver.execute_script('''
       const origToDataURL = HTMLCanvasElement.prototype.toDataURL;
       HTMLCanvasElement.prototype.toDataURL = function() {
           return "data:image/png;base64,canvasfakestring==";
        };

        const origGetChannelData = AudioBuffer.prototype.getChannelData;
        AudioBuffer.prototype.getChannelData = function() {
            const results = origGetChannelData.apply(this, arguments);
            for (let i = 0; i < results.length; i++) {
                results[i] = results[i] + Math.random() * 0.0000001;
            }
            return results;
        };
    ''')
    logging.info("Stealth browser launched successfully.")
    return driver, behavior

We can't go line by line to see what each code block does, so let's look at it from a high-level perspective. You can modify this code as needed to suit your specific setup.

What you’ll notice right off the bat is that this function takes input from most of the functions we designed above, specifically the proxy and identity objects that define how the browser should behave.

The first thing it does is launch the stealth browser using undetected-chromedriver, with the proxy and identity as the two main arguments. It extracts the proxy address from the whole string, then builds a custom Chrome session.

During this setup, it applies the proxy, sets the screen size based on your identity profile, sets the language headers, turns off automation features, and patches things such as localStorage, sessionStorage, and cookies.

What comes next is a full-on fingerprint override. The function spoofs everything from user agent, platform, WebGL renderer/vendor, to navigator flags like webdriver, plugins, and even time zone. It also injects fake audio and canvas fingerprints to avoid canvas/audio-based tracking.

All of this is what makes the browser “undetectable.” Once spoofing is in place, the function returns the driver and a behavior profile, allowing your scraper to blend in like a real human user.

Step 15: Verify Your Stealth Setup

Let’s make sure everything we’ve built so far is actually working. To do that, we’ll use SannySoft , a website designed to show exactly what your browser reveals to the outside world. Add the following function to your code to verify that your spoofed identity and stealth patches are sticking:

def verify_stealth_setup(driver, identity):
    try:
        # Go to a tool that reflects fingerprint info
        driver.get("https://bot.sannysoft.com")
        time.sleep(3)

        # Step 1: Basic navigator checks using JS
        navigator_checks = driver.execute_script("""
            return {
                webdriver: navigator.webdriver === undefined,
                platform: navigator.platform,
                hardwareConcurrency: navigator.hardwareConcurrency,
                languages: navigator.languages,
                userAgent: navigator.userAgent
            };
        """)

        if not navigator_checks['webdriver']:
            logging.warning("webdriver flag detected — browser likely flagged as bot")
            return False

        if navigator_checks['platform'] != identity['platform']:
            logging.warning(f"Platform mismatch: expected {identity['platform']}, got {navigator_checks['platform']}")
            return False

        if navigator_checks['userAgent'] != identity['user_agent']:
            logging.warning(f"User-Agent mismatch: expected {identity['user_agent']}, got {navigator_checks['userAgent']}")
            return False

        # Step 2: Timezone check via JS
        browser_tz = driver.execute_script("return Intl.DateTimeFormat().resolvedOptions().timeZone")
        if browser_tz != identity['timezone']:
            logging.warning(f"Timezone mismatch: expected {identity['timezone']}, got {browser_tz}")
            return False

        # Passed all checks
        logging.info("Stealth verification passed.")
        return True

    except Exception as e:
        logging.error(f"Error during stealth verification: {e}")
        return False

Since we set HEADLESS = False, you’ll be able to see the verification run in real-time. The function first retrieves the spoofed fingerprints and then compares them against the identity you assigned. This step simply confirms that the browser appears exactly how you told it to.

Step 16: Searching Yellow Pages Like a Human

We’re now ready to write a function that can search the Yellow Pages. For this tutorial, remember we’re scraping roofing contractors in Los Angeles. If you visit the Yellow Pages homepage, this is what you’ll see:

Now, we could go straight to the results page that lists roofing contractors in LA. But that shortcut increases our chances of getting flagged. Bots that skip the natural flow of a site tend to stand out.

Instead, the safest path is to start like a clueless user:

Land on the homepage
Manually type in the location
Type in the business type
And click 'Find'

That way, we blend in with expected user behavior and drastically reduce our odds of getting caught.

On the homepage above, right-click on the 'Find a business' search box and click 'Inspect'. In the highlighted HTML, you will see something like this:

<input id="query" type="text" value="" placeholder="Find a business" autocomplete="off" data-onempty="recent-searches" name="search_terms">

Take note of the id="query" - that’s what we’ll use in our code. Then do the same for the location box. You'll see something like:

<input id="location" type="text" value="Cleveland, OH" placeholder="Where?" autocomplete="off" data-onempty="menu-location" name="geo_location_terms" data-gtm-form-interact-field-id="0">

Take id="location" as your second target.

Now we can add these elements to our code below:

def search_yellowpages_homepage(driver, behavior, query="roofing contractors", location="Los Angeles, CA"):
    try:
        logging.info("Navigating to YellowPages homepage...")
        driver.get("https://www.yellowpages.com")

        search_input = WebDriverWait(driver, 20).until(
            EC.presence_of_element_located((By.ID, "query"))
        )
        location_input = driver.find_element(By.ID, "location")

        # Type location first + accept autosuggest
        location_input.clear()
        for ch in location:
            location_input.send_keys(ch)
            time.sleep(behavior["base_delay"] + random.uniform(0.05, 0.18))
        location_input.send_keys(Keys.TAB)
        time.sleep(random.uniform(0.2, 0.5))

        # Type query with tiny typo chance
        typo_chance = 0.15 if behavior.get("slow_typing") else 0.05
        base_delay = behavior["base_delay"]
        search_input.clear()
        for ch in query:
            if random.random() < typo_chance:
                wrong = random.choice("abcdefghijklmnopqrstuvwxyz")
                search_input.send_keys(wrong)
                time.sleep(base_delay + random.uniform(0.08, 0.22))
                search_input.send_keys(Keys.BACKSPACE)
                time.sleep(base_delay + random.uniform(0.08, 0.22))
            search_input.send_keys(ch)
            time.sleep(base_delay + random.uniform(0.08, 0.22))

        # mild human wiggle
        try:
            ActionChains(driver).move_by_offset(random.randint(5,30), random.randint(5,30)).pause(random.uniform(0.2,0.6)).perform()
        except Exception:
            pass

        search_input.send_keys(Keys.RETURN)

        # Wait for results + URL to settle
        WebDriverWait(driver, 30).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div.srp-listing, .result"))
        )
        time.sleep(base_delay + random.uniform(1.5, 3.0))
        WebDriverWait(driver, 10).until(lambda d: "/search" in d.current_url or "?search_terms=" in d.current_url)

        url = driver.current_url
        logging.info(f"YellowPages search completed. URL captured: {url}")
        return url

    except Exception as e:
        logging.error(f"Error during YellowPages homepage search: {e}")
        return None

Let’s break down exactly what’s happening here:

def search_yellowpages_homepage(driver, behavior, query="roofing contractors", location="Los Angeles, CA")

This function instructs the bot to visit Yellow Pages, enter a location and business type, and submit the search. It takes the customized stealth browser (driver), a random behavior profile (behavior), and default search terms (query and location).

driver.get(" https://www.yellowpages.com ")

This line opens the Yellow Pages homepage. It’s inside a try block to prevent the scraper from crashing if anything goes wrong.

WebDriverWait(driver, 15).until(...)

Waits up to 15 seconds for the search input box to appear. Specifically, it targets the input field with the ID "query".

location_input = driver.find_element(By.ID, "location")

After the page loads, this grabs the location input field using the ID selector.

location_input.clear() + typing the location

Clears any pre-filled text, then types the location character by character.

Delays between keystrokes are randomized based on the selected behavior profile to mimic human typing.

typo_chance = 0.15 if behavior.get("slow_typing") else 0.05

This sets how likely the bot is to make a typo. Slow typists (in the profile) have a 15% typo chance, otherwise, it's 5% - helping the bot appear more human.

search_input.clear() + typing the business query

Just like the location field, we clear the search box and type the business name letter by letter. We simulate occasional typos and correct them using the backspace key, just like a human would.

search_input.send_keys(Keys.RETURN)

Once typing is done, this line mimics pressing Enter to submit the search.

WebDriverWait(...) + time.sleep(...)

After submitting, we wait for the results section to load (using .srp-listing as a signal), then add a random delay to mimic natural idle time before scraping.

url = driver.current_url

Captures the results page URL. This is essential for the next scraping step.

If an error arises, we catch it and log it using the except block.

Step 17: Define a Function That Scrapes Business Listings

Now that we’ve captured a clean, fingerprint-free URL, let’s use it to extract listings for roofing contractors. Add the following function:

def scrape_yellowpages(driver, behavior, url=None):
    results = []
    try:
        # IMPORTANT: if we're already on the results page, do NOT reload it
        if url:
            current = driver.current_url
            # Only navigate if we're not already there (avoid losing referrer/cookies)
            if not current or current != url:
                driver.get(url)

        WebDriverWait(driver, 30).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div.srp-listing, .result"))
        )

        # tiny scroll & idle
        driver.execute_script("window.scrollBy(0, arguments[0]);", random.randint(150, 600))
        time.sleep(random.uniform(1.2, 2.5))

        listings = driver.find_elements(By.CSS_SELECTOR, "div.srp-listing, .result")
        for idx, listing in enumerate(listings[:MAX_LISTINGS_PER_PAGE]):
            try:
                name_el = listing.find_element(By.CSS_SELECTOR, "h2.n a.business-name span, a.business-name span, a.business-name")
                name = name_el.text.strip()

                # phone can be missing or behind different templates
                phone = ""
                try:
                    phone_el = listing.find_element(By.CSS_SELECTOR, "div.phones.phone.primary, .phones")
                    phone = phone_el.text.strip()
                except Exception:
                    pass

                logging.info(f"Found listing: {name}  {phone}")
                results.append((name, phone))

                if idx and idx % 5 == 0:
                    time.sleep(random.uniform(2.0, 4.5))

            except Exception as e:
                logging.debug(f"Skipping one listing: {e}")
                continue

        # light de-robotize movement
        if behavior.get("move_mouse_between_actions"):
            try:
                header = driver.find_element(By.CSS_SELECTOR, "header, .header, body")
                ActionChains(driver).move_to_element(header).pause(random.uniform(0.5, 1.1)).perform()
            except Exception:
                pass

        time.sleep(random.uniform(1.8, 3.8))
        return results

    except Exception as e:
        logging.error(f"YellowPages scraping failed: {e}")
        return []

Here's what's happening:

def scrape_yellowpages(driver, behavior, url=None)

We define a function that takes three arguments: the stealth Chrome browser we launched earlier, a behavior profile, and the search results URL we captured earlier.

results = []

Initializes an empty list where we'll store (name, phone) tuples for each business found.

if url:

If a URL was passed from the previous function, the bot gets the current page URL the driver is sitting on.

if not current or current != url:

Avoids reloading the sample page to preserve referrers, cookies, and session tokens.

Logging and WebDriverWait

Begins a try block to catch errors gracefully. It then logs the start of the scraping process and loads the search results page using the captured URL.

driver.execute_script(...)

Scrolls the window a little bit downward to mimic a small human scroll.

time.sleep(random.uniform(2.5, 5.0))

Adds a randomized pause to simulate human “reading” time before interacting with the page.

listings = driver.find_elements(By.CSS_SELECTOR, "div.srp-listing")

Collects all visible business listing elements into a list for processing.

For listing in listings + try block

Iterates over each business listing and extracts the company name and phone number using CSS selectors. Cleans the text to remove whitespace, then logs the result and appends it to the results list.

Step 18: Define a Function to Store the Results in a CSV File

Now that we’ve collected the business listings and created an empty CSV file, we use the function below to transfer the data into that file. Each listing is written as a new row, including the name, phone number, and a timestamp.

def log_business_to_csv(name, phone):
    with open("yellowpages_data.csv", mode="a", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow([datetime.now().isoformat(), name, phone])

Step 19: Define a Function to Detect Bot Blocks

Before we initialize the CSV, we need to check whether the bot has been soft-blocked. If we skip this step and a block occurs, the script could exit with a blank CSV file.

Use the following function to detect blocks like CAPTCHAs, rate limits, or access denials:

def detect_block(driver):
    html = driver.page_source.lower()
    if "unusual traffic" in html or "are you a robot" in html or "captcha" in html:
        return "captcha"
    if "access denied" in html or "403 forbidden" in html:
        return "forbidden"
    if "automated queries" in html or "too many requests" in html:
        return "rate_limit"
    if len(html.strip()) < 1000:
        return "empty"
    return None
initialize_csv()

Step 20: Write the Main Execution Block

Now that all the functions are defined, it’s time to put them to work. This step creates the main execution loop that ties everything together with smart retry logic. Without it, our script would just be a collection of unused functions - this is what actually runs the scraper.

Add the code below to execute your full scraping workflow:

for attempt in range(MAX_RETRIES):
    try:
        proxy, identity = assign_proxy_and_identity()
        driver, behavior = launch_stealth_browser(proxy, identity)
        logging.info(f" Behavior profile: {behavior['name']}")

        # Optional health check — keep or remove
        if not verify_stealth_setup(driver, identity):
            raise Exception("Stealth verification failed.")

        block = detect_block(driver)
        if block:
            raise Exception(f"Pre-check block: {block}")

        url = search_yellowpages_homepage(driver, behavior,
                              query="roofing contractors",
                              location="Los Angeles, CA")
        if not url:
            raise Exception("Search submission failed or listings did not load.")

        results = scrape_yellowpages(driver, behavior, url)

        
        

        block = detect_block(driver)
        if block:
            raise Exception(f"Blocked while scraping YP: {block}")

        if results:
            for name, phone in results:
                print(f" {name} — {phone}")
                log_business_to_csv(name, phone)
            driver.quit()
            sys.exit(0)
        else:
            logging.warning("No YellowPages listings found on scanned pages.")
            driver.quit()
            # fall through to retry with new proxy

    except Exception as e:
        logging.error(f"Attempt {attempt + 1} failed: {e}")
        backoff = BLOCK_BACKOFF_STEPS[min(attempt, len(BLOCK_BACKOFF_STEPS) - 1)]
        logging.info(f"Waiting {backoff}s before retry with new proxy/identity.")
        try:
            driver.quit()
        except:
            pass
        time.sleep(backoff)
else:
    logging.critical(" All retry attempts failed. Exiting.")
    sys.exit(1)

This block begins by starting a retry loop: for attempt in range(MAX_RETRIES). It runs the scraping logic multiple times in case of failure, based on the MAX_RETRIES global setting.

Inside the try block, it first calls the assign_proxy_and_identity() function to pick a fresh rotating proxy and a spoofed identity profile. It then launches a stealth browser session using launch_stealth_browser(), which launches an undetected-chromedriver instance configured with the assigned identity. Immediately after, it logs the behavior profile being used.

Next, it verifies whether the stealth setup was successful by calling verify_stealth_setup(), then checks the page source for any soft blocks using detect_block(). If no blocks are found, it calls the search_yellowpages_homepage() function to simulate a human search and capture the resulting URL.

That URL is then passed into the scrape_yellowpages() function, which handles the actual data extraction. After scraping, it performs one more block check. If the results are clean and contain business listings, it logs them to a CSV file using log_business_to_csv(). If no data is found or any part of the process fails, it exits or retries using the next proxy.

Step 21: Run Your Yellow Pages Scraper

Here is the full Yellow Pages scraper code:

#Browser Control & Stealth Automation
from selenium.webdriver.common.by import By

from selenium.webdriver.common.keys import Keys

from selenium.webdriver.common.action_chains import ActionChains

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.chrome.options import Options

import undetected_chromedriver as uc

#Proxy, Fingerprint & Spoofing Logic
import time
import random
import os
import sys
import uuid
import logging
import re

# Data logging and Outputt
import csv
from datetime import datetime

# Configure logging
logging.basicConfig(level=logging.INFO, format='[%(asctime)s] %(message)s')

# Global Settings
HEADLESS = False  # Set to True later if you spoof it correctly
MAX_RETRIES = 5
BLOCK_BACKOFF_STEPS = [5, 10, 20, 30]  # seconds
MAX_LISTINGS_PER_PAGE = 30

#Proxy pool 
PROXIES = [
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-TCDPEKCM_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-ZbgnB6u4_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-wrAACjQS_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-EE8Rhsih_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-fGo0pPHi_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-7JrqLZN2_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-mCRJTcks_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-51dDpfgf_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-Ve2WXTEV_lifetime-30m",
    "http://geo.iproyal.com:12321/country-us_city-losangeles_session-jurD1lGK_lifetime-30m"
]


TIMEZONE_MAP = {
    "city-losangeles": "America/Los_Angeles"
}


def extract_timezone(proxy_string):
    match = re.search(r"(city|state)-([a-z]+)", proxy_string)
    if not match:
        raise ValueError(f"Could not extract location from: {proxy_string}")
    
    loc_key = f"{match.group(1)}-{match.group(2)}"
    timezone = TIMEZONE_MAP.get(loc_key)
    
    if not timezone:
        raise ValueError(f"No timezone mapped for: {loc_key}")
    
    return timezone

IDENTITY_POOL = [
    {
        "identity_id": "la_win_chrome_11",
        "device_group": "desktop-windows",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.62 Safari/537.36",
        "viewport": (1920, 1080),
        "platform": "Win32",
        "hardware_concurrency": 12,
        "max_touch_points": 0,
        "webgl_vendor": "NVIDIA Corporation",
        "webgl_renderer": "NVIDIA GeForce RTX 4060"
    },
    {
        "identity_id": "la_mac_safari_12",
        "device_group": "desktop-mac",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15",
        "viewport": (1728, 1117),
        "platform": "MacIntel",
        "hardware_concurrency": 10,
        "max_touch_points": 0,
        "webgl_vendor": "Apple Inc.",
        "webgl_renderer": "Apple M3 Pro GPU"
    },
    {
        "identity_id": "la_linux_firefox_13",
        "device_group": "desktop-linux",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0",
        "viewport": (1600, 900),
        "platform": "Linux x86_64",
        "hardware_concurrency": 6,
        "max_touch_points": 0,
        "webgl_vendor": "X.Org",
        "webgl_renderer": "Mesa Intel(R) UHD Graphics 730"
    },
    {
        "identity_id": "la_win_edge_14",
        "device_group": "desktop-windows",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.62 Safari/537.36 Edg/126.0.2592.87",
        "viewport": (1366, 768),
        "platform": "Win32",
        "hardware_concurrency": 8,
        "max_touch_points": 0,
        "webgl_vendor": "Intel Inc.",
        "webgl_renderer": "Intel Iris Xe Graphics"
    },
    {
        "identity_id": "la_mac_chrome_15",
        "device_group": "desktop-mac",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.141 Safari/537.36",
        "viewport": (1440, 900),
        "platform": "MacIntel",
        "hardware_concurrency": 8,
        "max_touch_points": 0,
        "webgl_vendor": "Apple Inc.",
        "webgl_renderer": "Apple M2 GPU"
    },
    {
        "identity_id": "la_linux_edge_16",
        "device_group": "desktop-linux",
        "timezone": "America/Los_Angeles",
        "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.62 Safari/537.36 Edg/126.0.2592.87",
        "viewport": (1536, 864),
        "platform": "Linux x86_64",
        "hardware_concurrency": 4,
        "max_touch_points": 0,
        "webgl_vendor": "AMD",
        "webgl_renderer": "Radeon RX 6700 XT"
    }
]


# Proxy assignment and identity logic 
def assign_proxy_and_identity():
    proxy = random.choice(PROXIES)
    logging.info(f"Selected proxy: {proxy}")

    try:
        timezone = extract_timezone(proxy)
    except ValueError as e:
        logging.error(f"Failed to extract timezone: {e}")
        raise

    # Filter identities by matching timezone
    matching_identities = [idn for idn in IDENTITY_POOL if idn["timezone"] == timezone]
    
    if not matching_identities:
        logging.error(f"No identities found for timezone: {timezone}")
        raise Exception(f"No identities match timezone: {timezone}")

    identity = random.choice(matching_identities)
    logging.info(f"Assigned identity with user agent: {identity['user_agent']}")

    return proxy, identity

# Each profile defines timing, interaction, and navigation tendencies
BEHAVIOR_PROFILES = [
    {
        "name": "Fast Clicker",
        "base_delay": 0.5,
        "scroll_pattern": "none",
        "hover_before_click": False,
        "re_click_probability": 0.05,
        "slow_typing": False,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Deliberate Reader",
        "base_delay": 2.5,
        "scroll_pattern": "linear",
        "hover_before_click": True,
        "re_click_probability": 0.02,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Back-and-Forth Browser",
        "base_delay": 1.2,
        "scroll_pattern": "jittery",
        "hover_before_click": True,
        "re_click_probability": 0.1,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Precise Shopper",
        "base_delay": 1.0,
        "scroll_pattern": "none",
        "hover_before_click": False,
        "re_click_probability": 0.01,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Researcher",
        "base_delay": 2.0,
        "scroll_pattern": "linear",
        "hover_before_click": True,
        "re_click_probability": 0.03,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Curious Clicker",
        "base_delay": 1.7,
        "scroll_pattern": "jittery",
        "hover_before_click": True,
        "re_click_probability": 0.2,
        "slow_typing": False,
        "move_mouse_between_actions": True,
    },
    {
        "name": "Hesitant User",
        "base_delay": 3.0,
        "scroll_pattern": "none",
        "hover_before_click": True,
        "re_click_probability": 0.0,
        "slow_typing": True,
        "move_mouse_between_actions": True,
    },
]

# Function to randomly select a behavior profile for a session
def assign_behavior_profile():
    return random.choice(BEHAVIOR_PROFILES)

def initialize_csv():
    filename = "yellowpages_data.csv"
    if not os.path.exists(filename):
        with open(filename, mode="w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(["timestamp", "name", "phone"])

# Here's the complete stealth launch function based on the provided context.

def launch_stealth_browser(proxy: str, identity: dict):
    """
    Launch a stealthy undetected Chrome browser instance with proxy and identity settings applied.
    """


    # Extract proxy credentials and IP
    proxy_address = proxy.split("/")[2]  # Extracts 'geo.iproyal.com:12321'

    proxy_address = proxy.split("/")[2]  # Gets host:port from your whitelisted proxy string
    # Stealthy browser options
    options = uc.ChromeOptions()
    options.add_argument(f'--proxy-server=http://{proxy_address}')
    options.add_argument(f"--window-size={identity['viewport'][0]},{identity['viewport'][1]}")
    options.add_argument(f"--lang=en-US,en;q=0.9")
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_experimental_option("prefs", {
        "credentials_enable_service": False,
        "profile.password_manager_enabled": False
    })
    if HEADLESS:
        options.add_argument("--headless=new")


    # Start undetected Chrome driver
    driver = uc.Chrome(options=options)
    driver.execute_script("window.localStorage.setItem('bb_test_key', 'value');")
    driver.execute_script("window.sessionStorage.setItem('bb_test_key', 'value');")
    driver.execute_script("document.cookie = 'bb_user_sim=1234567; path=/';")

    driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
        "source": f"""
            Object.defineProperty(navigator, 'webdriver', {{get: () => undefined}});
            Object.defineProperty(navigator, 'userAgent', {{get: () => "{identity['user_agent']}" }});
            Object.defineProperty(navigator, 'platform', {{get: () => "{identity['platform']}" }});
            Object.defineProperty(navigator, 'hardwareConcurrency', {{get: () => {identity['hardware_concurrency']} }});
            Object.defineProperty(navigator, 'maxTouchPoints', {{get: () => {identity['max_touch_points']} }});
            Object.defineProperty(navigator, 'languages', {{get: () => ['en-US', 'en'] }});
            Object.defineProperty(navigator, 'language', {{get: () => 'en-US' }});
        """
    })
    # Enable DevTools protocol
    driver.execute_cdp_cmd("Page.enable", {})
    # Add fake chrome.runtime support
    driver.execute_cdp_cmd("Runtime.evaluate", {
        "expression": "Object.defineProperty(navigator, 'chrome', { get: () => ({ runtime: {} }) });"
    })
    driver.execute_script("""
        Object.defineProperty(navigator, 'plugins', {
            get: () => [1, 2, 3, 4, 5]
        });

        Object.defineProperty(navigator, 'mimeTypes', {
            get: () => [1, 2, 3]
        });

        navigator.mediaDevices = {
            enumerateDevices: () => Promise.resolve([
                { kind: 'audioinput', label: 'Built-in Microphone' },
                { kind: 'videoinput', label: 'Integrated Camera' }
            ])
        };

        Object.defineProperty(screen, 'colorDepth', {
            get: () => 24
        });
    """)
    # Proxy authentication via DevTools Protocol
    driver.execute_cdp_cmd('Network.enable', {})
    # Set timezone via DevTools
    driver.execute_cdp_cmd("Emulation.setTimezoneOverride", {"timezoneId": identity["timezone"]})

    # Inject WebGL vendor/renderer spoofing if neede
    driver.execute_script("""
        Object.defineProperty(navigator, 'platform', {get: () => '%s'});
        Object.defineProperty(navigator, 'hardwareConcurrency', {get: () => %d});
        Object.defineProperty(navigator, 'maxTouchPoints', {get: () => %d});
        Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
        Object.defineProperty(navigator, 'language', {get: () => 'en-US'});
        Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
    """ % (
        identity['platform'],
        identity['hardware_concurrency'],
        identity['max_touch_points']
    ))
    behavior = assign_behavior_profile()
    webgl_vendor = identity['webgl_vendor']
    webgl_renderer = identity['webgl_renderer']
    driver.execute_script(f'''
        const getParameter = WebGLRenderingContext.prototype.getParameter;
        WebGLRenderingContext.prototype.getParameter = function(parameter) {{
            if (parameter === 37445) return "{webgl_vendor}";
            if (parameter === 37446) return "{webgl_renderer}";
            return getParameter(parameter);
        }};
    ''')
    logging.info(f"WebGL spoof applied: {webgl_vendor} / {webgl_renderer}")

    # Canvas fingerprint spoofing and AudioBuffer
    driver.execute_script('''
       const origToDataURL = HTMLCanvasElement.prototype.toDataURL;
       HTMLCanvasElement.prototype.toDataURL = function() {
           return "data:image/png;base64,canvasfakestring==";
        };

        const origGetChannelData = AudioBuffer.prototype.getChannelData;
        AudioBuffer.prototype.getChannelData = function() {
            const results = origGetChannelData.apply(this, arguments);
            for (let i = 0; i < results.length; i++) {
                results[i] = results[i] + Math.random() * 0.0000001;
            }
            return results;
        };
    ''')
    logging.info("Stealth browser launched successfully.")
    return driver, behavior
def verify_stealth_setup(driver, identity):
    try:
        # Go to a tool that reflects fingerprint info
        driver.get("https://bot.sannysoft.com")
        time.sleep(3)

        # Step 1: Basic navigator checks using JS
        navigator_checks = driver.execute_script("""
            return {
                webdriver: navigator.webdriver === undefined,
                platform: navigator.platform,
                hardwareConcurrency: navigator.hardwareConcurrency,
                languages: navigator.languages,
                userAgent: navigator.userAgent
            };
        """)

        if not navigator_checks['webdriver']:
            logging.warning("webdriver flag detected — browser likely flagged as bot")
            return False

        if navigator_checks['platform'] != identity['platform']:
            logging.warning(f"Platform mismatch: expected {identity['platform']}, got {navigator_checks['platform']}")
            return False

        if navigator_checks['userAgent'] != identity['user_agent']:
            logging.warning(f"User-Agent mismatch: expected {identity['user_agent']}, got {navigator_checks['userAgent']}")
            return False

        # Step 2: Timezone check via JS
        browser_tz = driver.execute_script("return Intl.DateTimeFormat().resolvedOptions().timeZone")
        if browser_tz != identity['timezone']:
            logging.warning(f"Timezone mismatch: expected {identity['timezone']}, got {browser_tz}")
            return False

        # Passed all checks
        logging.info("Stealth verification passed.")
        return True

    except Exception as e:
        logging.error(f"Error during stealth verification: {e}")
        return False


def search_yellowpages_homepage(driver, behavior, query="roofing contractors", location="Los Angeles, CA"):
    try:
        logging.info("Navigating to YellowPages homepage...")
        driver.get("https://www.yellowpages.com")

        search_input = WebDriverWait(driver, 20).until(
            EC.presence_of_element_located((By.ID, "query"))
        )
        location_input = driver.find_element(By.ID, "location")

        # Type location first + accept autosuggest
        location_input.clear()
        for ch in location:
            location_input.send_keys(ch)
            time.sleep(behavior["base_delay"] + random.uniform(0.05, 0.18))
        location_input.send_keys(Keys.TAB)
        time.sleep(random.uniform(0.2, 0.5))

        # Type query with tiny typo chance
        typo_chance = 0.15 if behavior.get("slow_typing") else 0.05
        base_delay = behavior["base_delay"]
        search_input.clear()
        for ch in query:
            if random.random() < typo_chance:
                wrong = random.choice("abcdefghijklmnopqrstuvwxyz")
                search_input.send_keys(wrong)
                time.sleep(base_delay + random.uniform(0.08, 0.22))
                search_input.send_keys(Keys.BACKSPACE)
                time.sleep(base_delay + random.uniform(0.08, 0.22))
            search_input.send_keys(ch)
            time.sleep(base_delay + random.uniform(0.08, 0.22))

        # mild human wiggle
        try:
            ActionChains(driver).move_by_offset(random.randint(5,30), random.randint(5,30)).pause(random.uniform(0.2,0.6)).perform()
        except Exception:
            pass

        search_input.send_keys(Keys.RETURN)

        # Wait for results + URL to settle
        WebDriverWait(driver, 30).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div.srp-listing, .result"))
        )
        time.sleep(base_delay + random.uniform(1.5, 3.0))
        WebDriverWait(driver, 10).until(lambda d: "/search" in d.current_url or "?search_terms=" in d.current_url)

        url = driver.current_url
        logging.info(f"YellowPages search completed. URL captured: {url}")
        return url

    except Exception as e:
        logging.error(f"Error during YellowPages homepage search: {e}")
        return None


def scrape_yellowpages(driver, behavior, url=None):
    results = []
    try:
        # IMPORTANT: if we're already on the results page, do NOT reload it
        if url:
            current = driver.current_url
            # Only navigate if we're not already there (avoid losing referrer/cookies)
            if not current or current != url:
                driver.get(url)

        WebDriverWait(driver, 30).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div.srp-listing, .result"))
        )

        # tiny scroll & idle
        driver.execute_script("window.scrollBy(0, arguments[0]);", random.randint(150, 600))
        time.sleep(random.uniform(1.2, 2.5))

        listings = driver.find_elements(By.CSS_SELECTOR, "div.srp-listing, .result")
        for idx, listing in enumerate(listings[:MAX_LISTINGS_PER_PAGE]):
            try:
                name_el = listing.find_element(By.CSS_SELECTOR, "h2.n a.business-name span, a.business-name span, a.business-name")
                name = name_el.text.strip()

                # phone can be missing or behind different templates
                phone = ""
                try:
                    phone_el = listing.find_element(By.CSS_SELECTOR, "div.phones.phone.primary, .phones")
                    phone = phone_el.text.strip()
                except Exception:
                    pass

                logging.info(f"Found listing: {name}  {phone}")
                results.append((name, phone))

                if idx and idx % 5 == 0:
                    time.sleep(random.uniform(2.0, 4.5))

            except Exception as e:
                logging.debug(f"Skipping one listing: {e}")
                continue

        # light de-robotize movement
        if behavior.get("move_mouse_between_actions"):
            try:
                header = driver.find_element(By.CSS_SELECTOR, "header, .header, body")
                ActionChains(driver).move_to_element(header).pause(random.uniform(0.5, 1.1)).perform()
            except Exception:
                pass

        time.sleep(random.uniform(1.8, 3.8))
        return results

    except Exception as e:
        logging.error(f"YellowPages scraping failed: {e}")
        return []




def log_business_to_csv(name, phone):
    with open("yellowpages_data.csv", mode="a", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow([datetime.now().isoformat(), name, phone])

def detect_block(driver):
    html = driver.page_source.lower()
    if "unusual traffic" in html or "are you a robot" in html or "captcha" in html:
        return "captcha"
    if "access denied" in html or "403 forbidden" in html:
        return "forbidden"
    if "automated queries" in html or "too many requests" in html:
        return "rate_limit"
    if len(html.strip()) < 1000:
        return "empty"
    return None
initialize_csv()

for attempt in range(MAX_RETRIES):
    try:
        proxy, identity = assign_proxy_and_identity()
        driver, behavior = launch_stealth_browser(proxy, identity)
        logging.info(f"Behavior profile: {behavior['name']}")

        # Optional health check — keep or remove
        if not verify_stealth_setup(driver, identity):
            raise Exception("Stealth verification failed.")

        block = detect_block(driver)
        if block:
            raise Exception(f"Pre-check block: {block}")

        url = search_yellowpages_homepage(driver, behavior,
                              query="roofing contractors",
                              location="Los Angeles, CA")
        if not url:
            raise Exception("Search submission failed or listings did not load.")

        results = scrape_yellowpages(driver, behavior, url)
        block = detect_block(driver)
        if block:
            raise Exception(f"Blocked while scraping YP: {block}")

        if results:
            for name, phone in results:
                print(f"{name} — {phone}")
                log_business_to_csv(name, phone)
            driver.quit()
            sys.exit(0)
        else:
            logging.warning("No YellowPages listings found on scanned pages.")
            driver.quit()
            # fall through to retry with new proxy

    except Exception as e:
        logging.error(f"Attempt {attempt + 1} failed: {e}")
        backoff = BLOCK_BACKOFF_STEPS[min(attempt, len(BLOCK_BACKOFF_STEPS) - 1)]
        logging.info(f"Waiting {backoff}s before retry with new proxy/identity.")
        try:
            driver.quit()
        except:
            pass
        time.sleep(backoff)
else:
    logging.critical("All retry attempts failed. Exiting.")
    sys.exit(1)

Save the code, then return to your Command Prompt (assuming you're still inside the yellowpages-scraper folder) and run:

Python YellowPagesExtractor.py

It’s best to keep the command prompt and browser windows separate so you can monitor the scraper’s progress in real-time. If you prefer to run it silently, just set HEADLESS to True.

Once you press Enter, a new Chrome window will open. The bot will navigate to SannySoft to run the fingerprint test we built into the code. Here’s what that looks like in action:

After passing all checks, the bot navigates to Yellow Pages. Give Cloudflare a moment to complete its verification process, as shown below:

Once that’s done, the bot lands on the homepage and begins typing the target location:

Next, it types the business keyword, “roofing contractors”:

It then hits Enter and waits for the page to load. This is the page where the extraction happens:

Once that's done, it will exit. Now for the moment of truth: Did our bot actually export the roofing contractor listings to a CSV file in our folder? Let's see. Navigate to our working folder and open the CSV file. You'll now see the extracted data below:

Cleaning up Yellow Pages Business Data

While the phone numbers we scraped this time were mostly clean, that won’t always be the case. Yellow Pages listings often come with phone numbers in inconsistent formats, such as:

(310) 555-1234  
310.555.1234  
+1-310-555-1234  
3105551234  
1 (310) 555 1234

What you really want is a clean, standardized format:

+13105551234

Here’s a quick Python script that cleans them up:

import pandas as pd
import re

def normalize_us_phone(phone):
    digits = re.sub(r"\D", "", phone)
    if digits.startswith("1") and len(digits) == 11:
        digits = digits[1:]
    return f"+1{digits}" if len(digits) == 10 else None

df = pd.read_csv("yellowpages_data.csv")
df["CleanPhone"] = df["Phone"].apply(normalize_us_phone)
df = df[df["CleanPhone"].notnull()]
df.to_csv("yellowpages_cleaned.csv", index=False)

This script strips out symbols, standardizes all numbers to the +1XXXXXXXXXX format, and drops anything invalid.

Using Scraped Data for Lead Generation

Once you’ve scraped and cleaned your data, the next step is enrichment. Start by appending missing email addresses using scraping tools or third-party platforms. You can also expand your script to capture business websites or social media profiles where available.

With a fully enriched dataset, import everything into your CRM. Segment by location, industry, or business size, and build cold outreach campaigns that feel tailored. Personalized messaging drives better response rates, and when done right, this workflow turns scraped listings into a high-conversion lead generation pipeline.

Conclusion You now have a working Yellow Pages scraper. Just remember: every time you run your scraper, it’s best to use a fresh batch of user agents, identity profiles, and proxies. The newer your fingerprint, the lower your chances of getting flagged. That first run usually gives you the best yield. Better yet, scale it. Use a pool of 100 proxies and 100 identities to minimize detection and maintain uninterrupted data flow.

Create Account

Author

Justas Vitaitis

Senior Software Engineer

Justas is a Senior Software Engineer with over a decade of proven expertise. He currently holds a crucial role in IPRoyal’s development team, regularly demonstrating his profound expertise in the Go programming language, contributing significantly to the company’s technological evolution. Justas is pivotal in maintaining our proxy network, serving as the authority on all aspects of proxies. Beyond coding, Justas is a passionate travel enthusiast and automotive aficionado, seamlessly blending his tech finesse with a passion for exploration.

Learn More About Justas Vitaitis Meet all Writers

Share on

Article by IPRoyal

Meet our writers

In This Article

How to Scrape Yellow Pages Without Getting Blocked

Key Takeaways

Is It Legal and Worth It to Scrape Yellow Pages?

Challenges When Scraping Yellow Pages

What Data Can You Extract From Yellow Pages?

Step-By-Step Guide to Scraping Yellow Pages

Step 1: Install Dependencies

Installing Python

Installing Selenium

Installing Undetected-Chromedriver

Step 2: Create and Store Your Yellow Pages Scraper

Step 3: Write Your Import Statements

Browser Control and Stealth Imports

Anti-bot and Spoofing Utilities

Data Logging and Output

Step 4: Configure Logging

Step 5: Set Up Global Settings

Step 6: Format Your Proxy String

Step 7: Create a Time Zone Lookup Table

Step 8: Match Proxy Location to the Right Time Zone

Step 9: Define Your Identity Pool

Step 10: Define a Function to Pair a Proxy With a Matching Identity

Step 11: Define Behavior Profiles for Stealth Scraping

Step 12: Define a Function to Pick a Random Behavior Profile

Step 13: Prepare Your CSV File for Output

Step 14: Launch Your Stealth Browser

Step 15: Verify Your Stealth Setup

Step 16: Searching Yellow Pages Like a Human

Step 17: Define a Function That Scrapes Business Listings

Step 18: Define a Function to Store the Results in a CSV File

Step 19: Define a Function to Detect Bot Blocks

Step 20: Write the Main Execution Block

Step 21: Run Your Yellow Pages Scraper

Cleaning up Yellow Pages Business Data

Using Scraped Data for Lead Generation

Related articles

How to Scrape Instagram Data with Python: A Step-by-Step Guide

How to Scrape Job Postings from Indeed: A Step-by-Step Guide

Python Discord API Guide: Build & Deploy Your First Bot Quickly