How to Scrape Dynamic Websites with Python (and Avoid Getting Blocked)


Vilius Dumcius
Key Takeaways
-
Dynamic web pages require JavaScript rendering, so you should use Selenium and BeautifulSoup.
-
Combine coding with rotation of user agents, headers, delays, and residential proxies to avoid blocks.
-
Use modular code, which allows you to reuse scraper parts across different projects.
In This Article
Scraping dynamic websites with Python can be extremely useful when you need to extract data that’s hidden behind JavaScript. Scraping dynamic web pages that use JavaScript is more difficult for web scrapers to manage since it consists of changing elements.
After reading this, you’ll understand the difference between static and dynamic content and how to extract data safely. You’ll also get free code examples to scrape dynamic pages and tips to avoid IP bans.
What Is Dynamic Web Scraping (vs Static)?
A static web page sends all its content in the HTML response. You extract data, parse it, and that’s all. Scraping static websites is relatively fast and easy.
Scraping dynamic web pages, on the other hand, are more difficult since they change after the initial load. They rely on JavaScript to render new dynamic content, load new dynamic elements, or trigger user interactions. As a result, you need special tools to render JavaScript and mimic those interactions.
Static websites are easy to scrape and have light network traffic, but they’re limited to pages without heavy interactivity.
Dynamic web pages might contain more valuable information that you need, but they’re also more difficult to scrape.
Understanding the Document Object Model (DOM) helps when working with both. It’s basically the map of the page that your script will explore.
Challenges of Scraping Dynamic Content
When a page uses JavaScript to tweak parts of the page after it loads, your web scraper won’t see that dynamic content in raw HTML. A script might call APIs or respond to clicks, which sends new network requests.
That’s why scraping dynamic web pages is trickier. Those dynamic elements might not exist right away, and they won’t be visible to something that interacts only with HTTP requests.
You may need to wait, detect, or simulate user interactions, such as scrolling or clicking. Without doing that, you risk missing key information or getting blocked by anti-scraping measures.
Tools You Need for Dynamic Web Scraping in Python
Here are some beginner-level tools you should use:
- Selenium mimics a real browser, so it can render JavaScript and handle complex dynamic web interactions.
- BeautifulSoup parses the final HTML once it’s fully loaded.
- Python WebDriver API lets Python control a browser session for Chrome or Firefox.
You might also consider browser automation libraries like Playwright or Splash. However, for a classic approach, you may want to stick to Selenium first. You may also be interested in Rust web scraping with Selenium.
Here’s a quick install guide for all these tools, assuming you have an IDE ready:
pip install selenium beautifulsoup4 webdriver-manager
How to Scrape Dynamic Sites: Step-by-Step Code
Here’s a complete example on how to scrape dynamic sites using Python:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def init_driver(headless=True, wait_timeout=10):
"""
Initialize Chrome WebDriver with optimized settings for dynamic content scraping.
Configures Chrome with performance optimizations including headless mode,
sandboxing disabled, and hardware acceleration disabled for stability.
Sets implicit wait timeout for element location.
Args:
headless (bool): Run browser in headless mode (no GUI). Defaults to True.
wait_timeout (int): Implicit wait timeout in seconds for element location. Defaults to 10.
Returns:
webdriver.Chrome: Configured Chrome WebDriver instance ready for scraping.
Example:
driver = init_driver(headless=False, wait_timeout=15)
"""
options = webdriver.ChromeOptions()
if headless:
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(
service=Service(ChromeDriverManager().install()),
options=options
)
driver.implicitly_wait(wait_timeout)
return driver
def wait_for_element(driver, selector, timeout=10, by=By.CSS_SELECTOR):
"""
Wait for a specific element to be present in the DOM.
Uses Selenium's WebDriverWait with expected conditions to wait for element
presence. Logs warning if element is not found within timeout period.
Args:
driver (webdriver.Chrome): Chrome WebDriver instance.
selector (str): CSS selector or other selector string for target element.
timeout (int): Maximum time to wait in seconds. Defaults to 10.
by (By): Selenium By locator strategy. Defaults to By.CSS_SELECTOR.
Returns:
WebElement or None: Found element or None if timeout exceeded.
Example:
element = wait_for_element(driver, ".product-list", timeout=15)
"""
try:
element = WebDriverWait(driver, timeout).until(
EC.presence_of_element_located((by, selector))
)
return element
except TimeoutException:
logger.warning(f"Element {selector} not found within {timeout} seconds")
return None
def wait_for_dynamic_content(driver, content_selector=".item", timeout=15):
"""
Wait for dynamic content to load using multiple detection strategies.
Employs three strategies in sequence:
1. Wait for specific content elements to appear
2. Wait for document ready state completion
3. Wait for jQuery AJAX requests to complete (if jQuery present)
Includes additional buffer time for remaining async operations.
Args:
driver (webdriver.Chrome): Chrome WebDriver instance.
content_selector (str): CSS selector for main content elements. Defaults to ".item".
timeout (int): Maximum wait time in seconds. Defaults to 15.
Returns:
bool: True when content loading strategies complete successfully.
Example:
wait_for_dynamic_content(driver, ".product-card", timeout=20)
"""
if wait_for_element(driver, content_selector, timeout):
logger.info("Content loaded via element detection")
return True
WebDriverWait(driver, timeout).until(
lambda d: d.execute_script("return document.readyState") == "complete"
)
WebDriverWait(driver, timeout).until(
lambda d: d.execute_script("return jQuery.active == 0") if
d.execute_script("return typeof jQuery !== 'undefined'") else True
)
time.sleep(2)
return True
def handle_infinite_scroll(driver, max_scrolls=5, scroll_pause=2):
"""
Handle infinite scroll pages by automatically scrolling and detecting new content.
Scrolls to bottom of page repeatedly, waiting for new content to load after each scroll.
Stops when no new content is detected or maximum scroll limit is reached.
Tracks page height changes to determine when new content has loaded.
Args:
driver (webdriver.Chrome): Chrome WebDriver instance.
max_scrolls (int): Maximum number of scroll attempts. Defaults to 5.
scroll_pause (float): Time to wait between scrolls in seconds. Defaults to 2.
Returns:
int: Number of successful scrolls that loaded new content.
Example:
scrolls_completed = handle_infinite_scroll(driver, max_scrolls=10, scroll_pause=3)
"""
last_height = driver.execute_script("return document.body.scrollHeight")
scrolls = 0
while scrolls < max_scrolls:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(scroll_pause)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
scrolls += 1
logger.info(f"Scroll {scrolls}: New content loaded")
return scrolls
def fetch_dynamic_page(driver, url, content_selector=".item",
handle_scroll=False, max_scrolls=5):
"""
Fetch webpage with comprehensive dynamic content handling and optional scroll support.
Loads the specified URL and waits for dynamic content using multiple strategies.
Optionally handles infinite scroll scenarios by automatically scrolling and
waiting for new content. Includes error handling and logging for debugging.
Args:
driver (webdriver.Chrome): Chrome WebDriver instance.
url (str): Target URL to fetch and scrape.
content_selector (str): CSS selector for main content elements. Defaults to ".item".
handle_scroll (bool): Enable infinite scroll handling. Defaults to False.
max_scrolls (int): Maximum scroll attempts if handle_scroll is True. Defaults to 5.
Returns:
str or None: HTML page source after dynamic content loads, or None if error occurs.
Example:
html = fetch_dynamic_page(driver, "https://site.com", ".products", handle_scroll=True)
"""
try:
logger.info(f"Fetching: {url}")
driver.get(url)
wait_for_dynamic_content(driver, content_selector)
if handle_scroll:
scrolls = handle_infinite_scroll(driver, max_scrolls)
logger.info(f"Completed {scrolls} scrolls")
time.sleep(1)
return driver.page_source
except Exception as e:
logger.error(f"Error fetching page: {e}")
return None
def parse_data_robust(html, selectors=None):
"""
Parse HTML content with flexible selectors and comprehensive error handling.
Extracts structured data from HTML using BeautifulSoup with fallback selector
strategies. Attempts multiple selector patterns for each field to handle
varying website structures. Gracefully handles missing elements and parsing errors.
Args:
html (str or None): HTML content to parse. Returns empty list if None.
selectors (dict, optional): Custom selector configuration. Defaults to standard e-commerce patterns.
Expected format: {
'container': str, # Container element selector
'field_name': [str, ...], # List of selectors to try for each field
...
}
Returns:
list[dict]: List of dictionaries containing extracted data. Each dict represents one item
with keys corresponding to successfully extracted fields.
Example:
selectors = {
'container': '.product',
'title': ['h2.title', '.product-name'],
'price': ['.price', '.cost']
}
items = parse_data_robust(html, selectors)
"""
if not html:
return []
if selectors is None:
selectors = {
'container': '.item',
'title': ['h2', 'h3', '.title', '[data-title]'],
'price': ['.price', '.cost', '[data-price]', '.amount']
}
soup = BeautifulSoup(html, "html.parser")
items = []
containers = soup.select(selectors['container'])
logger.info(f"Found {len(containers)} items")
for container in containers:
item = {}
for field, field_selectors in selectors.items():
if field == 'container':
continue
if isinstance(field_selectors, str):
field_selectors = [field_selectors]
for selector in field_selectors:
try:
element = container.select_one(selector)
if element:
text = element.get_text(strip=True)
if text:
item[field] = text
break
except Exception as e:
logger.debug(f"Selector {selector} failed: {e}")
continue
if item:
items.append(item)
return items
def scrape_spa_with_navigation(driver, base_url, pages_to_scrape=None):
"""Handle Single Page Applications with client-side routing"""
all_data = []
if pages_to_scrape is None:
pages_to_scrape = ['/', '/page/1', '/page/2']
for page in pages_to_scrape:
url = base_url.rstrip('/') + page
# Navigate using JavaScript for SPA
driver.execute_script(f"window.history.pushState('', '', '{page}');")
# Trigger route change event if needed
driver.execute_script("window.dispatchEvent(new PopStateEvent('popstate'));")
# Wait for content
wait_for_dynamic_content(driver)
html = driver.page_source
data = parse_data_robust(html)
all_data.extend(data)
logger.info(f"Scraped {len(data)} items from {page}")
return all_data
def main_advanced(url, config=None):
"""Main function with advanced configuration options"""
# Default configuration
default_config = {
'headless': True,
'timeout': 15,
'content_selector': '.item',
'handle_scroll': False,
'max_scrolls': 5,
'custom_selectors': None,
'is_spa': False,
'spa_pages': None
}
if config:
default_config.update(config)
driver = None
try:
# Initialize driver
driver = init_driver(
headless=default_config['headless'],
wait_timeout=default_config['timeout']
)
if default_config['is_spa']:
# Handle SPA
data = scrape_spa_with_navigation(
driver, url, default_config['spa_pages']
)
else:
# Handle regular dynamic site
html = fetch_dynamic_page(
driver, url,
content_selector=default_config['content_selector'],
handle_scroll=default_config['handle_scroll'],
max_scrolls=default_config['max_scrolls']
)
data = parse_data_robust(html, default_config['custom_selectors'])
logger.info(f"Successfully scraped {len(data)} total items")
return data
except Exception as e:
logger.error(f"Scraping failed: {e}")
return []
finally:
if driver:
driver.quit()
# Example usage
if __name__ == "__main__":
# Basic usage
url = "https://example.com/dynamic"
results = main_advanced(url)
print(f"Basic scraping: {len(results)} items")
# Advanced configuration for infinite scroll site
scroll_config = {
'handle_scroll': True,
'max_scrolls': 10,
'content_selector': '.product-card',
'custom_selectors': {
'container': '.product-card',
'title': ['.product-title', 'h3'],
'price': ['.price', '.cost'],
'rating': ['.rating', '.stars']
}
}
results_scroll = main_advanced(url, scroll_config)
print(f"Scroll scraping: {len(results_scroll)} items")
# SPA configuration
spa_config = {
'is_spa': True,
'spa_pages': ['/products', '/products/page/2', '/products/page/3'],
'content_selector': '.item'
}
spa_results = main_advanced("https://spa-example.com", spa_config)
print(f"SPA scraping: {len(spa_results)} items")
Here’s what’s happening in this web scraping process:
- init_driver() sets up Selenium with Chrome WebDriver, configured with performance optimizations for dynamic content scraping, including headless mode and timeout settings.
- wait_for_dynamic_content() implements multiple strategies to detect when JavaScript has finished loading content - checking for specific elements, document ready state, and AJAX completion.
- fetch_dynamic_page() opens dynamic web pages, waits for all content to load using the detection strategies, and optionally handles infinite scroll scenarios by automatically scrolling and loading more content.
- parse_data_robust() uses BeautifulSoup to extract structured data from the fully loaded web page with flexible selectors that try multiple patterns and gracefully handle missing elements.
- main_advanced() ties everything together with configurable options for different site types (regular dynamic sites, infinite scroll pages, Single Page Applications) and returns structured results that can be exported as CSV, JSON, or processed further.
Additional helper functions handle specific scenarios:
- handle_infinite_scroll() automatically scrolls pages and detects new content loading.
- scrape_spa_with_navigation() manages Single Page Applications with client-side routing.
- wait_for_element() provides reliable element detection with timeout handling.
This web scraping approach is ideal for scraping dynamic websites. For static websites, you wouldn’t always need Selenium, as in most cases you could use requests and BeautifulSoup. If you want to learn more, you should check out how to use web scraping across different industries and what it requires.
How to Avoid Getting Blocked While Scraping
IP bans occur when web pages detect too many calls in a short period. Dynamic websites may watch for repeated network requests, same patterns, or missing user headers. Here’s how to avoid that while you extract data from both static and dynamic pages:
- Rotate headers and randomize user agents to mimic real browsers.
- Throttle requests to the web page by adding random delays.
- Use residential proxies while web scraping dynamic websites to spread across IPs.
- Handle cookies and session IDs to trick anti-bot systems in the web pages.
Follow these best web scraping practices and you will minimize your chances of getting banned while scraping dynamic pages.
Conclusion
Dynamic web scraping with Python involves more work than static web pages. You need tools like Selenium to handle JavaScript rendering, whereas web scraping static pages can be done using only requests.
Now you know how to fetch, parse, and export data safely. You’ve also learned about tips that could help you dodge IP bans and utilize web scraping on dynamic websites more consistently.
Dynamic web scraping is not that difficult once you get the hang of it. It’s surely more challenging to scrape dynamic pages than static ones, it’s still highly doable once you get the right tools and information.

Author
Vilius Dumcius
Product Owner
With six years of programming experience, Vilius specializes in full-stack web development with PHP (Laravel), MySQL, Docker, Vue.js, and Typescript. Managing a skilled team at IPRoyal for years, he excels in overseeing diverse web projects and custom solutions. Vilius plays a critical role in managing proxy-related tasks for the company, serving as the lead programmer involved in every aspect of the business. Outside of his professional duties, Vilius channels his passion for personal and professional growth, balancing his tech expertise with a commitment to continuous improvement.
Learn More About Vilius Dumcius