50% OFF Residential Proxies for 9 months — use code IPR50 at checkout

Get The Deal
Back to blog

How to Scrape Dynamic Websites with Python (and Avoid Getting Blocked)

Vilius Dumcius

Last updated -
How to

Key Takeaways

  • Dynamic web pages require JavaScript rendering, so you should use Selenium and BeautifulSoup.

  • Combine coding with rotation of user agents, headers, delays, and residential proxies to avoid blocks.

  • Use modular code, which allows you to reuse scraper parts across different projects.

Ready to get started?

Register now

Scraping dynamic websites with Python can be extremely useful when you need to extract data that’s hidden behind JavaScript. Scraping dynamic web pages that use JavaScript is more difficult for web scrapers to manage since it consists of changing elements.

After reading this, you’ll understand the difference between static and dynamic content and how to extract data safely. You’ll also get free code examples to scrape dynamic pages and tips to avoid IP bans.

What Is Dynamic Web Scraping (vs Static)?

A static web page sends all its content in the HTML response. You extract data, parse it, and that’s all. Scraping static websites is relatively fast and easy.

Scraping dynamic web pages, on the other hand, are more difficult since they change after the initial load. They rely on JavaScript to render new dynamic content, load new dynamic elements, or trigger user interactions. As a result, you need special tools to render JavaScript and mimic those interactions.

Static websites are easy to scrape and have light network traffic, but they’re limited to pages without heavy interactivity.

Dynamic web pages might contain more valuable information that you need, but they’re also more difficult to scrape.

Understanding the Document Object Model (DOM) helps when working with both. It’s basically the map of the page that your script will explore.

Challenges of Scraping Dynamic Content

When a page uses JavaScript to tweak parts of the page after it loads, your web scraper won’t see that dynamic content in raw HTML. A script might call APIs or respond to clicks, which sends new network requests.

That’s why scraping dynamic web pages is trickier. Those dynamic elements might not exist right away, and they won’t be visible to something that interacts only with HTTP requests.

You may need to wait, detect, or simulate user interactions, such as scrolling or clicking. Without doing that, you risk missing key information or getting blocked by anti-scraping measures.

Tools You Need for Dynamic Web Scraping in Python

Here are some beginner-level tools you should use:

  • Selenium mimics a real browser, so it can render JavaScript and handle complex dynamic web interactions.
  • BeautifulSoup parses the final HTML once it’s fully loaded.
  • Python WebDriver API lets Python control a browser session for Chrome or Firefox.

You might also consider browser automation libraries like Playwright or Splash. However, for a classic approach, you may want to stick to Selenium first. You may also be interested in Rust web scraping with Selenium.

Here’s a quick install guide for all these tools, assuming you have an IDE ready:

pip install selenium beautifulsoup4 webdriver-manager

How to Scrape Dynamic Sites: Step-by-Step Code

Here’s a complete example on how to scrape dynamic sites using Python:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def init_driver(headless=True, wait_timeout=10):
    """
    Initialize Chrome WebDriver with optimized settings for dynamic content scraping.
    
    Configures Chrome with performance optimizations including headless mode,
    sandboxing disabled, and hardware acceleration disabled for stability.
    Sets implicit wait timeout for element location.
    
    Args:
        headless (bool): Run browser in headless mode (no GUI). Defaults to True.
        wait_timeout (int): Implicit wait timeout in seconds for element location. Defaults to 10.
    
    Returns:
        webdriver.Chrome: Configured Chrome WebDriver instance ready for scraping.
    
    Example:
        driver = init_driver(headless=False, wait_timeout=15)
    """
    options = webdriver.ChromeOptions()
    
    if headless:
        options.add_argument("--headless")
    
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument("--disable-gpu")
    options.add_argument("--disable-extensions")
    
    driver = webdriver.Chrome(
        service=Service(ChromeDriverManager().install()), 
        options=options
    )
    
    driver.implicitly_wait(wait_timeout)
    
    return driver

def wait_for_element(driver, selector, timeout=10, by=By.CSS_SELECTOR):
    """
    Wait for a specific element to be present in the DOM.
    
    Uses Selenium's WebDriverWait with expected conditions to wait for element
    presence. Logs warning if element is not found within timeout period.
    
    Args:
        driver (webdriver.Chrome): Chrome WebDriver instance.
        selector (str): CSS selector or other selector string for target element.
        timeout (int): Maximum time to wait in seconds. Defaults to 10.
        by (By): Selenium By locator strategy. Defaults to By.CSS_SELECTOR.
    
    Returns:
        WebElement or None: Found element or None if timeout exceeded.
    
    Example:
        element = wait_for_element(driver, ".product-list", timeout=15)
    """
    try:
        element = WebDriverWait(driver, timeout).until(
            EC.presence_of_element_located((by, selector))
        )
        return element
    except TimeoutException:
        logger.warning(f"Element {selector} not found within {timeout} seconds")
        return None

def wait_for_dynamic_content(driver, content_selector=".item", timeout=15):
    """
    Wait for dynamic content to load using multiple detection strategies.
    
    Employs three strategies in sequence:
    1. Wait for specific content elements to appear
    2. Wait for document ready state completion
    3. Wait for jQuery AJAX requests to complete (if jQuery present)
    
    Includes additional buffer time for remaining async operations.
    
    Args:
        driver (webdriver.Chrome): Chrome WebDriver instance.
        content_selector (str): CSS selector for main content elements. Defaults to ".item".
        timeout (int): Maximum wait time in seconds. Defaults to 15.
    
    Returns:
        bool: True when content loading strategies complete successfully.
    
    Example:
        wait_for_dynamic_content(driver, ".product-card", timeout=20)
    """
    if wait_for_element(driver, content_selector, timeout):
        logger.info("Content loaded via element detection")
        return True
    
    WebDriverWait(driver, timeout).until(
        lambda d: d.execute_script("return document.readyState") == "complete"
    )
    
    WebDriverWait(driver, timeout).until(
        lambda d: d.execute_script("return jQuery.active == 0") if 
        d.execute_script("return typeof jQuery !== 'undefined'") else True
    )
    
    time.sleep(2)
    
    return True

def handle_infinite_scroll(driver, max_scrolls=5, scroll_pause=2):
    """
    Handle infinite scroll pages by automatically scrolling and detecting new content.
    
    Scrolls to bottom of page repeatedly, waiting for new content to load after each scroll.
    Stops when no new content is detected or maximum scroll limit is reached.
    Tracks page height changes to determine when new content has loaded.
    
    Args:
        driver (webdriver.Chrome): Chrome WebDriver instance.
        max_scrolls (int): Maximum number of scroll attempts. Defaults to 5.
        scroll_pause (float): Time to wait between scrolls in seconds. Defaults to 2.
    
    Returns:
        int: Number of successful scrolls that loaded new content.
    
    Example:
        scrolls_completed = handle_infinite_scroll(driver, max_scrolls=10, scroll_pause=3)
    """
    last_height = driver.execute_script("return document.body.scrollHeight")
    scrolls = 0
    
    while scrolls < max_scrolls:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        
        time.sleep(scroll_pause)
        
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
            
        last_height = new_height
        scrolls += 1
        logger.info(f"Scroll {scrolls}: New content loaded")
    
    return scrolls

def fetch_dynamic_page(driver, url, content_selector=".item", 
                      handle_scroll=False, max_scrolls=5):
    """
    Fetch webpage with comprehensive dynamic content handling and optional scroll support.
    
    Loads the specified URL and waits for dynamic content using multiple strategies.
    Optionally handles infinite scroll scenarios by automatically scrolling and
    waiting for new content. Includes error handling and logging for debugging.
    
    Args:
        driver (webdriver.Chrome): Chrome WebDriver instance.
        url (str): Target URL to fetch and scrape.
        content_selector (str): CSS selector for main content elements. Defaults to ".item".
        handle_scroll (bool): Enable infinite scroll handling. Defaults to False.
        max_scrolls (int): Maximum scroll attempts if handle_scroll is True. Defaults to 5.
    
    Returns:
        str or None: HTML page source after dynamic content loads, or None if error occurs.
    
    Example:
        html = fetch_dynamic_page(driver, "https://site.com", ".products", handle_scroll=True)
    """
    try:
        logger.info(f"Fetching: {url}")
        driver.get(url)
        
        wait_for_dynamic_content(driver, content_selector)
        
        if handle_scroll:
            scrolls = handle_infinite_scroll(driver, max_scrolls)
            logger.info(f"Completed {scrolls} scrolls")
        
        time.sleep(1)
        
        return driver.page_source
        
    except Exception as e:
        logger.error(f"Error fetching page: {e}")
        return None

def parse_data_robust(html, selectors=None):
    """
    Parse HTML content with flexible selectors and comprehensive error handling.
    
    Extracts structured data from HTML using BeautifulSoup with fallback selector
    strategies. Attempts multiple selector patterns for each field to handle
    varying website structures. Gracefully handles missing elements and parsing errors.
    
    Args:
        html (str or None): HTML content to parse. Returns empty list if None.
        selectors (dict, optional): Custom selector configuration. Defaults to standard e-commerce patterns.
            Expected format: {
                'container': str,  # Container element selector
                'field_name': [str, ...],  # List of selectors to try for each field
                ...
            }
    
    Returns:
        list[dict]: List of dictionaries containing extracted data. Each dict represents one item
                   with keys corresponding to successfully extracted fields.
    
    Example:
        selectors = {
            'container': '.product',
            'title': ['h2.title', '.product-name'],
            'price': ['.price', '.cost']
        }
        items = parse_data_robust(html, selectors)
    """
    if not html:
        return []
    
    if selectors is None:
        selectors = {
            'container': '.item',
            'title': ['h2', 'h3', '.title', '[data-title]'],
            'price': ['.price', '.cost', '[data-price]', '.amount']
        }
    
    soup = BeautifulSoup(html, "html.parser")
    items = []
    
    containers = soup.select(selectors['container'])
    logger.info(f"Found {len(containers)} items")
    
    for container in containers:
        item = {}
        
        for field, field_selectors in selectors.items():
            if field == 'container':
                continue
                
            if isinstance(field_selectors, str):
                field_selectors = [field_selectors]
            
            for selector in field_selectors:
                try:
                    element = container.select_one(selector)
                    if element:
                        text = element.get_text(strip=True)
                        if text:
                            item[field] = text
                            break
                except Exception as e:
                    logger.debug(f"Selector {selector} failed: {e}")
                    continue
        
        if item:
            items.append(item)
    
    return items

def scrape_spa_with_navigation(driver, base_url, pages_to_scrape=None):
    """Handle Single Page Applications with client-side routing"""
    all_data = []
    
    if pages_to_scrape is None:
        pages_to_scrape = ['/', '/page/1', '/page/2']
    
    for page in pages_to_scrape:
        url = base_url.rstrip('/') + page
        
        # Navigate using JavaScript for SPA
        driver.execute_script(f"window.history.pushState('', '', '{page}');")
        
        # Trigger route change event if needed
        driver.execute_script("window.dispatchEvent(new PopStateEvent('popstate'));")
        
        # Wait for content
        wait_for_dynamic_content(driver)
        
        html = driver.page_source
        data = parse_data_robust(html)
        all_data.extend(data)
        
        logger.info(f"Scraped {len(data)} items from {page}")
    
    return all_data

def main_advanced(url, config=None):
    """Main function with advanced configuration options"""
    
    # Default configuration
    default_config = {
        'headless': True,
        'timeout': 15,
        'content_selector': '.item',
        'handle_scroll': False,
        'max_scrolls': 5,
        'custom_selectors': None,
        'is_spa': False,
        'spa_pages': None
    }
    
    if config:
        default_config.update(config)
    
    driver = None
    try:
        # Initialize driver
        driver = init_driver(
            headless=default_config['headless'],
            wait_timeout=default_config['timeout']
        )
        
        if default_config['is_spa']:
            # Handle SPA
            data = scrape_spa_with_navigation(
                driver, url, default_config['spa_pages']
            )
        else:
            # Handle regular dynamic site
            html = fetch_dynamic_page(
                driver, url,
                content_selector=default_config['content_selector'],
                handle_scroll=default_config['handle_scroll'],
                max_scrolls=default_config['max_scrolls']
            )
            
            data = parse_data_robust(html, default_config['custom_selectors'])
        
        logger.info(f"Successfully scraped {len(data)} total items")
        return data
        
    except Exception as e:
        logger.error(f"Scraping failed: {e}")
        return []
        
    finally:
        if driver:
            driver.quit()

# Example usage
if __name__ == "__main__":
    # Basic usage
    url = "https://example.com/dynamic"
    results = main_advanced(url)
    print(f"Basic scraping: {len(results)} items")
    
    # Advanced configuration for infinite scroll site
    scroll_config = {
        'handle_scroll': True,
        'max_scrolls': 10,
        'content_selector': '.product-card',
        'custom_selectors': {
            'container': '.product-card',
            'title': ['.product-title', 'h3'],
            'price': ['.price', '.cost'],
            'rating': ['.rating', '.stars']
        }
    }
    
    results_scroll = main_advanced(url, scroll_config)
    print(f"Scroll scraping: {len(results_scroll)} items")
    
    # SPA configuration
    spa_config = {
        'is_spa': True,
        'spa_pages': ['/products', '/products/page/2', '/products/page/3'],
        'content_selector': '.item'
    }
    
    spa_results = main_advanced("https://spa-example.com", spa_config)
    print(f"SPA scraping: {len(spa_results)} items")

Here’s what’s happening in this web scraping process:

  • init_driver() sets up Selenium with Chrome WebDriver, configured with performance optimizations for dynamic content scraping, including headless mode and timeout settings.
  • wait_for_dynamic_content() implements multiple strategies to detect when JavaScript has finished loading content - checking for specific elements, document ready state, and AJAX completion.
  • fetch_dynamic_page() opens dynamic web pages, waits for all content to load using the detection strategies, and optionally handles infinite scroll scenarios by automatically scrolling and loading more content.
  • parse_data_robust() uses BeautifulSoup to extract structured data from the fully loaded web page with flexible selectors that try multiple patterns and gracefully handle missing elements.
  • main_advanced() ties everything together with configurable options for different site types (regular dynamic sites, infinite scroll pages, Single Page Applications) and returns structured results that can be exported as CSV, JSON, or processed further.

Additional helper functions handle specific scenarios:

  • handle_infinite_scroll() automatically scrolls pages and detects new content loading.
  • scrape_spa_with_navigation() manages Single Page Applications with client-side routing.
  • wait_for_element() provides reliable element detection with timeout handling.

This web scraping approach is ideal for scraping dynamic websites. For static websites, you wouldn’t always need Selenium, as in most cases you could use requests and BeautifulSoup. If you want to learn more, you should check out how to use web scraping across different industries and what it requires.

How to Avoid Getting Blocked While Scraping

IP bans occur when web pages detect too many calls in a short period. Dynamic websites may watch for repeated network requests, same patterns, or missing user headers. Here’s how to avoid that while you extract data from both static and dynamic pages:

  • Rotate headers and randomize user agents to mimic real browsers.
  • Throttle requests to the web page by adding random delays.
  • Use residential proxies while web scraping dynamic websites to spread across IPs.
  • Handle cookies and session IDs to trick anti-bot systems in the web pages.

Follow these best web scraping practices and you will minimize your chances of getting banned while scraping dynamic pages.

Conclusion

Dynamic web scraping with Python involves more work than static web pages. You need tools like Selenium to handle JavaScript rendering, whereas web scraping static pages can be done using only requests.

Now you know how to fetch, parse, and export data safely. You’ve also learned about tips that could help you dodge IP bans and utilize web scraping on dynamic websites more consistently.

Dynamic web scraping is not that difficult once you get the hang of it. It’s surely more challenging to scrape dynamic pages than static ones, it’s still highly doable once you get the right tools and information.

Create Account

Author

Vilius Dumcius

Product Owner

With six years of programming experience, Vilius specializes in full-stack web development with PHP (Laravel), MySQL, Docker, Vue.js, and Typescript. Managing a skilled team at IPRoyal for years, he excels in overseeing diverse web projects and custom solutions. Vilius plays a critical role in managing proxy-related tasks for the company, serving as the lead programmer involved in every aspect of the business. Outside of his professional duties, Vilius channels his passion for personal and professional growth, balancing his tech expertise with a commitment to continuous improvement.

Learn More About Vilius Dumcius
Share on

Related articles