How to Scrape Job Postings from Indeed: A Step-by-Step Guide

Marijus Narbutas

Last updated - March 23, 2026 ‐ 12 min read

Key Takeaways

Make sure to scrape ethically to avoid server overloads and crashing.
Use advanced scraping setups with tools like Selenium, Playwright, or specialized platforms like Octoparse to gather job posting data.
Stay hidden with residential proxies to avoid IP bans while you scrape.

Web scraping is the process of pulling information from websites automatically without spending hours reading each page manually.

One major use case is tracking job details to identify hiring trends and gather other HR-related insights. Companies and researchers watch job listings to study job markets. Students sometimes check job listings to find internships. All of that could be valuable data to someone working in the hiring field.

Indeed is one of the most popular places for such data collection. It’s full of job details from everywhere. While free for human browsing, Indeed strictly protects its data against automated bots, meaning you will need strong stealth techniques or access to their official API to collect this data successfully.

Is It Legal to Scrape Indeed Job Postings?

In the U.S., web scraping lives in a grey area. It’s not downright illegal, but it can get you in some trouble. When you scrape Indeed, you might face legal issues if you break the site’s rules.

Indeed’s Terms of Service have anti-scraping rules in place. They don’t want bots extracting or aggregating job data without permission in violation of their user agreements. If you ignore that, you risk bans or worse.

Websites like Indeed use specific tools to spot scraping attempts. If they catch your bot sifting through their job details, they might block your IP very quickly. Other times, they may quietly flag your account without any warning. It’s only a matter of time.

While residential proxies are essential for solving the IP banning problem, you’ll also need specialized anti-bot bypass tools or fortified headless browsers to handle the advanced Cloudflare and DataDome protections that Indeed employs.

Tools and Methods to Scrape Indeed Job Listings

There are several ways to do Indeed web scraping and collect job listing data. Here’s a quick overview:

BeautifulSoup and requests are great for simple websites, but they will fail on Indeed because they cannot execute JavaScript or bypass its advanced anti-bot protections.
Headless browsers like Selenium or Playwright are necessary to render Indeed’s JavaScript, but you must use stealth modifications (like undetected-chromedriver) so the site doesn’t instantly recognize your browser as a bot.
Octoparse is a visual scraper tool that’s great if you can’t code but still want to extract job data. It also has a free and paid version.
Third-party scraper APIs can bypass the security blocks for you and deliver clean job details without you having to build the crawler yourself.
Purchasing datasets is another option if scraping is not a necessity for you.

Big companies usually use smart bots, multiple IP addresses, and a handful of tools to scrape Indeed and gather or update thousands of job details simultaneously.

Ready to get started?

Comparing Extraction Methods: APIs, No-Code Tools, and Custom Scripts

Choosing the appropriate approach for your project depends heavily on your technical expertise, and it also relies entirely on the scale of your operation. Therefore, evaluating the available methods ensures you pick the most efficient route for collecting your required job data.

Writing your own custom scripts gives you complete flexibility over how you parse and store the data, though your extraction success will still depend entirely on your ability to bypass Indeed’s evolving anti-bot defenses continuously. It demands significant programming knowledge and forces you to handle maintenance manually whenever the target website updates its internal structure.

Platforms like Octoparse provide a highly visual interface that simplifies the entire workflow, and they’re great for users with zero coding experience who want to gather information quickly. On the downside, these no-code solutions frequently come with restrictive paywalls, and they might lack the deep customization necessary for highly complex scraping Indeed tasks.

Utilizing a dedicated scraping API streamlines the process by delivering pre-structured data directly to your database. It typically handles all the proxy rotation and CAPTCHA solving behind the scenes. But these premium services can become quite expensive at scale, so they might not fit within the budget of smaller projects.

Evaluating these tradeoffs allows you to align the solutions better with your business objectives.

Step-by-Step Guide to Scraping Indeed Jobs

Now, let’s get down to the brass tacks of scraping. Here’s a simple guide on how to scrape Indeed using Python.

Step 1: Set Up Your Environment

Install these libraries before you go on to code:

pip install requests
pip install beautifulsoup4
pip install pandas
pip install curl-cffi

Step 2: Write a Basic Indeed Scraper

Here’s a basic script that demonstrates the fundamental logic of parsing job data, though you will quickly find that Indeed’s security blocks standard requests like this, meaning you'll need to upgrade to stealth tools to run it successfully. This specific code snippet will scrape software engineer jobs in New York City from Indeed’s search page:

import pandas as pd
import time
import random
from bs4 import BeautifulSoup
from curl_cffi import requests as creq

QUERIES = [
    "software+engineer",
    "backend+developer",
    "frontend+developer",
    "python+developer",
    "fullstack+engineer",
]

LOCATION = "New+York%2C+NY"


def random_headers():
    chrome_versions = ["123.0.0.0", "124.0.0.0", "125.0.0.0", "126.0.0.0"]
    v = random.choice(chrome_versions)
    return {
        "User-Agent": f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      f"AppleWebKit/537.36 (KHTML, like Gecko) "
                      f"Chrome/{v} Safari/537.36",
        "Accept-Language": random.choice(["en-US,en;q=0.9", "en-US,en;q=0.8"]),
    }


def scrape_indeed_jobs():
    job_list = []
    seen_urls = set()
    session = creq.Session(impersonate="chrome")

    for query in QUERIES:
        url = f"https://www.indeed.com/jobs?q={query}&l={LOCATION}&start=0"
        print(f"Scraping [{query}]: {url}")

        resp = session.get(url, headers=random_headers())

        if resp.status_code != 200:
            print(f"  Got status {resp.status_code}, skipping")
            continue

        soup = BeautifulSoup(resp.text, "html.parser")
        cards = soup.find_all("div", class_="job_seen_beacon")
        print(f"  Found {len(cards)} cards")

        if not cards:
            print("  No cards found — likely blocked or selectors changed.")
            print(f"  Response length: {len(resp.text)} chars")

        for card in cards:
            h2 = card.find("h2", class_="jobTitle")
            a = h2.find("a", href=True) if h2 else None
            job_title = a.get_text(strip=True) if a else None
            job_url = f"https://www.indeed.com{a['href']}" if a else None

            if job_url in seen_urls:
                continue
            seen_urls.add(job_url)

            comp = card.find("span", {"data-testid": "company-name"})
            company = comp.get_text(strip=True) if comp else None

            loc = card.find("div", {"data-testid": "text-location"})
            location = loc.get_text(strip=True) if loc else None

            snippet = card.find("div", {"data-testid": "jobsnippet_footer"})
            summary = snippet.get_text(" ", strip=True) if snippet else None

            job_list.append({
                "Job Title": job_title,
                "Company":   company,
                "Location":  location,
                "Summary":   summary,
                "Query":     query.replace("+", " "),
                "URL":       job_url,
            })

        time.sleep(random.uniform(5, 10))

    return job_list


if __name__ == "__main__":
    jobs = scrape_indeed_jobs()
    df = pd.DataFrame(jobs)
    df.to_csv("indeed_job_postings.csv", index=False)
    print(f"Scraped {len(jobs)} unique jobs and saved to indeed_job_postings.csv")

If you need different job descriptions or job positions in some other locations, you will have to adjust the code to fit your scraping needs. If you want to scrape other platforms like Glassdoor, you cannot simply reuse this code; you’ll need to write an entirely new script tailored to that site's unique HTML structure and specific anti-bot defenses.

Extracting Data From Embedded JSON

When you're inspecting the page source, you'll notice that modern websites load their content using embedded JSON objects within the HTML structure. Extracting it directly is incredibly efficient, and parsing JSON data is significantly more reliable than navigating complex HTML trees.

Once you've successfully used stealth tools to bypass the anti-bot protections and retrieve the actual page source, you can locate the script tag containing the application state and load it into a structured format for immediate access to the data. Here's a quick example demonstrating how you can parse embedded data:

import json
import re
import pandas as pd
import time
import random
from bs4 import BeautifulSoup
from curl_cffi import requests as creq

QUERIES = [
    "software+engineer",
    "backend+developer",
    "frontend+developer",
    "python+developer",
    "fullstack+engineer",
]

LOCATION = "New+York%2C+NY"


def random_headers():
    chrome_versions = ["123.0.0.0", "124.0.0.0", "125.0.0.0", "126.0.0.0"]
    v = random.choice(chrome_versions)
    return {
        "User-Agent": f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      f"AppleWebKit/537.36 (KHTML, like Gecko) "
                      f"Chrome/{v} Safari/537.36",
        "Accept-Language": random.choice(["en-US,en;q=0.9", "en-US,en;q=0.8"]),
    }


def extract_jobs_from_json(html):
    """Extract job data from embedded JSON in script tags."""
    soup = BeautifulSoup(html, "html.parser")
    jobs = []

    for script in soup.find_all("script"):
        if not script.string:
            continue

        # look for common Indeed data variable patterns
        match = re.search(
            r'window\.mosaic\.providerData\["mosaic-provider-jobcards"\]\s*=\s*({.+?})\s*;',
            script.string,
            re.DOTALL,
        )
        if not match:
            continue

        try:
            data = json.loads(match.group(1))
        except json.JSONDecodeError:
            continue

        # navigate to the job results — structure may vary
        results = (
            data.get("metaData", {})
            .get("mosaicProviderJobCardsModel", {})
            .get("results", [])
        )

        for r in results:
            jobs.append({
                "Job Title": r.get("title"),
                "Company":   r.get("company"),
                "Location":  r.get("formattedLocation"),
                "Summary":   r.get("snippet"),
                "URL":       f"https://www.indeed.com/viewjob?jk={r.get('jobkey')}" if r.get("jobkey") else None,
            })

        break  # found our data, no need to check more scripts

    return jobs


def extract_jobs_fallback(html):
    """Fallback: dump all script tags to find the right JSON blob."""
    soup = BeautifulSoup(html, "html.parser")

    for script in soup.find_all("script"):
        if not script.string or len(script.string) < 5000:
            continue

        # search for any large JSON object assigned to a window variable
        match = re.search(r'window\.[a-zA-Z_.]+\s*=\s*({.+})\s*;', script.string, re.DOTALL)
        if match:
            try:
                data = json.loads(match.group(1))
                # save for inspection so you can map the correct keys
                with open("indeed_raw_json.json", "w", encoding="utf-8") as f:
                    json.dump(data, f, indent=2, ensure_ascii=False)
                print("  Dumped raw JSON to indeed_raw_json.json — inspect to find job keys")
                return data
            except json.JSONDecodeError:
                continue

    return None


def scrape_indeed_jobs():
    job_list = []
    seen_urls = set()
    session = creq.Session(impersonate="chrome")

    for query in QUERIES:
        url = f"https://www.indeed.com/jobs?q={query}&l={LOCATION}&start=0"
        print(f"Scraping [{query}]: {url}")

        resp = session.get(url, headers=random_headers())

        if resp.status_code != 200:
            print(f"  Got status {resp.status_code}, skipping")
            continue

        jobs = extract_jobs_from_json(resp.text)

        if not jobs:
            print("  Primary JSON extraction found nothing, trying fallback...")
            extract_jobs_fallback(resp.text)
            print("  Check indeed_raw_json.json and update the key paths.")
            continue

        print(f"  Extracted {len(jobs)} jobs from JSON")

        for job in jobs:
            if job["URL"] in seen_urls:
                continue
            seen_urls.add(job["URL"])
            job["Query"] = query.replace("+", " ")
            job_list.append(job)

        time.sleep(random.uniform(5, 10))

    return job_list


if __name__ == "__main__":
    jobs = scrape_indeed_jobs()
    df = pd.DataFrame(jobs)
    df.to_csv("indeed_job_postings_json.csv", index=False)
    print(f"Scraped {len(jobs)} unique jobs and saved to indeed_job_postings_json.csv")

While it provides a cleaner path to the job data and avoids the extreme fragility of parsing changing HTML classes, keep in mind that websites periodically rename these internal JavaScript variables, meaning your regular expressions will still require occasional maintenance.

Each page has around 15 job positions. You can continue scraping through the pages by increasing the scrape_indeed_jobs(pages=2) parameter, but keep in mind that Indeed strictly caps search results at 1,000 jobs (about 66 pages) per query, meaning you'll need to use narrower search filters to scrape larger datasets.

Sometime after 2026, Indeed implemented a login screen from page 2 onwards. Scraping behind a log-in requires you to accept the Terms of Service, which means that doing so would break ToS. We recommend consulting with a legal professional before engaging in any scraping.

Step 4: Extract Key Fields

Here you should include the data fields you need. You’ll only get as much data as you requested. You can pull job details like:

Job position title
Company name
Location
Short summary snippet

Extracting the full job description, however, requires writing additional code to visit each job's specific URL.

Step 5: Export Results

Use CSV, JSON, or any other format that works for you to save your job listing data cleanly. It’s recommended that you put more effort into it since your data is only as useful as it is readable.

Exporting your gathered data into a structured JSON format preserves the hierarchical nature of the information, which makes it highly compatible with modern databases or web applications. You can accomplish it easily with pandas by using the df.to_json('indeed_job_postings.json', orient='records') command, and you'll have a neatly organized file ready for further processing.

Furthermore, cleaning your dataset involves removing duplicate entries and standardizing text fields, so you guarantee the accuracy of analytics. For example, you might normalize salary ranges into a consistent currency format, or you could filter out postings that lack critical information.

As a result, it becomes a powerful asset for visualizing hiring trends across different regions, and it allows HR teams to identify the most frequently requested skills within their specific industry.

Taking the time to structure your output properly ultimately maximizes the value of your web scraping efforts.

Tips for Staying Undetected

If you scrape Indeed too fast and too much, you’ll get banned. There are some tricks, however, to help you stay under the radar:

Use residential proxies. They make your bot look like a normal user since the traffic comes from a legitimate home network.
Crawl politely. Slow down between requests when collecting job details to prevent server overload.
Rotate user agents and IPs. Professionals who scrape thousands of job positions daily do that, along with advanced stealth tools, to avoid getting flagged or banned.

Some more advanced Indeed scrapers even randomize patterns to look more human when they scrape Indeed for massive job detail gathering operations.

Conclusion

Scraping job positions from Indeed can be useful for gathering job market intelligence. It’s great for market research, trend tracking, and HR companies that need to find the perfect job position before anyone else.

But you have to be smart about it. Stay hidden, use the right tools, and innovate to overcome new anti-scraping measures that are constantly being deployed by the targets. When you set up a good system, you can scrape Indeed and other platforms smoothly without getting slammed by bans.

FAQ

How do I handle CAPTCHAs when scraping Indeed?

You'll inevitably encounter automated security checks, but instead of relying on traditional CAPTCHA-solving services, the most effective workaround is using premium residential proxies or heavily fortified stealth browsers that prevent the challenges from triggering in the first place.

Additionally, mimicking human behavior by adding random delays between your requests significantly reduces the likelihood of triggering these defensive mechanisms in the first place.

High-quality infrastructure remains practically mandatory for uninterrupted access during any automation while scraping Indeed.

Why is my IP getting blocked or rate-limited?

Websites monitor incoming traffic for unnaturally rapid request patterns, and they'll immediately restrict access if they detect a single IP address making hundreds of connections simultaneously.

You must route your traffic through a reliable proxy network, since rotating your IP hides your automated activities effectively. Proper rate limiting on your end prevents these temporary bans from derailing your project completely.

How can I detect blocked or partial responses?

While monitoring HTTP status codes is helpful, advanced firewalls often return a 200 OK even for a block page. You must explicitly check the HTML for challenge keywords (like “Verify you are human”) or verify the presence of core data elements to confirm a successful scrape.

Furthermore, implementing specific checks for known error messages or missing core HTML elements helps you identify when the server is serving a restricted version of the site.

What should I do if my CSS selectors stop working?

Frontend developers frequently update their website's layout, which breaks the hardcoded selectors within your extraction script. Because Indeed uses dynamically generated class names that change with every site update, you should update your code to rely on stable data-testid attributes or bypass the HTML entirely by extracting the embedded JSON data.

How can I scrape JavaScript-loaded content?

Many modern platforms rely heavily on client-side rendering, so utilizing a headless browser like Selenium allows you to execute the necessary JavaScript before parsing the HTML. You can also intercept the background network requests to find the direct JSON data source, which often yields faster results than simulating a full browser environment.

How can I stay updated when Indeed changes its structure?

Maintaining an automated testing suite that runs your script against a known static page alerts you instantly whenever the extraction logic fails. Furthermore, checking developer communities and web scraping forums provides valuable early warnings about major platform updates, so you can adapt your code before your project of scraping Indeed grinds to a halt.

Create Account

Author

Marijus Narbutas

Senior Software Engineer

With more than seven years of experience, Marijus has contributed to developing systems in various industries, including healthcare, finance, and logistics. As a backend programmer who specializes in PHP and MySQL, Marijus develops and maintains server-side applications and databases, ensuring our website works smoothly and securely, providing a seamless experience for our clients. In his free time, he enjoys gaming on his PS5 and stays active with sports like tricking, running, and weight lifting.

Learn More About Marijus Narbutas Meet all Writers

Share on

Article by IPRoyal

Meet our writers

In This Article

How to Scrape Job Postings from Indeed: A Step-by-Step Guide

Key Takeaways

Is It Legal to Scrape Indeed Job Postings?

Tools and Methods to Scrape Indeed Job Listings

Comparing Extraction Methods: APIs, No-Code Tools, and Custom Scripts

Step-by-Step Guide to Scraping Indeed Jobs

Step 1: Set Up Your Environment

Step 2: Write a Basic Indeed Scraper

Extracting Data From Embedded JSON

Step 4: Extract Key Fields

Step 5: Export Results

Tips for Staying Undetected

Conclusion

FAQ

Related articles

Mastering Yahoo Finance API with Python: A Beginner's Guide

Google Finance API Is Deprecated: How to Get Financial Data Now

LinkedIn Job Scraping With Python: A Step-by-Step Tutorial

In This Article

How to Scrape Job Postings from Indeed: A Step-by-Step Guide

Key Takeaways

Is It Legal to Scrape Indeed Job Postings?

Tools and Methods to Scrape Indeed Job Listings

Comparing Extraction Methods: APIs, No-Code Tools, and Custom Scripts

Step-by-Step Guide to Scraping Indeed Jobs

Step 1: Set Up Your Environment

Step 2: Write a Basic Indeed Scraper

Extracting Data From Embedded JSON

Step 3: Handle Pagination

Step 4: Extract Key Fields

Step 5: Export Results

Tips for Staying Undetected

Conclusion

FAQ

Related articles

Mastering Yahoo Finance API with Python: A Beginner's Guide

Google Finance API Is Deprecated: How to Get Financial Data Now

LinkedIn Job Scraping With Python: A Step-by-Step Tutorial