50% OFF Residential Proxies for 9 months — use code IPR50 at checkout

Get The Deal
Back to blog

How to Scrape LinkedIn Data in 2025: A Complete Guide

Justas Vitaitis

Last updated -
How to

Key Takeaways

  • Know terms and legal boundaries. Public pages may be okay, but private data is off-limits.

  • Use a smart scraping logic: proxies, delays, user-agent rotation, mimicked regular user behavior, and so on.

  • Focus on ethical data gathering. Use job listings and public LinkedIn profile info, not private personal data.

Ready to get started?

Register now

Scraping LinkedIn in 2025 isn’t as simple as it once was. The platform now uses advanced defenses against bots, making it challenging to collect data without getting blocked.

Additionally, with new interface updates every now and then, it presents a new challenge that developers need to adjust to all the time.

But with the right approach, you can still gather valuable information relatively safely. In this article, we’ll teach you how.

What is LinkedIn Scraping?

LinkedIn scraping refers to the use of automated tools to scrape data from LinkedIn without manually copying everything: profiles, job pages, or company information. A LinkedIn web scraper can gather data for recruitment, research, sales, market intelligence, and more.

Industries such as HR, marketing, investment, and lead generation, to name a few, often use LinkedIn scraper tools to build targeted lead lists, monitor competitors, hire qualified personnel, and analyze trends.

People use LinkedIn profile insights to verify skills, education, and work history. Companies employ it to track job listings, watch hiring trends, and spot growing unicorns.

As far as scraping LinkedIn data is concerned, it’s a legal gray area, but as long as you’re scraping public data that isn’t gated behind logging in or signing up, you’re more in the right than in the wrong. However, do not take this blog post as legal advice and consult with a professional first.

Many companies out there use LinkedIn scraper solutions to compile bulk LinkedIn data for recruiting, analysis, and more.

LinkedIn’s Terms of Service forbid unauthorized bots or LinkedIn scraper tools to gather LinkedIn data. Violating this can lead to account suspension or legal action.

The landmark case HiQ Labs v. LinkedIn clarified that scraping public LinkedIn profile pages might not violate the law. Courts ruled that HiQ Labs could collect public LinkedIn data.

In Europe, GDPR imposes additional restrictions: scraping LinkedIn data (especially personal data) without explicit consent can be a violation, even if profiles are public.

Does LinkedIn Block Scraping?

Yes. LinkedIn employs multiple protection layers, continuously introducing new measures to prevent bot traffic. For example:

  • Rate limits on job listings and profile page requests.
  • IP address bans for excessive requests.
  • Browser fingerprinting to detect automation and ban evasion.
  • CAPTCHA tests when suspicious behavior is detected.

Signs you’re blocked include sudden logouts, “access denied” errors, or missing LinkedIn profile content. Even stealthed headless browsers can be flagged by LinkedIn’s detection tools.

How to Scrape LinkedIn Without Getting Blocked

  • Use proxies or IP rotation. These will help you by switching IP addresses periodically to make the anti-scraping measures believe you’re a genuine user.
  • Don’t go directly to specific URLs. If possible, try to start sessions by “browsing around” a page or two before heading to what you want to scrape.
  • Mimic human browsing. Add random delays, modify scrolling behavior, consider mouse-moving protocols.
  • Use popular user agents. It will make it much more difficult for LinkedIn to notice that you’re using a LinkedIn web scraper.
  • Keep request rates low. It’s essential for scraping LinkedIn data, such as job listings or public profile data.
  • Avoid aggressive crawls. If you scrape company data too fiercely, it may trigger LinkedIn’s detection system, and you’ll most likely be shut down.

Check out more of the best scraping practices just to be safe, and then consider using a LinkedIn proxy to help you scrape its data.

What Data Can You Scrape From LinkedIn?

You can only scrape data that is publicly available on LinkedIn profile pages as such. Do note that some data may still be personal and protected under other laws, not just LinkedIn’s ToS:

  • Name
  • Headline
  • Current company
  • Location
  • Work experience
  • Education
  • Public posts
  • And more data that is visible without logging in.

Without logging in, you get basic data with a LinkedIn scraper. Once you’re logged in, you can access a lot more information, but now it’s off-limits as far as scraping is concerned. Logging in requires you to agree to the Terms of Service, which prohibits scraping any information at all.

Therefore, before you scrape data, you must ensure that you’re not signed up or logged in, since that automatically makes you agree to the Terms of Service and you’re no longer in any gray area.

Can You Scrape LinkedIn Jobs?

Yes, many tools focus on job listings when scraping data from LinkedIn. You can gather job titles, companies, locations, posted dates, job descriptions, and other available information.

You can use keyword filters to pull only tech job listings or sales roles. Remember to follow rate limits and avoid abusing non-public endpoints.

While many businesses often scrape job postings from LinkedIn, keep in mind that it’s still something LinkedIn discourages. You should proceed with caution and remain within the legal boundaries, even if they may not seem so clear.

Scraping With Python: A Starter Guide

Scraping LinkedIn or any other website with Python involves using specific libraries. They enable your script to behave like a browser, extract the content you need, and respect the site’s limits to prevent being blocked.

Here’s a beginner-friendly starter guide to scraping public LinkedIn-like data with Python.

Step 1: Install Dependencies

You’ll need a few essential Python libraries:

pip install requests beautifulsoup4 fake-useragent

If you’re dealing with JavaScript-heavy content, add:

pip install selenium webdriver-manager

Step 2: Basic Scraping Flow (Non-JavaScript Page)

Here’s a simplified pseudo script to scrape public job postings from a job board:

"""
Scrape public LinkedIn job cards (no login, no JavaScript)
Python 3.7 – 3.12
──────────────────────────────────────────────────────────
"""
from __future__ import annotations      
from typing import List, Dict
import re, time, urllib.parse, requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

'''Global variables to change to scrape different job listings'''
KEYWORDS       = "Copywriter"
LOCATION       = "United States"
PAGES_TO_GRAB  = 2       # each page = 25 jobs
REQUEST_DELAY  = 1.5     # seconds between requests
DEBUG          = False   # True → print HTTP size & first 300 chars
'''End of global variables'''

ua = UserAgent()
HEADERS = {"User-Agent": ua.random}

BASE_GUEST = (
    "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"
    "?keywords={kw}&location={loc}&start={start}"
)

JOB_URL_RE = re.compile(r"https?://www\.linkedin\.com/jobs/view/\d+")


def fetch_html(keywords: str, location: str, start: int) -> str | None:
    url = BASE_GUEST.format(
        kw=urllib.parse.quote_plus(keywords),
        loc=urllib.parse.quote_plus(location),
        start=start,
    )
    r = requests.get(url, headers=HEADERS, timeout=15)

    if r.status_code in (999, 429):          # LinkedIn soft-block / rate-limit
        print(f"Blocked (HTTP {r.status_code}) @ start={start}")
        return None

    html = r.text
    if "<title>LinkedIn: Log In" in html:    # sign-in wall sneakily returned
        print("Received a login page → blocked.")
        return None

    if DEBUG:
        print(
            f"[start={start}] {r.status_code}, {len(html):,} bytes\n"
            + html[:300].replace("\n", " ") + "…\n"
        )
    return html


def parse_cards(html: str) -> List[Dict[str, str]]:
    """Return a list[dict] even if LinkedIn tweaks class names again."""
    soup = BeautifulSoup(html, "html.parser")
    jobs: List[Dict[str, str]] = []

    
    for card in soup.select("div.base-card"):
        title_el   = card.select_one("h3.base-search-card__title")
        company_el = card.select_one("h4.base-search-card__subtitle")
        loc_el     = card.select_one("span.job-search-card__location")
        link_el    = card.select_one("a.base-card__full-link")
        if title_el and company_el and loc_el and link_el:
            jobs.append(
                {
                    "title":   title_el.get_text(strip=True),
                    "company": company_el.get_text(strip=True),
                    "location": loc_el.get_text(strip=True),
                    "link":    link_el["href"].split("?")[0],
                }
            )

    
    if not jobs:                                 # only if primary selector failed
        for a in soup.find_all("a", href=JOB_URL_RE):
            parent = a.find_parent(["div", "li"])
            title = (a.get_text(strip=True) or a["href"]).strip()
            company = parent.find(
                ["h4", "span"], class_=re.compile(r"(subtitle|company)", re.I)
            )
            loc = parent.find("span", class_=re.compile(r"location", re.I))
            jobs.append(
                {
                    "title": title,
                    "company": company.get_text(strip=True) if company else "",
                    "location": loc.get_text(strip=True) if loc else "",
                    "link": a["href"].split("?")[0],
                }
            )
    return jobs


def scrape_jobs(
    keywords: str,
    location: str,
    pages: int = 1,
    delay: float = 1.5,
) -> List[Dict[str, str]]:
    all_jobs: List[Dict[str, str]] = []
    for page in range(pages):
        html = fetch_html(keywords, location, start=page * 25)
        if html is None:               # blocked
            break
        all_jobs.extend(parse_cards(html))
        time.sleep(delay)
    return all_jobs


if __name__ == "__main__":
    jobs = scrape_jobs(KEYWORDS, LOCATION, PAGES_TO_GRAB, REQUEST_DELAY)
    print(f"\nFound {len(jobs)} jobs\n")
    for j in jobs[:10]:
        print("{title} — {company} ({location})".format(**j))

Step 3: Handling Rate Limits and Ethics

Whether you’re scraping LinkedIn or any other website, here are a few best practices to keep in mind:

  • Add delays between requests.
  • Rotate user agents to mimic different browsers.
  • Avoid scraping logged-in or gated content.
  • Use proxies when scaling to avoid IP bans.
  • Respect robots.txt of the target domain.

If you want to learn more about it, we’ve got an article covering a step-by-step guide on web scraping with Python . It will help you become more familiar with the concept of web scraping, and maybe even give you some valuable tips.

Conclusion

Scraping data from LinkedIn provides insights for recruiters, marketers, and analysts, but it comes with legal and technical risks. Public LinkedIn profile scraping may be allowed, but pulling private LinkedIn profile data or ignoring rate limits can get you blocked or sued. Use proxies, delay tactics, and ethical boundaries when scraping.

No matter if you want to scrape company data, people data, or job listings, you should always follow best practices and avoid collecting private information.

Create Account

Author

Justas Vitaitis

Senior Software Engineer

Justas is a Senior Software Engineer with over a decade of proven expertise. He currently holds a crucial role in IPRoyal’s development team, regularly demonstrating his profound expertise in the Go programming language, contributing significantly to the company’s technological evolution. Justas is pivotal in maintaining our proxy network, serving as the authority on all aspects of proxies. Beyond coding, Justas is a passionate travel enthusiast and automotive aficionado, seamlessly blending his tech finesse with a passion for exploration.

Learn More About Justas Vitaitis
Share on

Related articles