IPRoyal - Premium Proxy Service Provider
Back to blog

How to Scrape Google Search Results Using Python

Marijus Narbutas

Last updated -

How to

In This Article

Ready to get started?

Register now

Web scraping is the process of sending bots, usually programmed through Python or any other programming language, to download the content of a website as a HTML file. The file is then parsed and searched through to uncover the valuable information hidden in it.

Any website can technically be scraped, even Google Search Result Pages (Google SERP), even if doing so might not always be ethical or legal. You can, however, scrape Google search results, which is commonly done as Google SERP data is immensely valuable.

In fact, a lot of Google SERP data is used to understand how the search engine ranks websites and to derive search engine optimization (SEO) practices. So, if you want to get better at search engine optimization, knowing how to scrape Google Search Results could be valuable.

What Is Google SERP?

A Google SERP is the page users get when they enter a query into the search engine. The entire page that’s displayed (including all the ads, images, and AI Overviews) is considered a part of the Google Search Engine Result Page.

There are plenty of various elements one can encounter in a Google SERP:

1. Organic results - regular links that are not paid for and are ranked in a specific order.

2. AI overview - Google’s attempt to answer a query using generative AI.

3. Videos, images, and carousels - multimedia formats that can include various ways to help the user get the answer they’re looking for.

4. Featured snippets - a short text snippet that exists on a specific website but outputs directly to Google results. Usually, one or two sentences long.

5. Ads - links that appear above organic results that are paid for to be displayed in the Google SERP.

All of the above is just a short list of all elements that can appear in Google search results. There are well over 30 different features and elements that users can see in Google search results, however, many of them appear only in niche or rare queries.

Google itself uses web scraping and crawling to create its own search engine index. So, every Google search results page you see is founded upon web scraping practices. While that changes nothing in the legal sense, the company itself isn’t too keen on raising lawsuits against scrapers.

Additionally, all Google search results are publicly available data, which is generally considered legal to scrape. The only restrictions are placed on copyrighted and personal information, if it appears in Google search data.

Take note, however, that the company can still freely block your Google scraper and not even provide a reason as to why. As such, using residential proxies is mandatory when developing a Google scraper as getting banned is inevitable.

Finally, if you want to scrape Google search results, it’s highly recommended to consult with a legal professional. Only then can you be sure that there will be no issues with your activities.

Common Challenges When Scraping Google SERPs

There are three common challenges when it comes to collecting Google data – elements, CAPTCHAs, and IP bans.

Google search results are constantly evolving and changing, new features and elements appear and disappear while the structure of the page may change as well. Each time that happens, you may need to rewrite some functions of your Google search scraper as it may break or fail to collect the data you need.

Two of the other challenges are closely related. When Google suspects that someone is using a Google search scraper, they may first throw out a CAPTCHA. Bots have trouble solving these, so most scrapers will stop their activities and fail to collect any Google data.

Additionally, if you use methods to bypass CAPTCHA, the search engine may still ban your IP address if they suspect you’re collecting too much Google data.

Luckily, you can bypass both with proxies as they switch the IP address. Bans become entirely ineffective while CAPTCHAs can still pose a smaller challenge. If you need to solve them, there are plenty of third-party solving tools available.

Finally, you can implement web scraping best practices such as rotating user agents to improve the efficiency of your Google scraper. Doing so will reduce the likelihood of both CAPTCHAs and IP bans.

Step-by-Step Guide: Scraping Google Results with Python

There are a few ways you can scrape Google search results. Some people use third-party Google search API solutions. That’s one of the easiest ways as it’ll also usually give you access to data from other ventures such as Google Maps.

On the other hand, you can build your own Google search data scraper with Python . For each website (e.g., Google Maps), you’ll have to make a different one, however. So, if you need data from several of their search tools, such as combining regular search with Google Maps, you’ll likely make great use of a search API.

If you want to build one, the basic foundations are quite simple. You’ll need just a few Python libraries:

pip install requests beautifulsoup4

Requests will be used to send a GET request to acquire the Google search results HTML. It’ll be downloaded locally, while BeautifulSoup4 will parse the file to acquire specific Google search data.

import requests
from bs4 import BeautifulSoup
import csv


def scrape_google(query, num_results=10):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    query = query.replace(' ', '+')
    url = f"https://www.google.com/search?q={query}&num={num_results}"

    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print("Failed to retrieve data")
        return []

    soup = BeautifulSoup(response.text, 'html.parser')
    results = []

    for g in soup.find_all('div', class_='tF2Cxc'):
        title = g.find('h3').text
        link = g.find('a')['href']
        results.append({'Title': title, 'Link': link})

    return results


def save_to_csv(data, filename='results.csv'):
    keys = data[0].keys()
    with open(filename, 'w', newline='', encoding='utf-8') as output_file:
        dict_writer = csv.DictWriter(output_file, fieldnames=keys)
        dict_writer.writeheader()
        dict_writer.writerows(data)


# Usage
query = "adidas"
results = scrape_google(query)
save_to_csv(results)
print(f"Scraped {len(results)} results and saved to 'results.csv'.")

There are just a few important functions and more difficult parts. Let’s start with the two main functions.

Our first one accepts a query (which will be inputted at the bottom of our script) and a number of results. You can modify that to increase how much Google search data is visible on a single SERP.

We’ll then create a headers dictionary to set a specific user agent. The default requests user agent gets instantly blocked by Google, so we pick a common one to reduce the likelihood of getting blocked.

We tinker slightly with our query parameter as most users will input a query like regular text. However, when the URL is created, every space is replaced with a ‘+’. So, our .replace() method will make sure every query is converted correctly.

After that, a GET request is sent by our Google search scraper to retrieve the HTML we need and it’s turned into a soup object that can be easily searched through using BeautifulSoup4.

We then run a loop to find two important things – the title and the URL of the search result. Each one is added to the results list and after the loop finishes, the list is returned. Note that each entry in the list is a dictionary.

Our second function exports everything to a CSV file. We first extract the keys of the first element in the list, which are then used as the header row. Finally, all of the values are added in order.

Analyzing Scraped Data

Once you have enough Google search results scraped and stored, you can use them for various use cases. You may need to supplement your Google search results with the content that’s stored within the link itself, however, to get the most mileage out of your scraper.

But, if you already do, you can compare various content pieces of websites with how they rank on Google search results. You can glean insights into important keywords and rankings while also comparing changes your competitors are making to get into higher places on Google.

It’s also great if you want to conduct a competitor analysis for SEO purposes to find out where they’re ranking well and what their strong and weak points are. Google search result data can be widely used for various SEO applications – as long as you can analyze what you’ve acquired.

Create account

Author

Marijus Narbutas

Senior Software Engineer

With more than seven years of experience, Marijus has contributed to developing systems in various industries, including healthcare, finance, and logistics. As a backend programmer who specializes in PHP and MySQL, Marijus develops and maintains server-side applications and databases, ensuring our website works smoothly and securely, providing a seamless experience for our clients. In his free time, he enjoys gaming on his PS5 and stays active with sports like tricking, running, and weight lifting.

Learn More About Marijus Narbutas
Share on

Related articles