How To Scrape Data From Expedia
Simona Lamsodyte
Last updated -
In This Article
Expedia is a website that aggregates information about hotel listings, events, and travel options.
By scraping Expedia, you can access valuable data for market research, competitor analysis, or simply to find the best hotel deal for yourself.
In this article, you’ll read about the best methods that you can use to scrape Expedia, including a complete example on how to scrape hotel price data from Expedia using Python.
Are There Any Specific Challenges or Complexities Involved in Scraping Data From Expedia?
Like most modern websites, Expedia is a dynamic website that uses plenty of JavaScript for rendering content and handling user interactions. For this reason, it’s hard to scrape using conventional, HTML-based web scrapers like Beautiful Soup and Cheerio since they cannot execute JavaScript.
Are There Any Tools or Libraries Available for Efficient Data Scraping From Expedia?
To effectively scrape data from Expedia, you need to use more advanced web scraping tools, such as headless browsers like Puppeteer or Playwright , which can render the web page and execute JavaScript, allowing you to access the dynamically generated content. These tools provide the flexibility needed to interact with Expedia’s dynamic elements and extract the information you require.
One of the best tools for web scraping dynamic websites like Expedia is Playwright. It has official bindings for multiple languages such as JavaScript, Python, Java and C#, enabling a wide variety of developers to use it no matter the language they are experienced with.
Web Scraping Expedia using Python and Playwright
In the following section, you’ll learn how to use Playwright via Python to scrape hotel information for a city.
Setup
To follow the tutorial, you’ll need to have Python installed on your computer. If you don’t already have it, you can use the official instructions to download and install it.
You’ll also need to install Playwright. Use the following commands to install the library and its supported browsers.
pip install playwright
playwright install
Scraping Hotels
To scrape hotel data for a given location and dates, you first need to open the website in Playwright.
The following boilerplate launches a browser and opens an Expedia page for a given location (Milan, Italy) and given dates (1st-7th June, 2024).
import time
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.firefox.launch(
headless=False,
)
page = browser.new_page()
page.goto( 'https://www.expedia.com/Hotel-Search?adults=2&d1=2024-06-01&d2=2024-06-07&destination=Milan%20%28and%20vicinity%29%2C%20Lombardy%2C%20Italy&endDate=2024-06-07&latLong=45.47179%2C9.18617®ionId=180012&rooms=1&semdtl=&sort=RECOMMENDED&startDate=2024-06-01&theme=&useRewards=false&userIntent=')
time.sleep(2)
browser.close()
It’s possible to create a script that enters the location and dates you wish in the web interface and clicks the search button to acquire the URL. While this tutorial won’t cover how to do that, it’s a fun exercise to do after you finish the tutorial.
After you have arrived at the page, you need to scrape the information from the cards that appear in the search results.
First, select all the cards.
cards = page.locator('[data-stid="lodging-card-responsive"]').all()
Then, iterate over the cards to accumulate information about the hotels. In this example, you’ll scrape the title, rating, and price for the night of the hotels.
hotels = []
for card in cards:
content = card.locator('div.uitk-card-content-section')
title = content.locator('h3').text_content()
if content.locator('span.uitk-badge-base-text').is_visible():
rating = content.locator('span.uitk-badge-base-text').text_content()
else:
rating = False
if content.locator('div.uitk-type-500').is_visible():
price = content.locator('div.uitk-type-500').text_content()
else:
price = False
hotel = {
'title': title,
'rating': rating,
'price': price}
hotels.append(hotel)
Since there are some hotels that don’t yet have a rating or a price (this happens if they are fully booked or closed for the dates), you need to handle the missing values without crashing the script. For this reason, the code above checks if these elements are visible before selecting them. If they are not, it puts False
as the value of the element.
Finally, you can print out the hotel list:
print(hotels)
Here’s the full code for this section:
import time
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.firefox.launch(
headless=False,
)
page = browser.new_page()
page.goto(
'https://www.expedia.com/Hotel-Search?adults=2&d1=2024-06-01&d2=2024-06-07&destination=Milan%20%28and%20vicinity%29%2C%20Lombardy%2C%20Italy&endDate=2024-06-07&latLong=45.47179%2C9.18617®ionId=180012&rooms=1&semdtl=&sort=RECOMMENDED&startDate=2024-06-01&theme=&useRewards=false&userIntent=')
time.sleep(2)
#scrape hotels
cards = page.locator('[data-stid="lodging-card-responsive"]').all()
hotels = []
for card in cards:
content = card.locator('div.uitk-card-content-section')
title = content.locator('h3').text_content()
if content.locator('span.uitk-badge-base-text').is_visible():
rating = content.locator('span.uitk-badge-base-text').text_content()
else:
rating = False
if content.locator('div.uitk-type-500').is_visible():
price = content.locator('div.uitk-type-500').text_content()
else:
price = False
hotel = {
'title': title,
'rating': rating,
'price': price}
hotels.append(hotel)
print(hotels)
browser.close()
Once run, it should return a list of hotels:
[{'title': 'Milano Verticale | UNA Esperienze', 'rating': '9.2', 'price': '$379'}, {'title': 'Hyatt Centric Milan Centrale', 'rating': '8.8', 'price': '$326'}, {'title': 'UNAHOTELS Galles Milano', 'rating': '8.6', 'price': '$258'}, {'title': 'Residence de la Gare', 'rating': '8.8', 'price': '$137'}...
But this list doesn’t have all of the hotels listed. To access all of them, you need to expand the list by clicking the “Show More” button at the bottom of the list.
This is where scraping with a web automation library comes in handy—you can click, write, and do any other action a regular user could do!
Expanding Search Results With Playwright
To get the full search results, you need to repeatedly expand the list until the “Show More” button disappears before scraping the results.
Add the following code in the middle of the script, before you start scraping the cards. It locates the button, clicks it, waits for the results to load, and then repeats this process if the button is still there.
#scroll to the bottom of page
show_more = page.locator("button", has_text="Show More")
while show_more.is_visible() is True:
show_more.click()
time.sleep(5)
This pattern can be useful to handle any kind of repetitive loading of elements like in dynamic websites.
Here’s the full code for the script:
import time
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.firefox.launch(
headless=False,
)
page = browser.new_page()
page.goto(
'https://www.expedia.com/Hotel-Search?adults=2&d1=2024-06-01&d2=2024-06-07&destination=Milan%20%28and%20vicinity%29%2C%20Lombardy%2C%20Italy&endDate=2024-06-07&latLong=45.47179%2C9.18617®ionId=180012&rooms=1&semdtl=&sort=RECOMMENDED&startDate=2024-06-01&theme=&useRewards=false&userIntent=')
time.sleep(2)
#scroll to the bottom of page
show_more = page.locator("button", has_text="Show More")
while show_more.is_visible() is True:
show_more.click()
time.sleep(5)
#scrape hotels
cards = page.locator('[data-stid="lodging-card-responsive"]').all()
hotels = []
for card in cards:
content = card.locator('div.uitk-card-content-section')
title = content.locator('h3').text_content()
if content.locator('span.uitk-badge-base-text').is_visible():
rating = content.locator('span.uitk-badge-base-text').text_content()
else:
rating = False
if content.locator('div.uitk-type-500').is_visible():
price = content.locator('div.uitk-type-500').text_content()
else:
price = False
hotel = {
'title': title,
'rating': rating,
'price': price}
hotels.append(hotel)
print(hotels)
browser.close()
How Can Using Proxies Help in Overcoming Restrictions and Anti-scraping Measures on Expedia?
Scraping the hotels from just one city and one set of dates won’t trigger any anti-scraping measures from Expedia. Because the website receives so much traffic, it won’t even put a blip on the radar. But if you want to gather a large-scale dataset that includes multiple cities, dates, and perhaps even the historical changes in the prices of hotels, your actions might get detected and your IP address might get denied service from Expedia.
For this reason, proxies are used for large-scale web scraping activities. Proxies act as middlemen between the client and the server, forwarding the request to the server but changing the IP address from where the request comes from. With a service like IPRoyal residential proxies , you can change your IP on every request, picking at random from a pool of ethically-sourced IPs all around the world. This will help avoid detection: your requests will look like they originate from many different users instead of just one.
Here’s how you can add a proxy to your Playwright script.
First, you will need to find the host, port, username, and password information for your proxy server of choice. If you’re using IPRoyal proxies, you can find this information in your dashboard.
Then, update the pw.chromium.launch()
function with the data.
browser = pw.chromium.launch(
headless=False,
proxy={
'server':'host:port',
'username':'username',
'password': 'password123',
}
)
Now all the requests from the Playwright session will be funneled through your chosen proxy server.
What Are the Potential Applications and Use Cases for Scraping Data From Expedia?
Scraping data from Expedia can provide valuable information for a variety of applications and use cases. Here are some potential applications and use cases for scraping data from Expedia:
Price comparison: Travelers often use price comparison websites to find the best deals. Scraping Expedia data can be used to create a price comparison tool that helps users find the lowest prices for flights, hotels, and vacation packages.
Market research: Expedia data can be used to analyze travel trends, pricing strategies, and customer preferences. This information is valuable for businesses looking to enter or optimize their presence in the travel industry.
Competitor analysis: By scraping Expedia data, a hotel can gain insights into what their competitors are offering in terms of pricing, availability, and amenities. This can help them make informed decisions and stay competitive.
Customer review analysis: Scraping reviews and ratings from Expedia can provide valuable feedback on hotels, airlines, and other travel services. This data can be used by businesses to assess their performance and make improvements.
Predictive analytics: Data scraped from Expedia can be used to build predictive models for travel demand, pricing trends, and seasonal variations. This can help businesses optimize their operations and marketing strategies.
Conclusion
By using common web scraping tools such as Python and Playwright, you can easily scrape hotel information from Expedia. In a similar manner, you can also scrape the individual hotel pages to get more fine-grained data on what’s available on the market.
Once you increase the depth of your scraping activities, it’s important to start using proxies to protect your IP address from detection and IP bans. To solve this, a common solution is to get a reliable proxy provider like IPRoyal that provides a pool of proxies that can be rotated between requests.
FAQ
Is scraping data from Expedia legal for personal or commercial use?
Since data available from Expedia is public (accessible to everyone), it’s completely legal to scrape it if you’re not logged in. But if you make an account and use it while scraping, you fall under Expedia’s Terms of Service , which prohibit to “access, monitor or copy any content on our Service using any robot, spider, scraper or other automated means or any manual process”. Therefore, it’s suggested to access only the public part of the website with your scripts.
Are there any limitations on the number of requests or concurrent connections when scraping data from Expedia?
There is no public information of any limitations, but most requests tend to act once they receive a large amount of requests (especially concurrent) from the same user. For this reason, it’s important to use proxies if you plan to make many requests. This will hide the fact that large-scale scraping is happening and help protect your IP address from being blacklisted. Read our Expedia scraping example above to see how you can add proxies to your script.
Can I scrape hotel prices and availability from Expedia using proxies?
Yes, it’s possible and even suggested to use proxies for scraping Expedia, no matter how much you plan to scrape. It helps protect your IP address from being banned from accessing the website for scraping. If you want to learn how to do it, check out the final section of our Expedia web scraping code example.
Author
Simona Lamsodyte
Content Manager
Equally known for her brutal honesty and meticulous planning, Simona has established herself as a true professional with a keen eye for detail. Her experience in project management, social media, and SEO content marketing has helped her constantly deliver outstanding results across various projects. Simona is passionate about the intricacies of technology and cybersecurity, keeping a close eye on proxy advancements and collaborating with other businesses in the industry.
Learn More About Simona Lamsodyte