How to Find All Webpages on a Website: A Complete Guide


Kazys Toleikis
Key Takeaways
-
SEO, website administration, web scraping, and research are the main reasons to find all pages of a site.
-
You can find some website pages using search engine consoles and other tricks.
-
Usually, all web pages of a website are also marked in the sitemap.
-
Additionally, SEO tools can help you collect a list of all website pages.
-
Building a custom Python script is the most comprehensive method, but it might prove technically challenging.
In This Article
The difficulty of finding all the pages of a website can vary greatly, which makes it more frustrating when you need it quickly. In some cases, it may take a couple of minutes, but other times, you may need to hire a programmer.
The level of complexity in getting a full website’s URL list depends on your access level and the site’s design, among other things. Four methods can cover most cases, but the choice also depends on why you need to see all the pages.
Why You’d Want to See All Site Pages
Generally, the reasons for gathering a complete list of website pages fall into four broad categories. You can have a rough idea of which method to use based on them, as your motivation often hints at the level of access and tools available.
SEO-related use cases
Search Engine Optimization (SEO) use cases involve finding a full website’s URL list to examine its structure while attempting to improve its ranking in search engines. Here are some of the things an SEO specialist might look for on a website’s pages:
- Websites indexability
- Crawl depth and orphan pages
- Broken links
- Duplicated content
- Internal linking strategy
- Planning content additions
- Sitemap validation
Additionally, an SEO professional may need to access full pages of competitors’ sites to see the exact reasons why they are ranked better or worse than you. This might require different methods than analyzing a website you have full access to.
Website administration
A complete URL page list is a common tool for web admins and other professionals who administer websites. Common reasons to acquire and periodically inspect it might include:
- Improving website navigation
- Performing accessibility audits
- Preparing for website UI/UX redesign
- Maintaining website content
- Looking for security vulnerabilities
Web Scraping
Collecting data online in bulk requires some preparation. Quality proxies and scraping software are the first steps, but gathering a complete website page list is also important. Crawling the website is often one of the first steps.
It helps to find hidden or unlinked pages and save resources by checking the page content before scraping it. Archiving pages and tracking their changes over time is often accomplished by crawling full-page URLs.
Research Purposes
Website page data is used in various datasets for training AI, cybersecurity research, archiving historical web changes, and many other academic applications. These website page information use cases are worthy of mention because of their increasing popularity.
How to Find All Webpages on a Website: 4 Methods
Unless you are an SEO specialist or a website administrator with full access, method #2 and some part of method #1 won’t work for you. Since you don’t have access, you will need to use no-code SEO tools (#3) or write your own web scraping script (#4).
You should also consider your technical ability, as some parts, especially in the last method, might get quite advanced. Yet, using other options, such as Google Search Console or a no-code scraper, can be done without investing time into learning programming.
Method #1: Use Google and Manual Search Tricks
A simple method for finding all the pages of a website is to use Google Search. Simply type in site: followed by your website’s URL, like this: site:IPRoyal.com. This method will display all the indexed pages, but it may not provide a complete picture of what the website actually consists of.
Google may not have yet indexed all of the website’s pages, and some may have already been removed or purposefully excluded using robots.txt. So, the Site: operator is best used for a rough idea approximation.
Accessing the cached versions of websites might be useful if you need historical data. Unfortunately, Google Search has retired the cache: operator, but you can still use alternative tools. The Wayback Machine of the Internet Archive is ideal for this purpose and offers a dedicated site map feature .
If you have access to Google Search Console or Bing’s webmaster tools, you can check all indexed pages there. It’s quite straightforward in both cases.
- In Google Search Console, you can find indexed pages and sitemap on the left-hand menu under Indexing.
- In Bing’s Webmaster Tools, you’ll find a section labeled Sitemaps on the left sidebar menu.
Method #2: Check the Site’s Sitemap File
A sitemap is a file that provides a structured list of a website’s pages, images, videos, and other elements. It also gives information about the relationships between different parts.
All modern websites have an HTML or XML Sitemap to help users and search engines understand and navigate the website’s structure. Generally, a sitemap will include a list of URLs for every important page, their hierarchy, and metadata, such as the last modification date.
If you have access, a sitemap will be one of the first places to look for all pages of a website. Keep in mind that large websites might have multiple sitemaps.
Usually, they are segmented based on content type (different sitemaps for blog posts, landing pages, etc.), sections (a sitemap for each product category), hierarchy, or other ways.
The easiest method to find a website’s XML sitemap is within the root directory. Simply type in the root URL followed by a common name for the sitemap file.
https://www.iproyal.com/sitemap.xml
https://www.iproyal.com/sitemap_index.xml
https://www.iproyal.com/sitemap1.xml
If you can’t find the exact URL in the root directory, try checking the robots.txt file, which usually refers to the XML sitemap (line - Sitemap:). Depending on your website’s setup, it might also be accessible from the content management system (CMS) menu.
Platforms also tend to have different default sitemap generation URLs. Here’s the default one for WordPress:
https://www.iproyal.com/wp-sitemap.xml
Some websites provide user-facing sitemaps located in the footer or header of the main page. To find it, scroll down to the bottom and look for a link named Site Map, Sitemap, Website Map, or something similar.
While these sitemaps are easy to read, they may exclude some pages. However, sitemap files accessible to web admins might also exclude some pages intentionally or due to bad website maintenance.
Method #3: Use SEO Tools to List All URLs
If you don’t have access to the XML sitemap file or you know that it doesn’t include all the pages, you can use one of the popular crawling tools to create a sitemap for you.
Tools such as Screaming Frog, Ahrefs, Semrush, or Sitebulb work by sending web crawler bots to given URLs. The bot then scans the page, retrieves its data, and identifies links to other pages to visit.
Search engines use crawlers to make indexes, so SEO specialists commonly use them to audit websites. Each tool has its pros and cons for finding all web pages on a website.
Screaming Frog
Screaming Frog’s SEO spider is used for identifying potential SEO issues within a website. The free version of SEO Spider can crawl up to 500 URLs and create an XML sitemap of small websites.
For larger projects, you might need a paid version and some SEO knowledge. Yet, even the free version of SEO Spider stands out with its customizable URL inclusion and user-friendly interface.
Ahrefs
Ahrefs is an online SEO software suite, so it doesn’t directly generate sitemaps of websites. Instead, you can use various other functions that will allow you to find all pages on a website.
For example, Ahrefs’ Site Explorer gives top pages, site structure, and content gap reports of websites. Unfortunately, there is no free trial. Check with your marketing department to see if they already have a subscription.
Semrush
Semrush utilizes its own bot to crawl websites and identify SEO improvements. The Domain Overview and SEO Audit tools can be used to visit needed websites and provide you with a rough idea of their pages and structure.
Unlike Ahrefs, Semrush offers a free trial with thousands of pages included per audit. It can’t be considered a replacement for a sitemap, but it’s a good starting point for many smaller use cases.
Sitebulb
Sitebulb is another SEO site auditing tool that provides a list of web pages through its web crawling capabilities. The tool excels at large-scale crawling for SEO use cases, boasting one of the largest URL limits per audit.
There’s also a free trial that allows you to test Sitebulb’s capabilities. If you have powerful enough hardware, the paid version will allow crawling up to 2 million URLs.
Method #4: Build a Custom Scraper to Find Pages
SEO tools are quite effective for pulling data from any site without needing to code. Yet, you may encounter some limitations with large or complex pages due to crawl depth, technical challenges such as blocks in robots.txt, and other issues.
To get around them and ensure you find a complete list of pages on a website, you may need to build your own custom scraper. We wrote a complete Step-by-Step Guide for Python Web Scraping for that.
Python is well-suited for web scraping projects because it has libraries for handling requests Requests and parsing data Beautiful Soup . Even though they are convenient, you’ll still need at least an intermediate understanding of Python to make it work.
Here’s how a simple Python script for finding all pages of a website might look:
import requests
from bs4 import BeautifulSoup
base_url = "https://iproyal.com"
visited = set()
max_pages = 10 # Limit to 10 pages
def is_internal(link):
return link.startswith("/") or link.startswith(base_url)
def normalize_link(link):
if link.startswith("/"):
return base_url + link
elif link.startswith(base_url):
return link
else:
return None
def crawl(url):
if len(visited) >= max_pages:
return
try:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
except requests.RequestException as e:
print(f"Failed to retrieve {url}: {e}")
return
for a_tag in soup.find_all("a", href=True):
if len(visited) >= max_pages:
return
href = a_tag["href"]
if not is_internal(href):
continue
full_url = normalize_link(href)
if full_url and full_url not in visited:
visited.add(full_url)
crawl(full_url)
# Start crawling
visited.add(base_url)
crawl(base_url)
# Print results
print(f"\nCrawled {len(visited)} pages:")
for url in visited:
print(url)
This simple script demonstrates how to find all pages with a custom scraper. It takes the base URL https://iproyal.com and visits every internal page linked from there (/residential-proxies, /datacenter-proxies, etc.), creating a list of each discovered internal URL once. We have limited the number of pages to 10 to avoid overflow, but this is easily adjustable.
In a real-life setting, you will likely need some adjustments to make it work.
- A crawling queue (due to Python recursion depth limit).
- Different user agent headings.
- Rate limiting
- Proxy servers
- Error handling to deal with HTTP status codes
Using more advanced Python tools will help solve these issues. Scrapy is a free and open-source crawling framework that will help extract a page list using APIs.
You might also need browser automation tools, such as Selenium and Playwright . They will allow you to change your user agent and other browser specifications to collect data more accurately and with fewer disruptions.
While scraping publicly available data is generally legal, certain laws, such as the EU’s GDPR or California's CCPA, regulate the collection of private or personally identifiable data. Be sure to check applicable laws to ensure you avoid legal issues.
Other best scraping practices , such as following robots.txt, using quality proxies, implementing rate limits, inspecting website elements, and parsing data, are also important for efficient scraping.
Each website is built differently, so there’s no universal script for finding all pages of all websites. If your budget allows, consider hiring a programmer as a last resort.
Conclusion
Different options are available for finding all pages of a website depending on your goals and access level. If you have administrator access, find the Sitemap in CMS or Google Search Console. SEO tools and custom scrapers can also assist you.

Author
Kazys Toleikis
Head of Client Support
Kazys brings a strategic and disciplined approach to client support thanks to his leadership background, as well as vast experience in tactical planning and crisis management. He focuses on team leadership, customer satisfaction, and process improvement, ensuring efficient collaboration across departments. Known for his sharp decision-making and ability to stay calm under pressure, he is dedicated to delivering top-tier support no matter the challenge. After hours, Kazys enjoys staying active and exploring new opportunities for growth, both personal and professional.
Learn More About Kazys Toleikis