How to Retry Failed Python Requests
Eugenijus Denisov
Last updated -
In This Article
Python’s requests library is widely used for various purposes , chief among them being web scraping. It’s a simplified version of the built-in HTTP request (urllib) library, making it easier to connect with servers and websites.
Web scraping projects often utilize requests due to their simplicity and effectiveness. It’s also easier to troubleshoot, which is immensely useful as failed requests come up frequently during scraping.
Getting Started With the Requests Library
Our guide assumes you have some basic knowledge of Python and an IDE . Once you have these things going, you’ll need to install the requests library.
pip install requests
Your IDE should automatically download, unpack, and install the requests library, which you can then use to send requests.
You’ll first need to import it, as with every other Python library.
import requests
Sending a request is simple as it’s a simple call for the “get” method (or any other that you need).
import requests
def send_get_request(URL):
r = requests.get(URL)
print(r.status_code)
send_get_request('https://iproyal.com')
Printing the status code of the response will be important later as these serve as your indicators to optimize failed requests. You can test it by clicking the green arrow at the top right (for PyCharm).
Types of Failed Requests Responses
All successful and unsuccessful attempts to connect to a server will return some form of HTTP status code. We’ll avoid the successful ones as you don’t need to retry them.
403 Forbidden
Your destination server understood the request but did not respond appropriately , as you are not allowed to access that document (or the entire server). These are usually hard to solve, as 403 is most often returned when you need credentials or have been banned.
If you have credentials, they may be included in your GET request.
import requests
def send_get_request(URL, credentials):
r = requests.get(URL, auth=credentials)
print(r.status_code)
login_details = ('username', 'password')
send_get_request('https://iproyal.com', login_details)
Replacing the “login_details” objects’ values with your username and password should allow you to access a protected document.
Note that it’ll only work on a select few websites. Most now use a more complicated version of logging in .
429 Too Many Requests
One of the most frequent HTTP error responses when web scraping, 429 states that you’ve been sending too many requests to the same endpoint.
Switching proxies or implementing a strategy to retry failed requests is your best option.
500 Internal Server Error
Something failed on the server’s end of things. It’s likely that a simple retry would work either instantly or within a few minutes.
502 Bad Gateway
Nearly identical to the 500 Internal Server Error— something went wrong with the upstream server, causing failed requests. Retrying in a short while will likely fix the issue.
503 Service Unavailable
Indicates that the server is likely completely down or otherwise unavailable. While you can retry failed requests, it’ll only resolve by itself once the administrator fixes the issue.
504 Gateway Timeout
Indicates networking issues, which may be caused by either end. Retrying with increasing delays could fix the issue.
Implementing a Failed Requests Retry Strategy
Python requests give you all the tools you need to effectively do away with most failed requests. Out of the list of HTTP status codes above, only 403 and 429 have unique approaches, although 429 can also be solved like the rest.
There are two ways to create a Python requests retry strategy itself, one being a simple loop at set intervals and the other using increasing delays. The former has the benefit of resolving faster, but it’s also more easily detectable.
Retry Failed Requests With a Loop
import requests
import time
def send_get_request(URL, retry):
for i in range(retry):
try:
r = requests.get(URL)
if r.status_code not in [200, 404]:
time.sleep(5)
else:
break
except requests.exceptions.ConnectionError:
pass
print(r.status_code)
send_get_request('https://dashboard.iproyal.com/login', 5)
Since we’ll be using sleep to create a delay, we have to import the time library, which is done right after the Python requests library, although the ordering doesn’t matter.
In our function, we now include a “retry” argument, which will be used to state how many times we’ll retry the failed requests.
Additionally, a for loop is included, which takes the retry number and uses it as a range. An “if” statement is included to verify if 200 or 404 is received. If neither, then the function sleeps for 5 seconds and repeats the process.
If 200 or 400 is received, the function stops. Additionally, if a connection error occurs, it’ll simply do nothing , bypassing regular Python requests error handling.
Finally, you can always set a custom Python requests.get timeout function by adding an argument “(timeout = N)” if timing out is causing an issue.
Retry Failed Requests With HTTPAdapter
We’ll have to import a little more than Python requests for the second strategy.
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
HTTPAdapter will let us mount our failed requests retry strategy to a session. Our retry strategy will be defined by the “urllib3” retry utility.
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def send_get_request(URL):
sess = requests.session()
retries = Retry(total = 5,
backoff_factor = 1,
status_forcelist = [429, 500, 502, 503, 504])
sess.mount('https://', HTTPAdapter(max_retries=retries))
get_URL = sess.get(URL)
print(get_URL.status_code)
send_get_request('https://iproyal.com')
Our function now opens with a session instead of directly sending a request, which is necessary for the current failed requests strategy.
We then define a retry object with a few arguments. First, we set the total number of Python retries to 5, a backoff factor of 1, and set which status codes should be retried. A backoff factor is a more complicated sleep function, which is defined as:
{backoff factor} * (2 ** ({retry number} - 1))
Our first retry will be instant, but others will happen at increasingly long intervals.
Then, we mount our session to HTTPAdapter, which will perform all the necessary retries. After that everything is essentially identical to other strategies.
Finally, any Python request will wait for a response before proceeding. If you want to send a few requests in parallel, asynchronous programming will be required.
Escape 429 With Proxies
When integrating proxies into your web scraping project, there’s a unique way to avoid 429 (Too Many Requests) instead of using Python requests retry strategies.
Since 429 is assigned to an IP address, you can switch proxies to completely avoid the HTTP error code whenever it’s received. As long as you have a pay as you go residential proxy , you can keep switching IP addresses to avoid 429.
You can also have a failed requests retry strategy going as a fallback against other error codes.
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def send_get_request(URL):
sess = requests.session()
proxies = {"http" : "http://USER:PASS@HOST:PORT"}
retries = Retry(total = 5,
backoff_factor = 1,
status_forcelist = [500, 502, 503, 504])
sess.mount('https://', HTTPAdapter(max_retries=retries))
get_url = sess.get(URL, proxies=proxies)
if get_url.status_code == 429:
sess.get(URL, proxies=proxies)
print(get_url.status_code)
send_get_request('https://iproyal.com')
Since we’re using rotating Residential proxies , all we need to do is send a new request with the same endpoint if we receive a 429 error. Rotating proxies will automatically give a new IP address.
With sticky sessions, you should generate a larger list of proxies and place it in a dictionary object, then use an if statement to change to a new IP address once a 429 is received.
Conclusion
These basic strategies should let you automatically resolve most HTTP error codes. There are two strategies for avoiding most of the popular HTTP errors. You can set a basic loop to retry your failed requests:
import requests
import time
def send_get_request(URL, retry): #defines a function to send get requests with two arguments
for i in range(retry): #sets a range for the amount of retries
try:
r = requests.get(URL)
if r.status_code not in [200, 404]:
time.sleep(5) #tries to retrieve the URL, if 200 or 404 is not received, waits 5 seconds before trying again
else:
break #stops function if 200 or 404 received
except requests.exceptions.ConnectionError:
pass
print(r.status_code)
send_get_request('https://iproyal.com', 5)
Or you can use the HTTPAdapter Python requests retry strategy, which can be a little slower but less detectable:
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def send_get_request(URL): #defines a get request function with one argument
sess = requests.session() #sets a session object
retries = Retry(total = 5,
backoff_factor = 1,
status_forcelist = [429, 500, 502, 503, 504]) #sets the retry amount to 5, backoff_factor to 1, and sets specific HTTP error codes to be retried on
sess.mount('https://', HTTPAdapter(max_retries=retries)) #mounts HTTPAdapter to the session
get_URL = sess.get(URL)
print(get_URL.status_code)
send_get_request('https://iproyal.com')
Finally, for 429, you can always switch your IP address each time you receive the error code. An if statement and a new status_forcelist is all that’s needed.
Author
Eugenijus Denisov
Senior Software Engineer
With over a decade of experience under his belt, Eugenijus has worked on a wide range of projects - from LMS (learning management system) to large-scale custom solutions for businesses and the medical sector. Proficient in PHP, Vue.js, Docker, MySQL, and TypeScript, Eugenijus is dedicated to writing high-quality code while fostering a collaborative team environment and optimizing work processes. Outside of work, you’ll find him running marathons and cycling challenging routes to recharge mentally and build self-confidence.
Learn More About Eugenijus Denisov