IPRoyal
Back to blog

How to Retry Failed Python Requests

Eugenijus Denisov

Last updated -

How to

In This Article

Ready to get started?

Register now

Python’s requests library is widely used for various purposes , chief among them being web scraping. It’s a simplified version of the built-in HTTP request (urllib) library, making it easier to connect with servers and websites.

Web scraping projects often utilize requests due to their simplicity and effectiveness. It’s also easier to troubleshoot, which is immensely useful as failed requests come up frequently during scraping.

Getting Started With the Requests Library

Our guide assumes you have some basic knowledge of Python and an IDE . Once you have these things going, you’ll need to install the requests library.

pip install requests

Your IDE should automatically download, unpack, and install the requests library, which you can then use to send requests.

You’ll first need to import it, as with every other Python library.

import requests

Sending a request is simple as it’s a simple call for the “get” method (or any other that you need).

import requests

def send_get_request(URL):
    r = requests.get(URL)
    print(r.status_code)

send_get_request('https://iproyal.com')

Printing the status code of the response will be important later as these serve as your indicators to optimize failed requests. You can test it by clicking the green arrow at the top right (for PyCharm).

axcmORO.jpeg

Types of Failed Requests Responses

All successful and unsuccessful attempts to connect to a server will return some form of HTTP status code. We’ll avoid the successful ones as you don’t need to retry them.

403 Forbidden

Your destination server understood the request but did not respond appropriately , as you are not allowed to access that document (or the entire server). These are usually hard to solve, as 403 is most often returned when you need credentials or have been banned.

If you have credentials, they may be included in your GET request.

import requests

def send_get_request(URL, credentials):
    r = requests.get(URL, auth=credentials)
    print(r.status_code)

login_details = ('username', 'password')
send_get_request('https://iproyal.com', login_details)

Replacing the “login_details” objects’ values with your username and password should allow you to access a protected document.

Note that it’ll only work on a select few websites. Most now use a more complicated version of logging in .

429 Too Many Requests

One of the most frequent HTTP error responses when web scraping, 429 states that you’ve been sending too many requests to the same endpoint.

Switching proxies or implementing a strategy to retry failed requests is your best option.

500 Internal Server Error

Something failed on the server’s end of things. It’s likely that a simple retry would work either instantly or within a few minutes.

502 Bad Gateway

Nearly identical to the 500 Internal Server Error— something went wrong with the upstream server, causing failed requests. Retrying in a short while will likely fix the issue.

503 Service Unavailable

Indicates that the server is likely completely down or otherwise unavailable. While you can retry failed requests, it’ll only resolve by itself once the administrator fixes the issue.

504 Gateway Timeout

Indicates networking issues, which may be caused by either end. Retrying with increasing delays could fix the issue.

Implementing a Failed Requests Retry Strategy

Python requests give you all the tools you need to effectively do away with most failed requests. Out of the list of HTTP status codes above, only 403 and 429 have unique approaches, although 429 can also be solved like the rest.

There are two ways to create a Python requests retry strategy itself, one being a simple loop at set intervals and the other using increasing delays. The former has the benefit of resolving faster, but it’s also more easily detectable.

Retry Failed Requests With a Loop

import requests
import time

def send_get_request(URL, retry):
    for i in range(retry):
        try:
            r = requests.get(URL)
            if r.status_code not in [200, 404]:
                time.sleep(5)
            else:
                break
        except requests.exceptions.ConnectionError:
            pass
    print(r.status_code)

send_get_request('https://dashboard.iproyal.com/login', 5)

Since we’ll be using sleep to create a delay, we have to import the time library, which is done right after the Python requests library, although the ordering doesn’t matter.

In our function, we now include a “retry” argument, which will be used to state how many times we’ll retry the failed requests.

Additionally, a for loop is included, which takes the retry number and uses it as a range. An “if” statement is included to verify if 200 or 404 is received. If neither, then the function sleeps for 5 seconds and repeats the process.

If 200 or 400 is received, the function stops. Additionally, if a connection error occurs, it’ll simply do nothing , bypassing regular Python requests error handling.

Finally, you can always set a custom Python requests.get timeout function by adding an argument “(timeout = N)” if timing out is causing an issue.

Retry Failed Requests With HTTPAdapter

We’ll have to import a little more than Python requests for the second strategy.

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

HTTPAdapter will let us mount our failed requests retry strategy to a session. Our retry strategy will be defined by the “urllib3” retry utility.

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

def send_get_request(URL):
    sess = requests.session()

    retries = Retry(total = 5,
                    backoff_factor = 1,
                    status_forcelist = [429, 500, 502, 503, 504])

    sess.mount('https://', HTTPAdapter(max_retries=retries))
    get_URL = sess.get(URL)
    print(get_URL.status_code)

send_get_request('https://iproyal.com')

Our function now opens with a session instead of directly sending a request, which is necessary for the current failed requests strategy.

We then define a retry object with a few arguments. First, we set the total number of Python retries to 5, a backoff factor of 1, and set which status codes should be retried. A backoff factor is a more complicated sleep function, which is defined as:

{backoff factor} * (2 ** ({retry number} - 1))

Our first retry will be instant, but others will happen at increasingly long intervals.

Then, we mount our session to HTTPAdapter, which will perform all the necessary retries. After that everything is essentially identical to other strategies.

Finally, any Python request will wait for a response before proceeding. If you want to send a few requests in parallel, asynchronous programming will be required.

Escape 429 With Proxies

When integrating proxies into your web scraping project, there’s a unique way to avoid 429 (Too Many Requests) instead of using Python requests retry strategies.

Since 429 is assigned to an IP address, you can switch proxies to completely avoid the HTTP error code whenever it’s received. As long as you have a pay as you go residential proxy , you can keep switching IP addresses to avoid 429.

You can also have a failed requests retry strategy going as a fallback against other error codes.

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

def send_get_request(URL):
    sess = requests.session()

    proxies = {"http" : "http://USER:PASS@HOST:PORT"}

    retries = Retry(total = 5,
                    backoff_factor = 1,
                    status_forcelist = [500, 502, 503, 504])

    sess.mount('https://', HTTPAdapter(max_retries=retries))
    get_url = sess.get(URL, proxies=proxies)
    if get_url.status_code == 429:
        sess.get(URL, proxies=proxies)
        
    print(get_url.status_code)

send_get_request('https://iproyal.com')

Since we’re using rotating Residential proxies , all we need to do is send a new request with the same endpoint if we receive a 429 error. Rotating proxies will automatically give a new IP address.

With sticky sessions, you should generate a larger list of proxies and place it in a dictionary object, then use an if statement to change to a new IP address once a 429 is received.

Conclusion

These basic strategies should let you automatically resolve most HTTP error codes. There are two strategies for avoiding most of the popular HTTP errors. You can set a basic loop to retry your failed requests:

import requests
import time 

def send_get_request(URL, retry): #defines a function to send get requests with two arguments
    for i in range(retry): #sets a range for the amount of retries
        try:
            r = requests.get(URL)
            if r.status_code not in [200, 404]: 
                time.sleep(5) #tries to retrieve the URL, if 200 or 404 is not received, waits 5 seconds before trying again
            else:
                break #stops function if 200 or 404 received
        except requests.exceptions.ConnectionError:
            pass
    print(r.status_code)

send_get_request('https://iproyal.com', 5)

Or you can use the HTTPAdapter Python requests retry strategy, which can be a little slower but less detectable:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

def send_get_request(URL): #defines a get request function with one argument
    sess = requests.session() #sets a session object

    retries = Retry(total = 5,
                    backoff_factor = 1,
                    status_forcelist = [429, 500, 502, 503, 504]) #sets the retry amount to 5, backoff_factor to 1, and sets specific HTTP error codes to be retried on

    sess.mount('https://', HTTPAdapter(max_retries=retries)) #mounts HTTPAdapter to the session
    get_URL = sess.get(URL)
    print(get_URL.status_code)

send_get_request('https://iproyal.com')

Finally, for 429, you can always switch your IP address each time you receive the error code. An if statement and a new status_forcelist is all that’s needed.

Create account

Author

Eugenijus Denisov

Senior Software Engineer

With over a decade of experience under his belt, Eugenijus has worked on a wide range of projects - from LMS (learning management system) to large-scale custom solutions for businesses and the medical sector. Proficient in PHP, Vue.js, Docker, MySQL, and TypeScript, Eugenijus is dedicated to writing high-quality code while fostering a collaborative team environment and optimizing work processes. Outside of work, you’ll find him running marathons and cycling challenging routes to recharge mentally and build self-confidence.

Learn More About Eugenijus Denisov
Share on

Related articles