In This Article

Back to blog

How to Retry Failed Python HTTP Requests

Tutorials

Eugenijus Denisov

Last updated - ‐ 11 min read

Key Takeaways

  • The main types of failed requests encountered when web scraping or automating tasks are 403, 429, 500, 502, 503, and 504.

  • Generally, you can resolve most of these HTTP errors with a basic retry loop or using HTTPAdapter for your retry strategy.

  • The 429 error can also be resolved by respecting the server's wait times or using proxies to change your IP address.

Python’s Requests is widely used for various purposes. One of the most popular among them is web scraping. Requests is a simplified version of the built-in HTTP request (urllib) library, making it easy to connect with servers and websites when collecting data.

Web scraping projects often utilize Requests due to its simplicity and effectiveness. It’s also easy to troubleshoot, which is immensely useful since failed requests occur frequently when scraping. Retrying failed requests is one of the few things that might be difficult for beginners with Requests.

Getting Started With the Requests Library

Our guide assumes you have some basic knowledge of Python and an IDE installed. Once you have the basic tools, you’ll need to install the Requests library.

pip install requests

Your IDE should automatically download, unpack, and install Requests, which you can then use to send requests. As with every other Python library, you'll need to import it.

import requests

Sending a request is easy, as it's a simple call for the GET method (or any other that you need).

import requests

def send_get_request(URL):
    r = requests.get(URL)
    print(r.status_code)

send_get_request('https://iproyal.com')

Printing the response codes will be important later. They serve as indicators to optimize failed requests. You can test it by clicking the green arrow at the top right (for PyCharm).

Types of Failed Requests Responses

All successful and unsuccessful attempts to connect to a server will return some form of HTTP status code. We'll avoid the successful ones, as you don't need to retry them.

403 Forbidden

Your destination server understood the request but did not respond appropriately, as you are not allowed to access that document (or the entire server). These are usually hard to solve, as a 403 error means you need credentials or have been banned.

If you have credentials, they may be included in your GET request.

import requests

def send_get_request(URL, credentials):
    r = requests.get(URL, auth=credentials)
    print(r.status_code)

login_details = ('username', 'password')
send_get_request('https://iproyal.com', login_details)

Replacing the login_details objects' values with your username and password should allow you to access a protected document. Note that it'll only work on a select few websites. Most now use a more complicated version of logging in .

429 Too Many Requests

One of the most frequent HTTP error responses when scraping is 429. It states that you've been sending too many requests to the same endpoint.

You can parse the Retry-After HTTP header to see whether the server has provided a wait time. Respecting it is likely to solve the problem. Other options include switching proxies or implementing a strategy to retry failed requests through backoff times.

500 Internal Server Error

Something failed on the server's end, usually a bug or miscommunication, not with your response-request communication. A simple retry would likely work either instantly or within a few minutes.

502 Bad Gateway

502 is nearly identical to the 500 Internal Server Error. It means that something went wrong with the upstream server, causing failed requests. Retrying in a short while will likely fix the issue.

503 Service Unavailable

Indicates that the server is likely completely down or otherwise unavailable. While you can retry requests with such a response code, it'll only resolve on its own once the administrator fixes the issue.

504 Gateway Timeout

Indicates networking issues, which may be caused by either end of the response-request communication. Retrying the connection with increasing delays could fix the issue.

Ready to get started?
Register now

Implementing a Failed Requests Retry Strategy

Requests provide you with all the tools you need to address most failed requests effectively. Of the status codes above, only 403 and 429 have unique resolutions; in some cases, 429 can also be resolved like the rest.

Fixed Delay vs Backoff Retries

There are two ways to create an efficient strategy for Python Requests retries.

  • Fixed delay retries use a simple loop at set intervals, so the script waits the exact same time between every retry attempt. It's easier to develop, but also easier to detect for websites.
  • Backoff retry logic requires the wait time to increase after each failure. Often, the wait time is doubled or otherwise multiplied after each failure. Such a strategy reduces the server's load and the risk of blocking your script.

Exponential backoff is the recommended practice when connecting to public APIs and high-traffic websites. When sending concurrent requests, you might also want to add a small random offset (Jitter) to prevent retries from stacking.

Retry Failed Requests With Fixed Delay

import requests
import time

def send_get_request(url, retry):
    r = None
    for i in range(retry):
        try:
            r = requests.get(url)
            if r.status_code not in [200, 404]:
                time.sleep(5)
            else:
                break
        except requests.exceptions.ConnectionError:
            pass

    if r is not None:
        print(r.status_code)
    else:
        print("All retries failed with ConnectionError")

send_get_request('https://httpbin.org/status/503', 5)

Since we'll be using sleep to create a delay, we must import the time library. It is done immediately after importing the Python requests library (the order is inconsequential).

In our function, we now include a retry argument, which specifies the number of times we'll retry failed requests.

Additionally, a for loop is included, which uses the retry number as a range. An if statement verifies whether a 200 or 404 response is received. If neither, then the function sleeps for 5 seconds and repeats the process.

If 200 or 400 is received, the function stops. Additionally, if a connection error occurs, it'll simply do nothing, bypassing regular Python requests error handling.

Finally, you can always set a custom Python requests.get timeout function by adding an argument (timeout = N) if timing is causing an issue.

Timeout vs Retry - Key Difference

A timeout triggers a retry by defining how long the script waits for a response. Requests that don't receive an answer during the timeout window are considered failed. Without a timeout, your script could be frozen indefinitely.

Retry rules operate for each attempt, catching exceptions and trying again based on the rules you define. The timeout window supplements retry rules, such as max attempts count, error types qualified, and so on.

Retry Failed Requests With HTTPAdapter

We'll have to import more than Python requests for the second strategy. Here is the code snippet:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

HTTPAdapter will allow us to mount our failed requests retry strategy to a session. Our strategy will be defined by the urllib3 retry utility. First, let’s install urllib3.

pip install urllib3
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def send_get_request(url):
    session = requests.Session()

    retries = Retry(
        total=5,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504]
    )

    session.mount("https://", HTTPAdapter(max_retries=retries))
    response = session.get(url)
    print(response.status_code)

send_get_request("https://httpbin.org/status/503")

Our function now opens with a session instead of directly sending a request. We then define a retry object with a few arguments.

First, we set the total number of Python retries to 5, a backoff factor of 1, and set which status codes should be retried. A backoff factor is a more complicated sleep function, which is defined as:

{backoff factor} * (2 ** ({retry number} - 1))

Our first retry will be instant, but others will happen at increasingly longer intervals.

Then, we mount our session to HTTPAdapter, which will perform all the necessary retries. After that, everything is essentially identical to the previous retry strategy.

Finally, any Python request will wait for a response before proceeding. If you want to send multiple requests in parallel, you'll need to implement asynchronous programming principles.

What Does max_retries Mean?

Max_retries is the HTTPAdapter parameter that accepts either an integer or a full Retry object for controlling the retry frequency. With an integer, it only retries connection-level failures, and without one, it defaults to zero. In the code above, we defined it as accepting the full retry object.

As such, the Retry object controls the full retry behavior with attempt count, backoff, and which HTTP statuses trigger a retry. However, the retry logic does not apply to bare requests and needs an HTTPAdapter session, which enforces the retry logic.

Connection-level failures, such as timeouts, are retried automatically as no response is received. HTTP errors have to be explicitly configured for retries via status_forcelist in the Retry object because they signal a different error after the server responds.

Escape 429 With Proxies

If you use proxies for your web scraping project, you can employ a unique way to avoid the 429 error instead of using one of the Python requests retry strategies.

Since 429 is assigned to an IP address, you can switch proxies to completely avoid this HTTP error. As long as you have a pay-as-you-go residential proxy plan, you can keep switching IP addresses to avoid 429 indefinitely.

You can also have a similar failed requests retry strategy as a fallback for other errors.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def send_get_request(url):
    session = requests.Session()
    proxies = {"http": "http://USER:PASS@HOST:PORT"}

    retries = Retry(
        total=5,
        backoff_factor=1,
        status_forcelist=[500, 502, 503, 504]
    )

    session.mount("https://", HTTPAdapter(max_retries=retries))
    response = session.get(url, proxies=proxies)

    if response.status_code == 429:
        response = session.get(url, proxies=proxies)

    print(response.status_code)

send_get_request("https://httpbin.org/status/503")

Since we're using rotating residential proxies , all we need to do is send a new request with the same endpoint when we receive a 429 error. Rotating proxies will automatically give a new IP address.

With sticky sessions, you can place your current IP address in a dictionary object, then use an if statement to change to a new IP address once a 429 is received.

import requests
from tenacity import retry, wait_exponential

@retry(wait=wait_exponential(multiplier=1, min=2, max=10), stop=stop_after_attempt(5))
def fetch(url):
    response = requests.get(url, timeout=5)
    if response.status_code != 200:
        raise Exception(f"Bad response: {response.status_code}")
    return response

print(fetch("https://httpbin.org/status/503"))

Such a strategy offers automatic retries on failure using a clean, decorator-based approach. With @retry, you can implement retry-on-exception logic in Python for functions like HTTP requests or other operations.

Using the Retry-After Header

It's important to note that servers may provide the expected wait time for new requests in the Retry-After header. Respecting it helps avoid the 429 error and reduces blocks since you won't exceed the expected request frequency.

Wait times are expressed in either a number of seconds to wait or an exact HTTP date timestamp indicating when you can retry. Your code needs to read that header from the response, determine which format it is, convert it to a wait duration, and try again appropriately.

Popular Python HTTP libraries handle the Retry-After header automatically. Urllib3's Retry object in our code has a respect_retry_after_header parameter that defaults to True. So, as long as error 429 is in our status_forcelist, the Retry-After header will be respected.

Retry With Backoff Pattern / Library

You can also use the backoff library or write your own backoff algorithm. Here is a Python backoff example written without the library. Unless you need something extremely complicated, that will generally work:

import time
import requests

def get_with_backoff(url, max_retries=5, backoff_factor=1):
    for i in range(1, max_retries+1):
        try:
            r = requests.get(url)
            if r.status_code == 200:
                return r
        except requests.exceptions.RequestException:
            pass
        sleep_time = backoff_factor * (2 ** (i - 1))
        time.sleep(sleep_time)
    raise Exception("max retries exceeded")

print(get_with_backoff('https://iproyal\.com'))

It demonstrates a manual Python retry loop utilizing a backoff strategy. You could also plug in the backoff library to do the same.

Conclusion

These basic strategies should let you automatically resolve most HTTP error codes. There are two strategies for avoiding commonly encountered HTTP errors. You can set a basic loop to retry failed requests, or you can use the HTTPAdapter Python requests retry strategy.

Additionally, for the 429 error, you can switch your IP address each time you receive the error code. An if statement and a new status_forcelist are all that's needed.

FAQ

How to import requests in Python?

All Python code should start with import followed by libraries, such as Requests, in a separate line. Be sure that you have installed the newest Python and Requests versions before importing. That's how you get access to Requests and use GET requests (requests.get) or POST requests ( requests.post ) functions.

What status codes should I retry?

You usually retry on HTTP codes like 429, 500, 502, 503, 504. However, you should avoid retrying on 403 unless you're sure it's due to a temporary authorization issue. The codes tell your retry logic when to make attempts, but sometimes you must build your own retry logic.

What is backoff_factor?

The backoff_factor is part of a backoff strategy. It controls how long your code waits between retries. For example, wait = backoff_factor × (2 ** n) where "n" is the retry count. It follows an exponential backoff algorithm and is supported by Retry() from urllib3.util.retry import Retry.

How to test retry in Python?

You can simulate failures by hitting a test endpoint that returns errors with codes like 500 or 429. Then watch your code retry and print out HTTP codes. Use loops or a session object with HTTPAdapter, or use a decorator library like tenacity. That's how you can confirm the retry attempts count.

What is 429 retry after in Python?

When you get a 429 Too Many Requests response code, check its Retry-After HTTP header. You can parse that header to see whether the server has provided a wait time. Creating a failed requests logic that respects servers' wait time is likely to be more successful.

Does Python Requests retry by default?

No, Python's Requests does not retry requests automatically. If a request fails due to a network error, timeout, or other issues, it raises an exception, and an error message appears. To avoid disruptions, you'll have to implement Python Requests retry logic manually, using urllib3 or retry logic wrappers.

What is max_retries in Python Requests?

The max_retries argument in the HTTPAdapter class controls how many times the script should retry failed requests. It's set to zero by default, but you can configure it using urllib3's Retry object for specific errors. It allows you to define various counters, logic, and timing parameters.

What is the difference between backoff_factor and retries?

Both are used when building a retry logic in Python Requests. Retries set the number of retry attempts after a failed request, while backoff_factor controls the delay between them. Without retries, backoff_factor won't trigger any retry behaviour, and retries alone will try all attempts immediately with no delay, which isn't recommended.

Should retries handle exceptions or status codes?

Both. Handling only retries or only exceptions makes it possible for the server to fail without raising an exception to retry. Standard exceptions cover connection drops and timeouts, while the status_forcelist parameter can list errors, so they are caught even if the HTTP response was successful but indicates a failure.

Create Account
Share on
Article by IPRoyal
Meet our writers
Data News in Your Inbox

No spam whatsoever, just pure data gathering news, trending topics and useful links. Unsubscribe anytime.

No spam. Unsubscribe anytime.

Related articles