In this tutorial, you can find how to:

  • check broken links using Python script
  • redirects like 301, 308 etc

We will use Python library requests to detect HTTP status codes.

If you like to check broken URL links on a page of list of pages check this article: How to Find Broken Links With Python

Detect Redirects with Python

To detect redirects in Python we can use library requests and history.

Let's do short demo by using site: http://httpbin.org/ and simulate several redirects:

import requests
r = requests.get('http://httpbin.org/absolute-redirect/2')
print(r.status_code)
print(r.history)

the Python scripts detects 2 redirects:

[<Response [302]>, <Response [302]>]

Detect Redirects in list of URL-s

If you need to find all pages with redirects from a list of URL-s and get a report in a CSV file or a DataFrame we can do:

import requests

pages = [
	'http://httpbin.org/absolute-redirect/2',
	'http://httpbin.org/absolute-redirect/3',
	'http://httpbin.org/get',
	'https://httpbin.org/redirect-to?url=/redirect/1&status_code=308'
]

ls = []
for page in pages:
	r = requests.get(page)
	status = r.status_code
	ls.append([page, r.history])

import pandas as pd
df = pd.DataFrame(ls)
df

This will result into:

0 1
http://httpbin.org/absolute-redirect/2 [<Response [302]>, <Response [302]>]
http://httpbin.org/absolute-redirect/3 [<Response [302]>, <Response [302]>, <Response [302]>]
http://httpbin.org/get []
https://httpbin.org/redirect-to?url=/redirect/1&status_code=308 [<Response [308]>, <Response [302]>]

To check broken links or HTTP status codes using Python we can use the same requests library and get status:

import requests
import pandas as pd

pages = [
	'http://httpbin.org/absolute-redirect/2',
	'http://httpbin.org/redirecXXXX'
]

ls = []
for page in pages:
	r = requests.get(page)
	status = r.status_code
	ls.append([page, status])


df = pd.DataFrame(ls)

result is:

0 1
http://httpbin.org/absolute-redirect/2 200
http://httpbin.org/redirecXXXX 404

We can find that page:

http://httpbin.org/redirecXXXX

returns code 404 - which indicates that page is Not Found.

In another article we will see how to check all subrequests for redirects and broken links: How To Get the Network Tab with Python Requests?

Resources