In this tutorial, you can find how to:

  • check broken links using Python script
  • redirects like 301, 308 etc

We will use Python library requests to detect HTTP status codes.

Detect Redirects with Python

To detect redirects in Python we can use library requests and history.

Let's do short demo by using site: http://httpbin.org/ and simulate several redirects:

import requests
r = requests.get('http://httpbin.org/absolute-redirect/2')
print(r.status_code)
print(r.history)

the Python scripts detects 2 redirects:

[<Response [302]>, <Response [302]>]

Detect Redirects in list of URL-s

If you need to find all pages with redirects from a list of URL-s and get a report in a CSV file or a DataFrame we can do:

import requests

pages = [
	'http://httpbin.org/absolute-redirect/2',
	'http://httpbin.org/absolute-redirect/3',
	'http://httpbin.org/get',
	'https://httpbin.org/redirect-to?url=/redirect/1&status_code=308'
]

ls = []
for page in pages:
	r = requests.get(page)
	status = r.status_code
	ls.append([page, r.history])

import pandas as pd
df = pd.DataFrame(ls)
df

This will result into:

0 1
http://httpbin.org/absolute-redirect/2 [<Response [302]>, <Response [302]>]
http://httpbin.org/absolute-redirect/3 [<Response [302]>, <Response [302]>, <Response [302]>]
http://httpbin.org/get []
https://httpbin.org/redirect-to?url=/redirect/1&status_code=308 [<Response [308]>, <Response [302]>]

To check broken links or HTTP status codes using Python we can use the same requests library and get status:

import requests
import pandas as pd

pages = [
	'http://httpbin.org/absolute-redirect/2',
	'http://httpbin.org/redirecXXXX'
]

ls = []
for page in pages:
	r = requests.get(page)
	status = r.status_code
	ls.append([page, status])


df = pd.DataFrame(ls)

result is:

0 1
http://httpbin.org/absolute-redirect/2 200
http://httpbin.org/redirecXXXX 404

We can find that page:

http://httpbin.org/redirecXXXX

returns code 404 - which indicates that page is Not Found.

In another article we will see how to check all subrequests for redirects and broken links: How To Get the Network Tab with Python Requests?

Resources