In this tutorial, you can find how to:
- check broken links using Python script
- redirects like 301, 308 etc
We will use Python library requests to detect HTTP status codes.
Detect Redirects with Python
To detect redirects in Python we can use library
Let's do short demo by using site: http://httpbin.org/ and simulate several redirects:
import requests r = requests.get('http://httpbin.org/absolute-redirect/2') print(r.status_code) print(r.history)
the Python scripts detects 2 redirects:
[<Response >, <Response >]
Detect Redirects in list of URL-s
If you need to find all pages with redirects from a list of URL-s and get a report in a CSV file or a DataFrame we can do:
import requests pages = [ 'http://httpbin.org/absolute-redirect/2', 'http://httpbin.org/absolute-redirect/3', 'http://httpbin.org/get', 'https://httpbin.org/redirect-to?url=/redirect/1&status_code=308' ] ls =  for page in pages: r = requests.get(page) status = r.status_code ls.append([page, r.history]) import pandas as pd df = pd.DataFrame(ls) df
This will result into:
|http://httpbin.org/absolute-redirect/2||[<Response >, <Response >]|
|http://httpbin.org/absolute-redirect/3||[<Response >, <Response >, <Response >]|
|https://httpbin.org/redirect-to?url=/redirect/1&status_code=308||[<Response >, <Response >]|
Check for broken links with Python
To check broken links or HTTP status codes using Python we can use the same
requests library and get status:
import requests import pandas as pd pages = [ 'http://httpbin.org/absolute-redirect/2', 'http://httpbin.org/redirecXXXX' ] ls =  for page in pages: r = requests.get(page) status = r.status_code ls.append([page, status]) df = pd.DataFrame(ls)
We can find that page:
returns code 404 - which indicates that page is Not Found.
In another article we will see how to check all subrequests for redirects and broken links: How To Get the Network Tab with Python Requests?