Python Script to Check for Broken Links And Redirects
In this tutorial, you can find how to:
- check broken links using Python script
- redirects like 301, 308 etc
We will use Python library requests to detect HTTP status codes.
If you like to check broken URL links on a page of list of pages check this article: How to Find Broken Links With Python
Detect Redirects with Python
To detect redirects in Python we can use library requests
and history
.
Let's do short demo by using site: http://httpbin.org/ and simulate several redirects:
import requests
r = requests.get('http://httpbin.org/absolute-redirect/2')
print(r.status_code)
print(r.history)
the Python scripts detects 2 redirects:
[<Response [302]>, <Response [302]>]
Detect Redirects in list of URL-s
If you need to find all pages with redirects from a list of URL-s and get a report in a CSV file or a DataFrame we can do:
import requests
pages = [
'http://httpbin.org/absolute-redirect/2',
'http://httpbin.org/absolute-redirect/3',
'http://httpbin.org/get',
'https://httpbin.org/redirect-to?url=/redirect/1&status_code=308'
]
ls = []
for page in pages:
r = requests.get(page)
status = r.status_code
ls.append([page, r.history])
import pandas as pd
df = pd.DataFrame(ls)
df
This will result into:
0 | 1 |
---|---|
http://httpbin.org/absolute-redirect/2 | [<Response [302]>, <Response [302]>] |
http://httpbin.org/absolute-redirect/3 | [<Response [302]>, <Response [302]>, <Response [302]>] |
http://httpbin.org/get | [] |
https://httpbin.org/redirect-to?url=/redirect/1&status_code=308 | [<Response [308]>, <Response [302]>] |
Check for broken links with Python
To check broken links or HTTP status codes using Python we can use the same requests
library and get status:
import requests
import pandas as pd
pages = [
'http://httpbin.org/absolute-redirect/2',
'http://httpbin.org/redirecXXXX'
]
ls = []
for page in pages:
r = requests.get(page)
status = r.status_code
ls.append([page, status])
df = pd.DataFrame(ls)
result is:
0 | 1 |
---|---|
http://httpbin.org/absolute-redirect/2 | 200 |
http://httpbin.org/redirecXXXX | 404 |
We can find that page:
http://httpbin.org/redirecXXXX
returns code 404 - which indicates that page is Not Found.
In another article we will see how to check all subrequests for redirects and broken links: How To Get the Network Tab with Python Requests?