In this tutorial, you can find how to:
- check broken links using Python script
- redirects like 301, 308 etc
We will use Python library requests to detect HTTP status codes.
Detect Redirects with Python
To detect redirects in Python we can use library requests
and history
.
Let's do short demo by using site: http://httpbin.org/ and simulate several redirects:
import requests
r = requests.get('http://httpbin.org/absolute-redirect/2')
print(r.status_code)
print(r.history)
the Python scripts detects 2 redirects:
[<Response [302]>, <Response [302]>]
Detect Redirects in list of URL-s
If you need to find all pages with redirects from a list of URL-s and get a report in a CSV file or a DataFrame we can do:
import requests
pages = [
'http://httpbin.org/absolute-redirect/2',
'http://httpbin.org/absolute-redirect/3',
'http://httpbin.org/get',
'https://httpbin.org/redirect-to?url=/redirect/1&status_code=308'
]
ls = []
for page in pages:
r = requests.get(page)
status = r.status_code
ls.append([page, r.history])
import pandas as pd
df = pd.DataFrame(ls)
df
This will result into:
0 | 1 |
---|---|
http://httpbin.org/absolute-redirect/2 | [<Response [302]>, <Response [302]>] |
http://httpbin.org/absolute-redirect/3 | [<Response [302]>, <Response [302]>, <Response [302]>] |
http://httpbin.org/get | [] |
https://httpbin.org/redirect-to?url=/redirect/1&status_code=308 | [<Response [308]>, <Response [302]>] |
Check for broken links with Python
To check broken links or HTTP status codes using Python we can use the same requests
library and get status:
import requests
import pandas as pd
pages = [
'http://httpbin.org/absolute-redirect/2',
'http://httpbin.org/redirecXXXX'
]
ls = []
for page in pages:
r = requests.get(page)
status = r.status_code
ls.append([page, status])
df = pd.DataFrame(ls)
result is:
0 | 1 |
---|---|
http://httpbin.org/absolute-redirect/2 | 200 |
http://httpbin.org/redirecXXXX | 404 |
We can find that page:
http://httpbin.org/redirecXXXX
returns code 404 - which indicates that page is Not Found.
In another article we will see how to check all subrequests for redirects and broken links: How To Get the Network Tab with Python Requests?