In this article, you can find how to get all subrequests for a page by using Python requests. This is equivalent to the network tab of Chrome or Firefox which shows all subrequest of a given page like: assets, images, JS files etc.
To check for redirects or broken links with Python you can use: Python Script to Check for Broken Links And Redirects
To check subrequests we will use an additional library: selenium-wire 5.1.0. This library is installed by:
pip install selenium-wire
Then we can use it to get all subrequests by:
from seleniumwire import webdriver
import pandas as pd
pages = [
'http://httpbin.org/',
'http://wikipedia.org/',
'http://google.com/'
]
urls = []
driver = webdriver.Firefox()
for page in set(pages):
page = page.replace('//www', '//dev')
driver.get(page)
for request in driver.requests:
if request.response:
print(request.url, request.response.status_code, request.response.headers['Content-Type'])
urls.append([page, request.url, request.response.status_code, request.response.headers['Content-Type']])
df = pd.DataFrame(urls)
The code above process the following links:
Firefox driver is used in headful mode to load each page. Then all subrequests are processed one by one.
Finally we collect all links and their status in Pandas DataFrame.
Final result contain all requests from those 3 URL-s:
url | requests | status | content-type |
---|---|---|---|
http://httpbin.org/ | http://detectportal.firefox.com/canonical.html | 200 | text/html |
http://google.com/ | http://detectportal.firefox.com/canonical.html | 200 | text/html |
http://wikipedia.org/ | https://tracking-protection.cdn.mozilla.net/ads-track-digest256/1695941350 | 200 | application/octet-stream |
http://google.com/ | https://tracking-protection.cdn.mozilla.net/content-track-digest256/1695941350 | 200 | application/octet-stream |
http://wikipedia.org/ | https://wikipedia.org/ | 301 | text/html; charset=iso-8859-1 |
Total number of requests is 117.
You can compare the results with the Firefox network tab and Python results by:
- Open Firefox
- Right click Inspect
- Network Tab
- Reload
Errors
if you get the following error:
ModuleNotFoundError: No module named 'blinker._saferef'
You can fix it by downgrading the blinker
module:
!pip install blinker==1.7.0
P.S. The selenium-wire module was archived on github: https://github.com/wkeeling/selenium-wire