In this short tutorial, you'll see how to detect borken links with Python. Two examples will be shown - one using BeautifulSoup and the other one will use Selenium.

If you need to check page redirects and broken URL-s from list of pages you can check this article: Python Script to Check for Broken Links And Redirects.

The first example is extracting all links on a given URL

import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor

def get_broken_links(url):
	def _validate_url(url):
		r = requests.head(url)
		# print(url, r.status_code)
		if r.status_code == 404:

	data = requests.get(url).text
	soup = BeautifulSoup(data, features="html.parser")
	links = [link.get("href") for link in soup.find_all("a")]
	broken_links = []
	with ThreadPoolExecutor(max_workers=8) as executor:, links)
	return broken_links

We are checking the website of the creator of the request library: Kenneth Reitz.

The checks show one broken link on his page:


We can also list each link found and the status code like: 200 301 301 200 200
... 200 404 200

The code works as follows:

  • Defines an inner function _validate_url(url) to check the validity of each URL by HTTP status - 404
  • Retrieves the webpage content using requests.get(url).text.
  • Parses the HTML content using BeautifulSoup to extract all links (<a> tags).
  • Initializes an empty list broken_links to store broken links.
  • Utilizes a ThreadPoolExecutor to concurrently execute _validate_url function for each link with a maximum of 8 workers.
  • Returns the list of broken links found during the validation process.

First you need to install the package by:

pip install selenium

Then we can use the following code to check multiple pages. The will load the pages in a real browser.

It will check the links in two ways:

  • XPATH - value="//a[@href]"
  • tag name
from import By
import pandas as pd
from seleniumwire import webdriver
import requests

driver = webdriver.Firefox()  
dfs = []

def validate_url(url):
    broken_links = []
        r = requests.head(url)

        if r.status_code == 404:
            print('broken page:', url)
        return {'page': url, 'status':r.status_code, 'parrent': page}
        print(url, 'error checking')
        return {'page': url, 'status':None, 'parrent': page}

def validate_page(page):
    href_links = []
    href_links2 = []
    elems = driver.find_elements(by=By.XPATH, value="//a[@href]")
    elems2 = driver.find_elements(by=By.TAG_NAME, value="a")
    for elem in elems:
        l = elem.get_attribute("href")
        if l not in href_links:
    for elem in elems2:
        l = elem.get_attribute("href")
        if (l not in href_links2) & (l is not None):
    print(href_links == href_links2) 
    data = [validate_url(url)  for url in  href_links]
    df = pd.DataFrame(data)
    display(df[df['status'] != 200])

pages = ["", ""]

for page in set(pages):

Finally it will display and print stats for the statuses and the broken links: