Extract All Links from a Website Using Python

In this short guide, you'll see how to extract all links from a website using Python.

Here you can find the short answer:

(1) Using BeautifulSoup

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(html, 'html.parser')
links = [a['href'] for a in soup.find_all('a', href=True)]

(2) Using requests-html

from requests_html import HTMLSession

session = HTMLSession()
r = session.get(url)
links = r.html.absolute_links

(3) Using Selenium for dynamic pages

from selenium import webdriver

driver = webdriver.Chrome()
links = [elem.get_attribute('href') for elem in driver.find_elements('tag name', 'a')]

So let's see several useful examples on how to extract all links from websites with Python.

1: Extract links using BeautifulSoup

Let's start with the most popular method - using BeautifulSoup to parse HTML and extract all hyperlinks:

from bs4 import BeautifulSoup
import requests

url = 'https://www.python.org'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

links = [a.get('href') for a in soup.find_all('a', href=True)]

print(f"Found {len(links)} links")
print(links[:5])

result will be:

Found 87 links
['#content', '#python-network', '/', '/psf-landing/', '/about/']

This method works perfectly for static websites where all content loads immediately. BeautifulSoup is fast, lightweight, and handles most HTML parsing needs.

To get only absolute URLs (full URLs with domain), you can filter the results:

from urllib.parse import urljoin

base_url = 'https://www.python.org'
absolute_links = [urljoin(base_url, link) for link in links if link.startswith('http') or link.startswith('/')]

print(absolute_links[:3])

result:

['https://www.python.org/', 'https://www.python.org/psf-landing/', 'https://www.python.org/about/']

2: Extract unique external links

What if you want to extract only external links (links pointing to other domains)? You can filter based on the domain:

from bs4 import BeautifulSoup
import requests
from urllib.parse import urlparse

url = 'https://www.github.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

base_domain = urlparse(url).netloc
external_links = []

for a in soup.find_all('a', href=True):
    link = a['href']
    if link.startswith('http'):
        link_domain = urlparse(link).netloc
        if link_domain != base_domain:
            external_links.append(link)

print(f"Found {len(set(external_links))} unique external links")
print(list(set(external_links))[:3])

result:

Found 12 unique external links
['https://docs.github.com', 'https://skills.github.com', 'https://support.github.com']

3: Extract links from dynamic websites using Selenium

For websites that load content dynamically with JavaScript (like single-page applications), BeautifulSoup won't capture all links. Use Selenium instead:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')

driver = webdriver.Chrome(options=chrome_options)
driver.get('https://www.amazon.com')

links = [elem.get_attribute('href') for elem in driver.find_elements(By.TAG_NAME, 'a')]
links = [link for link in links if link]

print(f"Total links found: {len(links)}")
print(links[:5])

driver.quit()

result:

Total links found: 234
['https://www.amazon.com/gp/help/customer/display.html', 'https://www.amazon.com/ap/signin', 'https://www.amazon.com/gp/cart/view.html', 'https://www.amazon.com/prime', 'https://www.amazon.com/bestsellers']

Selenium is essential for modern websites built with React, Vue, or Angular where content loads after the initial page load.

4: Extract links with additional metadata

Sometimes you need more than just the URL - you might want the link text, title attribute, or CSS classes:

from bs4 import BeautifulSoup
import requests

url = 'https://news.ycombinator.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

link_data = []
for a in soup.find_all('a', href=True):
    link_data.append({
        'url': a.get('href'),
        'text': a.get_text(strip=True),
        'title': a.get('title', ''),
        'class': ' '.join(a.get('class', []))
    })

print(f"Extracted {len(link_data)} links with metadata")
print(link_data[:3])

result:

Extracted 156 links with metadata
[{'url': 'https://news.ycombinator.com', 'text': 'Hacker News', 'title': '', 'class': ''}, 
 {'url': 'newest', 'text': 'new', 'title': '', 'class': ''}, 
 {'url': 'front', 'text': 'past', 'title': '', 'class': ''}]

This approach is useful for content analysis, SEO audits, or building web crawlers that need context about each link.

5: Save extracted links to CSV file

Finally, let's save all extracted links to a CSV file for further analysis:

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'https://www.reddit.com/r/python'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

links = []
for a in soup.find_all('a', href=True):
    links.append({
        'url': a['href'],
        'text': a.get_text(strip=True)[:50]
    })

df = pd.DataFrame(links)
df.to_csv('extracted_links.csv', index=False)

print(f"Saved {len(df)} links to CSV file")
print(df.head())

result:

Saved 287 links to CSV file
                                 url                                text
0               /r/Python/            Python
1      /r/Python/wiki/index           Wiki
2                /r/Python/           Rules
3              /r/Python/hot            Hot
4              /r/Python/new            New

This creates a structured dataset perfect for spreadsheet analysis, data visualization, or further processing with pandas.

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet

1: Extract links using BeautifulSoup

2: Extract unique external links

3: Extract links from dynamic websites using Selenium

4: Extract links with additional metadata

5: Save extracted links to CSV file

Resources