How to scrape page with Python Requests and BeautifulSoup

In this post you can find how to scrape tags from a webpage using Python:

  • the requests library will fetch the HTML content
  • BeautifulSoup will parse and extract content.

Install

Install the beautifulsoup4 library if you haven't already:

pip install beautifulsoup4

beautifulsoup4

Example 1 - BeautifulSoup extract headers

import requests
from bs4 import BeautifulSoup

page = requests.get(
    "https://en.wikipedia.org/wiki/Main_Page")
soup = BeautifulSoup(page.content, 'html.parser')

page_title = soup.title.text

print(page_title)
anchors = [td.find('h1').text for td in soup.findAll('body')]
anchors

result:

Wikipedia, the free encyclopedia

['Main Page']

Example 2 - BeautifulSoup extract all h2 tags

import requests
from bs4 import BeautifulSoup

def scrape_h2_tags(url):
    # Make a GET request to the URL
    response = requests.get(url)
    
    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Parse the HTML content using BeautifulSoup
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find all <h2> tags
        h2_tags = soup.find_all('h2')
        
        # Print the text content of each <h2> tag
        for h2_tag in h2_tags:
            print(h2_tag.text)
    else:
        print(f"Failed to retrieve the webpage. Status code: {response.status_code}")

# Example usage
url_to_scrape = "https://en.wikipedia.org/wiki/Main_Page"
scrape_h2_tags(url_to_scrape)

result:

From today's featured article
Did you know ...
In the news
On this day
From today's featured list
Today's featured picture
Other areas of Wikipedia
Wikipedia's sister projects
Wikipedia languages