In this post, we will use several Python libraries to convert HTML or web page to image. We will cover several different examples:

  • copy entire page
  • capture screenshots by tag, class and id
  • take image by coordinates
  • libraries
    • html2image
    • pyppeteer
    • Playwright
    • selenium

html2image

We will start by lightweight Python package - html2image that acts as a wrapper around the headless mode of existing web browsers to generate images. It can be installed by:

Simple example that take screenshot of the full page by:

from html2image import Html2Image
hti = Html2Image()

hti.screenshot(url='https://www.python.org', save_as='python_org.png')

pyppeteer - local file

To convert HTML file to an image using a CSS locator in Python, we can use the pyppeteer library:

It can be installed by: pip install pyppeteer

import asyncio
from pyppeteer import launch

async def html_to_image(html_file_path, css_locator, output_image_path):
    browser = await launch()
    page = await browser.newPage()
    await page.goto(f"file://{html_file_path}")
    element = await page.querySelector(css_locator)
    bounding_box = await element.boundingBox()
    await page.screenshot(
        output_image_path,
        clip={
            "x": bounding_box["x"],
            "y": bounding_box["y"],
            "width": bounding_box["width"],
            "height": bounding_box["height"],
        },
    )
    await browser.close()

asyncio.get_event_loop().run_until_complete(html_to_image("path/to/file.html", "css/locator", "output/image/path.png"))

pyppeteer - save web page as image

pyppeteer library can also convert live pages to images. We will use page.goto() to navigate to the URL - https://www.example.com:

import asyncio
from pyppeteer import launch

async def live_page_to_image(url, css_locator, output_image_path):
    browser = await launch()
    page = await browser.newPage()
    await page.goto(url)
    element = await page.querySelector(css_locator)
    bounding_box = await element.boundingBox()
    await page.screenshot(
        output_image_path,
        clip={
            "x": bounding_box["x"],
            "y": bounding_box["y"],
            "width": bounding_box["width"],
            "height": bounding_box["height"],
        },
    )
    await browser.close()

asyncio.get_event_loop().run_until_complete(live_page_to_image("https://www.example.com", "css/locator", "output/image/path.png"))

pyppeteer - search by tag, class and id

We can cxtract all div elements with class "my-class" and id "my-id":

import asyncio
from pyppeteer import launch

async def extract_to_images(url):
    browser = await launch()
    page = await browser.newPage()
    await page.goto(url)

    elements = await page.querySelectorAll('div.my-class#my-id')

    for i, element in enumerate(elements):
        # Get the bounding box of the element
        bounding_box = await element.boundingBox()

        # Take a screenshot of the element
        element_screenshot = await element.screenshot()

        # Save the screenshot to a file
        output_image_path = f"output/image_{i}.png"
        with open(output_image_path, "wb") as f:
            f.write(element_screenshot)

    await browser.close()

asyncio.get_event_loop().run_until_complete(extract_to_images("https://www.example.com"))

Playwright

Alternatively we can use Python library playwright to take screenshot of web pages:

To capture screenshots in Playwright, we can use the screenshot() method of the Page class. To select elements by tag and class, we can use the querySelectorAll() method:

from playwright.sync_api import Playwright, sync_playwright

def extract_to_images(url: str):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url)

        # Select all div elements with class "my-class"
        elements = page.query_selector_all('div.my-class')

        for i, element in enumerate(elements):
            # Take a screenshot of the element
            element_screenshot = element.screenshot()

            # Save the screenshot to a file
            output_image_path = f"output/image_{i}.png"
            with open(output_image_path, "wb") as f:
                f.write(element_screenshot)

        browser.close()

extract_to_images("https://www.example.com")

selenium

We can capture screenshot with Selenium:

from selenium import webdriver

def extract_to_images(url: str):
    # Set up Firefox driver with geckodriver executable
    driver = webdriver.Firefox(executable_path='/path/to/geckodriver')

    # Navigate to URL
    driver.get(url)

    # Select all div elements with class "my-class"
    elements = driver.find_elements_by_css_selector('div.my-class')

    for i, element in enumerate(elements):
        # Take a screenshot of the element
        element_screenshot = element.screenshot_as_png

        # Save the screenshot to a file
        output_image_path = f"output/image_{i}.png"
        with open(output_image_path, "wb") as f:
            f.write(element_screenshot)

    driver.quit()

extract_to_images("https://www.example.com")

Firefox

Finally we will check two examples how to take manually screenshot in Firefox.

To take screenshot of page in Firefox we can do: