In this short guide, we will learn how to extract all text content from a webpage using Selenium in Python. Whether you're web scraping, testing web applications, or extracting data for analysis, getting visible text from pages is a fundamental Selenium operation.

Here you can find the short answer:

(1) Get all visible text

text = driver.find_element(By.TAG_NAME, 'body').text

(2) Get text from specific element

text = driver.find_element(By.CLASS_NAME, 'content').text

(3) Get inner HTML

html = driver.find_element(By.TAG_NAME, 'body').get_attribute('innerHTML')

So let's see multiple methods to extract text from webpages using Selenium.

1: Get All Visible Text from Entire Page

The simplest method to get all visible text from a webpage is accessing the body element:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.python.org')

page_text = driver.find_element(By.TAG_NAME, 'body').text

print(f"Total characters: {len(page_text)}")
print(f"\nFirst 500 characters:\n{page_text[:500]}")

driver.quit()

Output Result:

Total characters: 4523

First 500 characters:
Python
Python is a programming language that lets you work quickly and integrate systems more effectively.
Learn More

Get Started
Whether you're new to programming or an experienced developer, it's easy to learn and use Python.
Start with our Beginner's Guide

Download
Python source code and installers are available for Windows, Linux, macOS, and other platforms.
Latest: Python 3.12.1

Docs
Documentation for Python's standard library, along with tutorials and guides.
Browse Documentation

Jobs
Looking for work or looking to hire? The Python Job...

Key features:

  • Returns only visible text (hidden elements excluded)
  • Preserves line breaks between elements
  • No HTML tags included
  • Fast execution for most pages

2: Get Text from Specific Elements

Extract text from specific sections like headers, paragraphs, or divs:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://news.ycombinator.com')

titles = driver.find_elements(By.CLASS_NAME, 'titleline')

print(f"Found {len(titles)} article titles:\n")

for idx, title in enumerate(titles[:10], 1):
    print(f"{idx}. {title.text}")

driver.quit()

Output Result:

Found 30 article titles:

1. Show HN: AI-powered code review tool for Python
2. Understanding Database Indexing in PostgreSQL
3. Building Scalable APIs with FastAPI
4. Machine Learning Best Practices for Production
5. Why Rust is the Future of Systems Programming
6. Docker vs Kubernetes: When to Use Each
7. Advanced Python Decorators Explained
8. Microservices Architecture Patterns
9. OAuth 2.0 Authentication Guide
10. GraphQL vs REST: A Practical Comparison

3: Find Text on Page (Search for Specific Text)

Search for specific text on a page and verify its presence:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.github.com')

page_text = driver.find_element(By.TAG_NAME, 'body').text

search_terms = ['developers', 'open source', 'repositories', 'collaboration']

print("Searching for keywords on GitHub homepage:\n")

for term in search_terms:
    if term.lower() in page_text.lower():
        print(f"✓ Found: '{term}'")
        
        occurrences = page_text.lower().count(term.lower())
        print(f"  Appears {occurrences} time(s)\n")
    else:
        print(f"✗ Not found: '{term}'\n")

driver.quit()

Output Result:

Searching for keywords on GitHub homepage:

✓ Found: 'developers'
  Appears 8 time(s)

✓ Found: 'open source'
  Appears 5 time(s)

✓ Found: 'repositories'
  Appears 12 time(s)

✓ Found: 'collaboration'
  Appears 3 time(s)

4: Get Text with XPath Selector

Use XPath for precise text extraction from complex page structures:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.wikipedia.org')

heading = driver.find_element(By.XPATH, '//h1[@class="central-textlogo"]').text
print(f"Main heading: {heading}")

language_links = driver.find_elements(By.XPATH, '//div[@class="central-featured-lang"]//strong')

print(f"\nTop {len(language_links)} languages:")
for idx, lang in enumerate(language_links, 1):
    print(f"{idx}. {lang.text}")

driver.quit()

Output Result:

Main heading: WIKIPEDIA

Top 10 languages:
1. English
2. 日本語
3. Español
4. Deutsch
5. Русский
6. Français
7. Italiano
8. 中文
9. Português
10. Polski

5: Get Inner HTML vs Text Content

Understand the difference between text and innerHTML:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://example.com')

element = driver.find_element(By.TAG_NAME, 'body')

text_content = element.text
inner_html = element.get_attribute('innerHTML')
outer_html = element.get_attribute('outerHTML')

print(f"Text content length: {len(text_content)} characters")
print(f"Inner HTML length: {len(inner_html)} characters")
print(f"Outer HTML length: {len(outer_html)} characters")

print(f"\nText content (visible only):\n{text_content[:200]}")
print(f"\nInner HTML (includes tags):\n{inner_html[:200]}")

driver.quit()

Output Result:

Text content length: 234 characters
Inner HTML length: 1456 characters
Outer HTML length: 1468 characters

Text content (visible only):
Example Domain
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
More information...

Inner HTML (includes tags):
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>

When to use each:

  • .text - Human-readable content, visible text only
  • innerHTML - HTML structure with tags, includes hidden elements
  • outerHTML - Complete element including wrapper tag

6: Extract Text from Multiple Pages

Scrape text from multiple pages efficiently:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time

driver = webdriver.Chrome()

urls = [
    'https://www.python.org',
    'https://www.github.com',
    'https://stackoverflow.com'
]

results = {}

for url in urls:
    driver.get(url)
    time.sleep(2)
    
    page_text = driver.find_element(By.TAG_NAME, 'body').text
    
    results[url] = {
        'text_length': len(page_text),
        'word_count': len(page_text.split()),
        'preview': page_text[:100]
    }

driver.quit()

print("Text extraction summary:\n")
for url, data in results.items():
    print(f"URL: {url}")
    print(f"  Characters: {data['text_length']:,}")
    print(f"  Words: {data['word_count']:,}")
    print(f"  Preview: {data['preview']}...\n")

Output Result:

Text extraction summary:

URL: https://www.python.org
  Characters: 4,523
  Words: 678
  Preview: Python Python is a programming language that lets you work quickly and integrate systems...

URL: https://www.github.com
  Characters: 8,934
  Words: 1,245
  Preview: GitHub Where the world builds software Millions of developers and companies build, ship...

URL: https://stackoverflow.com
  Characters: 12,456
  Words: 1,892
  Preview: Stack Overflow - Where Developers Learn, Share, & Build Careers Every developer has a...

7: Handle Dynamic Content (Wait for Text)

Wait for text to appear on pages with JavaScript-loaded content:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.example.com')

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, 'dynamic-content'))
    )
    
    text = element.text
    print(f"Dynamic content loaded:\n{text}")
    
except Exception as e:
    print(f"Element not found: {e}")

driver.quit()

Output Result:

Dynamic content loaded:
Welcome to our website! This content was loaded dynamically after page load.

8: Get Text Excluding Hidden Elements

Filter out hidden elements to get only truly visible text:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.example.com')

all_elements = driver.find_elements(By.XPATH, '//*')

visible_text = []

for element in all_elements:
    if element.is_displayed() and element.text.strip():
        text = element.text.strip()
        if text not in visible_text:
            visible_text.append(text)

print(f"Unique visible text blocks: {len(visible_text)}\n")

for idx, text in enumerate(visible_text[:10], 1):
    print(f"{idx}. {text[:80]}...")

driver.quit()

Output Result:

Unique visible text blocks: 45

1. Example Domain...
2. This domain is for use in illustrative examples in documents. You may use...
3. More information......
4. IANA Services...
5. Domain Names...

9: Extract Text and Save to File

Save extracted text to a file for later analysis:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.python.org')

page_text = driver.find_element(By.TAG_NAME, 'body').text

output_file = 'python_org_text.txt'

with open(output_file, 'w', encoding='utf-8') as f:
    f.write(f"URL: {driver.current_url}\n")
    f.write(f"Title: {driver.title}\n")
    f.write(f"{'='*80}\n\n")
    f.write(page_text)

print(f"✓ Text saved to {output_file}")
print(f"  File size: {len(page_text):,} characters")

driver.quit()

Output Result:

✓ Text saved to python_org_text.txt
  File size: 4,523 characters

Troubleshooting

Problem: Empty string returned

Solution: Wait for page to load completely:

from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 10).until(lambda d: d.find_element(By.TAG_NAME, 'body').text != '')

Problem: Text contains extra whitespace

Solution: Clean text with string methods:

text = driver.find_element(By.TAG_NAME, 'body').text
clean_text = ' '.join(text.split())

Problem: Special characters display incorrectly

Solution: Specify UTF-8 encoding:

with open('output.txt', 'w', encoding='utf-8') as f:
    f.write(page_text)

Problem: StaleElementReferenceException

Solution: Re-locate element before accessing text:

element = driver.find_element(By.ID, 'content')
text = element.text

Resources

Selenium Python vs Java

  • python
driver.page_source

or java / groovy

driver.getPageSource();

You can get only the text of the body which should be the visible text on the page with:

  • python
element = driver.find_element_by_tag_name("body")
element.get_attribute('innerHTML')
  • java / groovy
element.getAttribute("innerHTML");

The code above is working in the most cases but may fail for some ( like HtmlUnitDriver). You can use another code which will result in similar output but it will work more widely:

WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

Full example for python:

from selenium import webdriver

driver = webdriver.Chrome('./chromedriver_linux64/chromedriver')
driver.maximize_window()
driver.get("https://www.google.com/ncr")
print (driver.find_element_by_tag_name("body").text)

result:

Gmail
Images
Sign in
Google offered in: french
A privacy reminder from Google
REMIND ME LATER
REVIEW NOW
France
PrivacyTermsSettings
AdvertisingBusinessAbout

Note that if you don't provide a link to to your chrome driver you may get an error like:

FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver': 'chromedriver'

os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home