How to Compare Text Differences in Python Using difflib

In this short tutorial, you'll see how to compare sequences and find differences in Python using the built-in difflib module. Whether you're building version control tools, plagiarism detection, data deduplication systems, or text diff viewers, difflib provides powerful algorithms for sequence comparison, similarity scoring, and fuzzy matching without external dependencies.

The difflib module offers multiple classes for comparing strings, lists, and files, making it essential for text analysis, code review tools, and content management systems.

1. Compare Similarity with SequenceMatcher

SequenceMatcher calculates similarity ratios between sequences, returning values from 0.0 (completely different) to 1.0 (identical).

from difflib import SequenceMatcher

text1 = "Apple Inc is a technology company"
text2 = "Apple Inc is a tech company"

matcher = SequenceMatcher(None, text1, text2)
similarity = matcher.ratio()

print(f"Text 1: {text1}")
print(f"Text 2: {text2}")
print(f"Similarity: {similarity:.2%}")

Output Result:

Text 1: Apple Inc is a technology company
Text 2: Apple Inc is a tech company
Similarity: 85.29%

How it works: The ratio() method uses the Gestalt pattern matching algorithm to compare sequences and return a similarity score. This is perfect for duplicate detection, fuzzy text matching, and content similarity analysis.

Real-World Example: Company Name Matching

from difflib import SequenceMatcher

database_names = ["Microsoft Corporation", "Apple Inc", "Amazon.com Inc", "Google LLC"]
user_input = "Microsft Corp"

for company in database_names:
    ratio = SequenceMatcher(None, user_input.lower(), company.lower()).ratio()
    print(f"{company}: {ratio:.2%} match")

Output Result:

Microsoft Corporation: 76.92% match
Apple Inc: 20.00% match
Amazon.com Inc: 23.08% match
Google LLC: 9.09% match

Use case: Identify typos, variations, or abbreviations in user input for search suggestions, autocomplete, and data cleaning.

2. Find Differences with Differ

The Differ class produces human-readable diff output showing additions, deletions, and unchanged lines, similar to Unix diff command.

from difflib import Differ

original = ["Apple", "Google", "Microsoft", "Amazon"]
modified = ["Apple", "Meta", "Microsoft", "Tesla"]

differ = Differ()
diff = list(differ.compare(original, modified))

for line in diff:
    print(line)

Output Result:

  Apple
- Google
+ Meta
  Microsoft
- Amazon
+ Tesla

Legend:

- indicates deleted items
+ indicates added items
(space) indicates unchanged items

Unified Diff for Version Control

from difflib import unified_diff

code_v1 = """def calculate_revenue(price, quantity):
    return price * quantity
    
revenue = calculate_revenue(100, 50)"""

code_v2 = """def calculate_revenue(price, quantity, discount=0):
    subtotal = price * quantity
    return subtotal * (1 - discount)
    
revenue = calculate_revenue(100, 50, 0.1)"""

diff = unified_diff(
    code_v1.splitlines(keepends=True),
    code_v2.splitlines(keepends=True),
    fromfile='v1.py',
    tofile='v2.py',
    lineterm=''
)

print(''.join(diff))

Output Result:

--- v1.py
+++ v2.py
@@ -1,4 +1,5 @@
-def calculate_revenue(price, quantity):
-    return price * quantity
+def calculate_revenue(price, quantity, discount=0):
+    subtotal = price * quantity
+    return subtotal * (1 - discount)
     
-revenue = calculate_revenue(100, 50)
+revenue = calculate_revenue(100, 50, 0.1)

Real-world application: Generate Git-style diffs for code review systems, audit trails, or document version control.

3. Find Closest Matches with get_close_matches

The get_close_matches() function finds similar strings from a list, perfect for spell checking, search suggestions, and data validation.

from difflib import get_close_matches

companies = [
    "Microsoft Corporation",
    "Apple Inc",
    "Amazon.com Inc", 
    "Alphabet Inc",
    "Meta Platforms",
    "Tesla Inc"
]

user_query = "Amazn"

matches = get_close_matches(user_query, companies, n=3, cutoff=0.3)

print(f"Search query: '{user_query}'")
print(f"Closest matches: {matches}")

Output Result:

Search query: 'Amazn'
Closest matches: ['Amazon.com Inc', 'Meta Platforms', 'Alphabet Inc']

Parameters:

n: Maximum number of matches to return
cutoff: Minimum similarity threshold (0.0 to 1.0)

Product Search with Fuzzy Matching

from difflib import get_close_matches

products = [
    "iPhone 15 Pro Max",
    "Samsung Galaxy S24 Ultra",
    "Google Pixel 8 Pro",
    "MacBook Pro 16-inch",
    "iPad Pro 12.9-inch"
]

search_terms = ["iphone pro", "macbok", "samsung galaxy"]

for search in search_terms:
    results = get_close_matches(search, products, n=2, cutoff=0.4)
    print(f"'{search}' → {results}")

Output Result:

'iphone pro' → ['iPhone 15 Pro Max', 'iPad Pro 12.9-inch']
'macbok' → ['MacBook Pro 16-inch']
'samsung galaxy' → ['Samsung Galaxy S24 Ultra']

References

difflib — Helpers for computing deltas

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet

1. Compare Similarity with SequenceMatcher

Real-World Example: Company Name Matching

2. Find Differences with Differ

Unified Diff for Version Control

3. Find Closest Matches with get_close_matches

Product Search with Fuzzy Matching

References