In this short tutorial, you'll see how to compare sequences and find differences in Python using the built-in difflib module. Whether you're building version control tools, plagiarism detection, data deduplication systems, or text diff viewers, difflib provides powerful algorithms for sequence comparison, similarity scoring, and fuzzy matching without external dependencies.
The difflib module offers multiple classes for comparing strings, lists, and files, making it essential for text analysis, code review tools, and content management systems.
1. Compare Similarity with SequenceMatcher
SequenceMatcher calculates similarity ratios between sequences, returning values from 0.0 (completely different) to 1.0 (identical).
from difflib import SequenceMatcher
text1 = "Apple Inc is a technology company"
text2 = "Apple Inc is a tech company"
matcher = SequenceMatcher(None, text1, text2)
similarity = matcher.ratio()
print(f"Text 1: {text1}")
print(f"Text 2: {text2}")
print(f"Similarity: {similarity:.2%}")
Output Result:
Text 1: Apple Inc is a technology company
Text 2: Apple Inc is a tech company
Similarity: 85.29%
How it works: The ratio() method uses the Gestalt pattern matching algorithm to compare sequences and return a similarity score. This is perfect for duplicate detection, fuzzy text matching, and content similarity analysis.
Real-World Example: Company Name Matching
from difflib import SequenceMatcher
database_names = ["Microsoft Corporation", "Apple Inc", "Amazon.com Inc", "Google LLC"]
user_input = "Microsft Corp"
for company in database_names:
ratio = SequenceMatcher(None, user_input.lower(), company.lower()).ratio()
print(f"{company}: {ratio:.2%} match")
Output Result:
Microsoft Corporation: 76.92% match
Apple Inc: 20.00% match
Amazon.com Inc: 23.08% match
Google LLC: 9.09% match
Use case: Identify typos, variations, or abbreviations in user input for search suggestions, autocomplete, and data cleaning.
2. Find Differences with Differ
The Differ class produces human-readable diff output showing additions, deletions, and unchanged lines, similar to Unix diff command.
from difflib import Differ
original = ["Apple", "Google", "Microsoft", "Amazon"]
modified = ["Apple", "Meta", "Microsoft", "Tesla"]
differ = Differ()
diff = list(differ.compare(original, modified))
for line in diff:
print(line)
Output Result:
Apple
- Google
+ Meta
Microsoft
- Amazon
+ Tesla
Legend:
-indicates deleted items+indicates added items(space) indicates unchanged items
Unified Diff for Version Control
from difflib import unified_diff
code_v1 = """def calculate_revenue(price, quantity):
return price * quantity
revenue = calculate_revenue(100, 50)"""
code_v2 = """def calculate_revenue(price, quantity, discount=0):
subtotal = price * quantity
return subtotal * (1 - discount)
revenue = calculate_revenue(100, 50, 0.1)"""
diff = unified_diff(
code_v1.splitlines(keepends=True),
code_v2.splitlines(keepends=True),
fromfile='v1.py',
tofile='v2.py',
lineterm=''
)
print(''.join(diff))
Output Result:
--- v1.py
+++ v2.py
@@ -1,4 +1,5 @@
-def calculate_revenue(price, quantity):
- return price * quantity
+def calculate_revenue(price, quantity, discount=0):
+ subtotal = price * quantity
+ return subtotal * (1 - discount)
-revenue = calculate_revenue(100, 50)
+revenue = calculate_revenue(100, 50, 0.1)
Real-world application: Generate Git-style diffs for code review systems, audit trails, or document version control.
3. Find Closest Matches with get_close_matches
The get_close_matches() function finds similar strings from a list, perfect for spell checking, search suggestions, and data validation.
from difflib import get_close_matches
companies = [
"Microsoft Corporation",
"Apple Inc",
"Amazon.com Inc",
"Alphabet Inc",
"Meta Platforms",
"Tesla Inc"
]
user_query = "Amazn"
matches = get_close_matches(user_query, companies, n=3, cutoff=0.3)
print(f"Search query: '{user_query}'")
print(f"Closest matches: {matches}")
Output Result:
Search query: 'Amazn'
Closest matches: ['Amazon.com Inc', 'Meta Platforms', 'Alphabet Inc']
Parameters:
- n: Maximum number of matches to return
- cutoff: Minimum similarity threshold (0.0 to 1.0)
Product Search with Fuzzy Matching
from difflib import get_close_matches
products = [
"iPhone 15 Pro Max",
"Samsung Galaxy S24 Ultra",
"Google Pixel 8 Pro",
"MacBook Pro 16-inch",
"iPad Pro 12.9-inch"
]
search_terms = ["iphone pro", "macbok", "samsung galaxy"]
for search in search_terms:
results = get_close_matches(search, products, n=2, cutoff=0.4)
print(f"'{search}' → {results}")
Output Result:
'iphone pro' → ['iPhone 15 Pro Max', 'iPad Pro 12.9-inch']
'macbok' → ['MacBook Pro 16-inch']
'samsung galaxy' → ['Samsung Galaxy S24 Ultra']