In this short guide, you will learn how to match text between two patterns using regex in Python. Extracting text between delimiters is a common task in:
- Log file analysis
- Web scraping
- HTML parsing
- Data cleaning
- Text preprocessing
We'll cover 2–3 practical regex examples in Python, including greedy vs non-greedy matching and handling multiple matches.
Sample Text
Let’s start with a simple example:
text = "Start [important data] End"
Goal: Extract the text between [ and ].
Example 1 — Basic Regex with re.search()
The simplest way to match text between two patterns is using a capturing group.
Example 1 — Extract Text Between Brackets
import re
text = "Start [important data] End"
match = re.search(r"\[(.*?)\]", text)
if match:
print(match.group(1))
Explanation
\[→ matches literal[(.*?)→ captures everything inside (non-greedy)\]→ matches literal]group(1)→ returns the captured text
Output:
important data
Example 2 — Extract Multiple Matches with re.findall()
If your text contains multiple patterns, use findall().
Example 2 — Multiple Matches
import re
text = "A [one] B [two] C [three]"
matches = re.findall(r"\[(.*?)\]", text)
print(matches)
Output:
['one', 'two', 'three']
Why use findall()?
- Returns all matches
- Ideal for batch text extraction
- Useful in scraping and NLP pipelines
Example 3 — Match Between Custom Start and End Patterns
You can extract text between any two markers.
Example 3 — Match Between <tag> and </tag>
import re
text = "<tag>Hello World</tag>"
match = re.search(r"<tag>(.*?)</tag>", text)
if match:
print(match.group(1))
This technique works for:
- Custom markers
- Log boundaries
- Template parsing
Greedy vs Non-Greedy Matching (Important!)
Consider:
text = "[first] some text [second]"
Greedy (Wrong for Multiple Matches)
re.findall(r"\[(.*)\]", text)
This may capture too much because .* is greedy.
Non-Greedy (Correct)
re.findall(r"\[(.*?)\]", text)
Using ? ensures minimal matching.
Summary
To match text between two patterns with regex in Python, use:
re.search(r"START(.*?)END", text)
For multiple matches:
re.findall(r"START(.*?)END", text)
Key takeaway:
- Use non-greedy matching (
.*?) - Use
search()for one match - Use
findall()for multiple matches