python regex match examples
In this post:
- Matching sentences single line string
- Matching sentences multi line string
- Regex Matching N Capital Letters
- Regex Matching Capital Words
- Regex Matching Numbers
Matching sentences single line string
This example shows matching from a single line string. It's finding all sentences separated by dot, exclamation mark or question mark. It's using python "re" which is the regular expression module
import re
# Matching sentences
str = """Python is an interpreted high-level programming language for general-purpose programming? Created by Guido van Rossum and first released in 1991! Python has a design philosophy that emphasizes code readability, and a syntax that allows programmers to express concepts in fewer lines of code, notably using ..."""
all = re.findall(r"\w+[^.!?]*[.!?]", str) # match sentences ending with . ! ?
for s in all:
print(s)
result
Python is an interpreted high-level programming language for general-purpose programming?
Created by Guido van Rossum and first released in 1991!
Python has a design philosophy that emphasizes code readability, and a syntax that allows programmers to express concepts in fewer lines of code, notably using .
Matching sentences multi line string
Catching multiline sentences can be really tricky task because depends on many factors as: OS separator, local settings, environments and data format (text file, reading xml etc). Here is a small trick that will work for most cases(the only concern here is performance and the text size):
import re
# Matching multiline sentences
str = """Python is an interpreted high-level programming language for general-purpose programming? Created
by Guido van Rossum and first released in 1991!
Python has a design philosophy that emphasizes code readability,
and a syntax that allows programmers to express concepts in fewer lines of code, notably using ..."""
s = str.replace('\n','') //join multiline string
all = re.findall(r"([A-Z][^\.!?]*[\.!?])", s ) # match sentences ending with . ! ?
for s in all:
print(s)
result
Python is an interpreted high-level programming language for general-purpose programming?
Created by Guido van Rossum and first released in 1991!
Python has a design philosophy that emphasizes code readability, and a syntax that allows programmers to express concepts in fewer lines of code, notably using .
Regex Matching N Capital Letters
Regex matching N capital letters in python is easy task. There are several options:
- "[A-Z]{5}" - match any 5 capital letters. It will catch COBOL and PYTHO from PYTHON
- "\b[A-Z]{5}\b"- match exactly 5 letters. It will catch only COBOL because \b is considered as boundary.
import re
# Matching capital letters
str = """COBOL is a compiled English-like computer programming language designed for business use. PYTHON is object-o"""
all = re.findall(r"[A-Z]{5}", str ) # match any 5 capital letters
exact = re.findall(r"\b[A-Z]{5}\b", str ) # match 5 letters only
for s in all:
print(s)
for s in exact:
print(s)
result
COBOL
PYTHO
COBOL
Regex Matching Capital Words
Regex matching capital words from string
- \b[A-Z].*?\b - match any word starting with capital letter
import re
# Matching capital letters
str = """COBOL is a compiled English-like computer programming language designed for business use. PYTHON is object-o"""
all = re.findall(r"\b[A-Z].*?\b", str ) # match capital letters
for s in all:
print(s)
result
COBOL
English
PYTHON
Regex Matching Numbers
Regex extracting numbers:
- \b[0-9].*?\b - match any lenght number combination
import re
# Matching capital letters
str = """121) COBOL is a compiled English-like computer programming language designed for business use. 122. PYTHON is object-o"""
all = re.findall(r"\b[0-9].*?\b", str ) # match capital letters
for s in all:
print(s)
result
121
122