Need to use a regex to match - "no character or one character" or "zero or one space". If so, you may use the following syntax to match similar patterns:
[ ]{0,1}
- match no space or 1 space[-]?
- match nothing or a single hyphen
Let's demonstrate usage of them with an example.
Example 1: Match space or no space in a string
If you have a list of usernames like:
- @ user_1
- @John Doe 1969@
- @-Peter-Parker123@
- @123any_other_user2@
- more users33
- more users2
- @more@
- @last standing@
and you would like to match @
followed by zero or one space than you can use regex syntax: (@[ ]{0,1}[A-Za-z0-9 ]+)
as:
import re
texts = ['@ user_1 ', '@John Doe 1969@', '@-Peter-Parker123@', '@123any_other_user2@', 'more users33', 'more users2' , '@more@', '@last standing@']
for text in texts:
print(re.findall(r"(@[ ]{0,1}[A-Za-z0-9 ]+)", text))
result:
['@ user']
['@John Doe 1969']
[]
['@123any']
[]
[]
['@more']
['@last standing']
How does it work?
()
- stands for a capture group and all inside will be extracted[ ]{0,1}
- matches no space or 1 space[A-Za-z0-9 ]+
- a-z (Range), Matches a character in the range "a" to "z" (char code 97 to 122)
- 0–9 (Range), Matches a character in the range "0" to "9" (char code 48 to 57)
Example 2: Match letter S Or letter S in URL
Say that you have a list of URLs and you would like to extract only URLs which start with http
and then contain 1 letter s
or no s
at all.
import re
texts = [
'https://en.wikipedia.org/wiki/Main_Page/',
'http://en.wikipedia.org/wiki/National_Park_Service/',
'https://en.wikipedia.org/wiki/Hoover_Dam/',
'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/',
'https://en.wikipedia.org/wiki/Central_African_Republic/',
'en.wikipedia.org/wiki/Africa/',
'ftp://en.wikipedia.org/wiki/Central_African_Republic/',
]
for text in texts:
print(re.findall(r"(http[s]{0,1}.*)", text))
this will result into:
['https://en.wikipedia.org/wiki/Main_Page/']
['http://en.wikipedia.org/wiki/National_Park_Service/']
['https://en.wikipedia.org/wiki/Hoover_Dam/']
['http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/']
['https://en.wikipedia.org/wiki/Central_African_Republic/']
[]
[]
Example 3: Match strings with no more than n occurrences
Finally if you like to find all strings which contain n number of spaces( or any other character) then you can use the next regex: re.findall(r"([_])", text)
.
So let's count number of _
in the next URLS:
import re
texts = [
'https://en.wikipedia.org/wiki/Main_Page/',
'http://en.wikipedia.org/wiki/National_Park_Service/',
'https://en.wikipedia.org/wiki/Hoover_Dam/',
'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/',
'https://en.wikipedia.org/wiki/Central_African_Republic/',
'en.wikipedia.org/wiki/Africa/',
'ftp://en.wikipedia.org/wiki/Central_African_Republic/',
]
for text in texts:
print(len(re.findall(r"([_])", text)), end=' - ')
print(re.findall(r"([_])", text))
result:
1 - ['_']
2 - ['_', '_']
1 - ['_']
4 - ['_', '_', '_', '_']
2 - ['_', '_']
0 - []
2 - ['_', '_']
Of course Python offers faster solution in case of just counting:
text.count('_')
The advantage of the regex is the customization. You can count not only for a single character but also for a list or a pattern.
So if you like to count how many times _
, /
or spaces occurs in the strings than you can use:
import re
texts = [
'https://en.wikipedia.org/wiki/Main_Page/',
'http://en.wikipedia.org/wiki/National_Park_Service/',
'https://en.wikipedia.org/wiki/Hoover_Dam/',
'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/',
'https://en.wikipedia.org/wiki/Central_African_Republic/',
'en.wikipedia.org/wiki/Africa/',
'ftp://en.wikipedia.org/wiki/Central_African_Republic/',
]
for text in texts:
print(len(re.findall(r"([_/ ])", text)), end=' - ')
print(re.findall(r"([_/])", text))
result:
6 - ['/', '/', '/', '/', '_', '/']
7 - ['/', '/', '/', '/', '_', '_', '/']
6 - ['/', '/', '/', '/', '_', '/']
9 - ['/', '/', '/', '/', '_', '_', '_', '_', '/']
7 - ['/', '/', '/', '/', '_', '_', '/']
3 - ['/', '/', '/']
7 - ['/', '/', '/', '/', '_', '_', '/']