Regex to Match no space or one space in Python

Need to use a regex to match - "no character or one character" or "zero or one space". If so, you may use the following syntax to match similar patterns:

  • [ ]{0,1} - match no space or 1 space
  • [-]? - match nothing or a single hyphen

Let's demonstrate usage of them with an example.

Example 1: Match space or no space in a string

If you have a list of usernames like:

  • @ user_1
  • @John Doe 1969@
  • @-Peter-Parker123@
  • @123any_other_user2@
  • more users33
  • more users2
  • @more@
  • @last standing@

and you would like to match @ followed by zero or one space than you can use regex syntax: (@[ ]{0,1}[A-Za-z0-9 ]+) as:

import re
texts = ['@ user_1  ', '@John  Doe 1969@', '@-Peter-Parker123@', '@123any_other_user2@', 'more users33', 'more users2' , '@more@', '@last standing@']
for text in texts:
    print(re.findall(r"(@[ ]{0,1}[A-Za-z0-9 ]+)", text))

result:

['@ user']
['@John  Doe 1969']
[]
['@123any']
[]
[]
['@more']
['@last standing']

How does it work?

  • () - stands for a capture group and all inside will be extracted
  • [ ]{0,1} - matches no space or 1 space
  • [A-Za-z0-9 ]+
    • a-z (Range), Matches a character in the range "a" to "z" (char code 97 to 122)
    • 0–9 (Range), Matches a character in the range "0" to "9" (char code 48 to 57)

Example 2: Match letter S Or letter S in URL

Say that you have a list of URLs and you would like to extract only URLs which start with http and then contain 1 letter s or no s at all.

import re
texts = [
    'https://en.wikipedia.org/wiki/Main_Page/',
    'http://en.wikipedia.org/wiki/National_Park_Service/',
    'https://en.wikipedia.org/wiki/Hoover_Dam/',
    'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/',
    'https://en.wikipedia.org/wiki/Central_African_Republic/',
    'en.wikipedia.org/wiki/Africa/',
    'ftp://en.wikipedia.org/wiki/Central_African_Republic/',
]
for text in texts:
    print(re.findall(r"(http[s]{0,1}.*)", text))

this will result into:

['https://en.wikipedia.org/wiki/Main_Page/']
['http://en.wikipedia.org/wiki/National_Park_Service/']
['https://en.wikipedia.org/wiki/Hoover_Dam/']
['http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/']
['https://en.wikipedia.org/wiki/Central_African_Republic/']
[]
[]

Example 3: Match strings with no more than n occurrences

Finally if you like to find all strings which contain n number of spaces( or any other character) then you can use the next regex: re.findall(r"([_])", text).

So let's count number of _ in the next URLS:

import re
texts = [
'https://en.wikipedia.org/wiki/Main_Page/',
'http://en.wikipedia.org/wiki/National_Park_Service/',
'https://en.wikipedia.org/wiki/Hoover_Dam/',
'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/',
'https://en.wikipedia.org/wiki/Central_African_Republic/',
'en.wikipedia.org/wiki/Africa/',
'ftp://en.wikipedia.org/wiki/Central_African_Republic/',
]
for text in texts:
    print(len(re.findall(r"([_])", text)), end=' - ')
    print(re.findall(r"([_])", text))

result:

1 - ['_']
2 - ['_', '_']
1 - ['_']
4 - ['_', '_', '_', '_']
2 - ['_', '_']
0 - []
2 - ['_', '_']

Of course Python offers faster solution in case of just counting:

text.count('_')

The advantage of the regex is the customization. You can count not only for a single character but also for a list or a pattern.

So if you like to count how many times _, / or spaces occurs in the strings than you can use:

import re
texts = [
'https://en.wikipedia.org/wiki/Main_Page/',
'http://en.wikipedia.org/wiki/National_Park_Service/',
'https://en.wikipedia.org/wiki/Hoover_Dam/',
'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/',
'https://en.wikipedia.org/wiki/Central_African_Republic/',
'en.wikipedia.org/wiki/Africa/',
'ftp://en.wikipedia.org/wiki/Central_African_Republic/',
]
for text in texts:
    print(len(re.findall(r"([_/ ])", text)), end=' - ')
    print(re.findall(r"([_/])", text))

result:

6 - ['/', '/', '/', '/', '_', '/']
7 - ['/', '/', '/', '/', '_', '_', '/']
6 - ['/', '/', '/', '/', '_', '/']
9 - ['/', '/', '/', '/', '_', '_', '_', '_', '/']
7 - ['/', '/', '/', '/', '_', '_', '/']
3 - ['/', '/', '/']
7 - ['/', '/', '/', '/', '_', '_', '/']