How to match a digit n times with regex in Python
Here are several approaches to match a digit n times with regex in Python:
Step 1: Match digits exactly n times
Let’s say that you want to find a digit sequence with exact length of n. If so then you can use the regex format \D(\d{4})\D
- to match 4 digits in a string.
Example:
import re
text = 'abcd123efg123456_1234ghij'
re.findall(r"\D(\d{4})\D", text)
will find:
['1234']
How does it work?
\d{4}
- match 4 digits exactly\D
- match non-digit character
So it will match a non digit character. Then will search for 4 digits followed by a non digit character. If so will extract only the 4 digits.
Step 2: Match digits n times or more
What if you like to extract sequence of digits n times or more? We can use a syntax like: \d{3,}
- which is going to extract 3 consecutive digits or more:
import re
text = 'abcd523efg123456_1234ghij'
re.findall(r"(\d{3,})", text)
This will result into:
['523', '123456', '1234']
What if you like to extract whole words or digits surrounded by spaces? Then you can use \b
which is used for a boundary as:
import re
text = 'abcd523efg 123456 _ 1234 ghij'
re.findall(r"\b(\d{3,})\b", text)
output:
['123456', '1234']
Step 3: Match digits n or m times
To find sequence of digits with length 3 or 6 then you can try with: r"\d{3}|\d{6}"
.
Important note: the order of matching the n and m times digits matters. To demonstrate that let's check the following examples:
import re
text = 'abcd523efg 123456 _ 1234 ghij'
re.findall(r"\d{3}|\d{6}", text)
result:
['523', '123', '456', '123']
while:
import re
text = 'abcd523efg 123456 _ 1234 ghij'
re.findall(r"\d{6}|\d{3}", text)
result:
['523', '123456', '123']
So if you like to get the longer sequence then you need to place the higher frequency first.
Step 4: Match digits n times starting with something
Finally let's check the case where you would like to find n digits starting with some pattern. This can be achieved by next example - where we will extract exactly 3 digits preceded by a letter:
import re
text = 'abcd523efg123456_1234ghij'
re.findall(r"[a-z](\d{3})", text)
result:
['523', '123']
and one more example about 3 letters preceded by non letter character:
import re
text = 'abcd523efg123456_1234ghij'
re.findall(r"[^a-z](\d{3})", text)
result:
['234', '123']