In this post:
- Java regular expression for sentence extraction
- Java regex extract sentence simple
- Java regex extracting date
- 10/10/2015
- 10 MAR 2015
- 10 March 2015
- Java regex match special characters - Match character '^'
- Java 8 regular expression matching phone numbers
- 001 505 434 8774
- 9000-505-434-700
Java regular expression for sentence extraction
Extracting sentences in Java can be done by using matchers:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = " java 7 is good! Java 8 is better. Java 9 is the best?";
Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$) [^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher matchSentence = re.matcher(str);
while (matchSentence.find()) {
System.out.println(matchSentence.group());
}
result:
java 7 is good!
Java 8 is better.
Java 9 is the best?
Java regex extract sentence simple
Depending on the text and the punctuation another option for extracting sentence is possible. This second option is less accurate:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = " java 7 is good! Java 8 is better. Java 9 is the best?";
Pattern reSentence = Pattern.compile("\\s+[^.!?]*[.!?]", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher matchSentence = reSentence.matcher(str);
while (matchSentence.find()) {
System.out.println(matchSentence.group());
}
result:
java 7 is good!
Java 8 is better.
Java 9 is the best?
Java regex extracting date 10/10/2015
Java 8 regex extracting date in format dd/mm/yyyy using groups:
([0-9]{1,2}/)
- match one or two number(s) from 0 to 9 followed by '/'([0-9]{4})
- match four numbers from 0 to 9
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = "This date is 10/10/2015 3 years before this one: 10/10/2018 ";
Pattern reDate = Pattern.compile("([0-9]{1,2}/) ([0-9]{1,2}/) ([0-9]{4})", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher matchDate = reDate.matcher(str);
while (matchDate.find()) {
System.out.println(matchDate.group());
}
result:
10/10/2015
10/10/2018
Java regex extracting date 10 MAR 2015
First example of extracting date of format dd MMM yyyy. This one is done by listing all months as follow: (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)
([0-9]{1,2})
- match one or two number(s) from 0 to 9 followed by '/'(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)
- match JAN or FEB etc([0-9]{4})
- match four numbers from 0 to 9- divided by space
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = "This date 10 MAR 2015 is not 3 years before this one: 10 JAN 2018 ";
Pattern reDate = Pattern.compile("([0-9]{1,2}) (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) ([0-9]{4})");
Matcher matchDate = reDate.matcher(str);
while (matchDate.find()) {
System.out.println(matchDate.group());
}
result:
10 MAR 2015
10 JAN 2018
Java regex matching date 10 MAR 2015
Another example for regular expression extracting of date in format dd MMM yyyy in java. This one is done by:
- [ADFJMNOS]\w* - Match big letter from ADFJMNOS followed by another letters
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = "This date 10 MAR 2015 is not 3 years before this one: 10 JAN 2018 ";
Pattern reDate = Pattern.compile("([0-9]{1,2}) [ADFJMNOS]\w* ([0-9]{4})");
Matcher matchDate = reDate.matcher(str);
while (matchDate.find()) {
System.out.println(matchDate.group());
}
result:
10 MAR 2015
10 JAN 2018
Java 8 regex extracting date 10 March 2015
Extracting date by Java regex which has full month name:
- [ADFJMNOS]\w* - Match big letter from ADFJMNOS followed by another letters
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = "This date 10 March 2015 is not 3 years before this one: 10 January 2018 ";
Pattern reDate = Pattern.compile("([0-9]{1,2}) [ADFJMNOS]\w* ([0-9]{4})");
Matcher matchDate = reDate.matcher(str);
while (matchDate.find()) {
System.out.println(matchDate.group());
}
result:
10 March 2015
10 January 2018
Java regex match special characters
Special characters in Java should be escaped when they are used in regular expression:
\\^\\d+
- Match character '^' followed by number(s)
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = "This date 10 March 2015 is not ^3 years before this one: 10 January 2018 ";
Pattern reSpecial = Pattern.compile("\\^\\d+" );
Matcher matchSpecial = reSpecial.matcher(str);
while (matchSpecial.find()) {
System.out.println(matchSpecial.group());
}
result:
^3
Java 8 regular expression matching phone numbers
Phone numbers can be easily catch by:
[\\d+\\-]+
- search for a digit followed by separator '-' . It will catch also numbers like 0016505434877 or 11.\d{4}-\d{3}-\d{3}-\d{3}
- a version which will catch explicit patterns like dddd-ddd-ddd-ddd(\d+)[0-9-( )]( |-)([0-9-( )]+)
- version catching free formats like 001 505 434 8774 and 9000-505-434-700
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String str = " My phone in April 11 was 001 505 434 8774 but since June is 9000-505-434-700";
Pattern rePhone = Pattern.compile("(\\d+)[0-9-( )]( |-)([0-9-( )]+)" );
Matcher matchPhone = rePhone.matcher(str);
while (matchPhone.find()) {
System.out.println(matchPhone.group());
}
result:
001 505 434 8774
9000-505-434-700