1.

Python RegEx

Answer»
  • Regex Matching

The re module in python allows us to perform regex matching operations.

import re
landline = re.compile(r'\d\d\d\d-\d\d\d\d')
num = landline.search('LandLine Number is 2435-4153')
print('Landline Number is: {}'.format(num.group()))
Output:
Landline Number is: 2435-4153

The above example landline number from the string and stores it appropriately in the num variable using regex matching.

  • Parenthesis Grouping

A group is a part of a regex pattern enclosed in parenthesis (). We can put matches into different groups using the parenthesis (). We can access the groups using group() function.

import re
landline = re.compile(r'(\d\d\d\d)-(\d\d\d\d)')
num = landline.search('LandLine Number is 2435-4153')
# This will print the first group, which is the entire regex enclosed in the brackets
print(num.group(0))
# This will print the second group, which is the nested regex enclosed in the 1st set of nested brackets
print(num.group(1))
# This will print the third group, which is the nested regex enclosed in the 2nd set of nested brackets
print(num.group(2))
Output:
2435-4153
2435
4153
  • Regex Symbols in Python

There are a lot of regex symbols that have different functionalities so they are mentioned in the table below:

SymbolMatches
+One or More of the preceding group
*Zero or More of preceding group
?Zero or One of preceding group
^nameString must begin with the name
name$String must end with the name
.Any character except \n
{n}Exactly n of preceding group
{n, }>= n of preceding group
{,n}[0, m] of preceding group
{n, m}[n, m] of preceding group
*?Non Greedy matching of the preceding group
[abc]Any character enclosed in the brackets
[^abc]Any character not enclosed in the brackets
\d, \w, \sDigit, word, or space respectively.
\D, \W, \SAnything except digit, word, or space respectively

Example:

Here we define a regex pattern,

address = "(\\d*)\\s?(.+),\\s(.+)\\s([A-Z]{2,3})\\s(\\d{4})"

From the above table, we can explain some of the symbols in this pattern:


  • \s?: 0 or 1 whitespace.

  • (\d*): 0 or more digit characters.

  • (.+): Greater than or equal to 1 characters.

  • \s: Single Whitespace

  • ([A-Z]{2, 3}): 2 or 3 Uppercase alphabets

  • (\d{4}): 4 digit characters




Discussion

No Comment Found