Python for Network Engineers

Regular Expression Functions in python with examples

by: George El., February 2021, Reading time: 5 minutes

In this post I will explain all the functions in the re module and then go on to discuss capturing and non capturing groups, non greedy operator, look ahead and look behind positive and negative assertions. Regular expressions can be a little initimidating in the beginning, but once you learn how to use them, you can use them in every language. If you don’t want to bother, you can use textfsm to parse your output. For the general cases, it should be sufficient. See my other post on using textfsm to parse cli output

search(pattern, string, flags=0)

the first function is search. It only returns the first match or None. Examples:


re.search("\d{2}", "abc 123 456", flags=0)

    <re.Match object; span=(4, 6), match='12'>

re.search("[0-9][0-5]", "18 15", flags=0)

    <re.Match object; span=(3, 5), match='15'>

re.search("george","hello George", flags=re.I)

    <re.Match object; span=(6, 12), match='George'>

match(pattern, string, flags=0)

only searches at the beginning. Examples:

re.match("george","hello George", flags=re.I)

re.match("george","George Hello", flags=re.I)

    <re.Match object; span=(0, 6), match='George'>

fullmatch(pattern, string, flags=0)

whole string matches the regular expression pattern

re.fullmatch("george","hello George", flags=re.I)

re.fullmatch("george","George", flags=re.I)

    <re.Match object; span=(0, 6), match='George'>

split(pattern, string, maxsplit=0, flags=0)

split is among the most useful. You can split a string based on regular expressions

re.split(r"\s+", "hello how are you?")

    ['hello', 'how', 'are', 'you?']

findall(pattern, string, flags=0)

Returns all non-overlapping matches of pattern in string, as a list of strings.

re.findall("\d+","123 today 456 tonight 79")

    ['123', '456', '79']

If one or more groups are present in the pattern, returns a list of groups; this will be a list of tuples if the pattern has more than one group.

re.findall("(\d+)","123 today 456 tonight 79")

    ['123', '456', '79']

re.findall("(\d+) (\w+)","123 today 456 tonight 79")

    [('123', 'today'), ('456', 'tonight')]

finditer(pattern, string, flags=0)

Returns an iterator object. You can cycle through the iterator to get the individual match objects. Examples:

items=re.finditer("\d+","123 today 456 tonight 79")
for item in items:
    print(item)

    <re.Match object; span=(0, 3), match='123'>
    <re.Match object; span=(10, 13), match='456'>
    <re.Match object; span=(22, 24), match='79'>

items=re.finditer("(\d+) (\w+)","123 today 456 tonight 79")
for item in items:
    print(item)

    <re.Match object; span=(0, 9), match='123 today'>
    <re.Match object; span=(10, 21), match='456 tonight'>

items=re.finditer("(\d+) (\w+)","123 today 456 tonight 79")
for item in items:
    print(item)
    print(item.group(0))
    print(item.group(1))
    print(item.group(2))

    <re.Match object; span=(0, 9), match='123 today'>
    123 today
    123
    today
    <re.Match object; span=(10, 21), match='456 tonight'>
    456 tonight
    456
    tonight

sub(pattern, repl, string, count=0, flags=0)

Replaces every pattern found with the repl string; repl can also be a function. You can also specify a count to replace only count number of times. You can also use backreferecing to refer to capturing groups. Capturing groups are covered later. Examples:

re.sub("(\d+)", "number", "123 today 456 tonight 79")

    'number today number tonight number'

re.sub("(\d+)", r"*\1*", "123 today 456 tonight 79")

    '*123* today *456* tonight *79*'

re.sub("(\w+) (\w+)", r"\2 \1", "first_name last_name")

    'last_name first_name'

def replace1(match):
    return str(len((match.group(0))))


x=re.sub("\d+", replace1, "123 today 456 tonight 79")
print(x)

    3 today 3 tonight 2

subn(pattern, repl, string, count=0, flags=0)

it also rerurns the number of changes

re.subn("(\d+)","number", "123 today 456 tonight 79")

    ('number today number tonight number', 3)

escape(pattern)

escapes special characters

re.escape('http://www.python.org')

    'http://www\\.python\\.org'

non greedy ooperator ?

By default +, * will match the maximum number of characters. If you put a ? after them they will only match the least amount of characters. Example:

re.search("\d+", "abc 123456")

    <re.Match object; span=(4, 7), match='123'>

re.search("\d+?", "abc 123456")

    <re.Match object; span=(4, 5), match='1'>

capture () and non capturing groups (?:)

Everything in () is called a group. Groups can be unnamed or named. Unnamed groups are referenced with numbers \1 \2 etc. You usually use capture groups with the sub function. The difference between capturing and non capturing groups is that group(0) still returns the whole string but the individual group is not saved in the latter case. Examples:

m = re.search("(abc) (\d+)", "abc 123456 abc def")
m.group(0)
    'abc 123456'
m.group(1)
    'abc'
m.group(2)
    '123456'

m = re.search("(?:abc) (\d+)", "abc 123456 abc def")

m.group(0) 

m.group(0)
    'abc 123456'
m.group(1)
    '123456'
m.group(2)
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-111-88acecccb001> in <module>
    ----> 1 m.group(2)
    IndexError: no such group

Named groups

You can use ?P<> to name a capturing group and then refer to it by name

m = re.search("(?:abc) (?P<digits>\d+)", "abc 123456 abc def")
m.group('digits')
    '123456'
m.group(0)
    'abc 123456'
m.group(1)
    '123456'
m.groupdict()
    {'digits': '123456'}

Positive look behind: ?<=

Continuing from the previous example If I don’t want to capture at all the first group, I have to use a positive look behind. This means I want my pattern to be preceeded by something, but I don’t want to capture it. Example:

m = re.search("(?<=def) (\d+)", "abc 123456 abc def 789")
m.group(0)
    ' 789'
m.group(1)
    '789'

Negative look behind: (?<!)

This means that I don’t want my pattern to be preceeded by something, and I don’t want to capture it. Example:

m = re.search("(?<!abc) (\d+)", "abc 123456 abc def 789")
m.group(0)
    ' 789'
m.group(1)
    '789'

Positive look ahead: (?=)

I want my pattern to be followed by whatever is in the (?=) but I don’t want to capture the (?=)

m = re.search("(\d+) (?=abc)", "123456 abc def 789 hij")
m.group(0)
    '123456 '
m.group(1)
    '123456'
m = re.search("(\d+) (?=hij)", "123456 abc def 789 hij")
m.group(0)
    '789 '
m.group(1)
    '789'

Negative look ahead: (?!)

I want my pattern to not be followed by whatever is in the (?!)

m = re.search("(\d+) (?!abc)", "123456 abc def 789 hij")
m.group(0)
    '789 '
m.group(1)
    '789'

I hope this was useful. See you in my next post.

comments powered by Disqus