Python re（正则表达式）模块

浏览数：16 / 时间：2015年06月08日

re模块

Python可以通过re模块来实现使用正则表达式匹配字符串，我们可以通过查看~/installs/python/lib/python2.7/re.py 文件查看re提供的方法，主要使用的下面的几个接口：

l def match(pattern, string, flags=0):

"""Try toapply the pattern at the start of the string, returning

a match object, or None ifno match was found."""

return _compile(pattern,flags).match(string)

re.match从字符串的开始匹配一个模式，第一个参数是正则表达式，第二个字符串是要匹配的字符串，第三个参数是标志位，缺省为0；如果可以查找到则返回一个match对象，否则返回None。

l def search(pattern, string, flags=0):

"""Scan through string looking for a match to thepattern, returning

amatch object, or None if no match was found."""

return_compile(pattern, flags).search(string)

re.search函数在字符串内查找模式，直到找到第一个就退出，查找不到返回None，其参数和re.match一致。而与match的区别在于，match只是匹配字符串的开始，而search匹配整个字符串。

l def findall(pattern, string, flags=0):

"""Return a list of all non-overlapping matches in thestring.

If one or more groups are present in the pattern, return a

list of groups; this will be a list of tuples if the pattern

has more than one group.

Empty matches are included in the result."""

return_compile(pattern, flags).findall(string)

re.findall可以获取所有匹配的字符串，并且以list形式返回。

l def compile(pattern, flags=0):

"Compile a regular expression pattern, returning a patternobject."

return_compile(pattern, flags)

re.compile可以将一个正则表达式编译成一个正则表达式对象，可以把经常用的正则表达式编译成正则表达式对象，从而提升匹配的效率。

上面提到search()和match()方法返回match object，下面介绍下match object的属性和方法。

Matchobject

属性：

方法：

group([group1, …]):
获得一个或多个分组截获的字符串；指定多个参数时将以元组形式返回。group1可以使用编号也可以使用别名；编号0代表整个匹配的子串；不填写参数时，返回group(0)；没有截获字符串的组返回None；截获了多次的组返回最后一次截获的子串。一个适度复杂的例子如下：

m = re.match(r"(?P<int>\d+)\.(\d*)",‘3.14‘)

执行这个匹配后，m.group(0)是3.14，m.group(1)是‘3’，m.group(2)是14。

groups([default]):
以元组形式返回全部分组截获的字符串。相当于调用group(1,2,…last)。default表示没有截获字符串的组以这个值替代，默认为None。
groupdict([default]):
返回以有别名的组的别名为键、以该组截获的子串为值的字典，没有别名的组不包含在内。default含义同上。
start([group]):
返回指定的组截获的子串在string中的起始索引（子串第一个字符的索引）。group默认值为0。
end([group]):
返回指定的组截获的子串在string中的结束索引（子串最后一个字符的索引+1）。group默认值为0。
span([group]):
返回(start(group), end(group))。
expand(template):
将匹配到的分组代入template中然后返回。