当前位置：网站首页>Re regular expressions

Re regular expressions

2022-08-05 07:11:00 【m0_52339560】

活动地址：CSDN21天学习挑战赛

re正则表达式

正则表达式功能强大,But it's still more complicated,Referencing the official documentation is not very easy to write.Here is a brief record,To be proficient or to use more.After writing part of it, I found that it was a bit difficult to write, 在这里插入图片描述

概述

正则表达式（称为RE,或正则,or a regular expression pattern）Essentially embedded inPythona tiny one、高度专业化的编程语言.

Strings can be parsed and processed through regular expressions.Commonly used are matching,替换,分割等操作.

匹配

Matching characters are the most important part of a regular expression.

Most letters and characters will only match themselves.比如正则表达式testwill only match the string exactly'test'.注意：Without setting the relevant parameters,Regular expressions are strictly case-sensitive.

Except for normal characters,There are also some special characters in regular,These special characters are called metacharacters.以下是常用的元字符:

. ^ $ * + ? { } [ ] \ | ( )

These are introduced one by one below:

.：This character matches any character except newlines.
^：This character matches the beginning of the string.
$：匹配字符串的结尾
*：Matches the preceding regular expression0到任意次,And it's as many matches as possible.
+：Matches the preceding regular expression1到任意次,And it's as many matches as possible.
?：Matches the preceding regular expression0或1This repeats.
{m}：对其之前的正则式指定匹配 m 个重复;少于 m 的话就会导致匹配失败.
{m,n}：对正则式进行 m 到 n 次匹配,在 m 和 n 之间取尽量多.
{m,n}?：对正则式进行 m 到 n 次匹配,在 m 和 n Take as little as possible.
[...]：matches appear...中的字符.If you want to match a set of characters,They can be listed individually,也可以使用-to concatenate the start and end characters of the set of characters,For example to match all lowercase letters[a-z],从ASCIIlook at the code,[a-z]可以匹配a和z之间的所有字符.
[^...]：Match does not appear...中的字符
|：A|B,A和B可以是任意正则表达式,那么匹配A或者B.
()：组合.匹配括号内的任意正则表达式.After the matching is completed, the matching results in parentheses can be extracted.

There are also some special sequences here,如下:

\d \D \s \S \w \W

Introduce their functions：

\d
匹配任何十进制数字;这等价于 [0-9].
\D
匹配任何非数字字符;这等价于 [^0-9].
\s
匹配任何空白字符;这等价于 [ \t\n\r\f\v].
\S
匹配任何非空白字符;这相当于 [^ \t\n\r\f\v].
\w
匹配任何字母与数字字符;这相当于 [a-zA-Z0-9_].
\W
匹配任何非字母与数字字符;这相当于 [^a-zA-Z0-9_].

Backslash disaster

在Python的字符串中,字符\需要使用\\来标识.

Suppose you write a regex to match strings'\section'.Then you need to use regular\\\\section来表示.

in a regex that uses backslashes repeatedly,This results in a lot of repeated backslashes,and makes the resulting string incomprehensible.

使用正则表达式

编译正则表达式

Compiles a regular expression into a pattern object,In turn, various operations are performed through the schema object.

import re 
p = re.complie('ab*') #Compile the regular expression into a pattern object
print(type(p)) #<class 're.Pattern'>

res = p.match('abc')
print(type(res)) #<class 're.Match'>

应用匹配

Once you have an object representing the compiled regular expression,你用它做什么？ Schema objects have several methods and properties. Only the most important ones are covered here.

方法 / 属性	目的
`match()`	Determines whether the regex matches from the beginning of the string.In fact, it is to match the entire string with the regular expression.
`search()`	扫描字符串,Find anywhere this regex matches.
`findall()`	Find all substrings matched by the regular,and return them as a list.
`finditer()`	Find all substrings matched by the regular,and return them as one iterator.

match():匹配整个字符串,返回Match

import re
p = re.compile('[a-z]+')
p.match(" ")  #None
m = p.match("tempo")
print(m) #<re.Match object; span=(0, 5), match='tempo'>

search():Matches the entire string and its substrings,返回Match

p = re.compile('[a-z]+')
m = p.search("::: message");print(m)
# <re.Match object;span=(4,11), match='message'>

findall():Returns a list of matching strings

p = re.compile(r'\d+')
m = p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
print(m)
# ['12', '11', '10']

finditer():Returns a sequence of matching objects as oneiterator

iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
print(type(iterator))
# <class 'callable_iterator'>

The code above returnsre.Match有一些常用的方法.

方法/属性	目的
group()	Returns the regular matched string
start()	返回匹配的开始位置
end()	返回匹配的结束位置
span()	返回包含匹配 (start, end) 位置的元组

示例:

import re
p = re.compile('[a-z]+')
p.match(" ")  #None
m = p.match("tempo")
print(m) #<re.Match object; span=(0, 5), match='tempo'>


m.group() # 'tempo'
print(m.start(), m.end()) #0 5
print(m.span()) #(0, 5)

分组

If you need to extract part of the matched string,Then you need to use grouping.

p = re.compile('(\w*)\s(\w*).*')
res = p.match('abc word hello')
print(res.group(0)) #abc word hello
print(res.group(1)) #abc
print(res.group(2)) #word

贪婪与非贪婪

This is to be introduced.*和.*?.

.*：尽可能多地匹配..*?：尽可能少地匹配.

s = '<html><head><title>Title</title>'
print(re.match('<.*>', s).group())
# <html><head><title>Title</title>
print(re.match('<.*?>', s).group())
# <html>