当前位置:网站首页>Re regular expressions
Re regular expressions
2022-08-05 07:11:00 【m0_52339560】
活动地址:CSDN21天学习挑战赛
re正则表达式
正则表达式功能强大,But it's still more complicated,Referencing the official documentation is not very easy to write.Here is a brief record,To be proficient or to use more.After writing part of it, I found that it was a bit difficult to write,
概述
正则表达式(称为RE,或正则,or a regular expression pattern)Essentially embedded inPythona tiny one、高度专业化的编程语言.
Strings can be parsed and processed through regular expressions.Commonly used are matching,替换,分割等操作.
匹配
Matching characters are the most important part of a regular expression.
Most letters and characters will only match themselves.比如正则表达式testwill only match the string exactly'test'.注意:Without setting the relevant parameters,Regular expressions are strictly case-sensitive.
Except for normal characters,There are also some special characters in regular,These special characters are called metacharacters.以下是常用的元字符:
. ^ $ * + ? { } [ ] \ | ( )
These are introduced one by one below:
.:This character matches any character except newlines.
^:This character matches the beginning of the string.
$:匹配字符串的结尾
*:Matches the preceding regular expression0到任意次,And it's as many matches as possible.
+:Matches the preceding regular expression1到任意次,And it's as many matches as possible.
?:Matches the preceding regular expression0或1This repeats.
{m}:对其之前的正则式指定匹配 m 个重复;少于 m 的话就会导致匹配失败.
{m,n}:对正则式进行 m 到 n 次匹配,在 m 和 n 之间取尽量多.
{m,n}?:对正则式进行 m 到 n 次匹配,在 m 和 n Take as little as possible.
[...]:matches appear...中的字符.If you want to match a set of characters,They can be listed individually,也可以使用-to concatenate the start and end characters of the set of characters,For example to match all lowercase letters[a-z],从ASCIIlook at the code,[a-z]可以匹配a和z之间的所有字符.
[^...]:Match does not appear...中的字符
|:A|B,A和B可以是任意正则表达式,那么匹配A或者B.
():组合.匹配括号内的任意正则表达式.After the matching is completed, the matching results in parentheses can be extracted.
There are also some special sequences here,如下:
\d \D \s \S \w \W
Introduce their functions:
\d匹配任何十进制数字;这等价于
[0-9].
\D匹配任何非数字字符;这等价于
[^0-9].
\s匹配任何空白字符;这等价于
[ \t\n\r\f\v].
\S匹配任何非空白字符;这相当于
[^ \t\n\r\f\v].
\w匹配任何字母与数字字符;这相当于
[a-zA-Z0-9_].
\W匹配任何非字母与数字字符;这相当于
[^a-zA-Z0-9_].
Backslash disaster
在Python的字符串中,字符\需要使用\\来标识.
Suppose you write a regex to match strings'\section'.Then you need to use regular\\\\section来表示.
in a regex that uses backslashes repeatedly,This results in a lot of repeated backslashes,and makes the resulting string incomprehensible.
使用正则表达式
编译正则表达式
Compiles a regular expression into a pattern object,In turn, various operations are performed through the schema object.
import re p = re.complie('ab*') #Compile the regular expression into a pattern object print(type(p)) #<class 're.Pattern'> res = p.match('abc') print(type(res)) #<class 're.Match'>
应用匹配
Once you have an object representing the compiled regular expression,你用它做什么? Schema objects have several methods and properties. Only the most important ones are covered here.
| 方法 / 属性 | 目的 |
|---|---|
match() | Determines whether the regex matches from the beginning of the string.In fact, it is to match the entire string with the regular expression. |
search() | 扫描字符串,Find anywhere this regex matches. |
findall() | Find all substrings matched by the regular,and return them as a list. |
finditer() | Find all substrings matched by the regular,and return them as one iterator. |
- match():匹配整个字符串,返回Match
import re p = re.compile('[a-z]+') p.match(" ") #None m = p.match("tempo") print(m) #<re.Match object; span=(0, 5), match='tempo'>
- search():Matches the entire string and its substrings,返回Match
p = re.compile('[a-z]+') m = p.search("::: message");print(m) # <re.Match object;span=(4,11), match='message'>
- findall():Returns a list of matching strings
p = re.compile(r'\d+') m = p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') print(m) # ['12', '11', '10']
- finditer():Returns a sequence of matching objects as oneiterator
iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') print(type(iterator)) # <class 'callable_iterator'>
The code above returnsre.Match有一些常用的方法.
| 方法/属性 | 目的 |
|---|---|
| group() | Returns the regular matched string |
| start() | 返回匹配的开始位置 |
| end() | 返回匹配的结束位置 |
| span() | 返回包含匹配 (start, end) 位置的元组 |
示例:
import re p = re.compile('[a-z]+') p.match(" ") #None m = p.match("tempo") print(m) #<re.Match object; span=(0, 5), match='tempo'> m.group() # 'tempo' print(m.start(), m.end()) #0 5 print(m.span()) #(0, 5)
分组
If you need to extract part of the matched string,Then you need to use grouping.
p = re.compile('(\w*)\s(\w*).*')
res = p.match('abc word hello')
print(res.group(0)) #abc word hello
print(res.group(1)) #abc
print(res.group(2)) #word
贪婪与非贪婪
This is to be introduced.*和.*?.
.*:尽可能多地匹配..*?:尽可能少地匹配.
s = '<html><head><title>Title</title>'
print(re.match('<.*>', s).group())
# <html><head><title>Title</title>
print(re.match('<.*?>', s).group())
# <html>
参考资料
- https://docs.python.org/zh-cn/3.8/howto/regex.html#match-versus-search
- https://docs.python.org/zh-cn/3.8/library/re.html#re-syntax
- https://blog.csdn.net/yuan2019035055/article/details/124217883
边栏推荐
猜你喜欢

【网友真实投稿】为女友放弃国企舒适圈,转行软件测试12k*13薪

typescript60-泛型工具类型(readonly)

(JLK105D)中山爆款LED恒流电源芯片方案

开源中国活动合作说明书

Day9 of Hegong Daqiong team vision team training - camera calibration

(四)旋转物体检测数据roLabelImg转DOTA格式

(2022杭电多校六)1010-Planar graph(最小生成树)

binary search tree problem

性能提升400倍丨外汇掉期估值计算优化案例

(2022杭电多校六)1012-Loop(单调栈+思维)
随机推荐
腾讯业务安全岗 IDP 谈话总结
基于快速行进平方法的水面无人船路径规划
2022杭电多校六 1006-Maex (树形DP)
Summary of Text Characterization Methods
2022熔化焊接与热切割操作证考试题及模拟考试
360度反馈调查表中的问题示范
技术分析模式(十)头肩图案
Rapid Medical's Ultra-Small and Only Adjustable Thromb Retriever Receives FDA Clearance
[instancetype type Objective-C]
After the firewall iptable rule is enabled, the system network becomes slow
[上海]招聘.Net高级软件工程师&BI数据仓库工程师(急)
Mysql为什么 建立数据库失败
铠侠携手Aerospike提升数据库应用性能
IO进程线程->进程间的通信->day7
TCP的粘包拆包问题+解决方案
(JLK105D)中山爆款LED恒流电源芯片方案
不能比较或排序 text、ntext 和 image 数据类型
任务流调度工具AirFlow,,220804,,
UDP group (multi)cast
Source code analysis of Nacos configuration service (full)