当前位置:网站首页>Regular expression learning
Regular expression learning
2022-06-26 09:23:00 【hellolianhua】
One grammar
1.1 + The number indicates that the preceding character must appear at least once (1 Times or times )
runoo+b, Can match runoob、runooob、runoooooob etc.
1.2 * The sign means that the preceding character may not appear , One or more times (0 Time 、 or 1 Time 、 Or many times )
runoo*b, Can match runob、runoob、runoooooob etc.
1.3 ? The question mark represents that the preceding character can only appear once at most (0 Time or 1 Time )
colou?r Can match color perhaps colour
1.4 Ordinary character
1.4.1 [ABC] matching [...] All characters in
for example [aeiou] Match string "google runoob taobao" All of the e o u a Letter .
1.4.2 [^ABC] Match except [...] All the characters of the characters in
for example [^aeiou] Match string "google runoob taobao" In addition to e o u a All the letters of the letter .
1.4.3 [A-Z] Represents an interval , Match all capital letters ,[a-z] Means all lowercase letters
1.4.4 . Match break (\n、\r) Any single character other than , Equivalent to [^\n\r].
for example .+
<TagName1/>
<TagName2/>A quick brown fox jumps over lazy dog<TagName3/>
The matching result is two
Match1:<TagName1/>
Match2:<TagName2/>A quick brown fox jumps over lazy dog<TagName3/>
1.4.5 [\s\S] Match all .\s Is to match all blanks , Including line breaks ,\S Not blank , Not including line breaks .
for example :
\S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ])
\S+ quick brown \s+ Can be used to match words
The matching result is two
Match1:quick
Match2:brown
1.4.6 \w Match the letter 、 Numbers 、 Underline . Equivalent to [A-Za-z0-9_]
\w matches any word character (equivalent to [a-zA-Z0-9_])
for example : \w+
<TagName1/>
<TagName2/>A quick brown fox jumps over lazy dog<TagName3/>
Match 1 | 1-9 | TagName1 |
Match 2 | 14-22 | TagName2 |
Match 3 | 24-25 | A |
Match 4 | 26-31 | quick |
Match 5 | 32-37 | brown |
Match 6 | 38-41 | fox |
Match 7 | 42-47 | jumps |
Match 8 | 48-52 | over |
Match 9 | 53-57 | lazy |
Match 10 | 58-61 | dog |
Match 11 | 62-70 | TagName3 |
1.5 Nonprinting characters
character | describe |
---|---|
\cx | Match by x Control characters indicated . for example , \cM Match one Control-M Carriage return .x The value of must be A-Z or a-z One of . otherwise , take c As an original 'c' character . |
\f | Match a page break . Equivalent to \x0c and \cL. |
\n | Match a line break . Equivalent to \x0a and \cJ. |
\r | Match a carriage return . Equivalent to \x0d and \cM. |
\s | Matches any whitespace characters , Including Spaces 、 tabs 、 Page breaks and so on . Equivalent to [ \f\n\r\t\v]. Be careful Unicode Regular expressions match full space characters . |
\S | Matches any non-whitespace characters . Equivalent to [^ \f\n\r\t\v]. |
\t | Match a tab . Equivalent to \x09 and \cI. |
\v | Match a vertical tab . Equivalent to \x0b and \cK. |
1.6 Special characters
Special characters , Just some characters with special meanings , As said above runoo*b Medium *, In short, it means any string . If you want to find... In a string * Symbol , You need to * Transference , That is, add a \,runo\*ob Match string runo*ob.
Many metacharacters require special treatment when trying to match them . To match these special characters , You must first make the characters " escape ", namely , Put the backslash character \ Put it in front of them . The following table lists the special characters in regular expressions :
Special characters | describe |
---|---|
$ | Matches the end of the input string . If set RegExp Object's Multiline attribute , be $ Also match '\n' or '\r'. To match $ Character itself , Please use \$. |
( ) | Mark the beginning and end of a subexpression . Subexpressions can be obtained for later use . To match these characters , Please use \( and \). |
* | Match previous subexpression zero or more times . To match * character , Please use \*. |
+ | Match previous subexpression one or more times . To match + character , Please use \+. |
. | Match break \n Any single character other than . To match . , Please use \. . |
[ | Mark the beginning of a bracket expression . To match [, Please use \[. |
? | Match previous subexpression zero or once , Or indicate a non greedy qualifier . To match ? character , Please use \?. |
\ | Mark next character as or special character 、 Or literal character 、 Or back reference 、 Or octal escape character . for example , 'n' Matching character 'n'.'\n' Match newline . Sequence '\\' matching "\", and '\(' The match "(". |
^ | Matches the start of the input string , Unless used in a bracket expression , When the symbol is used in a bracket expression , Indicates that the character set in the bracket expression is not accepted . To match ^ Character itself , Please use \^. |
{ | Mark the beginning of a qualifier expression . To match {, Please use \{. |
| | Indicate a choice between the two . To match |, Please use \|. |
1.6.1 $ asserts position at the end of a line
for example :$
Test string:
Match 1 1-9 TagName1
Match 2 14-22 TagName2
result :
Match 1 | 20-20 | null |
Match 2 | 43-43 | null |
1.6.2 () Start and end of subexpression
There will be one. group Appearance ( There are doubts )
8-11 | 1-9 | |
Group 1 | 8-11 | 1-9 |
Match 2 | 29-34 | 14-22 |
Group 1 | 29-34 | 14-22 |
1.6.3 ^ Matches the start of the input string , Unless used in a bracket expression , When the symbol is used in a bracket expression , Indicates that the character set in the bracket expression is not accepted .
Example :
Regular expressions ; ([^a-z]+-[^a-z]+)
Match 1 1-9 TagName1
Match 2 14-22 TagName2
result :
5-13 | 1 1-9 T | |
Group 1 | 5-13 | 1 1-9 T |
Match 2 | 26-36 | 2 14-22 T |
Group 1 | 26-36 | 2 14-22 T |
1.6.4 | perhaps
for example ;
at|ag
Match 1 1-9 TagName1
Match 2 14-22 TagName2
result
Match 1 | 1-3 | at |
Match 2 | 13-15 | ag |
Match 3 | 22-24 | at |
Match 4 | 36-38 | ag |
1.7 qualifiers
Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match . Yes * or + or ? or {n} or {n,} or {n,m} common 6 Kind of .
character | describe |
---|---|
* | Match previous subexpression zero or more times . for example ,zo* Can match "z" as well as "zoo".* Equivalent to {0,}. |
+ | Match previous subexpression one or more times . for example ,'zo+' Can match "zo" as well as "zoo", But can't match "z".+ Equivalent to {1,}. |
? | Match previous subexpression zero or once . for example ,"do(es)?" Can match "do" 、 "does" Medium "does" 、 "doxy" Medium "do" .? Equivalent to {0,1}. |
{n} | n Is a non negative integer . Matched definite n Time . for example ,'o{2}' Can't match "Bob" Medium 'o', But it matches "food" Two of them o. |
{n,} | n Is a non negative integer . Match at least n Time . for example ,'o{2,}' Can't match "Bob" Medium 'o', But it can match. "foooood" All in o.'o{1,}' Equivalent to 'o+'.'o{0,}' Is equivalent to 'o*'. |
{n,m} | m and n All non negative integers , among n <= m. Least match n Times and at most m Time . for example ,"o{1,3}" Will match "fooooood" Top three in o.'o{0,1}' Equivalent to 'o?'. Please note that there cannot be spaces between commas and two numbers . |
1.7.1 * and + Qualifiers are greedy , Because they match as much text as possible , It's only by adding a ? We can achieve non greedy or minimal matching .
for example , You may search HTML file , To find in h1 What's in the label .HTML The code is as follows :
<h1>RUNOOB- Novice tutorial </h1>
greedy : The following expression matches the less than symbol... From the beginning (<) To close h1 The marked is greater than the symbol (>) Between all the content .
/<.*>/
<
1.7.2 Not greed : If you just need to match the beginning and end h1 label , The following non greedy expression only matches <h1>.
/<.*?>/
.matches any character (except for line terminators)
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Added ?, It is as little as possible
/<\w+?>/
By means of *、+ or ? Place after qualifier ?, The expression is from " greedy " The expression is converted to " Not greed " Expression or minimum match
1.8 Locator
Locators are used to describe the boundaries of strings or words ,^ and $ Refers to the beginning and end of a string ,\b Describe the front or back boundary of a word ,\B Indicates a non word boundary .
The locators of regular expressions are :
character | describe |
---|---|
^ | Matches where the input string starts . If set RegExp Object's Multiline attribute ,^ Also with \n or \r Position matching after . |
$ | Matches the position of the end of the input string . If set RegExp Object's Multiline attribute ,$ Also with \n or \r Previous position match . |
\b | Matches a word boundary , That is, the position between words and spaces . |
\B | Non word boundary matching . |
The real chapter title doesn't just appear at the beginning of the line , And it's the only text in the line . It appears both at the beginning and at the end of the same line . The following expression ensures that the specified match matches only the chapter and does not match the cross reference . By creating regular expressions that match only the beginning and end of a line of text , That can be done .
1.8.1 /^Chapter [1-9][0-9]{0,1}$/ This is a matching chapter
1.8.2 /\bCha/ matching Chapter
Match 1 | 0-3 | Cha |
1.8.3 /ter\b/ matching Chapter
Match 1 | 4-7 | ter |
1.8.4 /\Bapt/ matching Chapter
Match 1 | 2-5 | apt |
1.8.5 \BCha matching Chapter
Your regular expression does not match the subject string.
1.9 choice
Use parentheses () Enclose all the options , Use... Between adjacent options | Separate .
() Represents the capture group ,() The matching values in each group are saved , Multiple matching values can be represented by numbers n Check it out. (n It's a number , It means the first one n Capture the contents of groups )
1.10 Listed below are ?=、?<=、?!、?<! Use difference of
1.10.1 exp1(?=exp2): lookup exp2 Ahead exp1.
1.10.2 (?<=exp2)exp1: lookup exp2 hinder exp1
1.10.3 exp1(?!exp2): It's not exp2 Of exp1
1.10.4 (?<!exp2)exp1: Not the front look exp2 Of exp1.
1.11 backreferences
The simplest way to reverse reference 、 One of the most useful applications , It provides the ability to find a match between two identical adjacent words in the text . Take the following sentence as
example :
Is is the cost of of gasoline going up up?
There are obviously multiple words on the sentence . If you can design a way to locate the sentence , Instead of looking for the repetition of each word , That would be great . The following regular expression uses a single subexpression to achieve this :
example
Find duplicate words :
var str = "Is is the cost of of gasoline going up up";
var patt1 = /\b([a-z]+) \1\b/igm;
document.write(str.match(patt1));
Match 10-5 | Is is | |
Group 1 | 0-2 | Is |
Match 2 | 15-20 | of of |
Group 1 | 15-17 | of |
Match 3 | 36-41 | up up |
Group 1 | 36-38 | up |
\1 matches the same text as most recently matched by the 1st capturing group
\b([a-z]+) \1\b/igm
Captured expression , just as [a-z]+ designated , Include one or more letters . The second part of the regular expression is a reference to previously captured sub matches , namely , The second match of the word is exactly matched by the parenthesis expression .\1 Specify the first child match .
Word boundary element characters ensure that only the entire word is detected . otherwise , Such as "is issued" or "this is" Phrases such as will not be recognized correctly by this expression .
Global tags after regular expressions g Specifies that the expression is applied to as many matches as can be found in the input string .
Case insensitive at the end of an expression i The tag specifies that it is not case sensitive .
Multiline marker m Specifies that potential matches may occur on both sides of the newline character .
Refer to the learning documentation :
Regular expressions – grammar | Novice tutorial (runoob.com)
边栏推荐
- php提取txt文本存储json数据中的域名
- Function function of gather()
- "One week's work on digital power" -- encoder and decoder
- Particles and sound effect system in games104 music 12 game engine
- 51 single chip microcomputer ROM and ram
- Fix the problem that the rich text component of the applet does not support the properties of video cover, autoplay, controls, etc
- Statistics of various target quantities of annotations (XML annotation format)
- Unity WebGL发布无法运行问题
- 《一周搞定模电》—55定时器
- 行为树 文件说明
猜你喜欢
Classified catalogue of high quality sci-tech periodicals in the field of computing
Analysis of ROS calculation diagram level
Spark based distributed parallel processing optimization strategy - Merrill Lynch data
教程1:Hello Behaviac
Phpcms mobile station module implements custom pseudo static settings
Phpcms V9 mobile phone access computer station one-to-one jump to the corresponding mobile phone station page plug-in
"One week's data collection" -- combinational logic circuit
The most complete and simple nanny tutorial: deep learning environment configuration anaconda+pychart+cuda+cudnn+tensorflow+pytorch
How to solve the problem that NVIDIA model cannot be viewed by inputting NVIDIA SMI and quickly view NVIDIA model information of computer graphics card
【CVPR 2021】Unsupervised Pre-training for Person Re-identification(UPT)
随机推荐
Phpcms V9 mall module (fix the Alipay interface Bug)
PD fast magnetization mobile power supply scheme
运行时端的执行流程
PD快充磁吸移动电源方案
Li Kou 399 [division evaluation] [joint query]
"One week's work on Analog Electronics" - optocoupler and other components
计算领域高质量科技期刊分级目录
Merrill Lynch data tempoai is new!
《一周搞定数电》——组合逻辑电路
Analysis of ROS calculation diagram level
Phpcms V9 remove the phpsso module
【Open5GS】Open5GS安装配置
Unity webgl publishing cannot run problem
挖财打新债安全吗
Edge computing is the sinking and extension of cloud computing capabilities to the edge and user sides
Self taught programming series - 2 file path and text reading and writing
Yolov5 advanced 5 GPU environment setup
Applet realizes picture preloading (picture delayed loading)
kubernetes集群部署(v1.23.5)
Bug encountered in training detectron2: the test set cannot be evaluated during training