当前位置：网站首页>Regular expression learning

Regular expression learning

2022-06-26 09:23:00 【hellolianhua】

One grammar

1.1 + The number indicates that the preceding character must appear at least once （1 Times or times ）

runoo+b, Can match runoob、runooob、runoooooob etc.

1.2 * The sign means that the preceding character may not appear , One or more times （0 Time 、 or 1 Time 、 Or many times ）

runoo*b, Can match runob、runoob、runoooooob etc.

1.3 ? The question mark represents that the preceding character can only appear once at most （0 Time or 1 Time ）

colou?r Can match color perhaps colour

1.4 Ordinary character

1.4.1 [ABC] matching [...] All characters in

for example [aeiou] Match string "google runoob taobao" All of the e o u a Letter .

1.4.2 [^ABC] Match except [...] All the characters of the characters in

for example [^aeiou] Match string "google runoob taobao" In addition to e o u a All the letters of the letter .

1.4.3 [A-Z] Represents an interval , Match all capital letters ,[a-z] Means all lowercase letters

1.4.4 . Match break （\n、\r） Any single character other than , Equivalent to [^\n\r].

for example .+

<TagName1/>
<TagName2/>A quick brown fox jumps over lazy dog<TagName3/>

The matching result is two

Match1:<TagName1/>
Match2:<TagName2/>A quick brown fox jumps over lazy dog<TagName3/>

1.4.5 [\s\S] Match all .\s Is to match all blanks , Including line breaks ,\S Not blank , Not including line breaks .

for example ：

\S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ])

\S+ quick brown \s+ Can be used to match words

The matching result is two

Match1:quick
Match2:brown

1.4.6 \w Match the letter 、 Numbers 、 Underline . Equivalent to [A-Za-z0-9_]

\w matches any word character (equivalent to [a-zA-Z0-9_])

for example ： \w+

<TagName1/>
<TagName2/>A quick brown fox jumps over lazy dog<TagName3/>

Match 1	1-9	TagName1
Match 2	14-22	TagName2
Match 3	24-25	A
Match 4	26-31	quick
Match 5	32-37	brown
Match 6	38-41	fox
Match 7	42-47	jumps
Match 8	48-52	over
Match 9	53-57	lazy
Match 10	58-61	dog
Match 11	62-70	TagName3

1.5 Nonprinting characters

character	describe
\cx	Match by x Control characters indicated . for example , \cM Match one Control-M Carriage return .x The value of must be A-Z or a-z One of . otherwise , take c As an original 'c' character .
\f	Match a page break . Equivalent to \x0c and \cL.
\n	Match a line break . Equivalent to \x0a and \cJ.
\r	Match a carriage return . Equivalent to \x0d and \cM.
\s	Matches any whitespace characters , Including Spaces 、 tabs 、 Page breaks and so on . Equivalent to [ \f\n\r\t\v]. Be careful Unicode Regular expressions match full space characters .
\S	Matches any non-whitespace characters . Equivalent to [^ \f\n\r\t\v].
\t	Match a tab . Equivalent to \x09 and \cI.
\v	Match a vertical tab . Equivalent to \x0b and \cK.

1.6 Special characters

Special characters , Just some characters with special meanings , As said above runoo*b Medium *, In short, it means any string . If you want to find... In a string * Symbol , You need to * Transference , That is, add a \,runo\*ob Match string runo*ob.

Many metacharacters require special treatment when trying to match them . To match these special characters , You must first make the characters " escape ", namely , Put the backslash character \ Put it in front of them . The following table lists the special characters in regular expressions ：

Special characters	describe
$	Matches the end of the input string . If set RegExp Object's Multiline attribute , be $ Also match '\n' or '\r'. To match $ Character itself , Please use \$.
( )	Mark the beginning and end of a subexpression . Subexpressions can be obtained for later use . To match these characters , Please use $ and $.
*	Match previous subexpression zero or more times . To match * character , Please use \*.
+	Match previous subexpression one or more times . To match + character , Please use \+.
.	Match break \n Any single character other than . To match . , Please use \. .
[	Mark the beginning of a bracket expression . To match [, Please use \[.
?	Match previous subexpression zero or once , Or indicate a non greedy qualifier . To match ? character , Please use \?.
\	Mark next character as or special character 、 Or literal character 、 Or back reference 、 Or octal escape character . for example , 'n' Matching character 'n'.'\n' Match newline . Sequence '\\' matching "\", and '\(' The match "(".
^	Matches the start of the input string , Unless used in a bracket expression , When the symbol is used in a bracket expression , Indicates that the character set in the bracket expression is not accepted . To match ^ Character itself , Please use \^.
{	Mark the beginning of a qualifier expression . To match {, Please use \{.
\|	Indicate a choice between the two . To match \|, Please use \\|.

1.6.1 $ asserts position at the end of a line

for example ：$

Test string:

Match 1 1-9 TagName1
Match 2 14-22 TagName2

result ：

Match 1

20-20

null

Match 2

43-43

null

1.6.2 () Start and end of subexpression

There will be one. group Appearance （ There are doubts ）

8-11	1-9
Group 1	8-11	1-9

Match 2	29-34	14-22
Group 1	29-34	14-22

1.6.3 ^ Matches the start of the input string , Unless used in a bracket expression , When the symbol is used in a bracket expression , Indicates that the character set in the bracket expression is not accepted .

Example ：

Regular expressions ; ([^a-z]+-[^a-z]+)

Match 1 1-9 TagName1
Match 2 14-22 TagName2

result ：

5-13	1 1-9 T
Group 1	5-13	1 1-9 T

Match 2	26-36	2 14-22 T
Group 1	26-36	2 14-22 T

1.6.4 | perhaps

for example ;

at|ag

Match 1 1-9 TagName1
Match 2 14-22 TagName2

result

Match 1

1-3

Match 2

13-15

Match 3

22-24

Match 4

36-38

1.7 qualifiers

Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match . Yes * or + or ? or {n} or {n,} or {n,m} common 6 Kind of .

character	describe
*	Match previous subexpression zero or more times . for example ,zo* Can match "z" as well as "zoo".* Equivalent to {0,}.
+	Match previous subexpression one or more times . for example ,'zo+' Can match "zo" as well as "zoo", But can't match "z".+ Equivalent to {1,}.
?	Match previous subexpression zero or once . for example ,"do(es)?" Can match "do" 、 "does" Medium "does" 、 "doxy" Medium "do" .? Equivalent to {0,1}.
{n}	n Is a non negative integer . Matched definite n Time . for example ,'o{2}' Can't match "Bob" Medium 'o', But it matches "food" Two of them o.
{n,}	n Is a non negative integer . Match at least n Time . for example ,'o{2,}' Can't match "Bob" Medium 'o', But it can match. "foooood" All in o.'o{1,}' Equivalent to 'o+'.'o{0,}' Is equivalent to 'o*'.
{n,m}	m and n All non negative integers , among n <= m. Least match n Times and at most m Time . for example ,"o{1,3}" Will match "fooooood" Top three in o.'o{0,1}' Equivalent to 'o?'. Please note that there cannot be spaces between commas and two numbers .

1.7.1 * and + Qualifiers are greedy , Because they match as much text as possible , It's only by adding a ? We can achieve non greedy or minimal matching .

for example , You may search HTML file , To find in h1 What's in the label .HTML The code is as follows ：

<h1>RUNOOB- Novice tutorial </h1>

greedy ： The following expression matches the less than symbol... From the beginning (<) To close h1 The marked is greater than the symbol (>) Between all the content .

/<.*>/

1.7.2 Not greed ： If you just need to match the beginning and end h1 label , The following non greedy expression only matches <h1>.

/<.*?>/

.matches any character (except for line terminators)

*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)

Added ?, It is as little as possible

/<\w+?>/

By means of *、+ or ? Place after qualifier ?, The expression is from " greedy " The expression is converted to " Not greed " Expression or minimum match

1.8 Locator

Locators are used to describe the boundaries of strings or words ,^ and $ Refers to the beginning and end of a string ,\b Describe the front or back boundary of a word ,\B Indicates a non word boundary .

The locators of regular expressions are ：

character	describe
^	Matches where the input string starts . If set RegExp Object's Multiline attribute ,^ Also with \n or \r Position matching after .
$	Matches the position of the end of the input string . If set RegExp Object's Multiline attribute ,$ Also with \n or \r Previous position match .
\b	Matches a word boundary , That is, the position between words and spaces .
\B	Non word boundary matching .

The real chapter title doesn't just appear at the beginning of the line , And it's the only text in the line . It appears both at the beginning and at the end of the same line . The following expression ensures that the specified match matches only the chapter and does not match the cross reference . By creating regular expressions that match only the beginning and end of a line of text , That can be done .

1.8.1 /^Chapter [1-9][0-9]{0,1}$/ This is a matching chapter

1.8.2 /\bCha/ matching Chapter

Match 1

0-3

Cha

1.8.3 /ter\b/ matching Chapter

Match 1

4-7

ter

1.8.4 /\Bapt/ matching Chapter

Match 1

2-5

apt

1.8.5 \BCha matching Chapter

Your regular expression does not match the subject string.

1.9 choice

Use parentheses () Enclose all the options , Use... Between adjacent options | Separate .

() Represents the capture group ,() The matching values in each group are saved , Multiple matching values can be represented by numbers n Check it out. (n It's a number , It means the first one n Capture the contents of groups )

1.10 Listed below are ?=、?<=、?!、?<! Use difference of

1.10.1 exp1(?=exp2)： lookup exp2 Ahead exp1.

1.10.2 (?<=exp2)exp1： lookup exp2 hinder exp1

1.10.3 exp1(?!exp2)： It's not exp2 Of exp1

1.10.4 (?<!exp2)exp1： Not the front look exp2 Of exp1.

1.11 backreferences

The simplest way to reverse reference 、 One of the most useful applications , It provides the ability to find a match between two identical adjacent words in the text . Take the following sentence as

example ：

Is is the cost of of gasoline going up up?

There are obviously multiple words on the sentence . If you can design a way to locate the sentence , Instead of looking for the repetition of each word , That would be great . The following regular expression uses a single subexpression to achieve this ：

example

Find duplicate words ：

var str = "Is is the cost of of gasoline going up up";

var patt1 = /\b([a-z]+) \1\b/igm;

document.write(str.match(patt1));

Match 10-5	Is is
Group 1	0-2	Is

Match 2	15-20	of of
Group 1	15-17	of

Match 3	36-41	up up
Group 1	36-38	up

\1 matches the same text as most recently matched by the 1st capturing group

\b([a-z]+) \1\b/igm

Captured expression , just as [a-z]+ designated , Include one or more letters . The second part of the regular expression is a reference to previously captured sub matches , namely , The second match of the word is exactly matched by the parenthesis expression .\1 Specify the first child match .

Word boundary element characters ensure that only the entire word is detected . otherwise , Such as "is issued" or "this is" Phrases such as will not be recognized correctly by this expression .

Global tags after regular expressions g Specifies that the expression is applied to as many matches as can be found in the input string .

Case insensitive at the end of an expression i The tag specifies that it is not case sensitive .

Multiline marker m Specifies that potential matches may occur on both sides of the newline character .

Refer to the learning documentation ：

Regular expressions – grammar | Novice tutorial (runoob.com)

原网站

版权声明
本文为[hellolianhua]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/177/202206260849185303.html

当前位置：网站首页>Regular expression learning

Regular expression learning

One grammar

1.1 + The number indicates that the preceding character must appear at least once （1 Times or times ）

1.2 * The sign means that the preceding character may not appear , One or more times （0 Time 、 or 1 Time 、 Or many times ）

1.3 ? The question mark represents that the preceding character can only appear once at most （0 Time or 1 Time ）

1.4 Ordinary character

1.4.1 [ABC] matching [...] All characters in

1.4.2 [^ABC] Match except [...] All the characters of the characters in

1.4.3 [A-Z] Represents an interval , Match all capital letters ,[a-z] Means all lowercase letters

1.4.4 . Match break （\n、\r） Any single character other than , Equivalent to [^\n\r].

1.4.5 [\s\S] Match all .\s Is to match all blanks , Including line breaks ,\S Not blank , Not including line breaks .

1.4.6 \w Match the letter 、 Numbers 、 Underline . Equivalent to [A-Za-z0-9_]

1.5 Nonprinting characters

1.6 Special characters

1.6.3 ^ Matches the start of the input string , Unless used in a bracket expression , When the symbol is used in a bracket expression , Indicates that the character set in the bracket expression is not accepted .

1.6.4 | perhaps

1.7 qualifiers

1.7.1 * and + Qualifiers are greedy , Because they match as much text as possible , It's only by adding a ? We can achieve non greedy or minimal matching .

1.7.2 Not greed ： If you just need to match the beginning and end h1 label , The following non greedy expression only matches <h1>.

1.8 Locator

1.8.1 /^Chapter [1-9][0-9]{0,1}$/ This is a matching chapter

1.8.2 /\bCha/ matching Chapter

1.8.3 /ter\b/ matching Chapter

1.8.4 /\Bapt/ matching Chapter

1.8.5 \BCha matching Chapter

1.9 choice

1.10 Listed below are ?=、?<=、?!、?<! Use difference of

1.10.1 exp1(?=exp2)： lookup exp2 Ahead exp1.

1.10.2 (?<=exp2)exp1： lookup exp2 hinder exp1

1.10.3 exp1(?!exp2)： It's not exp2 Of exp1

1.10.4 (?<!exp2)exp1： Not the front look exp2 Of exp1.

1.11 backreferences

example

边栏推荐

猜你喜欢

随机推荐