当前位置:网站首页>Self taught programming series - 1 regular expression
Self taught programming series - 1 regular expression
2022-06-26 09:08:00 【ML_ python_ get√】
Regular expressions
1.1 Do not use regular expressions
def isPhoneNumber(text):
if len(text) !=12:
return False
for i in range(0, 3):
if not text[i].isdecimal():
# Decimal character or not
return False
if text[3] !='-':
return False
for i in range(4,7):
if not text[i].isdecimal():
return False
if text[7] !='-':
return False
for i in range(8,12):
if not text[i].isdecimal():
return False
return True
# print("191-666-1234 is a phone number: ")
# print(isPhoneNumber('191-666-1234'))
# print("bilibili is a phone number :")
# print(isPhoneNumber('bilibili'))
# # Find... In a longer string
# message = "call me at 415-555-1011 tomorrow. 415-555-9999 is my office."
# for i in range(len(message)):
# chunk = message[i:i+12]
# if isPhoneNumber(chunk):
# print('phone number found: ' + chunk)
# print('Done')
1.2 Regular expressions
- \d representative 0-9 The above telephone numbers can be used \d\d\d-\d\d\d-\d\d\d\d To express
- \d{3}-\d{3}-\d{4} among {3} It means match three times , Regular expressions are a way to match , The returned object has properties and methods
- search() Method lookup returns a match object ,match Objects have group Method , Returns the actual matching text
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
# r To get the original string , And escape symbols \ You need to prefix each character with \ More complicated
mo = phoneNumRegex.search('My number is 415-555-4242.')
print("phone number found:" + mo.group())
summary : Regular expression steps
- Import re
- re.compile Create a regex object
- Yes regex Use search Method to pass in the string you want to find , Return to one match object
- Yes match Object use group Method , Returns the actual string
1.3 Group search
- Simple grouping
- Pipe matching
- ?* . Equisign
# groups Print all groups
Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = Regex.search('my number is 123-456-8888.')
print(mo.group(0))
print(mo.group(1))
print(mo.groups())
a,b = mo.groups()
print(a)
print(b)
## There are parentheses in the text , It's the parentheses that lose their meaning in the function
Regex1 = re.compile(r' (\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = Regex1.search('my number is (123) 456-8888.')
print(mo.group(1))
# The pipe matches the first of the words that appear
Regex_hero = re.compile(r'Ironman|Batman')
mo = Regex_hero.search('Ironman and Batman!')
print(mo.group())
mo = Regex_hero.search('Batman and Ironman!')
print(mo.group())
## utilize findall All matches can be found
# Use pipes to match the first occurrence of any word ( The prefix is the same )
Regex_a = re.compile(r'Bat(man|mobile|copter|bat)')
mo = Regex_a.search('Batbat and Batmobile are best!')
print(mo.group())
# (group)? It means that we should group For optional grouping
Regex_chioce = re.compile(r'Bat(wo)?man')
mo = Regex_chioce.search(' I am Batman')
mo1 = Regex_chioce.search('you are Batwoman!')
mo.group()
mo1.group()
# (group)* It means that we should group matching 0 Times or more
Regex_new = re.compile(r'Bat(wo)*man')
mo = Regex_new.search('Batman is my lover!')
print(mo.group())
mo1 = Regex_new.search('my name is Batwowowowowoman!')
print(mo1.group())
mo2 = Regex_new.search('my name is Batman')
print(mo2.group())
# (group)+ It means that we should group matching 1 Times or more
# Regex_add = re.compile(r'Bat(wo)+man')
# mo3 = Regex_add.search('my name is Batman')
# print(mo3.group())
# AttributeError: 'NoneType' object has no attribute 'group'
# {} Specify the number of matches (group){3}3 Time {3,} 3 More than once {,5}5 Below
Regex_ha = re.compile(r'(ha){3}')
mo = Regex_ha.search('hahaha')
print(mo.group())
1.4 Greedy matching and non greedy matching
- python The default regular expression for is greedy , That is, match the longest string ,(group){3,5}? Non greedy matching can be realized
Regex_nogreedy = re.compile(r'(ha){3,5}?')
mo = Regex_nogreedy.search('hahahahahaha')
print(mo.group())
1.5 findall and search
- findall() Find all strings ,search() Find the first string in the string that meets the condition
- findall() Return a list ,search() Return to one match object ; You can also use group matching
Regex_phone = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = Regex_phone.search('Cell:123-456-8888 Work:123-567-9999')
mo1 = Regex_phone.findall('Cell:123-456-8888 Work:123-567-9999')
print(mo.group())
print(mo1)
1.6 Character classification
- \d 0-9 \D except 0-9 Other characters
- \w Word characters include letters 、 Numbers 、 Underline \W Characters other than words
- \s Blank character \S Nonwhite space character
Regex_str = re.compile(r'\d+\s*\w+')
# + Match once or more , A string has multiple numbers and multiple words
mo = Regex_str.findall('12 drummers, 11 pipers, 10 lords, 9 ladies, 8maids, 7swans, 6 geese, 5 rings, 4 birds, 3 hens, 2 doves, 1 partridge')
print(mo)
# Establish your own character classification
Regex_own = re.compile(r'[AEIOUaeiou]')
mo = Regex_own.findall('RoboCop eats baby food. BABY FOOD!')
print(mo)
# [ Custom characters ] Customize [^] Match characters other than custom - Can be connected
Regex_own1 = re.compile(r'[^AEIOUaeiou]')
mo1 = Regex_own1.findall('RoboCop eats baby food. BABY FOOD!')
print(mo1)
1.7 Precise matching
- ^ Insert symbols and $ End symbol
- wildcard .
Regex1 = re.compile(r'^Hello')
mo = Regex1.search('Hello world!')
mo1 = Regex1.search('he said hello!')
print(mo,'\n',mo1)
Regex2 = re.compile(r'^\d+$')
mo2 = Regex2.search('111111111x23333333')
mo3 = Regex2.search('222213232131')
print(mo2,'\n',mo3)
Regex3 = re.compile(r'\d+$')
mo4 = Regex3.search('my age is 26')
print(mo4)
# wildcard . Match all characters except line breaks , But only one character is matched
Regex_at = re.compile(r'.at')
mo = Regex_at.findall('The cat in the hat sat on the flat mat.')
print(mo)
# .* Match any character , For example, when entering name and password
Regex_name = re.compile(r'First name: (.*) Last name: (.*)')
mo = Regex_name.search('First name: AI Last name: Sweigart')
print(mo.group(1))
print(mo.group(2))
# The use of non greedy algorithms
Regex_greed = re.compile(r'<.*>')
mo = Regex_greed.search('<To serve man> for dinner>')
print(mo.group())
Regex_nogreed = re.compile(r'<.*?>')
mo1 = Regex_nogreed.search('<To serve man> for dinner>')
print(mo1.group())
1.8 compile The second parameter
- Ignore blanks re.VERBOSE
- Ignore case re.I
- Wildcard newline find re.DOTALL
# wildcard . To match the newline character, you need to pass in the parameter re.DOTALL
Regex_nonewline = re.compile(r'.*')
mo = Regex_nonewline.findall('Serve the public trust. \nProtect the innocent\nUphold the law')
print(mo)
Regex_newline = re.compile(r'.*',re.DOTALL)
mo1 = Regex_newline.findall('Serve the public trust. \nProtect the innocent\nUphold the law')
print(mo1)
# Ignore case
Regex_cop = re.compile(r'robocop', re.I)
mo=Regex_cop.search('RoboCop is part man, part machine,all cop.')
print(mo.group())
# Let regular expressions ignore whitespace re.VERBOSE
Regex_group = re.compile(r'Agent (\w)\w*',re.VERBOSE)
mo1 = Regex_group.sub(r'\1****', 'A gent Alice gave the secret documents to Agent Bob')
print(mo1)
Regex_group = re.compile(r'Agent (\w)\w*',re.VERBOSE | re.I|re.DOTALL)
# Different values are used for the same parameter , Press bit or
1.9 Alternative text
- sub
# sub Replace matching text
Regex_sub = re.compile(r'Agent \w+')
mo = Regex_sub.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob')
print(mo)
# Replace the matching text with some matching elements, such as the initials of names , Just group , Then incoming \1 \2 \3 that will do
Regex_group = re.compile(r'Agent (\w)\w*')
mo1 = Regex_group.sub(r'\1****', 'Agent Alice gave the secret documents to Agent Bob')
print(mo1)
1.10 Phone number and email address extractor
- Paste the message to the clipboard : Manual or programmed
- Get text from clipboard : Use pyperclip Module copy and paste string , Create two regular expressions to match the phone number and email address respectively
- Find all phone numbers and... In the text E-mail Address : Find all matching results ( Not a one-time match )
- Paste them on the clipboard : Put the matched strings in good format , Put it in a string , For pasting
- If no match is found , Then the message
import pyperclip, re
# Define two regular expressions
phoneRegex = re.compile(r'''( (\d{3} | \(\d{3}\))? (\s | - | \ .)? # Space 、- or . Number (\d{3}) (\s | - | \ .) (\d{4}) (\s*(ext|x|ext.)\s*(\d{2,5}))? # Optional extension number , Here is the third group 8 The first group is the extension number )''', re.VERBOSE)
# First return the group with the largest bracket, and then return a total of 9 Elements
emailRegex = re.compile(r'''( [a-zA-Z0-9._%+-]+ # user name @ [a-zA-Z0-9.-]+ # domain name (\.[a-zA-Z]{2,4}) # .com .cn etc. )''' , re.VERBOSE)
# Continue matching
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
print(groups)
phoneNum = '-'.join([groups[1],groups[3],groups[5]])
if groups[8] !=' ':
phoneNum+= ' x'+groups[8]
matches.append(groups[0])
for groups in emailRegex.findall(text):
matches.append(groups[0])
# Concatenate into a string , Copy to clipboard
if len(matches) >0:
pyperclip.copy('\n'.join(matches))
print('Copied to clipboard: ')
print('\n'.join(matches))
else:
print('No phone numbers or email address found')
eg: We can use csdn Home page as an example , give the result as follows :
Copied to clipboard:
400-660-0108
999-2021
472464
1900
658
1101
[email protected].net
边栏推荐
- Live review | smardaten lihongfei interprets the Research Report on China's low / no code industry: the wind direction has changed
- 1.Intro_ Math (white board derivation and reprint of station B)
- 行为树 文件说明
- Differences between commonjs and ES6 modularity
- Yolov5 advanced level 2 installation of labelimg
- PD快充磁吸移動電源方案
- 编程训练7-日期转换问题
- Computer mall based on SSM
- 远程工作的一些命令
- Pytorch neural network
猜你喜欢

Data warehouse (3) star model and dimension modeling of data warehouse modeling

隐藏式列表菜单以及窗口转换在Selenium 中的应用

phpcms v9手机访问电脑站一对一跳转对应手机站页面插件

Notes on setting qccheckbox style

【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(一)

Yolov5 advanced III training environment

Drawing with MATLAB (2) -- color ring

基于SSM的毕业论文管理系统

Yolov5进阶之三训练环境

设置QCheckbox 样式的注意事项
随机推荐
Phpcms V9 remove the phpsso module
1.Intro_ Math (white board derivation and reprint of station B)
Yolov5 advanced III training environment
Solution to the encoding problem encountered by the crawler when requesting get/post
PD快充磁吸移动电源方案
在哪个软件上开户比较安全
[IVI] 15.1.2 system stability optimization (lmkd Ⅱ) psi pressure stall information
Unity WebGL发布无法运行问题
cookie session 和 token
Selenium 搭建 Cookies池 绕过验证反爬登录
Graduation thesis management system based on SSM
[300+ continuous sharing of selected interview questions from large manufacturers] column on interview questions of big data operation and maintenance (I)
【云原生 | Kubernetes篇】深入万物基础-容器(五)
行為樹XML文件 熱加載
编辑类型信息
1.17 daily improvement of winter vacation learning (frequency school and Bayesian school) and maximum likelihood estimation
Upgrade phpcms applet plug-in API interface to 4.3 (add batch acquisition interface, search interface, etc.)
Baidu applet rich text parsing tool bdparse
拦截器与过滤器的实现代码
修复小程序富文本组件不支持video视频封面、autoplay、controls等属性问题