当前位置:网站首页>Lesson 4 beautifulsoup
Lesson 4 beautifulsoup
2022-06-25 20:43:00 【Osmanthus rice wine balls】
The fourth lesson BeautifulSoup
One 、 summary
1. effect : Get the specified content of the web page
2. quote
from bs4 import BeautifulSoup
Two 、 operation
1.
from bs4 import BeautifulSoup
file = open("./baidu.html","rb")#baidu.html It is a file that contains the source code of the web page that has been crawled and coexisted
html = file.read()# Web source code
bs = BeautifulSoup(html,"html.parser")# analysis html, The parser is html.parser, Resolve to tree structure
print(bs.a)
print(bs.title)#tag Print out labels tag And its contents : Only the first one I found
print(bs.title.string)
print(type(bs.title.string))#NavigableString Print out the contents of the label ( character string )
print(bs.a.attrs)# Get all the attributes in the tag
print(bs)# The content of the whole document
print(bs.a.string)#Comment It's a special one NavigableString, The output does not contain annotation symbols
2. Document traversal :
print(bs.head.contents)# take tag( Here tag yes head) The child nodes of are output as a list
print(bs.head.contents[1])# Use the list to get its number 1 Elements
notes : Details can be found on : Traversal file tree
3. Search for documents :
# String traversal
t_list = bs.find_all("a")# Find all a Hyperlinks to tags
print(t_list)
# Regular expressions
t_list=bs.find_all(re.compile("a"))# Find out what contains a All links under the letter label
print(t_list)
# Pass in a function , Search according to function requirements
def name_is_exists(tag)
return tag.has_attr("name")
t_list = bs.find_all(name_is_exists)
print(t_list)
# The specified parameters are searched by line kwargs
t_list=bs.find_all(id="head")
t_list=bs.find_all(class_=True)
t_list=bs.find_all(href="http://……")
# Find the text that contains the response
t_list=bs.find_all(text = " Map ")# Find out what contains “ Map ” The text of
t_list=bs.find_all(text = [" Map "," tieba "])
t_list=bs.find_all(text=re.compile("\d"))# Use regular expressions to find all text contents with numbers ( The string in the tag
#limit Parameters
t_list = bs.find_all("a",limit=3)# Get three tags as a Documents
#css Selectors
t_list = bs.select("title")# Search through tags
t_list = bs.select(".mnav")# Find... By class name (class="mnav")
t_list = bs.select("#u1")# according to id Search for (id="u1")
t_list = bs.select("a[class='bri']")# Find... By attributes (<a class="bri" href="……"……)
t_list = bs.select("head>title")# Find through sub tags (<head>……<title>……)
t_list = bs.select(".mnav ~ .bri")# Brother node
print(t_list[0].get.text())# Take the first text element
边栏推荐
- Skills of CCF question 2
- R language quantile autoregressive QAR analysis pain index: time series of unemployment rate and inflation rate
- Leaflet modify popup style
- laf. JS - open source cloud development framework (readme.md)
- DICOM to NII
- Intra domain information collection for intranet penetration
- Besides using hackbar, how can I make post requests
- What is the core journal of Peking University? An article will help you understand it thoroughly
- How to close gracefully after using jedis
- Attention to government and enterprise users! The worm prometei is spreading horizontally against the local area network
猜你喜欢

Yolov4 reading notes (with mind map)! YOLOv4: Optimal Speed and Accuracy of Object Detection
Detailed explanation of unified monitoring function of multi cloud virtual machine
Interviewer: why does TCP shake hands three times and break up four times? Most people can't answer!
![[harmonyos] [arkui] how can Hongmeng ETS call pa](/img/19/9d2c68be48417e0aaa0d27068a67ce.jpg)
[harmonyos] [arkui] how can Hongmeng ETS call pa
Online yaml to XML tool

Node installation method you don't know
Literals and type conversions of basic data types

Transunet reading notes
Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing

Log4j2 vulnerability battle case
随机推荐
SaaS privatization deployment scheme
R language quantile autoregressive QAR analysis pain index: time series of unemployment rate and inflation rate
Global netizens Yuanxiao created a picture of appreciating the moon together to experience the creativity of Baidu Wenxin big model aigc
MySQL lock
Robotium_ (clickbyid method)
Detailed explanation of unified monitoring function of multi cloud virtual machine
ZK implementation of distributed global counter for cursor application scenario analysis
Yanjiehua, editor in chief of Business Review: how to view the management trend of business in the future?
Solution to big noise of OBS screen recording software
Ensure the decentralization and availability of Oracle network
Expand and check the specified node when loading ztree
The live registration is hot to start | the first show of Apache dolphin scheduler meetup in 2022!
Section 13: simplify your code with Lombok
Pcl+vs2019+opencv environment configuration
Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part I)
Talking about the foundation of function test today
8 minutes to understand the wal mechanism of tdengine
Barrier of cursor application scenario
Teach you how to create and publish a packaged NPM component
Causes and solutions of unreliable JS timer execution