当前位置:网站首页>Lesson 4 beautifulsoup
Lesson 4 beautifulsoup
2022-06-25 20:43:00 【Osmanthus rice wine balls】
The fourth lesson BeautifulSoup
One 、 summary
1. effect : Get the specified content of the web page
2. quote
from bs4 import BeautifulSoup
Two 、 operation
1.
from bs4 import BeautifulSoup
file = open("./baidu.html","rb")#baidu.html It is a file that contains the source code of the web page that has been crawled and coexisted
html = file.read()# Web source code
bs = BeautifulSoup(html,"html.parser")# analysis html, The parser is html.parser, Resolve to tree structure
print(bs.a)
print(bs.title)#tag Print out labels tag And its contents : Only the first one I found
print(bs.title.string)
print(type(bs.title.string))#NavigableString Print out the contents of the label ( character string )
print(bs.a.attrs)# Get all the attributes in the tag
print(bs)# The content of the whole document
print(bs.a.string)#Comment It's a special one NavigableString, The output does not contain annotation symbols
2. Document traversal :
print(bs.head.contents)# take tag( Here tag yes head) The child nodes of are output as a list
print(bs.head.contents[1])# Use the list to get its number 1 Elements
notes : Details can be found on : Traversal file tree
3. Search for documents :
# String traversal
t_list = bs.find_all("a")# Find all a Hyperlinks to tags
print(t_list)
# Regular expressions
t_list=bs.find_all(re.compile("a"))# Find out what contains a All links under the letter label
print(t_list)
# Pass in a function , Search according to function requirements
def name_is_exists(tag)
return tag.has_attr("name")
t_list = bs.find_all(name_is_exists)
print(t_list)
# The specified parameters are searched by line kwargs
t_list=bs.find_all(id="head")
t_list=bs.find_all(class_=True)
t_list=bs.find_all(href="http://……")
# Find the text that contains the response
t_list=bs.find_all(text = " Map ")# Find out what contains “ Map ” The text of
t_list=bs.find_all(text = [" Map "," tieba "])
t_list=bs.find_all(text=re.compile("\d"))# Use regular expressions to find all text contents with numbers ( The string in the tag
#limit Parameters
t_list = bs.find_all("a",limit=3)# Get three tags as a Documents
#css Selectors
t_list = bs.select("title")# Search through tags
t_list = bs.select(".mnav")# Find... By class name (class="mnav")
t_list = bs.select("#u1")# according to id Search for (id="u1")
t_list = bs.select("a[class='bri']")# Find... By attributes (<a class="bri" href="……"……)
t_list = bs.select("head>title")# Find through sub tags (<head>……<title>……)
t_list = bs.select(".mnav ~ .bri")# Brother node
print(t_list[0].get.text())# Take the first text element
边栏推荐
- Lantern Festival, learning at the right time! Novice training camp attacks again, learning buff continues to fill up
- III Implementation principle of vector
- Those high-frequency and real software test interview questions sorted out by the test director in 7 days, come to get
- laf. JS - open source cloud development framework (readme.md)
- How to close gracefully after using jedis
- What is an app circle of friends advertisement
- After 20 days' interview, I finally joined Ali (share the interview process)
- The latest promo! 1 minute to understand the charm of the next generation data platform
- Uncover n core 'black magic' of Presto + alluxio
- 2021-08-25
猜你喜欢
Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing

Redis core principle and design idea

Pcl+vs2019 configuration and some source code test cases and demos

206. reverse linked list (insert, iteration and recursion)

Interface automation -md5 password encryption
MySQL lock

Cvpr2020 | the latest cvpr2020 papers are the first to see, with all download links attached!

Cvpr2019 | the latest cvpr2019 papers are the first to read!

Short video is just the time. How can you quickly build your video creation ability in your app?
Causes and solutions of unreliable JS timer execution
随机推荐
About eruake learning
Cvpr2019 | the latest cvpr2019 papers are the first to read!
5 minutes to learn how to install MySQL
Record some questions about MySQL (DNS reverse resolution in Linux)
Instant aesthetics of the Centennial Olympic Games: beauty in the air, condensed in minutes and seconds - Alibaba cloud video cloud AI editorial department "cloud smart scissors"
Barrier of cursor application scenario
Share a billing system (website) I have developed
K-fold cross validation
Chrome plugin installation
6. exception handling
What are the differences between domestic advanced anti DDoS servers and overseas advanced anti DDoS servers?
NMS reduction box
Cloud native 04: use envoy + open policy agent as the pre agent
2020-11-14-Alexnet
How to play one to many in JPA?
[golang] leetcode intermediate - the kth largest element in the array &
Online yaml to XML tool
MySQL lock
The last core step of configuring theano GPU
How can the intelligent transformation path of manufacturing enterprises be broken due to talent shortage and high cost?