当前位置:网站首页>Learn to crawl steadily 08 - detailed explanation of the use method of selenium

Learn to crawl steadily 08 - detailed explanation of the use method of selenium

2022-06-22 01:58:00 Smart Aries

1 Set up the environment

1.1 install selenium

pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple

1.2 Install browser driver

First , Download browser driver :https://npm.taobao.org/mirrors/chromedriver, Then extract the browser driver chromedriver Put it in python The folder where the interpreter is located (python The folder where the interpreter is located :cmd-----where python) in .

2 Example

#  Give Way  selenium  Launch Google browser 
from selenium import webdriver 
# from selenium.webdriver import Chrome
# 1、 Create a browser object 
web = webdriver.Chrome() 
# web = Chrome()
#2、 Open a web address 
web.get("http://www.baidu.com")

3 Set up headless browser

from selenium.webdriver import Chrome
# Prepare the parameter configuration 
opt  = Options()
opt.add_argument('--headless')
opt.add_argument('--disable-gpu')

web = Chrome(options = opt) # Set the parameter configuration to the browser , It becomes a headless browser 
web.get("http://www.baidu.com")

4 Switch windows

web.get("xxxxxxxx")
# stay selenium in , New windows do not switch by default 
# Switch windows 
web.switch_to.window(driver.window_handles[-1])
# Close subwindow 
web.close()
# change selenium Window perspective , Go back to the original window 
web.switch_to.window(driver.window_handles[0])

5 iframe Content acquisition in

# If you encounter iframe, You need to get iframe, And then switch to iframe visual angle , Then you can get the data 
iframe = web.get('https://www.91kanju.com/vod-play/541-2-1.html')
web.switch_to.frame(iframe)
# Switch back to the original page 
web.switch_to.default_content()

6 The solution for the program to be identified

6.1 chrome The version number of is less than 88

# When starting the browser ( No web content is loaded at this time ), Embed... Into the page js Code , Get rid of webdriver
web = Chrome()

web.execut_cdp_cmd("Page.asddScriptToEvaluateOnNewDocument"),{
    
	"source":""" navigator.webdriver = undefined Object.defineProperty(navigator,'webdriver',{ get:() => undefined }) """
}
web.get(xxxxxx)

6.2 chrome The version number of is greater than 88

option = Options()
#  Write but not write 
#option.add_experimental_option('excludeSwitches',['enable-automatioin'])
option.add_argument('--disable-black-features = AutomationControlled')

web = Chrome(option = option)
web.get(xxxxxxx)
原网站

版权声明
本文为[Smart Aries]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206220134337015.html