当前位置:网站首页>Selenium crawl notes
Selenium crawl notes
2022-06-24 20:36:00 【Yu Xu】
Import third-party library selenium.
import selenium
from selenium import webdriverDownload the corresponding browser driver :
edge:https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
chrome:https://code.google.com/p/chromedriver/downloads/list
firefox:https://github.com/mozilla/geckodriver/releases/
IE:NuGet Gallery | Selenium.WebDriver.IEDriver 4.0.0
After downloading, it is a compressed folder , Open folder , There's a webmsedgedriver.exe file , Copy this file to division C In a dish other than a dish , Then configure the path to the system environment of this computer .
The path of the configuration environment is “ This computer — Right click properties — About — Advanced system setup — senior — environment variable — System variables —path
take msedgedriver.exe The path of the file is configured , And then click OK .
# Create a browser object , I am here edge browser , If you are using chrome Browser words , there edge To be converted into chrome,firefox So it is with , The first letter should be capitalized !!
driver = webdriver.Edge()
driver.get('https://www.taobao.com/?spm=a21bo.jianhua.201857.1.5af911d9NTiGPH')
# Page maximization
driver.maximize_window()Run it here , Find out driver = webdriver.Edge() There is an error .
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the specified file .
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\learn\ test .py", line 4, in <module>
driver = webdriver.Edge()
File "D:\ Study \pycharm practice \learn\lib\site-packages\selenium\webdriver\edge\webdriver.py", line 62, in __init__
super(WebDriver, self).__init__(DesiredCapabilities.EDGE['browserName'], "ms",
File "D:\ Study \pycharm practice \learn\lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 90, in __init__
self.service.start()
File "D:\ Study \pycharm practice \learn\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'msedgedriver' executable needs to be in PATH. Please download from https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Here it is said that the driver needs to be in the configuration , But I thought I had configured the path , How to configure , Later I found out , The original path is given to in the form of an object webdriver.Edge() In this way .
So the code has to be changed to this .
# Of course, here it is edge You have to change it to your own browser name , Lowercase is OK
from selenium.webdriver.edge.service import Service
# use Service() Method to give a path to a variable s, Regular expressions are used here
s = Service(r'D:\msedgedriver.exe')
# there service yes Edge Parameters in methods , The specific usage can be selected with the mouse Edge, Then press and hold ctrl, Then click with the left mouse button , The corresponding method file will pop up
driver = webdriver.Edge(service=s)
driver.get('https://www.taobao.com/?spm=a21bo.jianhua.201857.1.5af911d9NTiGPH')
# Page maximization
driver.maximize_window()Then run the code , Taobao will pop up , There's a point here , When code and people browse the web, there will be different situations :
1、 If people come to visit the web , Search in search , Select items , Until the purchase is finalized , The interface pop-up window for logging in to the user account will pop up ;
2、 If it is the code to manipulate the driver to browse the web , Then you will enter the set product in the search column , Pop up the pop-up window of the login interface directly .
Let's first write the code of the content to search .
Here is another content :
General is to use find_element_by_xpath() To get web page elements , It turned out to be mine pycharm But on the bottom
# Here we need to use a different method , Add a... To it from selenium.webdriver.common.by import By
# It is not recommended to use find_element_by_xpath(), Please use find_element() Methods to replace
find_element_by_* commands are deprecated. Please use find_element() instead
# That is to say find_elemnet_by_xpath() == find_element(By.XAPTH, ‘ The element you are looking for ')This is used here. xpath Method to get the web page elements of the search box , Then set the random delay of the web page 1 To 3 second .
import random
driver.find_element(By.XPATH, '//*[@id="J_TSearchForm"]/div[1]/button').click()
time.sleep(random.randint(1, 3))Then get the search button , Also set random delay 1 To 3 second .
边栏推荐
- Leetcode (135) - distribute candy
- Internet of things? Come and see Arduino on the cloud
- 情绪识别AI竟「心怀鬼胎」,微软决定封杀它!
- Fuzzy background of unity (take you to appreciate the hazy beauty of women)
- Get to know the data structure of redis - hash
- Design of routing service for multi Activity Architecture Design
- Predicate
- Bean lifecycle flowchart
- 二叉树的基本性质与遍历
- 伯克利、MIT、劍橋、DeepMind等業內大佬線上講座:邁向安全可靠可控的AI
猜你喜欢

《梦华录》“超点”,鹅被骂冤吗?
![[普通物理] 光栅衍射](/img/f3/965ff7cd3bb76b4f71b69b9d12ece3.png)
[普通物理] 光栅衍射

苹果不差钱,但做内容“没底气”

Berkeley, MIT, Cambridge, deepmind and other industry leaders' online lectures: towards safe, reliable and controllable AI

The four stages of cloud computing development have finally been clarified

The Network Security Review Office launched a network security review on HowNet, saying that it "has a large amount of important data and sensitive information"

Set up your own website (14)

Apple doesn't need money, but it has no confidence in its content
思源笔记工具栏中的按钮名称变成了 undefined,有人遇到过吗?

海泰前沿技术|隐私计算技术在医疗数据保护中的应用
随机推荐
Sequence stack version 1.0
Builder mode -- Master asked me to refine pills
顺序表的基本操作
在Dialog中使用透明的【X】叉叉按钮图片
Wait for the victory of the party! After mining ebb tide, graphics card prices plummeted across the board
[performance tuning basics] performance tuning standards
Apple doesn't need money, but it has no confidence in its content
Stackoverflow 年度报告 2022:开发者最喜爱的数据库是什么?
lol手游之任务进度条精准计算
How to enhance influence
云计算发展的 4 个阶段,终于有人讲明白了
The first public available pytorch version alphafold2 is reproduced, and Columbia University is open source openfold, with more than 1000 stars
Leetcode(135)——分发糖果
别再用 System.currentTimeMillis() 统计耗时了,太 Low,StopWatch 好用到爆!
C langage pour le déminage (version simplifiée)
【云驻共创】ModelBox隔空作画 绘制你的专属画作
Vant component used in wechat applet
C語言實現掃雷(簡易版)
畅直播|针对直播痛点的关键技术解析
Hosting service and SASE, enjoy the integration of network and security | phase I review