当前位置:网站首页>Nanny level anti crawling teaching, JS reverse implementation of font anti crawling
Nanny level anti crawling teaching, JS reverse implementation of font anti crawling
2022-06-23 21:56:00 【Charlie is not a dog】
Hello everyone , I'm Charlie ~
There are many anti climbing measures for websites , for example :js Anti creeping 、ip Anti creeping 、css Anti creeping 、 Font reverse crawling 、 Captcha anti climbing 、 Sliding click class verification, anti climbing, etc , Today, we're going to learn about font anti climbing by crawling through a recruitment .
Today's website
Xiaobian has been encrypted :aHR0cHM6Ly93d3cuc2hpeGlzZW5nLmNvbS8= For safety reasons , We put the website through base64 It's encoded , You can go through base64 Decode and get the URL .
Font reverse crawling
Font reverse crawling : A common anti climbing technique , It is an anti crawling strategy completed by the cooperation of web pages and front-end font files , The earliest use of font anti crawling technology is 58 Same city 、 Car house, etc , Now many mainstream websites or APP Also use font anti - crawling technology for your own website or APP Add an anti climbing measure .
Font anti climbing principle : Replace some data in the page with a custom font , When we don't use the correct decoding method, we can't get the correct data content .
stay HTML Pass through @font-face To use custom Fonts , As shown in the figure below :
Its grammatical form is :
@font-face{
font-family:" name ";
src:url(' Font file link ');
url(' Font file link ')format(' file type ')
}Font files are generally ttf type 、eot type 、woff type ,woff Types of documents are widely used , So what people usually encounter is woff Files of type .
With woff Type file as an example , What is its content , What coding method makes the data correspond to the code one by one ?
Let's take the font file of a recruitment website as an example , Enter Baidu font compiler and open the font file , As shown in the figure below :
Open a font randomly , As shown in the figure below :
You can find the font 6 Put it in a plane coordinate , Get the font according to each point of the plane coordinates 6 The coding , I won't explain how to get the font 6 It's coded .
How to solve the problem of font anti crawling ?
First, the mapping relationship can be seen as a dictionary , There are roughly two common methods :
The first one is : Manually extract the corresponding relationship between a group of codes and characters and display them in the form of a dictionary , The code is as follows :
replace_dict={
'0xf7ce':'1',
'0xf324':'2',
'0xf23e':'3',
.......
'0xfe43':'n',
}
for key in replace_dict:
data = data .replace(key,replace_dict[key])First, define a dictionary corresponding to the font and its corresponding code , Re pass for The loop replaces the data one by one .
Be careful : This method is mainly suitable for data with less font mapping .
The second kind : First download the font file of the website , Then convert the font file to XML file , Find the font mapping code inside , adopt decode Function decoding , Then combine the decoded code into a dictionary , Then replace the data one by one according to the contents of the dictionary , Due to the long code , No sample code will be written here , The code of this method will be shown in the actual combat drill later .
Okay , That's all for font anti climbing , Next, we officially climb a recruitment website .
Practical drill
Custom font file lookup
First, enter a recruitment website and open the developer mode , As shown in the figure below :
Here we see that only new words in the code can't function normally , It's used to replace... With code , It is preliminarily determined that the user-defined font file is used , Then you will find the font file , So where can I find the font file , First, open developer mode , And click the Network Options , As shown in the figure below :
In general , The font file is placed in Font Card selection in progress , We found that there are 5 Entries , So which is the entry of the custom font file , Every time we click on the next page , The custom font file will execute once , At this time, we just need to click on the next page in the web page , As shown in the figure below :
You can see one more to file The entry at the beginning , At this time, it can be preliminarily determined that the file is a user-defined font file , Now let's download it , The download method is very simple , Only need to file The beginning of the entry URL Copy and open it on the web page , Download it and open it in Baidu font compiler , As shown in the figure below :
At this time, I found that I couldn't open , Is there a wrong font file , The website suggests that this file type is not supported , Then we change the suffix of the downloaded file to .woff Try opening it , As shown in the figure below :
At this time, it was successfully opened .
Font mapping relationship
Found the custom font file , So how do we use it ? At this time, we first customize the method get_fontfile() To handle custom font files , Then, the mapping relationship in the font file is displayed through a dictionary in two steps .
- Font file download and conversion ;
- Font mapping decoding .
Font file download and conversion
First of all, the update frequency of custom font files is very high , At this time, we can obtain the user-defined font file of the web page in real time to prevent the use of the previous user-defined font file, resulting in inaccurate data acquisition . First, look at the of the custom font file url link :
https://www.xxxxxx.com/interns/iconfonts/file?rand=0.2254193167485603 https://www.xxxxxx.com/interns/iconfonts/file?rand=0.4313944100724574 https://www.xxxxxx.com/interns/iconfonts/file?rand=0.3615862774301839
You can find the of custom font files URL Only rand This parameter changes , And it's a random sixteen bit less than 1 Floating point number , Then we just need to construct rand Parameters can be , The main codes are as follows :
def get_fontfile():
rand=round(random.uniform(0,1),17)
url=f'https://www.xxxxxx.com/interns/iconfonts/file?rand={rand}'
response=requests.get(url,headers=headers).content
with open('file.woff','wb')as f:
f.write(response)
font = TTFont('file.woff')
font.saveXML('file.xml')First, through random.uniform() Method to control the size of random numbers , Re pass round() Method to control the number of bits of random numbers , So you can get rand Value , Re pass .content hold URL The response content is converted to binary and written to file.woff In file , Through TTFont() Method to get the contents of the file , adopt saveXML Method to save the content as xml file .xml The content of the document is shown in the figure below :
Font decoding and presentation
The font .xml There is a total of 4589 OK, so much , Which part is the code part of the font mapping relationship ?
First, let's look back at the content of Baidu font encoder , As shown in the figure below :
The corresponding code of Chinese character person is f0e2, So let's just look at the font .xml Query person's code in the file , As shown in the figure below :
You can find a total of 4 results , But if you look closely, each result is the same , At this time, we can obtain the mapping relationship according to their code rules , Then, the corresponding data value is obtained by decoding , Finally, show in the form of a dictionary , The main codes are as follows :
with open('file.xml') as f:
xml = f.read()
keys = re.findall('<map code="(0x.*?)" name="uni.*?"/>', xml)
values = re.findall('<map code="0x.*?" name="uni(.*?)"/>', xml)
for i in range(len(values)):
if len(values[i]) < 4:
values[i] = ('\\u00' + values[i]).encode('utf-8').decode('unicode_escape')
else:
values[i] = ('\\u' + values[i]).encode('utf-8').decode('unicode_escape')
word_dict = dict(zip(keys, values))Read first file.xml The contents of the document , Find the... In the code code、name And set to keys key ,values value , Re pass for Loop handle values The value of is decoded into the data we want , Finally through zip() Methods are combined into a tuple and passed through dict() Method to dictionary data , The results are shown in the figure :
Obtain recruitment data
In the last step , We have successfully converted the font mapping relationship into dictionary data , Next, start making a network request to get the data , The main codes are as follows :
def get_data(dict,url):
response=requests.get(url,headers=headers).text.replace('&#','0')
for key in dict:
response=response.replace(key,dict[key])
XPATH=parsel.Selector(response)
datas=XPATH.xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div[1]/div[1]/div')
for i in datas:
data={
'workname':i.xpath('./div[1]/div[1]/p[1]/a/text()').extract_first(),
'link':i.xpath('./div[1]/div[1]/p[1]/a/@href').extract_first(),
'salary':i.xpath('./div[1]/div[1]/p[1]/span/text()').extract_first(),
'place':i.xpath('./div[1]/div[1]/p[2]/span[1]/text()').extract_first(),
'work_time':i.xpath('./div[1]/div[1]/p[2]/span[3]/text()').extract_first()+i.xpath('./div[1]/div[1]/p[2]/span[5]/text()').extract_first(),
'company_name':i.xpath('./div[1]/div[2]/p[1]/a/text()').extract_first(),
'Field_scale':i.xpath('./div[1]/div[2]/p[2]/span[1]/text()').extract_first()+i.xpath('./div[1]/div[2]/p[2]/span[3]/text()').extract_first(),
'advantage': ','.join(i.xpath('./div[2]/div[1]/span/text()').extract()),
'welfare':','.join(i.xpath('./div[2]/div[2]/span/text()').extract())
}
saving_data(list(data.values()))First, customize the method get_data() And receive dictionary data of font mapping relationship , Re pass for Loop to replace dictionary contents and data one by one , Finally through xpath() To extract the data we want , Finally, pass the data into our custom method saving_data() in .
Save the data
The data has been obtained , Next, you'll save the data , The main codes are as follows :
def saving_data(data):
db = pymysql.connect(host=host, user=user, password=passwd, port=port, db='recruit')
cursor = db.cursor()
sql = 'insert into recruit_data(work_name, link, salary, place, work_time,company_name,Field_scale,advantage,welfare) values(%s,%s,%s,%s,%s,%s,%s,%s,%s)'
try:
cursor.execute(sql,data)
db.commit()
except:
db.rollback()
db.close()Start the program
Okay , The program is almost written , Next, you'll write code to run the program , The main codes are as follows :
if __name__ == '__main__':
create_db()
get_fontfile()
for i in range(1,3):
url=f'https://www.xxxxxx.com/interns?page={i}&type=intern&salary=-0&city=%E5%85%A8%E5%9B%BD'
get_data(get_dict(),url)Result display
边栏推荐
- How do API gateways set up dynamic routing? What are the benefits of dynamic routing?
- CAD图在线Web测量工具代码实现(测量距离、面积、角度等)
- HR SaaS is finally on the rise
- How to make a label for an electric fan
- How to deal with high memory in API gateway how to maintain API gateway
- Minimisé lorsque Outlook est allumé + éteint
- Experiment 5 module, package and Library
- DM sub database and sub table DDL "optimistic coordination" mode introduction - tidb tool sharing
- Flink practical tutorial: advanced 4-window top n
- MySQL de duplication query only keeps one latest record
猜你喜欢

University of North China, Berkeley University of California, etc. | Domain Adaptive Text Classification with structural Knowledge from unlabeled data

Sending network request in wechat applet

发现一个大佬云集的宝藏硕博社群!

Error running PyUIC: Cannot start process, the working directory ‘-m PyQt5. uic. pyuic register. ui -o

Peking University, University of California Berkeley and others jointly | domain adaptive text classification with structured knowledge from unlabeled data (Domain Adaptive Text Classification Based o

Minimisé lorsque Outlook est allumé + éteint

Analysis of Alibaba cloud Tianchi competition -- prediction of o2o coupon

使用 Provider 改造屎一样的代码,代码量降低了2/3!

Selenium批量查询运动员技术等级

实验五 模块、包和库
随机推荐
One article to help you understand automatic injection
Experiment 5 module, package and Library
Intel openvino tool suite advanced course & experiment operation record and learning summary
Smart cockpit SOC competition upgraded, and domestic 7Nm chips ushered in an important breakthrough
How to build an API gateway and how to maintain an API gateway?
Open source C # WPF control library --newbeecoder UI usage guide (III)
MySQL de duplication query only keeps one latest record
How to deal with high memory in API gateway how to maintain API gateway
Is PMP necessary?
Find My资讯|苹果可能会推出第二代AirTag,试试伦茨科技Find My方案
Raid card with hardware knowledge (5)
CMU博士论文 | 通过记忆的元强化学习,118页pdf
Explain the rainbow ingress universal domain name resolution mechanism
[同源策略 - 跨域问题]
BenchCLAMP:评估语义分析语言模型的基准
蓝牙芯片|瑞萨和TI推出新蓝牙芯片,试试伦茨科技ST17H65蓝牙BLE5.2芯片
Outlook开机自启+关闭时最小化
How to provide value for banks through customer value Bi analysis
[JS 100 examples of reverse] anti climbing practice platform for net Luozhe question 5: console anti debugging
How to open a stock account? What are the main considerations for opening an account? Is there a security risk in opening an account online?