当前位置:网站首页>获取同程(艺龙)酒店数据
获取同程(艺龙)酒店数据
2022-07-24 17:31:00 【tslilove】
声明
本文章中所有内容仅供学习交流,抓包内容、敏感网址、数据接口均已做脱敏处理,严禁用于商业用途和非法用途,否则由此产生的一切后果均与作者无关,若有侵权,请联系我立即删除!
目标主页:aHR0cHM6Ly93d3cubHkuY29tLw==
接口:aHR0cHM6Ly93d3cubHkuY29tL3RhcGkvdjIvbGlzdA==
这里选择①,②,③三个设置,你们如果有其他需求可以相应的勾选

图中标注位置为采集对象

我们点击检索,然后按F12,点击下一页,通过抓包,我们发现是通过JSON格式数据返回的,并且一一对应

我们再看看请求参数如何,图中标记的3点需要简单的设置一下,其他均可固定

到这里很多人看了以后,这不就是有手就行么,然后把请求头和对应的参数拿来,就发起请求,发现数据是重复的,那怎么办呢?难道被反爬了?怀着迟疑的态度,我们继续往下看

我们再点击下一页看看,发现出来一个新的请求,然后对比了两个请求头,发现有一个参数值是动态变化的,没错,就是traceid,每一次都会变化,那怎么办呢?
经过本人调试,这个站点很感觉没有做什么隐藏,我们直接搜索

然后会来到这么一个地方

打上短点,点击下一页,然后下一步,在控制台输出一下,这不就是我们想要的的么

然后单步进去,看看是如何构造的,就是如下图,函数为w的一个方法得到

function w() {
for (var t = [], e = "0123456789abcdef", n = 0; n < 36; n++)
t[n] = e.substr(Math.floor(16 * Math.random()), 1);
t[14] = "4",
t[19] = e.substr(3 & t[19] | 8, 1),
t[8] = t[13] = t[18] = t[23] = "-";
var i = t.join("");
return i
}
利用python还原
import requests
import numpy as np
import time
def getData(page):
url = "https://www.ly.com/tapi/v2/list"
data = {
"city": "53",
"inDate": "2022-07-21",
"outDate": "2022-07-22",
"filterList": "8888_1",
"pageIndex": str(page),
"pageSize": "20",
"sugActInfo": "",
"traceToken":"|*|cityId:101|*|qId:60f5dd2a-47d4-426a-923a-658f0d156bf3|*|st:city|*|sId:101|*|scene_ids:0|*|bkt:r1|*|"
}
headers = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, br",
"appfrom": "16",
"cluster": "idc",
"Connection": "keep-alive",
"Cookie": "firsttime=1654522087825; H5CookieId=3e7c8fb2-6aa7-4979-a178-16f22a135dd0; NewProvinceId=3; NCid=53; NewProvinceName=%E5%8C%97%E4%BA%AC; NCName=%E5%8C%97%E4%BA%AC; Hm_lvt_64941895c0a12a3bdeb5b07863a52466=1658303906; Hm_lpvt_64941895c0a12a3bdeb5b07863a52466=1658303906; 17uCNRefId=RefId=14211945&SEFrom=bing&SEKeyWords=; TicketSEInfo=RefId=14211945&SEFrom=bing&SEKeyWords=; CNSEInfo=RefId=14211945&tcbdkeyid=&SEFrom=bing&SEKeyWords=&RefUrl=https%3A%2F%2Fcn.bing.com%2F; qdid=35297|1|14211945|dd62ba; route=9e4269ab1c446976d6f19828bedd499a; __tctmc=144323752.205791637; __tctmd=144323752.254392154; __tctma=144323752.1654522083234181.1654522083536.1654522083536.1658303905033.2; __tctmb=144323752.592073071986482.1658303905033.1658303905033.1; __tctmu=144323752.0.0; __tctmz=144323752.1658303905033.2.1.utmccn=(referral)|utmcsr=bing.com|utmcct=|utmcmd=referral; longKey=1654522083234181; __tctrack=0; Hm_lvt_c6a93e2a75a5b1ef9fb5d4553a2226e5=1658303908; Hm_lpvt_c6a93e2a75a5b1ef9fb5d4553a2226e5=1658303908; businessLine=hotel; H5Channel=mnoreferseo%2CSEO; indate=2022-07-20; outdate=2022-07-21; lasttime=1658303912283; JSESSIONID=0FC5AA1FB2953B247344B7B588966116",
"deviceid": "3e7c8fb2-6aa7-4979-a178-16f22a135dd0",
"Host": "www.ly.com",
"Referer": "https://www.ly.com/hotel/hotellist?city=53&inDate=2022-07-20&outDate=2022-07-21&filterList=8888_1&pageSize=20&t=1658303927222",
"Tmapi-Client": "tpc",
"traceid": getW(),#ctx.call("w"),
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.62"
}
# print(ctx.call("w"))
resq = requests.get(url,params=data,headers=headers)
print(resq.json())
def run():
for page in range(1,5):
print(f"正在获取{page}页")
getData(page)
time.sleep(3)
if __name__ == '__main__':
run()
发起请求,获取前4页看看

唉呀妈呀,真香啊!!!
到这里就结束了,希望对大家有所帮助,有问题交流!

边栏推荐
- 键盘输入操作
- 20 -- validate palindrome string
- Stop littering configuration files everywhere! Try our 7-year-old solution, which is stable
- Shardingsphere database read / write separation
- 实习报告1——人脸三维重建方法
- 2022 Yangtze River Delta industrial automation exhibition will be held in Nanjing International Exhibition Center in October
- TCP protocol debugging tool tcpengine v1.3.0 tutorial
- C语言自定义类型讲解 — 联合体
- Make good use of these seven tips in code review, and it is easy to establish your opposition alliance
- C语言自定义类型 — 枚举
猜你喜欢

Baidu PaddlePaddle easydl x wesken: see how to install the "eye of AI" in bearing quality inspection

Eth POS 2.0 stacking test network pledge process

Analyze the capabilities and scenarios of Apache pulsar, a cloud native message flow system

Atcoder beginer 202 e - count descendants (heuristic merge on heavy chain split tree for offline query)

Method of querying comma separated strings in a field by MySQL

Tensorflow introductory tutorial (38) -- V2 net

地表最强程序员装备“三件套”,你知道是什么吗?

Separation and merging of channels

Tensorflow introductory tutorial (37) -- DC Vnet

C语言自定义类型讲解 — 联合体
随机推荐
How to remove the top picture of the bubble skin article details of solo blog
数论整除分块讲解 例题:2021陕西省赛C
Array learning navigation
Memory allocation and recycling strategy
JS & TS learning summary
Demonstration experiment of scrollbar for adjusting image brightness
Mobile robot (IV) four axis aircraft
Logical operation of image pixels
[matlab]: basic knowledge learning
Atcoder beginer 202 e - count descendants (heuristic merge on heavy chain split tree for offline query)
One article of quantitative framework backtrader: understand indicator indicators
Pat a - correct spelling
Baidu PaddlePaddle easydl x wesken: see how to install the "eye of AI" in bearing quality inspection
Stop littering configuration files everywhere! Try our 7-year-old solution, which is stable
电脑监控是真的吗?4个实验一探究竟
地表最强程序员装备“三件套”,你知道是什么吗?
调整图像亮度的滚动条演示实验
Coldplay weekly issue 10
portmap 端口转发
Transformer structure analysis -- learning notes