当前位置:网站首页>Turning and anti-climbing attack and defense
Turning and anti-climbing attack and defense
2022-08-02 10:11:00 【InfoQ】
一、背景
二、现状
- 规律性
- 高频次
- 正面进攻:This pattern is characterized by a large number of requests,Rough camouflage,Focusing on an interface,By the number and diversity to win.
- espionage operation:Less but more regular request,They will try to disguised as a real user,Mimic the behavior of users,To obtain the key interface more core information.
- 数据安全性: 作为电商平台,The key commodity information,And user information once crawl,Most likely resulting in the loss of goods、用户信息泄露、Even telecommunications fraud and a series of safety problems.
- 大数据异常: Statistics such asdau,pv,uvEtc are dependent on the interface request log every day,Once these log records the data of real users creeper,You will lose statistical effect.
- 服务稳定性: Due to the above mentioned the first attack mode,The crawler will be a large number of requests,Some even close to yu hong pan attack,This will greatly increase the load on the server,If there are new activities at the same time,Can cause traffic soared,Leading to system paralysis.
- Personal failure: appAccording to provide all the user's search content to the general user search keywords,The crawler if intrusive search interface,And a large number of malicious keyword search,Is expected to push for normal user keywords less accurate,从而影响用户体验.
三、前人研究
- 登录限制: Requires the user to request interface must login,This approach can largely increase the cost of the crawler,But more arbitrary,In the key nodes are easy to affect the user experience.
- cookie校验: 用户请求cookieCan carry some used to identify the identity of the data in the,These data have their own a set of generating rules,Support the validity check,对这些数据进行验证,Can identify whether the real user.But when the crawler cracked generate rules,Or use of real userscookie进行请求时,This method cannot be effective defense.
- Frequency check: Use the common features of each request(由于ip的成本最高,所以一般选择ip)frequency statistics,To decide whether the characteristics of banned for.Due to the crawler regularity,高频次的特点,This method can effectively prevent a lot of the crawler request,But it also brings another problem,同一ipSometimes more than one user in with,May be injured user.而不选择ip,Select other dimensions,Fake and low cost.
- 验证码校验: 同一ipMany times or user request reaches a certain threshold,Requires the user to enter the verification code,Verification code there are many kinds of,如文字、图形、滑动等.The sliding graphical verification code works best,Because of the high cost of image recognition,But need front end with,But also will affect the user experience.
- 数据加密: The front of the request data is encrypted calculate,And the encrypted value as a parameter to the server,在服务器端同样有一段加密逻辑,生成一串编码,With the request parameters matching,匹配通过则会返回数据.This method still need the client to participate in,And the encryption algorithm expressly written in the bookJS里,The crawler or can be analyzed out.
四、CleanerReptile cleaner
- 准确性:Have the ability to grasp the crawler,To avoid friendly fire again.
- 实时性:秒级别的响应,If catch all reptiles such as the data,满载而归了,Again it's meaningless to ban.
- 正面进攻:合法性校验,频次控制.
- espionage operation:用户行为分析.
4.1 系统模型

4.1.1 数据处理中心

4.1.2 Block Center

- The only user identity legitimacy: Due to the formation of the user's unique identification has a certain rule,Around is no exception,Naturally we can use these rules to determine whether a request for the real user.A large number of illegal request don't need other judgment,Use only generate rules can block,This strategy is mainly used to stop the frontal attack.
- 频次: 同样地,This strategy is mainly used to stop the frontal attack.But it is used to make up for the inadequacy of identity legitimacy strategies,When the crawler using real user's identity to a large number of requests,We can use them the characteristics of high frequency,Setting the threshold value for a particular interface frequency,When the request more than limit,To specify a user characteristics were banned.
- El表达式: The above two strategies can significantly weaken the influence of the frontal attack,But for spying operation is almost powerless,Because when some cunning crawler repeatedly frustrated,Ascertained the climbing strategy,after the frequency threshold.They will choose to imitate the behavior of real users to request,Give up a short time to get a lot of information fantasy,Turning to though time is long, can access to the complete data.这时ELExpressions are used,The core can be banned users a request list,CleanerCan analyze the user behavior in this request,Look at them any deviation from the normal user request,At this time of the crawler although did not have the characteristics of high frequency,但周期性,The characteristics of regularity,Is the crawler sin,They will never be able to avoid.According to some different interface request order,Frequency ratio of different characteristics such as whether the user can be deduced illegal.
- 封禁记录: 辅助策略,To hit the banned user dimension into the database,As an indicator of determine whether banned in the future.
- 黑白名单: Specify the characteristics of skip or forced banned,Support manually add,In case of system failure or disorder.
- Access to other banned library:辅助策略,Combined with other business banned information,Improve the judgment result.
4.1.3 封禁库
4.2 效果

五、总结
- 基本特性:实时性,准确性.
- 基本功能:合法性校验,频次控制,用户行为分析.
- 基本模块:大数据处理中心,Banned strategy center.
- 基本策略:Legitimacy check strategy,frequency strategy,ELExpression strategy,黑白名单策略.
- Two dynamics:The perfection of the climbing strategy、插拔;The adjustment of the scoring standard.
- a balance:The crawler and the game process between the crawler is a long,Both will eventually reach a state of balance,In the face of the crawler continuous rebound,We can do is to continue to monitor,suppress quickly.
边栏推荐
- matlab-day02
- 周鸿祎称微软抄袭 360 安全模式后发文否认;英特尔CEO基辛格回应市值被AMD超越:股价下跌是咎由自取|极客头条...
- The perceptron perceptron of Li Hang's "Statistical Learning Methods" notes
- sqlmap安装教程用w+r打开(sqlyog安装步骤)
- 第十六章 协程
- Using the TCP protocol, will there be no packet loss?
- MySql千万级分页优化,快速插入千万数据方法
- 【新版干货书】深度伪造 (DeepFakes):创造,检测和影响
- Rear tube implements breadcrumb function
- DVWA Clearance Log 2 - Command Injection
猜你喜欢

【云原生】快出数量级的性能是怎样炼成的?就提升了亿点点

软件工程国考总结——选择题

如何选择一块真正“好用的、性能高”的远程控制软件

The perceptron perceptron of Li Hang's "Statistical Learning Methods" notes

Application scenarios of js anti-shake function and function throttling

如何搭建威纶通触摸屏与S7-200smart之间无线PPI通信?

软件测试之发现和解决bug

MySql tens of millions of paging optimization, fast insertion method of tens of millions of data

You Only Hypothesize Once: 用旋转等变描述子估计变换做点云配准(已开源)

重磅大咖来袭!阿里云生命科学与智能计算峰会精彩内容剧透
随机推荐
The realization of the list
要长续航还是更安全?海豹与深蓝SL03对比导购
TimerTask(addin timer语音)
Geoffery Hinton:深度学习的下一个大事件
Spearman's correlation coefficient
后管实现面包屑功能
Getting Started with SCM from Scratch (1): Summary of Background Knowledge
QT专题:组合会话框和文本编辑器
iNFTnews | 看见元宇宙的两面,何谓全真互联网和价值互联网?
如何选择一块真正“好用的、性能高”的远程控制软件
Re22:读论文 HetSANN An Attention-based Graph Neural Network for Heterogeneous Structural Learning
一文带你了解推荐系统常用模型及框架
Use compilation to realize special effects of love
【技术分享】OSPFv3基本原理
Chapter 15 Generics
R语言ggplot2可视化:使用ggpubr包的ggbarplot函数可视化水平柱状图(条形图)、使用orientation参数设置柱状图转置为条形图
currentstyle 织梦_dede currentstyle属性完美解决方案
8月份的.NET Conf 活动 专注于 .NET MAUI
软件工程国考总结——选择题
Implementation of mysql connection pool