当前位置:网站首页>What is thermal data detection?
What is thermal data detection?
2022-06-24 16:36:00 【Programmer fish skin】
If the data should also be classified like garbage , What kind of heat data is it ?
Hello everyone , I'm fish skin , Today, I will share a bit of technical knowledge .
As we all know , Various websites 、 The operation of applications cannot be separated from the support of data , Especially for enterprises , Business data is its life .
But sometimes , Pile all the data into a lump 、 Unified processing may not meet our requirements for performance and storage space . therefore , We need to classify the data , To adapt to different business needs and application scenarios .
among , One way to divide data is to divide it into “ Thermal data ”、“ Cold data ”, And even “ Warm data ”!
Just like garbage sorting ~
Let's talk about what is thermal data first !
What is thermal data ?
seeing the name of a thing one thinks of its function , Thermal data means Very popular 、 Frequently visited The data of .
For example, the news on a hot list , There may be thousands of visits per second .
According to the characteristics of thermal data , It can be divided into two categories :
- There are expectations : It is expected that data will become popular , For example, in the big promotion activities with advance notice, the hot commodities endorsed by online celebrities , The double 11 Shopping Festival of a treasure is the best example .
- No expectation : Data access suddenly soared ! It may have been maliciously attacked by people 、 Web crawler , Or the content that is suddenly popular inadvertently . For example, a big news suddenly appeared , A wave of Weibo hasn't had time to do a good job of protection , It may explode .
In response to thermal data , Usually we choose caching technology , Taking data to K / V( Key value pair ) Is stored in memory in advance .
When we need to access cached data , Need to be based on a key character string , To find the corresponding value .
Frequently visited key, Also called heat key, heat key It's a broad concept , It's not just about caching systems , For example, the following are all hot key:
- A primary key that is frequently accessed in a database , For example, for popular applications appId
- K / V Caching systems that are frequently accessed key
- A malicious attack 、 Request information of robot brush , Like the user's userId、 machine IP etc.
- Frequently accessed interface address , Such as app Information Service /app/query
- Count how often a single user accesses an interface , Such as userId + /app/query
- Count the frequency of a machine accessing an interface , Such as IP + /app/query
- Count how often a user accesses specific content of an interface , Such as userId + /app/query + appId
After knowing what is thermal data , Let's talk about thermal data detection technology , namely “ Find the heat data ” Technology .
Why do you want to test thermal data ?
The reason we check the thermal data is very simple :
1. Lifting performance
If you use distributed caching , Network communication is still required when reading , There will be extra time overhead . If you can cache hot data locally in advance , Namely preheating , It can greatly improve the performance of the machine in reading data , Reduce the pressure on the lower level cache cluster .
Of course , This does not mean that all data should be stored locally . More cache levels , The more complex the update operation , The greater the risk of data inconsistency !
2. Risk aversion
For unexpected thermal data ( heat key), It may bring great risks to the business , Risks can be divided into two levels :
Risks to the data layer
Under normal circumstances ,Redis A single cache can support about 100000 QPS( Number of requests per second ), And the concurrency can be increased through the cluster . For systems with average concurrency , use Redis Caching is enough . But if there is a sudden burst of commodity data , Or receive a malicious request , For this data key The interview of QPS May soar to millions 、 Tens of millions ! In low version Redis Single thread working mode , This will cause normal requests to queue , Unable to respond in time , In severe cases, the entire fragmented cluster will be paralyzed .
There's another situation , A hot spot key Suddenly expired , It will lead to a large number of requests directly crashing into the fragile database , Cause the database to hang up !
Risks to application services
Each application can accept and process a limited number of requests per unit time , If attacked by a malicious request , Let malicious users occupy a lot of request processing resources alone , It will cause other normal users who are harmless to humans and animals to fail to respond in time .
therefore , Need a dynamic thermal key Detection mechanism , When unexpected hot data appears , The first time I found him , And carry out special processing for these data . Such as local cache 、 Deny malicious users 、 Interface current limiting / Degradation etc. . Avoid possible risks while improving data access performance .
So how to detect thermal data ?
How to detect thermal data ?
First , We need to give “ heat ” Define a threshold or rule , How hot is it ?
It can be defined according to experience value , It can also be defined according to the average heat of the system data , such as 1 Seconds access 1000 The secondary data is thermal data .
For stand-alone applications , Detecting thermal data is simple , Directly locally for each key Create a sliding window counter , Count the total number of visits per unit time ( frequency ), And store the detected heat through a collection key.
For distributed applications , Antipyretic key The access of is distributed on different machines , Cannot compute independently locally , therefore , Need an independent 、 Centralized heat key Computing unit .
thus , Thermal data detection can be divided into configuration rules 、 heat key Report 、 heat key Statistics 、 heat key Push four steps :
- Configuration rules : Specify heat key Reporting conditions for , Circle the items that need to be monitored key
- heat key Report : Each machine will have its own key The access status is reported to the centralized computing unit
- heat key Statistics : Collect the information reported by each application instance , Use the sliding window algorithm to calculate key The heat of the
- heat key push : When key When the heat reaches the set value , Push heat key Information to all application instances , Each application instance will key Values are cached locally .
Go through the above steps , A basic set of hot key The detection mechanism is completed . However, thermal data detection systems often face complex business scenarios , There are other issues to consider , such as key Failure treatment, etc .
To meet high concurrency scenarios , In design heat key When detecting the frame , It should also focus on the following indicators :
- The real time : Considering the heat key The suddenness of ( Maybe even 1 millisecond ), Must be able to detect heat in real time key And push
- High performance : The frame shall remain lightweight and high performance , Effectively reduce costs
- accuracy : Accurately detect the heat that conforms to the rules key, No missing report 、 No false alarm
- Uniformity : Ensure the hot connection between the application instance and the local cache key Agreement , No data errors
- Scalable : To be counted key When the order of magnitude is very large , The centralized computing cluster can be expanded horizontally
Besides , Excellent heat key The detection framework shall also meet the requirements of easy access 、 There is no invasion of business 、 It can be configured dynamically 、 Rule hot update 、 Visual management and other features .
Last , Students who want to learn more can take a look at the popularity of JD open source key Detection frame JD-hotkey And those who like open source TMC, Their designs are very clever .
I have written an analysis of these two frameworks before , There will be a chance to sort it out later .
边栏推荐
- What is cloud development? Why cloud development? Talk about our story
- What is the difference between get and post? After reading it, you won't be confused and forced, and you won't have to fight with your friends anymore
- Abnormal dockgeddon causes CPU 100%
- Ps\ai and other design software pondering notes
- Several characteristics of pharmaceutical industry
- How to pop up an alarm through the national standard gb28181 protocol video platform easygbs for mobile detection / perimeter intrusion detection video recording
- #夏日挑战赛# HarmonyOS - 实现带日期效果的待办事项
- Kubernetes popular series: getting started with container Foundation
- Inter thread communication of embedded development foundation
- CDs view permission check
猜你喜欢
B. Ternary Sequence(思维+贪心)Codeforces Round #665 (Div. 2)
There are potential safety hazards Land Rover recalls some hybrid vehicles
Ui- first lesson
C. Three displays(动态规划)Codeforces Round #485 (Div. 2)
C. Three displays codeforces round 485 (Div. 2)
[go] concurrent programming channel
Applet - use of template
Cognition and difference of service number, subscription number, applet and enterprise number (enterprise wechat)
A survey on dynamic neural networks for natural language processing, University of California
B. Terry sequence (thinking + greed) codeforces round 665 (Div. 2)
随机推荐
A survey of training on graphs: taxonomy, methods, and Applications
Popular explanation [redirection] and its practice
C. Three displays codeforces round 485 (Div. 2)
Istio FAQ: sidecar startup sequence
MySQL InnoDB and MyISAM
Detailed explanation of transpose convolution in pytorch
C. K-th not divisible by n (Mathematics + thinking) codeforces round 640 (Div. 4)
Greenplum role-based fine-grained permission control
Use Google search like a professional
What is zero trust? Three classes will show you how to understand him!
Istio FAQ: sidecar stop sequence
National standard gb28181 protocol video platform easygbs alarm reporting function adds video alarm reporting and video recording
MySQL timestamp format conversion date format string
What is Ethernet
#夏日挑战赛# HarmonyOS - 实现带日期效果的待办事项
Serial of H3CNE experiment column - spanning tree STP configuration experiment
A memory leak caused by timeout scheduling of context and goroutine implementation
Leetcode notes of Google boss | necessary for school recruitment!
Tencent on the other hand, I was puzzled by the "horse race" problem
Development trend of CAE simulation analysis software