当前位置：网站首页>Differences and solutions of redis cache avalanche, cache penetration and cache breakdown

Differences and solutions of redis cache avalanche, cache penetration and cache breakdown

2022-06-25 15:32:00 【ITenderL】

Cache avalanche 、 The difference and solution between cache penetration and cache breakdown

Cache avalanche

What is a cache avalanche

Cache avalanche means that the cache fails in a large area at the same time , Later requests will all fall on the database , Cause the database to bear a large number of requests in a short time and collapse . Like an avalanche , All-powerful .

Take a chestnut ： The second kill begins 12 An hour ago , We have a lot of goods in the store Redis in , The cache expiration time set is also 12 Hours , So when the second kill starts , The access to these second kill products is invalid . The resulting situation is , The corresponding request goes directly to the database , It's like an avalanche .

Solution

For hot cache data invalidation

The expiration time of cache data is set randomly , Prevent a large number of data expiration at the same time .
Set the cache to never expire .

For service unavailability

use Redis Cluster deployment , Avoid single machine problems that make the entire cache service unavailable .
Current limiting , Avoid processing large amounts of requests at the same time .

Cache penetration

What is cache penetration ？

Cache penetration refers to data that does not exist in the cache or in the database , All requests fall on the database , Cause the database to bear a large number of requests in a short time and collapse .

for instance ： Some hacker deliberately creates something that doesn't exist in our cache key Make a lot of requests , Causes a large number of requests to fall to the database .

Solution

Verification is added to the interface layer , Such as user authentication verification ,id Do basic verification , for example ：id<0 Direct interception .
Invalid cache key, Data not available from cache , In the database, there is no access to , You can save one set key null exp 30 To the cache , And set expiration time .
Using the bloon filter , Put possible data hash To a big enough bitmap in , A certain nonexistent data will be bitmap To filter out , Thus, the query pressure on the underlying storage system is avoided .

The bloon filter （Bloom Filter） It's called Bloom My brother 1970 Put forward in . We can think of it as a binary vector （ Or bit array ） And a series of random mapping functions （ hash function ） Two part data structure . Compared with what we usually use List、Map 、Set And so on , It takes up less space and is more efficient , But the disadvantage is that the returned result is probabilistic , Not very accurate . In theory, the more elements you add to a set , The more likely it is to misreport . also , The data stored in the bloon filter is not easy to delete .

Insert picture description here

Each element in the digit group occupies only 1 bit , And each element can only be 0 perhaps 1. Apply for a 100w The number group of elements only occupies 1000000Bit / 8 = 125000 Byte = 125000/1024 kb ≈ 122kb Space .

Principle of bloon filter ：

When an element is added to the bloom filter , Will do the following ：

Use hash function in bloom filter to calculate element value , Get hash value （ There are several hash functions that get a few hash values ）.
According to the hash value , Set the value of the corresponding subscript to 1.

When we need to determine whether an element exists in the bloom filter , Will do the following ：

Do the same hash calculation for the given element again ;
After getting the value, judge whether each element in the digit group is 1, If the value is 1, So this value is in the bloom filter , If there is a value that is not 1, Indicates that the element is not in the bloom filter .

Insert picture description here

As shown in the figure , When the string store is to be added to the bloom filter , The string is first generated by multiple hash functions with different hash values , Then the elements in the following table of the corresponding digit group are set to 1（ When the bit array is initialized , All positions are 0）. When the same string is stored the second time , Because the previous corresponding position has been set to 1, So it's easy to know that this value already exists （ It's very convenient to go heavy ）.

If we need to determine whether a string is in the bloom filter , Just do the same hash again for the given string , After getting the value, judge whether each element in the digit group is 1, If the value is 1, So this value is in the bloom filter , If there is a value that is not 1, Indicates that the element is not in the bloom filter .

Different strings may be hashed out in the same place , In this case, we can increase the number group size or adjust our hash function .

Sum up , We can figure out ： The bloom filter says that an element exists , A small probability will miscalculate . The bloom filter says that an element is not there , Then this element must not be in .

Use scenario of bloon filter

Determine whether the given data exists ： For example, judge whether a number is in a number set containing a large number of numbers （ The number set is big ,5 More than hundred million ！）、 Prevent cache penetration （ Judge whether the requested data is effective to avoid bypassing the cache request database directly ） wait 、 Email spam filtering 、 Blacklist function and so on .
duplicate removal ： For example, when you climb a given URL, you can use the one you have already climbed URL duplicate removal .

Cache breakdown

What is cache breakdown

For some with expiration set key, If these key It may be accessed at some point in time with super high concurrency , It's a very “ hotspot ” The data of . This is the time , A question needs to be considered ： The cache is “ breakdown ” The problem of , The difference between this and cache avalanche is that this is for a certain key cache , The former is a lot of key.

Take a chestnut ： When the cache expires at a certain point in time , Right at this point in time Key There are a lot of concurrent requests coming , These requests usually find that the cache is expired from the back end DB Load data and reset to cache , At this time, a large number of concurrent requests may instantly put the back end DB Overwhelmed .