当前位置:网站首页>Differences and solutions of redis cache avalanche, cache penetration and cache breakdown

Differences and solutions of redis cache avalanche, cache penetration and cache breakdown

2022-06-25 15:32:00 ITenderL

Cache avalanche 、 The difference and solution between cache penetration and cache breakdown

Cache avalanche

What is a cache avalanche

Cache avalanche means that the cache fails in a large area at the same time , Later requests will all fall on the database , Cause the database to bear a large number of requests in a short time and collapse . Like an avalanche , All-powerful .

Take a chestnut : The second kill begins 12 An hour ago , We have a lot of goods in the store Redis in , The cache expiration time set is also 12 Hours , So when the second kill starts , The access to these second kill products is invalid . The resulting situation is , The corresponding request goes directly to the database , It's like an avalanche .

Solution

For hot cache data invalidation

  1. The expiration time of cache data is set randomly , Prevent a large number of data expiration at the same time .
  2. Set the cache to never expire .

For service unavailability

  1. use Redis Cluster deployment , Avoid single machine problems that make the entire cache service unavailable .
  2. Current limiting , Avoid processing large amounts of requests at the same time .

Cache penetration

What is cache penetration ?

Cache penetration refers to data that does not exist in the cache or in the database , All requests fall on the database , Cause the database to bear a large number of requests in a short time and collapse .

for instance : Some hacker deliberately creates something that doesn't exist in our cache key Make a lot of requests , Causes a large number of requests to fall to the database .

Solution

  1. Verification is added to the interface layer , Such as user authentication verification ,id Do basic verification , for example :id<0 Direct interception .
  2. Invalid cache key, Data not available from cache , In the database, there is no access to , You can save one set key null exp 30 To the cache , And set expiration time .
  3. Using the bloon filter , Put possible data hash To a big enough bitmap in , A certain nonexistent data will be bitmap To filter out , Thus, the query pressure on the underlying storage system is avoided .

The bloon filter (Bloom Filter) It's called Bloom My brother 1970 Put forward in . We can think of it as a binary vector ( Or bit array ) And a series of random mapping functions ( hash function ) Two part data structure . Compared with what we usually use List、Map 、Set And so on , It takes up less space and is more efficient , But the disadvantage is that the returned result is probabilistic , Not very accurate . In theory, the more elements you add to a set , The more likely it is to misreport . also , The data stored in the bloon filter is not easy to delete .

 Insert picture description here

Each element in the digit group occupies only 1 bit , And each element can only be 0 perhaps 1. Apply for a 100w The number group of elements only occupies 1000000Bit / 8 = 125000 Byte = 125000/1024 kb ≈ 122kb Space .

Principle of bloon filter

When an element is added to the bloom filter , Will do the following :

  1. Use hash function in bloom filter to calculate element value , Get hash value ( There are several hash functions that get a few hash values ).
  2. According to the hash value , Set the value of the corresponding subscript to 1.

When we need to determine whether an element exists in the bloom filter , Will do the following :

  1. Do the same hash calculation for the given element again ;
  2. After getting the value, judge whether each element in the digit group is 1, If the value is 1, So this value is in the bloom filter , If there is a value that is not 1, Indicates that the element is not in the bloom filter .

 Insert picture description here

As shown in the figure , When the string store is to be added to the bloom filter , The string is first generated by multiple hash functions with different hash values , Then the elements in the following table of the corresponding digit group are set to 1( When the bit array is initialized , All positions are 0). When the same string is stored the second time , Because the previous corresponding position has been set to 1, So it's easy to know that this value already exists ( It's very convenient to go heavy ).

If we need to determine whether a string is in the bloom filter , Just do the same hash again for the given string , After getting the value, judge whether each element in the digit group is 1, If the value is 1, So this value is in the bloom filter , If there is a value that is not 1, Indicates that the element is not in the bloom filter .

Different strings may be hashed out in the same place , In this case, we can increase the number group size or adjust our hash function .

Sum up , We can figure out : The bloom filter says that an element exists , A small probability will miscalculate . The bloom filter says that an element is not there , Then this element must not be in .

Use scenario of bloon filter

  1. Determine whether the given data exists : For example, judge whether a number is in a number set containing a large number of numbers ( The number set is big ,5 More than hundred million !)、 Prevent cache penetration ( Judge whether the requested data is effective to avoid bypassing the cache request database directly ) wait 、 Email spam filtering 、 Blacklist function and so on .
  2. duplicate removal : For example, when you climb a given URL, you can use the one you have already climbed URL duplicate removal .

Cache breakdown

What is cache breakdown

For some with expiration set key, If these key It may be accessed at some point in time with super high concurrency , It's a very “ hotspot ” The data of . This is the time , A question needs to be considered : The cache is “ breakdown ” The problem of , The difference between this and cache avalanche is that this is for a certain key cache , The former is a lot of key.

Take a chestnut : When the cache expires at a certain point in time , Right at this point in time Key There are a lot of concurrent requests coming , These requests usually find that the cache is expired from the back end DB Load data and reset to cache , At this time, a large number of concurrent requests may instantly put the back end DB Overwhelmed .

Solution

  1. Hot data never expires .
  2. Use mutexes
原网站

版权声明
本文为[ITenderL]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202200502168181.html