当前位置：网站首页>Cache design issues

Cache design issues

2022-06-21 19:03:00 【Xiao Lin also wants a dragon maid】

The text excerpt is summarized in Geek time ——《Java Common mistakes in business development 100 example 》

Before we talk about cache design concepts , I wonder if you have encountered these problems in your daily development ：

hotspot Key The problem of returning to the original database , If Key If it's very hot , In fact, the caching system can't afford , Compare all the accesses to one cache server . Do you have any way to put this hot spot Key The query pressure is distributed to multiple Redis Go to the node ？
Big Key It is also a problem of data caching . If one Key Of Value Particularly large , Then it might be right Redis Great performance impact of the product , because Redis Time single thread model , For big Key Query or delete , May cause Redis Blocking or high availability switching . You know how to query Redis Medium Key And how to achieve big Key The split of ？

I will answer the above questions at the end , Now let's learn about cache synchronization 、 An avalanche 、 Concurrent 、 Penetration and so on .

Don't put the Redis As a database

Usually , We'll borrow it Redis Equally distributed cache database to cache data , however Don't put the Redis Use as a database . I have seen many cases , because Redis The disappearance of data in leads to business logic errors , And because the original data is not preserved , Business can't be recovered .
although Redis It has the function of persistence , But we can't ignore its essence just because of this —— A memory based KV database . therefore , hold Redis Use as cache , There are two things that need special attention ：

First of all , From the client's point of view , Cache data must have original data , Even if this Key The set expiration time is 1 minute , But it is in the 30 It disappeared in seconds , We should also accept this . When the data disappears , We need to be able to load data from raw data , And we must not think that there is no TTL The cache of will not be deleted .
second , from Redis From the perspective of the server , The amount of data that the cache system can store must be smaller than the original data . First , We need to limit Redis Usage of memory , Then set up an appropriate algorithm to expel the data , It's very important ！

from Redis Documents It can be seen that , The commonly used elimination strategies are ：

allkeys-lru, For all Key, Priority to delete the least recently used Key;
volatile-lru, For... With expiration time Key, Priority to delete the least recently used Key;
volatile-ttl, For... With expiration time Key, Priority will be given to deleting those that are about to expire Key（ according to TTL Value ）;
allkeys-lfu（Redis 4.0 above ）, For all Key, Priority to delete the least used Key;
volatile-lfu（Redis 4.0 above ）, For... With expiration time Key, Priority to delete the least used Key.

Actually , These algorithms are Key Range +Key Choose a combination of algorithms , The scope of which is allkeys and volatile Two kinds of , The algorithms are LRU、TTL and LFU Three . Next , I'm from Key Range and algorithm angle , Tell you how to choose the right expulsion algorithm .
First , From an algorithmic point of view ,Redis 4.0 Later on LFU Than LRU more “ practical ”. Just imagine , If one Key The frequency of visits is 1 Once a day , But it happens to be 1 I just visited... Seconds ago , that LRU You may not choose to give priority to this Key, On the contrary, one may be eliminated 5 Once a second, but recently 2 Second has not visited Key, and LFU The algorithm doesn't have this problem . and TTL Will compare “ Simple minded ” One o'clock , Priority will be given to deleting those that are about to expire Key, But it's possible that this Key Being heavily accessed .
then , from Key In terms of scope ,allkeys Can ensure that even if Key No, TTL It can also be recycled , If the client is always “ forget ” Set cache expiration time , Then consider using this series of algorithms . and volatile It would be safer , In case the client puts Redis Used as a long-term cache , Just initialize the cache once at startup , So once you delete this class, you don't TTL The data of , It may lead to client errors .
therefore , Both users and managers should consider Redis How to use , Users need to consider that they should use the cached posture Redis, Managers should be Redis Set memory limits and appropriate eviction policies , Avoid OOM.

Note the cache avalanche problem

The cache IO The number of reads and writes is much higher than that of the database , Therefore, be particularly careful in the case of a large number of cache invalidations . Once this happens , There may be a large amount of data that needs to be returned to the original database for query , Put great pressure on the database , Extreme cases can even lead to a direct crash of the database .

There are roughly two scenes that cause avalanches ：

The caching system itself is not available , Cause data to return to the source ;
a large number of key At the same time , Cause data to return to the source .

The first problem itself is the problem of system availability , No cache design issues involved . Today is mainly about How to ensure a large number of Key Do not passively expire at the same time .
Let's look at a case , When the program is initialized, put 1000 City data to Redis In cache , The expiration time is 30 second ; After the data expires, get the data from the database and write it to the cache , Each time data is obtained from the database, the counter +1; When the program starts , Start a scheduled task thread and output the counter value every second , And set the counter to zero .

@Autowired
private StringRedisTemplate stringRedisTemplate;
private AtomicInteger atomicInteger = new AtomicInteger();

@PostConstruct
public void wrongInit() {
    // initialization 1000 City data to Redis, Validity of all cached data 30 second 
    IntStream.rangeClosed(1, 1000).forEach(i -> stringRedisTemplate.opsForValue().set("city" + i, getCityFromDb(i), 30, TimeUnit.SECONDS));
    log.info("Cache init finished");
    
    // Once a second , Output database access QPS
    
   Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
        log.info("DB QPS : {}", atomicInteger.getAndSet(0));
    }, 0, 1, TimeUnit.SECONDS);
}

@GetMapping("city")
public String city() {
    // Query a city randomly 
    int id = ThreadLocalRandom.current().nextInt(1000) + 1;
    String key = "city" + id;
    String data = stringRedisTemplate.opsForValue().get(key);
    if (data == null) {
        // Back to source database query 
        data = getCityFromDb(id);
        if (!StringUtils.isEmpty(data))
            // cache 30 Seconds expired 
            stringRedisTemplate.opsForValue().set(key, data, 30, TimeUnit.SECONDS);
    }
    return data;
}

private String getCityFromDb(int cityId) {
    // Simulation query database , Check the increment counter and add one 
    atomicInteger.incrementAndGet();
    return "citydata" + System.currentTimeMillis();
}

Use wrk Tools , Set up 10 Threads 10 Connection pressure measurement city Interface ：

wrk -c10 -t10 -d 100s http://localhost:45678/cacheinvalid/city

Start the program 30 Cache expires in seconds , Back to the source database QPS Up to 700 many ：

Resolve cache Key At the same time, large-scale failure needs to go back to the source , There are two ways to cause the database pressure surge problem .

Scheme 1 , Differentiated cache expiration time , Don't let a lot of Key The way to expire at the same time is to When initializing the cache , Set the cache expiration time to 30 second + 10 Random delay within seconds . So we don't let these Key Focused on the 30 Seconds is expired .

@PostConstruct
public void rightInit1() {
    // The expiration time of this cache is 30 second +10 Random delay of seconds 
    IntStream.rangeClosed(1, 1000).forEach(i -> stringRedisTemplate.opsForValue().set("city" + i, getCityFromDb(i), 30 + ThreadLocalRandom.current().nextInt(10), TimeUnit.SECONDS));
    log.info("Cache init finished");
    // Again 1 Output the database once every second QPS：
   Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
        log.info("DB QPS : {}", atomicInteger.getAndSet(0));
    }, 0, 1, TimeUnit.SECONDS);
}

After modification , When the cache expires, the source will not be concentrated in the same second , Database QPS from 700 More down to the highest 100 about ：

Option two , Let the cache not actively expire . When initializing cached data, set the cache never to expire , Then start a background thread 30 Update all data to the cache every second , And through proper dormancy , Controls how often data is updated from the database , Reduce database pressure ：

@PostConstruct
public void rightInit2() throws InterruptedException {
    CountDownLatch countDownLatch = new CountDownLatch(1);
    // every other 30 The cache is fully updated every second  
    Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
        IntStream.rangeClosed(1, 1000).forEach(i -> {
            String data = getCityFromDb(i);
            // It takes some time to simulate updating the cache 
            try {
                TimeUnit.MILLISECONDS.sleep(20);
            } catch (InterruptedException e) { }
            if (!StringUtils.isEmpty(data)) {
                // Cache never expires , Passive update 
                stringRedisTemplate.opsForValue().set("city" + i, data);
            }
        });
        log.info("Cache update finished");
        // When starting the program, you need to wait for the first time to update the cache 
        countDownLatch.countDown();
    }, 0, 30, TimeUnit.SECONDS);

    Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
        log.info("DB QPS : {}", atomicInteger.getAndSet(0));
    }, 0, 1, TimeUnit.SECONDS);

    countDownLatch.await();
}

After this modification , Although the overall update time of the cache is 21 About seconds , But the pressure on the database will be relatively stable ：

Use these two methods , We need to pay special attention to the following three points ：

Scheme 1 and scheme 2 are two different caching methods , If you can't cache all the data , Then only scheme one can be used ;
Even if scheme II is used , Cache never expires , You also need to query , Ensure that there is logic back to the source . As I said before , We cannot ensure that the data in the cache system will never be lost .
Whether it is scheme 1 or scheme 2 , When adding data from the database to the cache , It is necessary to judge whether the data from the database is legal , For example, carry out the most basic air judgment inspection .

please remember , Be sure to verify the data before adding it to the cache , If any obvious abnormality is found, it shall be reported to the police in time , It's very important !

Pay attention to cache breakdown

In some Key It belongs to extreme hot data , And the concurrency is very large , If this Key Be overdue , There may be a large number of concurrent requests returning to the source at the same time , It is equivalent to a large number of concurrent requests directly hitting the database .

Let's reproduce this problem . When the program starts , Initialize a hotspot data to Redis in , The expiration time is set to 5 second , every other 1 Second output back to the source QPS：

@PostConstruct
public void init() {
    // Initialize a hotspot data to Redis in , The expiration time is set to 5 second 
    stringRedisTemplate.opsForValue().set("hotsopt", getExpensiveData(), 5, TimeUnit.SECONDS);
    // every other 1 Second output back to the source QPS
   
   Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
        log.info("DB QPS : {}", atomicInteger.getAndSet(0));
    }, 0, 1, TimeUnit.SECONDS);
}

@GetMapping("wrong")
public String wrong() {
    String data = stringRedisTemplate.opsForValue().get("hotsopt");
    if (StringUtils.isEmpty(data)) {
        data = getExpensiveData();
        // Rejoin the cache , Expiration time is still 5 second 
        stringRedisTemplate.opsForValue().set("hotsopt", data, 5, TimeUnit.SECONDS);
    }
    return data;
}

You can see , every other 5 The second database has 20 Left and right QPS：

If the back to source operation is particularly expensive , Then this concurrency cannot be ignored . At this time , We can consider using a locking mechanism to limit concurrency back to the source . For example, the following code example , Use Redisson To get one based on Redis Distributed locks for , Try to acquire the lock before querying the database ：

@Autowired
private RedissonClient redissonClient;
@GetMapping("right")
public String right() {
    String data = stringRedisTemplate.opsForValue().get("hotsopt");
    if (StringUtils.isEmpty(data)) {
        RLock locker = redissonClient.getLock("locker");
        // Acquire distributed lock 
        if (locker.tryLock()) {
            try {
                data = stringRedisTemplate.opsForValue().get("hotsopt");
                // Double check , Because there may already be one B The thread passed the first judgment , Waiting for the lock , then A The thread has written data to Redis in 
                if (StringUtils.isEmpty(data)) {
                    // Back to source database query 
                    data = getExpensiveData();
                    stringRedisTemplate.opsForValue().set("hotsopt", data, 5, TimeUnit.SECONDS);
                }
            } finally {
                // Don't forget to release , In addition, pay attention to the writing , After obtaining the lock, the whole code try+finally, Make sure unlock Be on the safe side 
                locker.unlock();
            }
        }
    }
    return data;
}

In this way, the concurrency of the database can be limited to 1：
But in fact, in a real business scenario , not always To use double checked distributed locks so rigorously for global concurrency restrictions , Because this can minimize the concurrency of the database back to the source , But it also limits the concurrency when the cache fails . The way to consider is ：

Scheme 1 , Use in-process locks to limit , In this way, each node can return to the source database in a concurrent way ;
Option two , Do not use locks for restrictions , Instead, use something like Semaphore Tools to limit concurrency , For example, it's limited to 10, This not only limits the number of concurrency back to the source, but also does not make it too large , It also enables a certain number of threads to return to the source at the same time .

Pay attention to cache penetration

In the previous example , The logic of caching back to the source is when the required data cannot be found in the cache , Back to source database query . One loophole that is easy to appear here is , No data in the cache does not necessarily mean that the data is not cached , Another possibility is that the original data does not exist at all .
Here we need to pay attention to , The difference between cache penetration and cache breakdown ：

Cache penetration means , Caching does not act as a stress buffer ;
Cache breakdown refers to , When the cache fails, the instantaneous concurrency hits the database .

There are two solutions to cache penetration .

Scheme 1 , For non-existent data , Also set a special Value To cache , For example, when the user information found in the database is empty , Set up NODATA In this way, strings with special meanings are cached . But in fact, it may be attacked and a large amount of invalid data may be added to the cache
Worry about this situation , It also uses a bloom filter as a pre filter .
You can save all possible values in the bloom filter , Filter the data once before reading from the cache ：

If the bloom filter thinks the value does not exist , Then the value must not exist , No need to query the cache and no need to query the database ;
Misjudgment request with minimal probability , Will eventually make it illegal Key The request goes to the cache or database .

Use a Bronx filter , We can use Google Of Guava Provided by Toolkit BloomFilter Class to modify the program ： Startup time , Initialize a with all valid users ID Of 、10000 An element of BloomFilter, Call its... Before querying the data from the cache mightContain Method , To detect users ID Is it possible ; If the bloom filter says the value does not exist , Then it must not exist , Go straight back to ：

private BloomFilter<Integer> bloomFilter;

@PostConstruct
public void init() {
    // Create a bloom filter , Element quantity 10000, Expected miscalculation rate 1%
    bloomFilter = BloomFilter.create(Funnels.integerFunnel(), 10000, 0.01);
    // Fill the bloom filter 
    IntStream.rangeClosed(1, 10000).forEach(bloomFilter::put);
}

@GetMapping("right2")
public String right2(@RequestParam("id") int id) {
    String data = "";
    // Judge first through the bloom filter 
    if (bloomFilter.mightContain(id)) {
        String key = "user" + id;
        // Cache query 
        data = stringRedisTemplate.opsForValue().get(key);
        if (StringUtils.isEmpty(data)) {
            // Go to database query 
            data = getCityFromDb(id);
            stringRedisTemplate.opsForValue().set(key, data, 30, TimeUnit.SECONDS);
        }
    }
    return data;
}

Actually , Scheme 2 can be used together with scheme 1 , I.e. the bulon filter is pre installed , In case of misjudgment, save the special value to the cache , Double insurance prevents invalid data query requests from hitting the database .

Note the cache data synchronization policy

aforementioned 3 A case , In fact, they all belong to the passive deletion after the cache data expires . In practice , After modifying the original data , Considering the timeliness of cache data update , We may adopt the strategy of actively updating the cache .
actually , First update the database and then delete the cache , Load data into the cache on demand when accessing Strategy is the best . Although in extreme cases , This strategy may also lead to data inconsistency , But the probability is very low , Basically negligible .
It should be noted that , Deleting the cache after updating the database may fail , If it fails, consider adding the task to the delay queue for delay retry , Make sure the data can be deleted , The cache can be updated in time . Because the delete operation is idempotent , So even if the problem of repeated deletion is not too big , This is another reason why deleting is better than updating .
therefore , The more recommended method for cache updates is , The data in the cache is not actively triggered by the data update operation , Unified load on demand when needed , After the data is updated, delete the data in the cache in time .

Opening answer

The first question is ： Type a scene ： If in a very hot data , Data updates are not very frequent , But queries are very frequent , We need to guarantee the basic guarantee 100% Cache hit rate of , What to do with ？
Our approach is , Space exchange efficiency , The same key Retain 2 Share ,1 No suffixes ,1 With suffix , Those without suffixes are ttl, There are no suffixes , First query without suffix , No queries , Do two things ：1、 Background program query DB Update cache ;2 The query is returned to the caller with a suffix . In this way, you can avoid database hangs caused by cache breakdown as much as possible .
The second question is ：

The key You need to deposit and withdraw in lump sum every time
You can try to split objects into several key-value, Use multiGet Get value , The significance of such a split is the pressure of a single operation , Spread the operating pressure evenly to multiple redis In the example , Reduce the impact on individual redis Of IO influence ;
The object only needs to access part of the data at a time
It can be like the first one , Split into several key-value, You can also store this in a hash in , Every field Represents a specific attribute , Use hget,hmget To get part of value, Use hset,hmset To update some properties .

原网站

版权声明
本文为[Xiao Lin also wants a dragon maid]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/172/202206211714486585.html