
1. Data consistency
We know ,Redis It is mainly used for caching , Just use the cache , Whether it's local memory for caching or using Redis Do the cache , Then there will be the problem of data synchronization .
In general , We are all
Read the cached data first , Cache data has , The result is returned immediately ; If there is no data , Read data from the database , And synchronize the read data to the cache , Provide the return data of the next read request
.

This can effectively reduce the pressure on the database , But if
Modify delete
Data in the database , The memory is unable to perceive the data changes in the database . This will cause the inconsistency between the data in the database and the data in the cache , Then how to solve it ?
There are several common solutions :
1.1 Update cache first , Updating the database
We usually don't consider this plan . The reason is that the cache was successfully updated , An exception occurred while updating the database , As a result, the cached data is completely inconsistent with the database data , And it's hard to detect , Because the data in the cache always exists .
1.2 Update the database first , Updating cache
We generally do not consider this scheme , The reason is the same as the first one , Database update succeeded , Cache update failed , Data inconsistency will also occur .
That is, for updating the cache , Generally do not consider , Mainly from below 2 Point consideration :
1、 Concurrency issues
If there's a request at the same time A And request B Update operation , Then there will be :
So there's a request A Update cache should be better than request B Update the cache early , But because of the Internet and so on ,B But than A The cache was updated earlier . This leads to dirty data , So don't think about .
2、 Business scenario problems
If you are one
There are many scenarios for writing databases
, and
There are few data reading scenarios
Business needs , The adoption of this scheme will lead to , I haven't read the data yet , The cache is updated frequently , Waste performance .
Secondly, many times , Caching scenarios at complex points , Caching is not just a direct value from the database . For example, a field of a table may be updated , Then its corresponding cache , You need to query the data of the other two tables and perform operations , To calculate the latest value of the cache .
And does it mean , Every time the database is modified , Must update the corresponding cache , Maybe it's like this , But for more complex scenarios of cache data calculation , That's not it . If you frequently modify multiple tables involved in a cache , Cache is also updated frequently . But the problem is , Will this cache be accessed frequently ?
for instance : A field of the table involved in the cache , stay 1 It's changed in minutes 20 Time , Or is it 100 Time , Then cache updates 20 Time 、100 Time , But this cache is 1 Only read in minutes 1 Time , There's a lot of cold data .
actually , If you just delete the cache , So in 1 Within minutes , This cache is just a recalculation , Significantly reduced overhead . Use cache to calculate cache .
Actually delete cache , Instead of updating the cache , It's just one. Lazy The idea of computation , Don't do complex calculations every time , Whether it's going to work or not , But let it recalculate when it needs to be used .
In the final analysis, it is choice
Update cache
still
Retire cache
Well , Mainly depends on
The complexity of updating the cache
, The cost of updating the cache is small , At this point, we should prefer to update the cache , To ensure a higher cache hit rate , Updating the cache is expensive , At this point, we should be more inclined to eliminate the cache . But eliminating the cache is simple , And the side effects are only added once
cache miss
, Therefore, it is generally used as a general processing method .
1.3 So let's delete the cache , Updating the database
There will also be problems with the scheme , The specific reasons are as follows :
But at this point the request A It didn't update successfully , Or the transaction has not yet been committed , request B Go to the database to query for the old value , Then the database and Redis Data inconsistency .
So what's the solution ? The simplest solution is
Delay double delete
The strategy of , namely :
This practice can make 1s Internally caused data deletion .
that , This 1 How does the second determine , How long should I sleep ?
For the above case , Self evaluate the time-consuming of reading data and business logic of your project . The sleep time for writing the data is then based on the time spent reading the data's business logic , Add a few hundred ms that will do . The purpose of this is , Is to make sure that the read request ends , Write requests can remove cached dirty data caused by read requests .
1.4 Update the database first , Deleting cache
This way, , go by the name of
Cache Aside Pattern
.
For read requests
If it does not exist , Read database , Then take out the data and put it into the cache , Return response at the same time .
For write requests
This situation still has concurrency problems ?
Suppose there are two requests , A request A Do query operation , A request B Do update operation , So this is going to happen :
But the probability of this situation is actually very low , If the above occurs , Then step 3 Write database operations than steps 2 The read database operation takes less time , It's possible to make steps 4 Before the steps 5.
But actually , Database
Read operations are much faster than write operations
Of , So step 3 It takes more time than steps 2 shorter , It's very difficult for this to happen .
however , In theory, there is still a possibility , So how to deal with this situation ? There are usually two kinds :
1.5 There is a problem with the delete policy
therefore , about Redis Cache consistency , The usual approach is to delete the cache , Since there is an action to delete , If there is a problem during the deletion phase , As a result, the data has not been deleted , At this time, every query is wrong data . How can this be solved ?
Generally speaking, there are two schemes :
Use the message queue to retry the deletion compensation

But this solution will have a disadvantage that it will cause a lot of intrusion into the business code , Deeply coupled , So there will be an optimized solution , We know right MySQL After the database update operation Binlog We can find the corresponding operation in the log , So we can subscribe to Mysql Database Binlog Logs operate on the cache .
For subscriptions Binlog Logs can be accessed through Alibaba's open source framework
canal
, See this article for details :
be based on canal Framework solution mysql And redis The question of consistency
2. Cache penetration
It refers to querying a fundamental
Nonexistent data
, Neither the cache tier nor the storage tier will hit , If no data can be found from the storage layer, it will not be written to the cache layer .
Cache penetration will cause non-existent data to be queried in the storage layer every time it is requested , Lost the significance of cache protection back-end storage . Cache penetration issues can make
The back-end storage load increases
, Because many back-end storage does not have high concurrency , It may even cause the back-end storage to go down .
Generally, the total number of calls can be counted separately in the program 、 Cache layer hits 、 Number of storage layer hits , If a large number of storage layer null hits are found , Maybe there is a cache penetration problem .
2.1 Cause analysis
There are two basic reasons for cache penetration :
1、 There is a problem with your own business code or data
such as , Our database id All are 1 Starting to grow , If initiated as id The value is -1 Data or id For very large nonexistent data . If the parameter is not verified , database id Are greater than 0 Of , I always use less than 0 To request , You can get around it every time Redis Directly to the database , At this time, the database cannot be found , Every time , Concurrent highs are prone to collapse .
2、 A malicious attack 、 Reptiles and so on cause a lot of empty hits
2.2 Solution
Cache penetration can be solved from the following aspects :
1、 Add verification
Such as user authentication verification ,id Do basic verification ,id<=0 Direct interception of ;
2、 Caching empty objects
Since the original data does not exist , Keep the empty object in the cache layer , Then accessing this data will get from the cache , This protects the back-end data source .
But there will be 2 A question :
Null values are cached , It means that there are more keys in the cache layer , Need more memory space ( If it's an attack , The problem is more serious ), The more effective method is for this kind of data
Set a shorter expiration time
, Let it automatically remove .
The data of cache layer and storage layer will be inconsistent for a period of time , It may have some impact on the business . For example, the expiration time is set to 5 minute , If the storage layer adds this data at this time , Then there will be inconsistencies between the cache layer and the storage layer data in this period of time , At this point, the data consistency scheme can be used to process .
3、 The bloon filter
Before accessing the cache layer and storage layer , There will be key Save in advance with a bloom filter , Do the first level intercept .
for example : A recommendation system has 4 Billion users id, Every hour, the algorithm engineer will calculate the recommended data according to each user's previous historical behavior and put it into the storage layer , But the latest users have no historical behavior , There will be cache penetration behavior , For this reason, users of all recommended data can be made into Bloom filters . If the bloom filter thinks that the user id non-existent , Then there is no access to the storage tier , It protects the storage layer to a certain extent .
This method is suitable for data hit is not high 、 Data is relatively fixed 、 Low real time ( Usually the data set is large ) Application scenarios of , Code maintenance is more complex , But the cache space is less .
3. Cache breakdown
Buffer breakdown refers to a
hotspot key
, Large concurrent centralized access to this point , When this key At the moment of failure , Continuous large concurrency breaks through the cache , Direct request database .
If the cache breaks down , Set hotspot data never to expire . Or you can add the mutex .
3.1 Cause analysis
Set the expiration time of key, Carrying high concurrency , It's hot data . From this key To expire from MySQL Load data into the cache for a period of time , A lot of requests could kill the database .
Cache avalanche refers to a large number of cache failures , Cache breakdown refers to cache failure of hot data .
3.2 Solution
1、 Use mutexes
A common practice in the industry , Is to use mutex .
To put it simply , When the cache fails ( Judge that the value is empty ), Not immediately load db, Instead, use some operations with the return value of the successful operation of the caching tool first ( such as Redis Of SETNX perhaps Memcache Of ADD) Go to set One mutex key, When the operation returns success , Proceed again load db And reset the cache , otherwise , Just try the whole thing again get Caching method .
The pseudocode is as follows :
public String get(key) {
String value = redis.get(key);
if (value == null) {// Represents cache value expiration
// Set up 3min timeout , prevent del When the operation fails , Next time the cache expires, it can't be load db
if (redis.setnx(key_mutex, 1, 3 * 60) == 1) {
// Indicates that the setting is successful
value = db.get(key);
redis.set(key, value, expire_secs);
redis.del(key_mutex);
} else {
// This time represents that other threads at the same time have load db And back to the cache , In this case, try again to get the cache value
sleep(50);
get(key); // retry
}
} else {
return value;
}
}2、 Never expire
there
Never expire
It has two meanings :
from Redis Look up , It's true that the expiration time is not set , That's the guarantee , There will be no hot spots key Overdue problem , That is to say
Physics doesn't expire
.
functionally , If it doesn't expire , That's static ? So we keep the expiration date key Corresponding value in , If it's found to be overdue , Build the cache through a background asynchronous thread , That is to say
Logic does not expire
.
From the perspective of actual combat , This method is very performance friendly , The only drawback is when building the cache , The rest of the threads ( Threads that are not building the cache ) Maybe it's old data , But for general Internet functions, this is tolerable .
4. Cache avalanche
Cache avalanche refers to the time from bulk to expiration of data in cache , And the amount of query data is huge , Cause too much pressure on the database and even downtime .
4.1 Cause analysis
Because the cache layer carries a lot of requests , Effectively protects the storage tier , But if the cache layer cannot provide services for some reason , For example, a large area of cached data fails at the same time , That moment Redis It's the same as nothing , So all the requests will reach the storage layer , The number of calls to the storage layer will skyrocket , Cause cascading downtime in the storage tier .
4.2 Solution
Prevent and solve buffer avalanche problem , We can start from the following three aspects :
Like an airplane with multiple engines , If the cache layer is designed to be highly available , Even if individual nodes 、 Individual machines 、 Even the computer room went down , Services are still available , for example Redis Sentinel and Redis Cluster High availability is realized ;
Use a multi-level caching mechanism , For example, using at the same time Redis and Memcache cache , request ->redis->memcache->db.
5. Hot Key
stay Redis in , High frequency of visits key be called hot key, Hot data .
5.1 reason
There are two reasons for hot issues :
Some unexpected events in daily work and life , for example : The price reduction and promotion of some popular goods during the double 11 , When one of these items is viewed or purchased tens of thousands of times , There will be a large demand , In this case, it will cause hot issues . Empathy , Be published in large numbers 、 Browsing hot news 、 Hot reviews 、 Star live broadcast, etc , These typical scenes with more reading and less writing will also generate hot issues .
When the server reads data for access , Data is often segmented , In this process, it will be on a certain host Server Up to the corresponding Key Visit , When the visit exceeds Server At the limit , It will lead to hot spots Key Problem generation .
5.2 harm
Flow concentration , The physical network card limit is reached .
Too many requests , Cache fragmentation service is broken .DB breakdown , Cause a business avalanche .
As mentioned above , When a hot spot Key When the request exceeds the upper limit of the host network card on a host , Because of the over concentration of traffic , It will cause other services in the server to be unavailable . If the hot spots are too concentrated , hotspot Key Too much cache , When the current cache capacity is exceeded , It will lead to the collapse of cache fragmentation service .
When the cache service crashes , At this time, another request is generated , It will be cached in the background DB On , because DB Its performance is weak , In the face of large requests, it is easy to have request penetration , Will further lead to avalanches , Seriously affect the performance of the equipment .
5.3 Solution
Find hot spots key after , Need to focus on hot spots key To deal with , Usually there are the following 2 Medium scheme :
1、 Use L2 cache
have access to guava-cache or hcache, Will be hot key Load into JVM As a local cache . Visit these key You can get it directly from the local cache , No direct access to Redis The layer , The cache server is effectively protected .
2、key
Dispersed
Will be hot key Split into multiple children key, Then it is stored on different machines of the cache cluster , These key Corresponding value And hot spots key It's the same . When passing through hot spots key When you go to query data , By some means hash The algorithm randomly selects a child key, Then access the cache machine , Spread the hot spots to multiple sub key On .
actually , For heat key The problem is the problem that will only occur when the high level is issued , General single machine systems do not have such high concurrency , The L2 cache architecture is a bit complicated , The actual situation mainly depends on whether the business scenario really needs .
For example, each Redis ceiling 10w/s QPS,Redis5.0 We usually deploy in clusters 3 Lord 6 from , heat key It is generally distributed over a hash slot , It's a Lord redis+ Two from redis, Theoretically, it can satisfy 30w/s Of QPS, Let's reserve a little buffer,10wQPS There must be no problem . If higher , For example, millions of visits , That's except for expansion Redis Outside the cluster , Local caching is also necessary .
6. Big Key
bigkey Refer to key Corresponding value The memory space occupied is relatively large , For example, a string type value Can save up to 512MB, A list type value Up to storage 23-1 Elements . If it is subdivided according to the data structure , It is generally divided into string types bigkey And non string types bigkey.
String type : Reflected in a single value Great value , It is generally believed that more than 10KB Namely bigkey, But this value is different from the specific QPS relevant .
Non string type : Hash 、 list 、 aggregate 、 Ordered set , Reflected in the excessive number of elements .
bigkey Both spatial and temporal complexity are not very friendly .
6.1 Find out
redis-cli --bigkeys
You can command statistics bigkey The distribution of , But in a production environment , Developers and operation and maintenance personnel prefer to define bigkey Size , And more hope to find the real bigkey What are the , So that we can locate 、 solve 、 optimization problem . Judge a key Is it bigkey, Just execute debug object key see serializedlength Attribute is enough , It said key Corresponding value Number of bytes after serialization .
If there are many key values ,scan + debug object It will be slow , You can use Pipeline Mechanism complete . For data structures with a large number of elements ,debug object The execution speed is slow , Blocked Redis The possibility of , So if there is a slave node , Consider executing on the slave node .
6.2 harm
bigkey The harm of is reflected in three aspects :
Every time to get bigkey The generated network traffic is large , Suppose a bigkey by 1MB, The number of visits per second is 1000, So every second produces 1000MB Of traffic , For ordinary gigabit network card ( In terms of bytes 128MB/s) It's a disaster for our servers , Moreover, the general server will be deployed in the way of single machine and multiple instances , That is to say a bigkey May affect other instances , The consequences are dire .
bigkey The existence of is not completely fatal , If this bigkey Exists but is rarely accessed , Then only the problem of uneven memory space exists , It is less important and urgent than the other two problems , But if bigkey It's a hot spot key, Then the harm it brings is unimaginable , Therefore, we must pay close attention to the actual development, operation and maintenance bigkey The existence of .
6.3 Solution
The main idea is to split , Yes big key Stored data (big value) To break up , become value1,value2… valueN wait .
for example big value It's a big json adopt mset The way , Put this key The contents of are scattered into various instances , Or a hash, Every field Represents a specific attribute , adopt hget、hmget Get part value,hset、hmset To update some properties .
for example :big value It's a big list, Can be disassembled into list Split into .list_1,list_2,list3,listN The same applies to other data types .
7. Redis Split brain
The so-called brain fissure , It refers to the master-slave cluster that guarantees availability ,
Two master nodes are generated at the same time
, They can all receive write requests .
The main cause may be network problems redis master The node follows redis slave Nodes and sentinel Clusters are in different network partitions , At this time because sentinel The cluster cannot perceive master The existence of , So will slave The node is promoted to master node ..
The most direct effect of cerebral fissure , That is, the client does not know which master node to write data to , As a result, different clients will write data to different master nodes . and , Serious words , Cleft brain can further lead to data loss .
7.1 Sentinel master-slave cluster brain crack
Suppose there are now three servers , One master server , Two slave servers , And the sentinel mechanism .

Based on the above environment , At this time, the network environment fluctuated, leading to a certain master The machine is suddenly out of the normal network , But actually master Still running ,sentinel By way of election, the Council has promoted a slave As new master.
If it happens at the right time App Server1 Still connected is the old master, and App Server2 Connected to the new master On . The data is not consistent , The sentry restored to the old master After the perception of nodes , Will downgrade it to slave node , Then start again maste Synchronous data (full resynchronization), Lead to aging during brain fissure master Data written is lost .

Solution
The following parameters are configured to solve the cerebral fissure :
min-replicas-to-write 1
min-replicas-max-lag 5 The above parameters indicate that at least 1 individual slave, The delay of data replication and synchronization cannot exceed 5 second .
The first parameter represents the least slave The node is 1 individual , Only one data is written master Nodes are also synchronized to at least 1 From nodes , It means that the data is successfully added .
The second parameter indicates that the delay of data replication and synchronization cannot exceed 5 second .
These two parameters are configured , If a brain crack occurs , primary master The request will be rejected when the client writes , This can avoid a lot of data loss .
7.2 Cluster cleft brain
By default ,Redis A cluster of brain fissures generally does not exist , because Redis There are more than half of the election mechanisms in the cluster , And when the cluster 16384 The entire cluster is not available when any one of the slots is not assigned to a node .
So we're building Redis When the cluster , You should let the cluster Master The minimum number of nodes is 3 individual , And the number of available nodes in the cluster is odd .
Not by default , For example, the number of clusters is an even number or parameters cluster-require-full-coverage Set to off ( The function of this parameter : When it is turned on, as long as there is node downtime 16384 Pieces are not completely covered , The entire cluster is out of service ), In this case, it may still lead to brain cracking . So the solution can also use parameters min-replicas-to-write and min-replicas-max-lag .
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206211841526986.html