当前位置:网站首页>Flexible use of distributed locks to solve the problem of repeated data insertion
Flexible use of distributed locks to solve the problem of repeated data insertion
2022-06-24 06:11:00 【2020labs assistant】
One 、 Business background
Many user oriented Internet services maintain a user data at the back end of the system , The fast application center business has done the same . The fast application center allows users to collect fast applications , The user's favorite list is recorded on the server , Identify by user account OpenID To associate the favorite app package name .
In order to make the user's favorite list in the fast application center can be connected with the fast application Menubar Get through the collection status of , We also recorded the user account id OpenID With the client local identity local_identifier Binding relationship of . Because fast application Manubar Held by the fast application engine , Independent of the fast application center , The user account ID cannot be obtained through the account system , Only the local identity of the client can be obtained local_identifier, So we can only keep state synchronization through the mapping relationship between the two .
On the concrete realization , We trigger a synchronization operation when the user starts the fast Application Center , By the client OpenID And the local identity of the client is submitted to the server for binding . The binding logic of the server is : Judge OpenID Does it already exist , Insert the database if it doesn't exist , Otherwise, update the corresponding data row local_identifier Field ( Because users may log in to the same on two different mobile phones vivo account number ). In subsequent business processes , So we can OpenID Query corresponding local_identifier, And vice versa .
But after the code went online for some time , We found that t_account There are a lot of duplicates in the data table OpenID Record . According to the binding logic described above , This should not happen in theory . Fortunately, these duplicate data have no impact on the update and query scenarios , Because in the query SQL We joined LIMIT 1 The limitation of , So for a OpenID In fact, both update and query operations only work on ID The smallest record .
Two 、 Problem analysis and positioning
Although redundant data has no impact on the actual business , But this obvious data problem is certainly intolerable . So we started to check the problem .
The first thought is to start with the data itself . First pass right t_account Make a rough observation of the table data , It was found that there was about 3% Of OpenID There will be duplication . That is to say, repeated insertion occurs occasionally , Most requests are handled correctly as expected . We reread the code , It is confirmed that there are no obvious logical errors in the implementation of the code .
Let's take a closer look at the data . We picked a few that were repetitive OpenID, Query the relevant data records , Find these OpenID The number of repetitions is also different , Some only repeat once , Some are more . however , At this time, we found a more valuable information —— These are the same OpenID The creation time of all data rows is exactly the same , And it's growing ID Is a continuous .
therefore , We guess that the problem should be caused by concurrent requests ! We simulated the concurrent call of the client to the interface , There is indeed a phenomenon of repeated insertion of data , Further confirmed the rationality of this guess . however , Clearly, the logic of the client is to synchronize each user at startup , Why is there the same OpenID Concurrent requests ?
in fact , The actual operation of the code is not as ideal as we thought , There are often some unstable factors in the operation of the computer , For example, network environment 、 The load of the server . These unstable factors may cause the client to fail to send the request , there “ Failure ” It may not mean real failure , Instead, the entire request may take too long , The timeout set by the client has been exceeded , Thus, it is artificially judged as failure , So the request is sent again through the retry mechanism . Then it may eventually lead to the same request being submitted multiple times , And these requests may be blocked at some point in the middle ( For example, when the processing thread load of the server is too large , Too late to process the request , The request entered the buffer queue ), When the blocking is relieved, these requests may be processed concurrently in a very short time .
This is actually a typical concurrency conflict problem , This problem can be simply abstracted as : How to avoid writing duplicate data in concurrency . in fact , There are many common business scenarios that may face this problem , For example, it is not allowed to use the same user name when registering .
Generally speaking , When we deal with such problems , The most intuitive way is to make a query first , Insert is allowed only when it is judged that there is no current data in the database .
obviously , This process is no problem from the perspective of a single request . But when multiple requests are concurrent , request A And request B Start a query first , And they all get the result that it doesn't exist , So both perform data insertion , Eventually lead to concurrency conflicts .
3、 ... and 、 Explore feasible solutions
Now that the problem is located , The next step is to start looking for solutions . In the face of this situation , We usually have two choices , One is to let the database solve , The other is solved by the application .
3.1 Database level processing —— unique index
When using MySQL Database and InnoDB When storing the engine , We can use the unique index to ensure that the values of the same column are unique . obviously , stay t_account In this list , We didn't start with open_id Column to create a unique index . If we want to add a unique index at this time , You can use the following ALTER TABLE sentence .
ALTER TABLE t_account ADD UNIQUE uk_open_id( open_id );
Once open_id After adding a unique index to the column , When the above concurrency occurs , request A And request B One of them must give priority to data insertion , The other will get a similar error . therefore , Final assurance t_account There is only one item in the table openid=xxx There are records of .
Error Code: 1062. Duplicate entry 'xxx' for key 'uk_open_id'
3.2 Application level processing —— Distributed lock
Another solution is that we don't rely on the underlying database to provide us with uniqueness , Instead, it relies on the application's own code logic to avoid concurrency conflicts . The guarantee of application layer is actually a more general scheme , After all, we cannot assume that all data persistence components used by the system have the ability of data uniqueness detection .
How to do it ? Simply speaking , It's to melt and act serially . The reason why we encounter the problem of repeatedly inserting data , Because “ Check whether the data already exists ” and “ insert data ” The two actions are separated . Because these two steps are not atomic , It leads to two different requests that can pass the detection in the first step at the same time . If we can combine these two actions into one atomic operation , You can avoid data conflicts . At this time, we need to lock , To achieve the atomicity of this code block .
about Java Language , The most familiar locking mechanism is synchronized Keyword. .
public synchronized void submit(String openId, String localIdentifier){
Account account = accountDao.find(openId);
if (account == null) {
// insert
}
else {
// update
}
}however , It's not that simple . Need to know , Our program is not only deployed on one server , Instead, multiple nodes are deployed . In other words, concurrency here is not just concurrency between threads , It's concurrency between processes . therefore , We can't get through java Language level locking mechanism to solve this synchronization problem , What we need here should be distributed locks .
3.3 The trade-off between the two solutions
Based on the above analysis , It seems that both options are feasible , But in the end, we chose the distributed lock scheme . Why does the first scheme simply need to add an index , We don't use it ?
Because the existing online data is already open_id Duplicate data on column , If you add a unique index directly at this time, you cannot succeed . To add a unique index , We must first clean up the existing duplicate data . But here comes the problem , The online program keeps running , Duplicate data may be generated continuously . Can we find a time period when the user requests inactivity to clean up , And complete the establishment of the unique index before the new duplicate data is inserted ? The answer, of course, is yes , It's just that this scheme needs operation and maintenance 、DBA、 Develop multi-party collaborative processing , Because of business characteristics , The most appropriate processing time should be in the dead of night in the early morning . Even with such harsh repair measures , There is no 100% guarantee that there will be no new duplicate data insertion between the completion of data cleaning and the establishment of index . therefore , A fix based on a unique index looks very appropriate at first glance , But the specific operation is still a little troublesome .
in fact , The most appropriate opportunity to establish a unique index should be in the initial design stage of the system , This can effectively avoid the problem of duplicate data . But it's done , In the current situation , We still choose the more operable distributed locking scheme . Because if you choose this scheme , We can go online first and add new code for distributed lock repair , Block new duplicate data insertion , Then clean up the original duplicate data , In this way, you only need to modify the code and go online once . Of course , After the problem is completely solved , We can reconsider adding a unique index to the data table .
So next , Let's take a look at how to implement the scheme based on distributed lock . First, let's review the knowledge of distributed locks .
Four 、 Distributed lock Overview
4.1 What are the characteristics of distributed locks ?
- In the distributed system environment , At the same time, only one thread of one machine can acquire the lock ;
- Highly available access lock and release lock ;
- High performance acquire lock and release lock ;
- Reentrant feature ;
- Lock failure mechanism , Prevent deadlock ;
- With blocking / Non blocking lock feature .
4.2 What are the implementation methods of distributed locks ?
There are three main implementations of distributed locks :
- Implementation of distributed lock based on Database ;
- be based on Zookeeper Implement distributed locks ;
- be based on Redis Implement distributed locks ;
4.2.1 Database based implementation
The implementation method based on database is to directly create a lock table , Locking is realized by operating table data 、 Unlock . With MySQL Database, for example , We can create such a table , And right method_name Make a constraint with a unique index :
CREATE TABLE `myLock` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT ' Primary key ', `method_name` varchar(100) NOT NULL DEFAULT '' COMMENT ' Locked method name ', `value` varchar(1024) NOT NULL DEFAULT ' Lock information ', PRIMARY KEY (`id`), UNIQUE KEY `uidx_method_name` (`method_name `) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' Method in locking ';
then , We can lock and unlock by inserting and deleting data :
# Lock
insert into myLock(method_name, value) values ('m1', '1');
# Unlock
delete from myLock where method_name ='m1';The implementation based on database is simple , But there are some obvious problems :
- No lock failure time , If unlocking fails , It will cause the lock record to remain in the database forever , Cause a deadlock .
- The lock cannot be re entered , Because it does not know whether the requester is the thread currently occupying the lock .
- The current database is a single point , In case of downtime , The locking mechanism will completely collapse .
4.2.2 be based on Zookeeper How to implement
ZooKeeper It is an open source component that provides consistency services for distributed applications , Inside it is a hierarchical file system tree structure , Specify that all node names in the same directory are unique .
ZooKeeper The node of (Znode) Yes 4 Types :
- Persistent nodes ( After the session is disconnected, the node still exists )
- Persistent sequential nodes
- Temporary node ( After the session is disconnected, the node is deleted )
- Temporary order node
When a new Znode When created as a sequential node ,ZooKeeper By way of 10 The sequence number of bits is appended to the original name to set Znode The path of . for example , If there will be a path /mynode Of Znode Create as a sequential node , be ZooKeeper Will change the path to /mynode0000000001, And set the next serial number to 0000000002, This serial number is maintained by the parent node . If two sequential nodes are created at the same time , that ZooKeeper Not for each Znode Use the same number .
be based on ZooKeeper Characteristics of , Distributed locks can be implemented as follows :
- Create a directory mylock;
- Threads A To get the lock is in mylock Create temporary order node under directory ;
- obtain mylock All child nodes in the directory , Then get the smaller brother node , If it doesn't exist , Indicates that the current thread sequence number is the minimum , Gets the lock ;
- Threads B Get all nodes , Judge that you are not the smallest node , Set a node smaller than yourself to listen to ;
- Threads A processed , Delete your own node , Threads B A change event was detected , Judge whether you are the smallest node , If so, get the lock .
Because you are creating a temporary node , When the thread holding the lock unexpectedly goes down , The lock can still be released , Therefore, the problem of deadlock can be avoided . in addition , We can also implement the blocking feature through the node queuing listening mechanism , It can also be done through Znode The thread ID is carried in to implement the reentrant lock . meanwhile , because ZooKeeper The high availability feature of the cluster , The availability of distributed locks can also be guaranteed . however , Because nodes need to be created and deleted frequently ,Zookeeper The method is inferior in performance to Redis The way .
4.2.3 be based on Redis How to implement
Redis Is an open source key value pair (Key-Value) Storage database , It is based on memory implementation , Very high performance , Often used as a cache .
be based on Redis The core principle of implementing distributed locks is : Try a specific key Conduct set operation , If the setting is successful (key It didn't exist before ) 了 , It is equivalent to obtaining the lock , At the same time key Set an expiration time , Avoid deadlock caused by thread exiting before releasing lock . After the thread completes the synchronization task, it actively releases the lock through delete Order to complete .
One thing that needs special attention here is how to lock and set the expiration time . Some people use setnx + expire These two commands are used to implement , But there's a problem . Suppose the current thread executes setnx For the lock , But in execution expire It was down before , The lock cannot be released . Of course , We can combine the two commands in one paragraph lua Script , Implement the atomic submission of two commands .
Actually , We simply use set The command can be implemented directly in one command setnx And set expiration time , So as to complete the locking operation :
SET key value [EX seconds] [PX milliseconds] NX
Unlocking only requires :
DEL key
5、 ... and 、 be based on Redis Distributed lock solution
In this case , We have adopted a method based on Redis How to implement distributed locks .
5.1 Distributed locked Java Realization
Because the project adopted Jedis frame , And online Redis Deploy in cluster mode , So we are based on redis.clients.jedis.JedisCluster Encapsulates a RedisLock class , Provide locking and unlocking interfaces .
public class RedisLock {
private static final String LOCK_SUCCESS = "OK";
private static final String LOCK_VALUE = "lock";
private static final int EXPIRE_SECONDS = 3;
@Autowired
protected JedisCluster jedisCluster;
public boolean lock(String openId) {
String redisKey = this.formatRedisKey(openId);
String ok = jedisCluster.set(redisKey, LOCK_VALUE, "NX", "EX", EXPIRE_SECONDS);
return LOCK_SUCCESS.equals(ok);
}
public void unlock(String openId) {
String redisKey = this.formatRedisKey(openId);
jedisCluster.del(redisKey);
}
private String formatRedisKey(String openId){
return "keyPrefix:" + openId;
}
}On the concrete realization , We set it up 3 Expiration time of seconds , Because the task of locking is simple database query and insertion , Moreover, the server and database are deployed in the same computer room , Under normal circumstances 3 Seconds is enough for code execution .
in fact , The above implementation is a crude version of Redis Distributed lock , We did not consider thread reentry in our implementation , The problem that locks are released by other processes by mistake is also not considered , But it can already meet our needs in this business scenario . Suppose it is extended to a more general business scenario , We can consider in value Add a specific ID of the current process to the , And make corresponding matching detection at the stage of locking and releasing the lock , You can get a safer and more reliable Redis The implementation of distributed lock .
Of course , image Redission Such frameworks also provide quite complete Redis Encapsulation and implementation of distributed lock , In some business scenarios with relatively stringent requirements , I suggest using this kind of framework directly . Because this article focuses on the idea of troubleshooting and solving problems , So there is no right Redisson The specific implementation principle of distributed is introduced more , Interested partners can find a wealth of information on the Internet .
5.2 Improved code logic
Now? , We can use the encapsulated RedisLock To improve the original code .
public class AccountService {
@Autowired
private RedisLock redisLock;
public void submit(String openId, String localIdentifier) {
if (!redisLock.lock(openId)) {
// If the same openId Concurrent , The thread did not grab the lock , Then directly discard the request
return;
}
// Get lock , Start executing user data synchronization logic
try {
Account account = accountDao.find(openId);
if (account == null) {
// insert
} else {
// update
}
} finally {
// Release the lock
redisLock.unlock(openId);
}
}
}5.3 Data cleaning
Finally, let's briefly talk about the finishing work . Due to the large amount of duplicate data , It's unlikely to be handled slowly by hand . So we wrote a scheduled task class , Clean up every other minute , Every time I clean 1000 A repetition of OpenID, Avoid the impact of a large number of queries and deletions on database performance in a short time . When it is confirmed that the duplicate data has been completely cleaned up, stop the scheduling of scheduled tasks , And remove this code in the next version iteration .
6、 ... and 、 summary
In the daily development process, there will inevitably be a variety of problems , We should learn to follow the vine and touch the melon step by step , Find the root cause of the problem ; Then try to find feasible solutions within their own cognitive scope , And carefully weigh the pros and cons of various schemes , In order to finally solve the problem efficiently .
author : Fast application server R & D team -Lin Yupan
边栏推荐
- How to quickly master the orders message in sportisimo EDI project?
- Malicious software packages are found in pypi code base. Tencent security threat intelligence has been included. Experts remind coders to be careful of supply chain attacks
- How do I view the IP address of a domain name? What is the relationship between domain name and IP?
- What is the difference between a white box test and a black box test
- Ups and esxi realize automatic shutdown after power failure
- Realization of data transmission between a and B computers by using single chip microcomputer serial port
- New core and new speed - next generation standard O & M engine
- Micro build low code supports Excel to import data source
- Analysis of official template of wechat personnel recruitment management system (III)
- Brief introduction to the working principle of high frequency signal generator
猜你喜欢
随机推荐
Analysis of official template of wechat personnel recruitment management system (I)
"Adobe international certification" confused me: what is Pantone?
Member management system PC side building tutorial (I)
How about the VIP domain name? Does the VIP domain name need to be filed after registration?
Realization of data transmission between a and B computers by using single chip microcomputer serial port
Could not read username for xxxxx
How to build a website after successfully registering a domain name? Can I build a website without registering a domain name?
How to select cloud game platforms? Just pay attention to two points
How to record the purchased domain name? Why should the purchased domain name be filed?
Material production tool manual
The website cannot be opened after windows installs the dongle
Risc-v instruction set explanation (7) instruction address alignment and addition and subtraction overflow processing
ZABBIX enterprise distributed monitoring
Basic concepts of complex networks
The errorcontrol registry of the third-party service is 3, which may cause the system to cycle restart. For example, ldpkit introduced by WPS
An indoor high-end router with an external cable bundle limiting mechanism
Figure 1 understand Tencent reassurance platform
[industry outlook] future development forecast of UHD video application
Smart Logistics: with the advent of aiot era, how to get through the "last mile" of logistics?
What is the reason why the list of channels on the left side of easycvr video Plaza displays garbled codes?



