当前位置：网站首页>Flexible use of distributed locks to solve the problem of repeated data insertion

Flexible use of distributed locks to solve the problem of repeated data insertion

2022-06-24 06:11:00 【2020labs assistant】

One 、 Business background

Many user oriented Internet services maintain a user data at the back end of the system , The fast application center business has done the same . The fast application center allows users to collect fast applications , The user's favorite list is recorded on the server , Identify by user account OpenID To associate the favorite app package name .

In order to make the user's favorite list in the fast application center can be connected with the fast application Menubar Get through the collection status of , We also recorded the user account id OpenID With the client local identity local_identifier Binding relationship of . Because fast application Manubar Held by the fast application engine , Independent of the fast application center , The user account ID cannot be obtained through the account system , Only the local identity of the client can be obtained local_identifier, So we can only keep state synchronization through the mapping relationship between the two .

On the concrete realization , We trigger a synchronization operation when the user starts the fast Application Center , By the client OpenID And the local identity of the client is submitted to the server for binding . The binding logic of the server is ： Judge OpenID Does it already exist , Insert the database if it doesn't exist , Otherwise, update the corresponding data row local_identifier Field （ Because users may log in to the same on two different mobile phones vivo account number ）. In subsequent business processes , So we can OpenID Query corresponding local_identifier, And vice versa .

But after the code went online for some time , We found that t_account There are a lot of duplicates in the data table OpenID Record . According to the binding logic described above , This should not happen in theory . Fortunately, these duplicate data have no impact on the update and query scenarios , Because in the query SQL We joined LIMIT 1 The limitation of , So for a OpenID In fact, both update and query operations only work on ID The smallest record .

Two 、 Problem analysis and positioning

Although redundant data has no impact on the actual business , But this obvious data problem is certainly intolerable . So we started to check the problem .

The first thought is to start with the data itself . First pass right t_account Make a rough observation of the table data , It was found that there was about 3% Of OpenID There will be duplication . That is to say, repeated insertion occurs occasionally , Most requests are handled correctly as expected . We reread the code , It is confirmed that there are no obvious logical errors in the implementation of the code .

Let's take a closer look at the data . We picked a few that were repetitive OpenID, Query the relevant data records , Find these OpenID The number of repetitions is also different , Some only repeat once , Some are more . however , At this time, we found a more valuable information —— These are the same OpenID The creation time of all data rows is exactly the same , And it's growing ID Is a continuous .

therefore , We guess that the problem should be caused by concurrent requests ！ We simulated the concurrent call of the client to the interface , There is indeed a phenomenon of repeated insertion of data , Further confirmed the rationality of this guess . however , Clearly, the logic of the client is to synchronize each user at startup , Why is there the same OpenID Concurrent requests ？

in fact , The actual operation of the code is not as ideal as we thought , There are often some unstable factors in the operation of the computer , For example, network environment 、 The load of the server . These unstable factors may cause the client to fail to send the request , there “ Failure ” It may not mean real failure , Instead, the entire request may take too long , The timeout set by the client has been exceeded , Thus, it is artificially judged as failure , So the request is sent again through the retry mechanism . Then it may eventually lead to the same request being submitted multiple times , And these requests may be blocked at some point in the middle （ For example, when the processing thread load of the server is too large , Too late to process the request , The request entered the buffer queue ）, When the blocking is relieved, these requests may be processed concurrently in a very short time .

This is actually a typical concurrency conflict problem , This problem can be simply abstracted as ： How to avoid writing duplicate data in concurrency . in fact , There are many common business scenarios that may face this problem , For example, it is not allowed to use the same user name when registering .

Generally speaking , When we deal with such problems , The most intuitive way is to make a query first , Insert is allowed only when it is judged that there is no current data in the database .

obviously , This process is no problem from the perspective of a single request . But when multiple requests are concurrent , request A And request B Start a query first , And they all get the result that it doesn't exist , So both perform data insertion , Eventually lead to concurrency conflicts .

3、 ... and 、 Explore feasible solutions

Now that the problem is located , The next step is to start looking for solutions . In the face of this situation , We usually have two choices , One is to let the database solve , The other is solved by the application .

3.1 Database level processing —— unique index

When using MySQL Database and InnoDB When storing the engine , We can use the unique index to ensure that the values of the same column are unique . obviously , stay t_account In this list , We didn't start with open_id Column to create a unique index . If we want to add a unique index at this time , You can use the following ALTER TABLE sentence .

ALTER TABLE t_account ADD UNIQUE uk_open_id( open_id );

Once open_id After adding a unique index to the column , When the above concurrency occurs , request A And request B One of them must give priority to data insertion , The other will get a similar error . therefore , Final assurance t_account There is only one item in the table openid=xxx There are records of .

Error Code: 1062. Duplicate entry 'xxx' for key 'uk_open_id'

3.2 Application level processing —— Distributed lock

Another solution is that we don't rely on the underlying database to provide us with uniqueness , Instead, it relies on the application's own code logic to avoid concurrency conflicts . The guarantee of application layer is actually a more general scheme , After all, we cannot assume that all data persistence components used by the system have the ability of data uniqueness detection .

How to do it ？ Simply speaking , It's to melt and act serially . The reason why we encounter the problem of repeatedly inserting data , Because “ Check whether the data already exists ” and “ insert data ” The two actions are separated . Because these two steps are not atomic , It leads to two different requests that can pass the detection in the first step at the same time . If we can combine these two actions into one atomic operation , You can avoid data conflicts . At this time, we need to lock , To achieve the atomicity of this code block .

about Java Language , The most familiar locking mechanism is synchronized Keyword. .

public synchronized void submit(String openId, String localIdentifier){
    Account account = accountDao.find(openId);
    if (account == null) {
        // insert
    }
    else {
        // update
    }
}

however , It's not that simple . Need to know , Our program is not only deployed on one server , Instead, multiple nodes are deployed . In other words, concurrency here is not just concurrency between threads , It's concurrency between processes . therefore , We can't get through java Language level locking mechanism to solve this synchronization problem , What we need here should be distributed locks .

3.3 The trade-off between the two solutions

Based on the above analysis , It seems that both options are feasible , But in the end, we chose the distributed lock scheme . Why does the first scheme simply need to add an index , We don't use it ？

Because the existing online data is already open_id Duplicate data on column , If you add a unique index directly at this time, you cannot succeed . To add a unique index , We must first clean up the existing duplicate data . But here comes the problem , The online program keeps running , Duplicate data may be generated continuously . Can we find a time period when the user requests inactivity to clean up , And complete the establishment of the unique index before the new duplicate data is inserted ？ The answer, of course, is yes , It's just that this scheme needs operation and maintenance 、DBA、 Develop multi-party collaborative processing , Because of business characteristics , The most appropriate processing time should be in the dead of night in the early morning . Even with such harsh repair measures , There is no 100% guarantee that there will be no new duplicate data insertion between the completion of data cleaning and the establishment of index . therefore , A fix based on a unique index looks very appropriate at first glance , But the specific operation is still a little troublesome .

in fact , The most appropriate opportunity to establish a unique index should be in the initial design stage of the system , This can effectively avoid the problem of duplicate data . But it's done , In the current situation , We still choose the more operable distributed locking scheme . Because if you choose this scheme , We can go online first and add new code for distributed lock repair , Block new duplicate data insertion , Then clean up the original duplicate data , In this way, you only need to modify the code and go online once . Of course , After the problem is completely solved , We can reconsider adding a unique index to the data table .

So next , Let's take a look at how to implement the scheme based on distributed lock . First, let's review the knowledge of distributed locks .

Four 、 Distributed lock Overview

4.1 What are the characteristics of distributed locks ？

In the distributed system environment , At the same time, only one thread of one machine can acquire the lock ;
Highly available access lock and release lock ;
High performance acquire lock and release lock ;
Reentrant feature ;
Lock failure mechanism , Prevent deadlock ;
With blocking / Non blocking lock feature .

4.2 What are the implementation methods of distributed locks ？

There are three main implementations of distributed locks ：

Implementation of distributed lock based on Database ;
be based on Zookeeper Implement distributed locks ;
be based on Redis Implement distributed locks ;

4.2.1 Database based implementation

The implementation method based on database is to directly create a lock table , Locking is realized by operating table data 、 Unlock . With MySQL Database, for example , We can create such a table , And right method_name Make a constraint with a unique index ：

CREATE TABLE `myLock` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT ' Primary key ',
 `method_name` varchar(100) NOT NULL DEFAULT '' COMMENT ' Locked method name ',
 `value` varchar(1024) NOT NULL DEFAULT ' Lock information ',
 PRIMARY KEY (`id`),
 UNIQUE KEY `uidx_method_name` (`method_name `) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' Method in locking ';

then , We can lock and unlock by inserting and deleting data ：

# Lock 
insert into myLock(method_name, value) values ('m1', '1');
 
# Unlock 
delete from myLock where method_name ='m1';

The implementation based on database is simple , But there are some obvious problems ：

No lock failure time , If unlocking fails , It will cause the lock record to remain in the database forever , Cause a deadlock .
The lock cannot be re entered , Because it does not know whether the requester is the thread currently occupying the lock .
The current database is a single point , In case of downtime , The locking mechanism will completely collapse .

4.2.2 be based on Zookeeper How to implement

ZooKeeper It is an open source component that provides consistency services for distributed applications , Inside it is a hierarchical file system tree structure , Specify that all node names in the same directory are unique .

ZooKeeper The node of （Znode） Yes 4 Types ：

Persistent nodes （ After the session is disconnected, the node still exists ）
Persistent sequential nodes
Temporary node （ After the session is disconnected, the node is deleted ）
Temporary order node

When a new Znode When created as a sequential node ,ZooKeeper By way of 10 The sequence number of bits is appended to the original name to set Znode The path of . for example , If there will be a path /mynode Of Znode Create as a sequential node , be ZooKeeper Will change the path to /mynode0000000001, And set the next serial number to 0000000002, This serial number is maintained by the parent node . If two sequential nodes are created at the same time , that ZooKeeper Not for each Znode Use the same number .

be based on ZooKeeper Characteristics of , Distributed locks can be implemented as follows ：

Create a directory mylock;
Threads A To get the lock is in mylock Create temporary order node under directory ;
obtain mylock All child nodes in the directory , Then get the smaller brother node , If it doesn't exist , Indicates that the current thread sequence number is the minimum , Gets the lock ;
Threads B Get all nodes , Judge that you are not the smallest node , Set a node smaller than yourself to listen to ;
Threads A processed , Delete your own node , Threads B A change event was detected , Judge whether you are the smallest node , If so, get the lock .

Because you are creating a temporary node , When the thread holding the lock unexpectedly goes down , The lock can still be released , Therefore, the problem of deadlock can be avoided . in addition , We can also implement the blocking feature through the node queuing listening mechanism , It can also be done through Znode The thread ID is carried in to implement the reentrant lock . meanwhile , because ZooKeeper The high availability feature of the cluster , The availability of distributed locks can also be guaranteed . however , Because nodes need to be created and deleted frequently ,Zookeeper The method is inferior in performance to Redis The way .

4.2.3 be based on Redis How to implement

Redis Is an open source key value pair (Key-Value) Storage database , It is based on memory implementation , Very high performance , Often used as a cache .

be based on Redis The core principle of implementing distributed locks is ： Try a specific key Conduct set operation , If the setting is successful （key It didn't exist before ）了 , It is equivalent to obtaining the lock , At the same time key Set an expiration time , Avoid deadlock caused by thread exiting before releasing lock . After the thread completes the synchronization task, it actively releases the lock through delete Order to complete .

One thing that needs special attention here is how to lock and set the expiration time . Some people use setnx + expire These two commands are used to implement , But there's a problem . Suppose the current thread executes setnx For the lock , But in execution expire It was down before , The lock cannot be released . Of course , We can combine the two commands in one paragraph lua Script , Implement the atomic submission of two commands .

Actually , We simply use set The command can be implemented directly in one command setnx And set expiration time , So as to complete the locking operation ：

SET key value [EX seconds] [PX milliseconds] NX

Unlocking only requires ：

DEL key

5、 ... and 、 be based on Redis Distributed lock solution

In this case , We have adopted a method based on Redis How to implement distributed locks .

5.1 Distributed locked Java Realization

Because the project adopted Jedis frame , And online Redis Deploy in cluster mode , So we are based on redis.clients.jedis.JedisCluster Encapsulates a RedisLock class , Provide locking and unlocking interfaces .

public class RedisLock {
 
    private static final String LOCK_SUCCESS = "OK";
    private static final String LOCK_VALUE = "lock";
    private static final int EXPIRE_SECONDS = 3;
 
    @Autowired
    protected JedisCluster jedisCluster;
 
    public boolean lock(String openId) {
        String redisKey = this.formatRedisKey(openId);
        String ok = jedisCluster.set(redisKey, LOCK_VALUE, "NX", "EX", EXPIRE_SECONDS);
        return LOCK_SUCCESS.equals(ok);
    }
 
    public void unlock(String openId) {
        String redisKey = this.formatRedisKey(openId);
        jedisCluster.del(redisKey);
    }
 
    private String formatRedisKey(String openId){
        return "keyPrefix:" + openId;
    }
}

On the concrete realization , We set it up 3 Expiration time of seconds , Because the task of locking is simple database query and insertion , Moreover, the server and database are deployed in the same computer room , Under normal circumstances 3 Seconds is enough for code execution .

in fact , The above implementation is a crude version of Redis Distributed lock , We did not consider thread reentry in our implementation , The problem that locks are released by other processes by mistake is also not considered , But it can already meet our needs in this business scenario . Suppose it is extended to a more general business scenario , We can consider in value Add a specific ID of the current process to the , And make corresponding matching detection at the stage of locking and releasing the lock , You can get a safer and more reliable Redis The implementation of distributed lock .

Of course , image Redission Such frameworks also provide quite complete Redis Encapsulation and implementation of distributed lock , In some business scenarios with relatively stringent requirements , I suggest using this kind of framework directly . Because this article focuses on the idea of troubleshooting and solving problems , So there is no right Redisson The specific implementation principle of distributed is introduced more , Interested partners can find a wealth of information on the Internet .

5.2 Improved code logic

Now? , We can use the encapsulated RedisLock To improve the original code .

public class AccountService {
 
    @Autowired
    private RedisLock redisLock;
 
    public void submit(String openId, String localIdentifier) {
        if (!redisLock.lock(openId)) {
            //  If the same openId Concurrent , The thread did not grab the lock , Then directly discard the request 
            return;
        }
 
        //  Get lock , Start executing user data synchronization logic 
        try {
            Account account = accountDao.find(openId);
            if (account == null) {
                // insert
            } else {
                // update
            }
        } finally {
            //  Release the lock 
            redisLock.unlock(openId);
        }
    }
}

5.3 Data cleaning

Finally, let's briefly talk about the finishing work . Due to the large amount of duplicate data , It's unlikely to be handled slowly by hand . So we wrote a scheduled task class , Clean up every other minute , Every time I clean 1000 A repetition of OpenID, Avoid the impact of a large number of queries and deletions on database performance in a short time . When it is confirmed that the duplicate data has been completely cleaned up, stop the scheduling of scheduled tasks , And remove this code in the next version iteration .

6、 ... and 、 summary

In the daily development process, there will inevitably be a variety of problems , We should learn to follow the vine and touch the melon step by step , Find the root cause of the problem ; Then try to find feasible solutions within their own cognitive scope , And carefully weigh the pros and cons of various schemes , In order to finally solve the problem efficiently .